CIOReview | | 19 JUNE 2022CXO INSIGHTSIOT ANALYTICS ENABLEMENT WITH DATA NAMING STANDARDIZATIONBy Darin McCoy, Analytics Manager, Caterpillar Inc. and Andrei Khurshudov, Director of IoT Analytics, CAT DigitalC aterpillar Inc. is synonymous with heavy machinery. It is the world's leading manufacturer of construction and mining equipment, off-highway diesel and natural gas engines, industrial gas turbines and diesel-electric locomotives. Digital technology, including big data analytics, is also a part of the company's journey, adding value to its iron with robust, scalable and high-quality data. At Caterpillar, advanced IoT analytics are enabled at scale by way of onboard computers, sensors and cameras on more than 1.2million connected assets worldwide. The data generated on customer assets on job sites across the globe includes time-series data, machine health alerts, fuel usage, GPS and operator-specific usage. This example of "Big Data" is high volume and velocity and enables the development of new analytics capabilities that can increase safety and the customer value of assets. However, as the amount of data grows, data quality issues can also grow equally in size. For example, there can be missing batches or messages, missing data channels, interruptions in cell/satellite coverage, imperfect ETL (extract, transform, load) and other factors that can degrade the efficacy of IoT analytics. Imagine a modern large mining truck with more than100 IoT sensors. The amount of data collected by the sensors can result in data quality problems that lead to ill-performing analytics applications. Surprisingly, a `simple' issue like irregularities in data channel naming can become a major problem for the plurality of IoT analytics applications. Data scientists and data engineers can find themselves investing more time inquality control and various model adjustments rather than analytics itself. Even the smallest discrepancies can reduce scalability of deployed analytics solutions, increase the barrier of entry for building new innovative applications on this data and decrease communication effectiveness between engineering and analytics.For instance, a channel delivering pressure data could be named as `pressure,' `prs,' `press,' `oil press' or something else. Similarly, another channel that relates to temperature sensor data can be name as `temperature,' `temp,' tmp' or other related forms. While these channels are generating similar data, they are named differently likely because they originate from different design teams, types of equipment and product families. And, since these product families often rely on the same analytics algorithms, these differences can disrupt the flow of data to the analytics applications.Imagine an analytics model designed to monitor conditions inside a particular Andrei Khurshudov
<
Page 9 |
Page 11 >