A glossary for Industrial DataOps
Two bodies that often approach problems, projects, and process from different perspectives are IT and Operations Technology (OT). While the industrial automation community has been writing and discussing the necessity of IT-OT convergence for nearly a decade, this functional collaboration still remains a stumbling block for many industrial companies on their Industry 4.0 journeys.
The good news is that the emerging concept of Industrial DataOps can provide some common ground.
DataOps is a new approach to data integration and security that aims to improve data quality and reduce time spent preparing data for use throughout the enterprise. Industrial DataOps provides a toolset—and a mindset—for OT to establish “data contracts” with IT. By using an Industrial DataOps solution, OT is empowered to model, transform, and share plant floor data with IT systems without the integration and security concerns that have long vexed the collaboration.
If we see the value in IT-OT collaboration, the first step is getting these functions to speak the same language. This post aims to document key terms surrounding Industrial DataOps and provide IT and OT with a common dictionary. Some of these definitions are more technical in nature and others are more business oriented. Let’s dive in.
Terms and definitions
Aggregation. A consolidated view in the property set of all attribute and variable data from source PLCs, machine controllers, RTUs, smart sensors, and other systems.
Attribute. A data characteristic from a source PLC, machine controller, RTU, smart sensor, or other system with a static value (like location) or a dynamic value (like temperature or machine state).
Connection. A connection in an Industrial DataOps solution represents a path to a source system that contains inputs and outputs. An input represents a path to a data point contained in a connection that can be read. An output represents a path to a data point contained in a connection that can be written to.
Contextualization. Data structures on PLCs and machine controllers have minimal descriptive information—if any. In many cases, data points are referenced with cryptic data point naming schemes or references to memory locations. Contextualization transforms raw data into information by presenting human-readable property names and adding static metadata to the data set. Contextualization enables industrial data to be used more easily outside of the controls environment for machine maintenance, process optimization, quality, and traceability.
Correlation. The purpose of automation data has historically been to control and monitor the production process. Therefore, industrial data is correlated for process control. In cases where industrial data must be analyzed and aligned by machine for predictive maintenance, by process for process optimization, or by product for quality and traceability, the data must be assembled and contextualized appropriately for each use case before it can be used. Correlation prepares information for its end purpose by assembling, contextualizing and transposing the data into a usable state.
Data Model. The data model forms the basis for standardizing data across a wide range of raw input data. A data model is comprised of a collection of attributes that are common to the logical item. When working with industrial data, a data model is typically a standard representation of an asset, process, product, system, or role.
From one machine to the next, each industrial device may have its own data model. Historically, vendors, systems integrators, and in-house controls engineers have not focused on creating data standards. They refined the systems and changed the data models over time to suit their needs. This worked for one-off projects, but today’s IIoT projects require more scalability.
To handle the scale of hundreds of machines and controllers—and tens of thousands of data points—a set of standard models can be established within an Industrial DataOps solution. The models correlate the data by machinery, process, and product and present it to the consuming applications. This systematic approach of building data models greatly accelerates the usage of this information and simplifies the management of the integrations.
At the core of the model is the real-time data coming off the machinery and automation equipment. This data must often be augmented from many sources, including other equipment or controllers nearby, smart devices or sensors, derivations or transformations computed from existing data points available, metadata manually entered, and data from other databases or systems. Once the standard models are created in the Industrial DataOps solution, they can be instantiated for each logical asset, process, and/or product.
Data Payload. Data payload is the collection of data that is assembled through the model instance and is sent out in the flow to the target connection. This payload must be formulated in a way the target system can consume it and must include enough information such that it is understandable, identifiable, and useful.
Edge. Computing and data handling may be done at the edge, in on-premises servers, or in the cloud.
The edge refers to the end of the TCP/IP network where it connects to industrial automation equipment. The edge also includes the computers installed close to this network boundary that influence the data flowing through it. As computers have become less expensive, easier to manage, and more robust, the prevalence of system architectures with minimally configured computers installed at the edge to process, route, and analyze data has increased. Edge-located computers reduce latency and increase response time, making them ideal self-contained cells to support an Industrial DataOps solution. These computers range from PLCs and network switches with open Linux or Windows cores, to single board computers like Raspberry Pi, to PC-sized industrialized computers.
ETL. ETL solutions (or Extract, Transform, Load) integrate business systems with analytics systems. ETL solutions are designed to extract data in a batch process from systems and databases like CRM and ERP, combine this data in an intermediate data store, and then provide tools to manually and automatically transform the data by cleaning it, aligning it, and normalizing it. The data is then loaded into a final data store to be utilized by analytics, trending, and search tools.
Traditional ETL falls short for industrial data for a number of reasons. First of all, the data must be processed in real time not in batches. Also, the data models must be standardized since each machine has its own data definitions, which generate a higher volume of data models than is typically processed in ETL solutions for business systems. And finally, contextualization is critical and more extensive due to the source devices and protocols.
ETL solutions for business systems are typically used by IT or data science. However, for industrial use cases, the data cleanup must be defined and maintained by the OT team who are familiar with the source systems, data nuances, and changes to the automation equipment.
Flow. A flow defines the mapping and execution of data coming in and data going out of an Industrial DataOps solution. A flow’s source can be a simple primitive input or a modeled instance, while its target is an output. Data flows may be controlled model-by-model by identifying the model to be moved, the target system, and the frequency or trigger for the movement.
Instance. Models are leveraged through the creation of a model instance, each of which are unique to a specific instance of an asset, process, product, system, or role. Whereas a model specifies the standard attributes of a type of asset, process, system, or role, a model instance represents one of these items with mappings to actual live data. For example, a model may be created to represent how a Quality Manager’s view of a manufacturing line will be standardized. If there are 10 manufacturing lines, 10 model instances would be created and populated with data to represent each one.
Metadata. Metadata is data about data. For instance, metadata on a pressure gauge could include the unit of measure for the pressure value or metadata on a machine could include where it is located on the factory floor or the make and model of the machine.
Normalization. Normalization requires converting property values to common units of measure like converting a temperature value from Fahrenheit to Celsius or converting a temperature sensor’s raw, unitless measurement range to degrees Celsius.
Normalization can also be applied to data flow. For example, analytics systems typically expect to receive data at a consistent or normalized frequency. An Industrial DataOps solution can align data after it has been collected and normalize its flow to consuming applications.
Standardization. The automation in a factory evolves over time with machinery and equipment sourced from a wide variety of hardware vendors. This variety in machinery results in a wide variety of available data. Some data points may simply have different names while others may have different units of measure or different measurements entirely. Standardization enables the user to homogenize the property set by asset, process, product, or target system, which allows the data to be rapidly adopted in analytics, visualization, and other systems.
Transformation. Industrial computing devices like PLCs, machine controllers, smart sensors, and embedded devices typically represent values and state with shorthand abbreviations or numbers. While this format is ideal for storage and coding, it is not usable by anyone who is not intimately familiar with the programming of the device. From one machine to the next, unique transformation must be defined and performed. For example, 1 = RUNNING.
Transformations may also include statistical calculations of raw data like the average, min, and max temperature values checked every second but recorded every hour. Transformations may also be used to derive an attribute value when a device does not have a unique data tag for it.
Unified Namespace. A consolidated, abstracted structure by which all business applications are able to consume real-time industrial data in a consistent manner. The benefits of a Unified Namespace include reduced time to implement new integrations, reduced efforts to maintain data integrations, improved agility of integrations, access to new data, and improved data quality and security.
Next steps
This application abstraction approach enables OT to take responsibility and make changes to plant floor equipment and production, add new applications, and react to changes in business relationships with outside vendors without disrupting access to business insights.
If you’re interested in learning more about Industrial DataOps, please check out a white paper written by my colleague John Harrington titled “DataOps: The Missing Link in Your Industrial Data Architecture.”
Finally, I hope this glossary provides some clarity on terminology and offers a common vocabulary for IT and OT to continue to build upon. If you think an Industrial DataOps solution could be a good fit for your organization, please request a demo to learn more about HighByte Intelligence Hub.