An intro to industrial data modeling
Tony Paine
is the CEO of HighByte, focused on the company’s vision and ability to execute to plan. For 20 years, Tony immersed himself in industrial software development and strategy at Kepware. As CEO, he led the company through a successful acquisition to PTC in 2016 prior to founding HighByte in 2018.
The data model forms the basis for standardizing data across a wide range of raw input data. An Industrial DataOps solution like HighByte Intelligence Hub enables users to develop models that standardize and contextualize industrial data. In short, HighByte Intelligence Hub is a data hub with a modeling and transformation engine at its core.
But what exactly is a data model, and why is data modeling important for Industry 4.0? This post aims to address these questions and provide an introduction to modeling data at scale.
But what exactly is a data model, and why is data modeling important for Industry 4.0? This post aims to address these questions and provide an introduction to modeling data at scale.
What is a data model?
A data model is a definition that describes a rich piece of information. The information can have many different attributes, some containing real-time, raw operational data and others that actually define that data. The latter provides context, which could be a source description, unit of measure, min and max ranges, and other types of information that—when pulled together—define a piece of information that corresponds to a real “thing” like an asset, process, system, or role.
While data models are well-known entities in the industrial automation industry, they go by many names. Depending on your level of experience or function, you may be accustomed to different naming conventions. For instance, if you're a controls engineer modeling complex data sets within a PLC, you might refer to the data model as a “user defined tag”. Others associate data models with the library of built-in complex tags that are provided by default based on the PLC vendor’s implementation. Still others may think of data models as simply the structure of how any data is laid out in terms of names, data points, data types, and whether or not these properties are required.
The industry has also come together to create standardized data models or data sets, which are sometimes specific to a particular vertical industry. These models define nomenclature, how data should be represented, and the model’s structure (how it is laid out). I am referring to standardized models like ISA-95 and companion specs to OPC UA.
However, there are still many data models in use that are vendor specific (sometimes even device specific) and therefore not standardized across the industry. IT systems and Cloud applications also have their expectations on how data should be modeled, received, and stored. Between OT and IT devices, systems, and applications, there is a lot of data model diversity and little standardization in real-world practice.
While data models are well-known entities in the industrial automation industry, they go by many names. Depending on your level of experience or function, you may be accustomed to different naming conventions. For instance, if you're a controls engineer modeling complex data sets within a PLC, you might refer to the data model as a “user defined tag”. Others associate data models with the library of built-in complex tags that are provided by default based on the PLC vendor’s implementation. Still others may think of data models as simply the structure of how any data is laid out in terms of names, data points, data types, and whether or not these properties are required.
The industry has also come together to create standardized data models or data sets, which are sometimes specific to a particular vertical industry. These models define nomenclature, how data should be represented, and the model’s structure (how it is laid out). I am referring to standardized models like ISA-95 and companion specs to OPC UA.
However, there are still many data models in use that are vendor specific (sometimes even device specific) and therefore not standardized across the industry. IT systems and Cloud applications also have their expectations on how data should be modeled, received, and stored. Between OT and IT devices, systems, and applications, there is a lot of data model diversity and little standardization in real-world practice.
Why is data modeling important?
Data modeling is important because models standardize information, enable interoperability, show intent, determine trust, and ensure proper data governance.
To expand on these ideas, data modeling allows for the standardization of how data is categorized and pulled together for additional meaning. Modeling allows for interoperability when sharing information across various applications or when sharing the information between people with different knowledge of and use for the data. Users with different functions need to be able to look at data and quickly understand its source, structure, and what the model represents (like a pump or a production line). This context and metadata are what makes modeling so important.
Moreover, data modeling shows intent: what a value is, what it should be, if it’s in an acceptable range, and whether or not it can be trusted. By modeling data with an abstraction layer dedicated to merging, modeling, and securely sharing data, we help ensure proper data governance. Whether data governance is a dedicated body within the company or simply a published set of rules, data governance dictates how information should be shared across business units and mandates data uniformity. And, of course, governance ensures that the data can be understood by and distributed to only the systems and users that need access to that information.
To expand on these ideas, data modeling allows for the standardization of how data is categorized and pulled together for additional meaning. Modeling allows for interoperability when sharing information across various applications or when sharing the information between people with different knowledge of and use for the data. Users with different functions need to be able to look at data and quickly understand its source, structure, and what the model represents (like a pump or a production line). This context and metadata are what makes modeling so important.
Moreover, data modeling shows intent: what a value is, what it should be, if it’s in an acceptable range, and whether or not it can be trusted. By modeling data with an abstraction layer dedicated to merging, modeling, and securely sharing data, we help ensure proper data governance. Whether data governance is a dedicated body within the company or simply a published set of rules, data governance dictates how information should be shared across business units and mandates data uniformity. And, of course, governance ensures that the data can be understood by and distributed to only the systems and users that need access to that information.
What does a data model look like?
A data model is not—and should not—be complicated. At its most basic definition, a data model is one-to-many name value pairs. Data models are created as logical collections of these name value pairs that are related in some way and—when put together—become a valuable and useful information object.
For example, you might create a data model that represents a thermostat. The first attribute is a current value. The second attribute is a set point value. The third attribute is a unit value. The model clearly articulates how a thermostat should be represented for the enterprise. In this example, every thermostat will have a name, current value (a floating-point value), set point, and unit of measure (a static character indicating degrees Fahrenheit or degrees Celsius).
A thermostat is obviously a simple thing to model. But this same concept applies to even the most complex process or piece of equipment that you may want to model. You will distill the model down to its very primitive data points that, all together, have more important meaning. You will include any contextual attributes in the model that describe what the data points are and what they should be such that the information becomes self-describing to any consumer of the resulting data model.
My advice for anyone getting started with data modeling is to start small. Models do not need to be complicated. Effective models distill data sets down into their simplest form so they can easily be reused, helping you achieve standardization at scale.
For example, you might create a data model that represents a thermostat. The first attribute is a current value. The second attribute is a set point value. The third attribute is a unit value. The model clearly articulates how a thermostat should be represented for the enterprise. In this example, every thermostat will have a name, current value (a floating-point value), set point, and unit of measure (a static character indicating degrees Fahrenheit or degrees Celsius).
A thermostat is obviously a simple thing to model. But this same concept applies to even the most complex process or piece of equipment that you may want to model. You will distill the model down to its very primitive data points that, all together, have more important meaning. You will include any contextual attributes in the model that describe what the data points are and what they should be such that the information becomes self-describing to any consumer of the resulting data model.
My advice for anyone getting started with data modeling is to start small. Models do not need to be complicated. Effective models distill data sets down into their simplest form so they can easily be reused, helping you achieve standardization at scale.
Get started today!
Join the free trial program to get hands-on access to all the features and functionality within HighByte Intelligence Hub and start testing the software in your unique environment.