OT architectures can complicate the efforts of IT personnel trying to properly gather data for widespread use throughout the enterprise. OT infrastructure frequently contains a wide variety of legacy equipment and systems, creating data silos and limiting efficient movement of data across the plant or enterprise. Moreover, OT environments are constrained by legacy systems that often do not integrate seamlessly with new equipment, making it difficult to scale, and less cost-effective to implement new technologies such as analytics.
Many organizations are embracing data lake technologies to better communicate with the wide variety of OT systems and equipment. These provide flexible connectivity to leverage the organizations’ initial investment, enabling greater visibility across the enterprise.
Organizations face many important choices when searching for the right data lake solution—one that will not only serve them well today, but also provide scalability to serve well into the future. To implement the best data lake solution, organizations should focus on the way technology will help connect, collect, and contextualize the data they rely on for continued reliability and operational success.
Data management software provides out-of-the-box connectivity
Most organizations are trying to increase and improve analytics, and many are even moving toward centralized monitoring and reporting to drive operational excellence. However, one of the key complications that IT groups face when trying to connect data across the plant or across the enterprise is the wide variety of new and legacy equipment using varying integration frameworks.
Engineering, reliability, maintenance, operations, research, and other departments may all need to collect and share OT data. Due to variations in data types and storage formats, delivering this collaboration often requires many connectivity software packages for security, buffering, tunneling, bridging, and redundancy management. This frequently results in a complex and difficult to maintain connectivity patchwork (see Figure 1).
Moreover, when solutions are created, they are typically difficult to connect to emerging technologies, such as cloud analytics. Many older OT systems don’t use the modern integration frameworks that IT departments rely on for cloud connectivity. As a result, IT must perform makeshift development to make these connections, complicating management, security, and reliability. In the most severe instances, these solutions can exceed the capabilities of the OT systems’ infrastructure, causing occasional outages.
Data lakes solve the complex web of OT system connectivity problems, without the need to rip and replace old systems that are reliably performing essential tasks. These data lakes can be deployed either locally or in the cloud, and the most advanced lakes come with a wide range of out-of-the-box connectivity solutions, providing connections to nearly any OT system (see Figure 2). Organizations relying on infrastructure with performance limitations have the flexibility to throttle data rates to ensure collection doesn’t interrupt OT systems’ operation.
For organizations operating in areas with limited communication infrastructure or with unique cybersecurity concerns inhibiting the ability to move data offsite, advanced data lake software can be implemented flexibly on-prem or in the cloud, with the ability to move from one format to the other. This allows an organization to meet security and regulatory needs by storing data in the cloud, locally, or a combination of the two. As circumstances change, so can the infrastructure.
Unlock efficiency with automatic data collection and storage
Today, many plants rely on the historian to gather critical plant data. However, historians have significant limitations for anyone working outside the facility or the process engineering group. Typically, historians are not ideal for managing a wide variety of data types. Historian licensing by tag, the most common method, is cost-prohibitive for organizations trying to monitor many data points.
In addition, a historian is typically most effective in collecting and storing numerical data. While this is valuable for some functional areas, it leaves out many of the data types that functional groups rely upon such as photos, videos, spreadsheets, and more.
Why data sharing is a two-way street
Avoiding the black hole of data
Turning data into quality information for making better decisions
As a result, non-numerical data is not stored in the historian, or when stored is not easy to extract and use. This leaves many groups without access to data in the historian, or if they have access, it is only to a small sliver of the data they need. IT and OT are thus forced to manage a wide variety of systems, and these groups must develop secure solutions for the transfer of data among these systems.
Data lakes deliver much more flexible collection and storage capability. Eliminating tag-based licensing means teams can collect and store data at a much lower price. In addition, advanced data lakes provide automatic aggregation of data from a wide variety of sources, with storage in a central repository (see Figure 3).
Automated aggregation reduces required data access effort because users don’t have to open multiple applications to locate data about a particular asset. It also increases efficiency and security at both the individual and plant level because data doesn’t have to be manually transferred, often on insecure devices such as flash drives. Data is also less likely to fall victim to human error in collection and transcription.
Data lakes typically use more advanced data collection methods than historians. New database technologies such as NoSQL solutions improve flexibility and scalability as compared to a historian. The databases used by data lakes easily store unstructured data and scale as the database grows, providing an improved user experience via faster loading and retrieval of data.
Moreover, advanced data collection systems can provide valuable functionality for resource-starved organizations. Data lakes in these systems can connect directly to a CMMS to close the loop on maintenance. Even small maintenance teams can quickly identify problems, schedule repairs and see the results of those repairs—all from one system.
Draw meaning from data with contextualization
Having access to large amounts of data is not enough because organizations also need ways to assign meaningful context to data. Limited consistency of data across the organization can make it difficult to draw conclusions leading to meaningful change.
Modern, flexible data lakes allow organizations to contextualize data in a hierarchical model. Instead of looking at one string of numerical data on the historian and comparing it manually with data from other systems, teams can process historical, asset, inspection, and CMMS data in one system (see Figure 4).
Having all plant data in one system enables organizations to use modern IT tools, even if the systems supplying them aren’t nearly as modernized. It also unlocks new strategies and solutions to improve performance across the enterprise, and to empower personnel to act on the data.
Today’s data lakes standardize data to make it effectively system agnostic. Standardized data can then be sent to nearly any application, or it can be automatically connected to the system’s built-in analytics tools for seamless contextualization. Personnel across the enterprise can manage data and establish and drive key performance indicators and other metrics—which are automatically sent to users’ preferred device (mobile, tablet, desktop)—all from one system.
Standardization in a single system is particularly useful when organizations have a wide array of instrumentation and valves that need to operate at peak performance to reduce downtime and emissions, ensure safety and drive optimum production. These devices are often widespread and hard to interconnect, and plant personnel do not have a way to cross-correlate all their data.
Advanced data lakes enable users from many different functional areas to run and view reports on cross-enterprise data from wherever they are. From their desk or from the field, users can track and trend performance to confirm proper configuration, evaluate performance based on equipment manufacturer or operating conditions, and more.
Improved efficiency and visibility across the enterprise
Bridging the gap between OT and IT is key to maintaining the flexibility necessary for a competitive advantage in today’s global market. Data lakes improve collaboration and decision making without requiring plants to rip and replace legacy equipment or manage complex infrastructure. The resulting connectivity improvement supports digital transformation initiatives and empowers IT and OT teams to work together to break down silos and put actionable advice and metrics in the hands of users on the plant floor and across the enterprise.
This story originally appeared in the May 2021 issue of Plant Services. Subscribe to Plant Services here.