It seems like instrument maintenance (and maintenance in general) is a bit overlooked when people consider process safety.
Even though mechanical integrity is a cornerstone of OSHA’s 1910.119 process safety management (PSM) regulations, maintenance in general does not always play a big part in process hazard analyses (PHA) such as hazard & operability (HAZOP) studies and risk assessments such as layer-of-protection analysis (LOPA), other than failure of process equipment and instrumentation being considered as hazard-initiating causes in the analysis. There may be a guideword or two, but HAZOP and LOPA are scenario-based, whereas maintenance, reliability, and mechanical integrity are system-level safety considerations that must be considered holistically. Maintenance, it seems, is often taken for granted, yet it can have a significant effect on safety.
This article explores how process safety is affected by system-level considerations in the instrument maintenance program; reviews how instrument maintenance has affected and can affect process safety, organizational performance excellence, and the minimum required maintenance level (MRML); and identifies signs of a poorly performing instrument maintenance organization.
Safety – reliability – maintenance
Safety and reliability are system-level considerations that are intertwined in a process plant. You cannot have a safe plant that is not a reliable plant, and a reliable plant is a safer plant. Reliability also directly affects the availability (uptime) of a plant, which improves the plant production. A good portion of the initiating causes for hazardous scenarios in HAZOPs and LOPAs are related to instrument and control failures, which is indicative of their importance to safety.
Instruments in a plant by far represent the largest number of devices and are the most technically challenging assets to maintain. Instrument unreliability affects plant uptime: When instrument and controls maintenance is ineffective, poor plant operability and off-spec product can result, with more process material rework and blending required, affecting product quality and production rates. In addition, poor measurements and controls can lead to a lack of operability and operator confidence in the information being provided by the instruments and controls. This can lead to a loss of situation awareness during abnormal operation, which can lead to potential safety incidents. (As an example, I once asked a 30-year operator in a major refinery what he did first in a developing problem situation and he replied, “To see whether my instruments were lying to me.”)
In some plants ICS maintenance is performed by a single maintenance organization, but more commonly the DCS equipment and in some cases the PLC maintenance may be done by separate organizations within the plant with different management structures. The principles remain the same for all types of ICS maintenance, and are referred to here as “instrument maintenance” for simplicity. Also, “organization,” “group,” and “department” are used interchangeably in the article.
Maintenance organizational performance
Instrument maintenance, and maintenance in general, has contributed to major accidents (see timeline). It is obvious that instrument maintenance performance is one of the keys to a profitable and safe plant, yet there seems to be a constant drive to reduce maintenance cost, often without apparent regard to the system-level effect on safety and reliability or to the long-term effect on lifecycle costs. How do we ensure that maintenance activities yield the desired effect on the plant’s bottom line while still being safe and environmentally conscious?
The simple answer is that the maintenance organization must have the goal of performing at a high level written into its vision and actions in order to ensure that both the short-term and long-term business and safety goals are met or exceeded. This requires a holistic view of maintenance activities and consideration of lifecycle costs and benefits, with a balance between maintenance costs and performance required to have a safe and reliable plant. Short-term goals for this quarter, prioritizing making the shareholder happy, or going the cheap route are counterproductive to this goal. Asking maintenance management and supervisors to reduce cost without having equivalent maintenance performance goals is also counterproductive.
The main purpose of a maintenance organization is to ensure that the physical equipment or assets that the organization has responsibility for are maintained to a standard that meets the company’s business objectives, while ensuring the safety and environmental integrity of the facility. Maintenance excellence is part of operational excellence, which is an ongoing journey that is focused on minimizing and managing downside risks while maximizing operational performance and safety. Maintenance excellence focuses on maintenance discipline, which is centered around carrying out each maintenance task the right way, every time and in a timely manner. The maintenance organization’s practices and procedures should reflect these goals, and the organization should strive to consistently adhere to the company’s maintenance practices and procedures regardless of external pressures. The maintenance organization should not, however, be too tightly constrained that the organization is not resilient or flexible in the face of new or unique maintenance challenges.
A minimum performance organization description requires the introduction of the minimum required maintenance level (MRML), which is illustrated as an orange rectangle in the figure on the previous page. There is an MRML required in a process plant to assure a plant’s efficiency, production rate, reliability, and safety If you fall below this minimum, there is an increased risk of a negative outcome to production and safety. The effect is not typically immediate but is rather cumulative; as time goes on, the risk of seriously affecting the plant’s operation and safety increases (and risk increase may not be linear).
An organization that is at the edge of the orange rectangle is a minimum performance organization. If the organization falls below the MRML, there is an increased risk of a negative operational or safety outcome for the plant. Exceeding the high-performance organization line results in diminishing returns for increased maintenance performance.
There is also a negative effect that can occur at all levels of performance that is called “drift in to failure” in which it is easy to miss little things drifting toward failure. The closer you are to the MRML, the more likely you are to stray to below the MRML and increase your risk of a negative outcome if you drift that way.
Also shown in the figure is a current performance line, which is shown to illustrate that an organization has a management decision to make as to what type of maintenance organization the company wants to have in the future. It is also possible that a local organization can approach a local high-performance organization through personal leadership and team effort in the organization, if adequate resources are available, regardless of higher-level management decisions. Unfortunately, sustainability of such organization over the long term is difficult due to changes in personnel over time.
The MRML applies to instrument and controls maintenance as well as equipment maintenance and mechanical integrity. The MRML is a function of many diverse elements, which include but are not limited to:
- Plant design (process properties, physical design and installation of equipment, instrumentation, and controls, materials of construction, etc.)
- Operational and maintenance discipline
- Operational rate vs. plant nameplate (e.g., how hard is the plant running)
- Past history of operations, maintenance, and engineering of the plant (e.g., what abuse has the plant suffered)
- Maintenance of mechanical integrity
- Current and past level of performance of the maintenance department.
When you approach or fall below the minimum MRML, there is a tendency to get into a firefighting mode and forget that your main goal is to ensure the overall availability, quality, operability, and reliability of the plant instrument and control systems. Many “little” negative actions will start to happen and will add up to bigger issues as time goes by (i.e., little alligators grow up into big alligators). Some of these negative actions are making poor or incomplete repairs, incompletely or incorrectly documenting, skipping procedure steps, failing to complete tests before placing instruments back into service, delayed testing of SIS systems and critical alarms, and letting bad actors fester. These actions can lead to the decay of instrument system functionality, increased safety risk, and reduced production rates and quality.
People are one of the few renewable maintenance resources, and they are the resilience bulwark against unplanned or unique problems that occur during plant operations. One of the big effects on the performance of human resources is morale. It is a simple fact that if you have good people with adequate resources, some ownership, and not too many management roadblocks, your organization will do well. Good and fair treatment, recognition of merit, and ownership in the maintenance activities are some of the things that affect morale.
Safety and instrument systems (SIS) and safety critical instrument systems
To achieve and sustain a high-performance maintenance organization, the organization must be one of continuous, real improvement. Once an instrument asset is turned over to operations, it becomes the responsibility of the instrument maintenance organization to maintain it and improve it if its reliability is lacking. Continuous improvement also includes improving maintenance practices and procedures to ensure quality and safe repairs.
The performance of the instrument maintenance organization is a function of leadership, experience, competency, capabilities, resources, and work quality to achieve a reliable, efficient, and safe plant. These must be optimized around the installed base of instrumentation and operating conditions. Instrument and controls maintenance have several differences than regular maintenance. Some of these are the quantity of devices involved, wide technology range (mechanical, pneumatic, electrical, electronics, and digital technologies), rapidly evolving technology, varying process interactions, etc. This range of technology requires educated, trained, and well-motivated technicians with a wide range of experience and capabilities. Asset management systems (AMS) can also help manage the technology and ICS devices.
Instrument maintenance has several critical functions in maintaining the safety reliability of safety instrument systems, safety critical alarms, and instrumented protective systems. These functions involve repairs that maintain the safety integrity of the safety systems and proof-testing the SIS, safety critical alarms, and instrumented protective systems at their designated test interval.
Failure to proof-test the SIS at the test interval in the SRS affects the calculated safety reliability negatively and may cause the SIS to not meet the specified safety integrity level (SIL). Failure to test safety critical instruments systems on time increases the risk that a serious hazardous incident may occur without all the protections operating correctly.
Also, failure to maintain the instrument safety systems in a plant can directly affect the plant’s safety by removing protection layers against hazards. The instrument department is encouraged to participate in HAZOPs (yes, they are long and sometimes boring) and LOPAs, because maintenance can contribute information about the instrument systems, past history of instrumentation systems, and the instrumentation that might be used as independent protection layers (IPLs).
Technicians are required by OSHA 1910-119 PSM to be trained on the safety systems they support and should understand the consequence of failing to maintain the safety systems properly.
To ensure that the maintenance organization contributes to the reliable operation and safety of a plant as well as to its production rate and quality, the maintenance organization must perform at a high level. Instrument maintenance in particular can contribute to improved operability, reliable operations, and operator confidence. The contribution can be significant, particularly in the safety arena. Where will your company land on the performance scale, and where will it be down the pike from now?