Get your reliability game running: Know where you are and where you're going

This is part 1 of the Plant Services' August cover story.

My company, Lucite International, is a global player in acrylics manufacturing, and we pride ourselves on always setting our standards high. However, in 2015, our reliability program was not meeting our high organizational standards. Over the years, Lucite invested a good deal of money in smart control valves, monitoring software, and other new technologies, but these improvements were never fully integrated into a comprehensive reliability program.

Our reliability team operated with two main strategies. The most-critical assets were designated as “stop to fix”: When vibration monitoring consultants detected a problem, maintenance would shut down the device and troubleshoot. Assets in this category included cooling-tower pumps, process fans, boiler feed-water pumps, large refrigeration compressors, gearboxes, and critical motors.

Less-critical assets were considered “run to fail” and would simply run with problems until a maintenance window presented itself or the machine shut down. Assets in this category included process pumps and noncritical motors.

In late 2015, the reliability team decided it was time for a change. The team spent two years overhauling the reliability program to improve runtime in tandem with improving productivity. The result has been a predictive monitoring program that lets maintenance see problems coming and keep assets running at peak efficiency while avoiding shutting down equipment.

But the real story isn’t where our organization has been; it’s where our reliability program is going. Throughout the design of Lucite’s new reliability program, the reliability team has kept four major strategies in mind to ensure that the new program will continue to grow and become a sustainable world-class reliability initiative.

Strategy 1: Know where you are and where you're going

Perhaps the hardest task for the Lucite reliability team was figuring out our direction. It was obvious that the plant’s reliability strategy wasn’t optimal, but the most difficult part of implementing a reliability program is knowing where to start. Because we couldn’t fix everything at once, the team started by taking stock of what we needed to change. This meant asking hard questions about standard Lucite operations, but the answers to those questions provided the necessary direction to take the first steps.

What are our failure windows?

When the reliability program implementation began, the consultants delivering vibration data operated with a window of one week into the future. Any problems that would last more than a week before failure simply weren’t reported to maintenance.

Having a window of only one week meant that any detected problems would require immediate, unplanned shutdown of equipment for repairs. Each issue detected was considered critical. After rethinking this approach, we created our first revised goal: Failures needed to be detected with a two-month lead time. However, to improve lead time, we would need to be sure our technicians had the necessary support.

What is the status of our primary maintenance tools?

The reliability team learned early on that a big part of the problem with maintenance routes was that maintenance teams didn’t have the tools they needed to perform their jobs properly. Over the years, assets and devices on maintenance routes had been added in an ad-hoc manner. Sometimes they were properly documented, but often they were simply installed in a hurry and never added to the reliability plan.

This would commonly happen with pumps along the vibration consultant’s maintenance route. While taking planned vibration readings, the technician often would notice that other pumps that were making noise and add a reading for that device into the route. These assets would be added to the vibration software in an unstructured manner, with nondescriptive names such as “Pump 1.” Over time, the number of these devices increased, resulting in a maintenance route that contained many assets that only the consulting technician could identify.

The data drift was making it all but impossible to reconcile data collected in the field with the asset registry in our custom Lotus Notes-based computerized maintenance management system (CMMS). For example, a pump labeled “Pump 1” in the vibration software would likely be labeled differently in the CMMS, inhibiting the ability of maintenance to identify the correct equipment for which to write work orders quickly and efficiently. Therefore, the reliability team took the vibration software database and updated it to match the CMMS.

Another roadblock to reliability was that settings in our vibration software still matched the hardware restrictions the plant had been operating under since the 1980s. With these tools, the maximum resolution for vibration readings was set at 800 lines, and the limited number of data points never allowed the reliability team to catch all vibration frequencies. Missing data points often matched up with bearing defects or other errors that analysts otherwise could have diagnosed.

Adding Emerson AMS 2140 machinery health analyzers allowed the Lucite reliability team to increase the resolution of readings, which has dramatically improved technicians’ troubleshooting capabilities. In the past, when we got reports of vibration problems, we would immediately shut down and check for loose bolts. Now, with more data available, analysts have the ability to check phase alignment and rule out misalignment before shutting down a machine.

How effective is our current maintenance?

Examining the existing reliability program showed that it was particularly ineffective. Auditing the program delivered key data showing critical reliability improvement areas:

Only 2% of work orders came from the vibration program, though the majority of maintenance work was resolving vibration-related issues.
The percentage of routes completed was low; standard routes weren’t being performed as assigned.
The percentage of equipment monitored was low; equipment on maintenance routes was regularly skipped.

Seeing that much of the predictive and preventive work assigned wasn’t being performed correctly was a wakeup call for the team. Lucite reached out to vibration equipment providers to obtain customized, one-on-one professional training for route technicians. With this training, new equipment, and a cleaned CMMS database, technicians could complete routes faster and more effectively, increasing completion percentages to approximately 80%.

Strategy 2: Document everything

Delivering a top-tier reliability program requires a lot of documentation. Throughout program development, a variety of stakeholders will need data, both to get the program off the ground and to ensure that it continues evolving into the future, regardless of changes in staff, budget, or technology.

Document what you need

The reliability team started by documenting what Lucite was already doing well. Two interns followed operators on their rounds, documenting what they were doing, what failure modes they were trying to address, and how well what they needed to do worked with the CMMS.

Documenting operations activities helped management see two important factors. First, they could see that operations was already performing the beginning stages of predictive maintenance; operators knew what was going wrong and what needed to change. Lucite simply needed a program to support those changes – to integrate the operator findings into corrective work orders.

Second, we identified 214 preventive maintenance activities that operators were already performing, though they hadn’t been identified as contributing to reliability. This helped solidify the idea that this work needed to be done but could be performed much more efficiently. The operators and maintenance personnel were doing great work, but there was a lot of overlap.

We built a custom view in our CMMS that allowed us to see all activities completed by operations and maintenance in one location.
The team used that data to identify maintenance activities that overlapped. For example, on one double-mechanical-seal oil pot, operations performed a level check twice a day, and maintenance once a day—far more checks than the asset needed. Reducing this inspection to an operations task freed maintenance to use that time to address other issues.

After identifying the need for improvements to the reliability program, the team began to deliver reported metrics from high-performing organizations that could be compared with Lucite’s operations. This data painted a clear picture for management of what was versus what could be. The team also showed real-world examples, including a primary reactor motor bearing failure that caused a plant shutdown. Technicians illustrated for management that with a robust vibration program, the shutdown could have been avoided entirely, saving tens of thousands of dollars.

About the Author: Victor Foster

Victor Foster, CMRP, is a reliability engineer at Lucite International. He has an extensive maintenance and reliability engineering background across many industries. Foster has extensive training in root-cause failure analysis, reliability strategy, best practices, and FMEA and holds a certificate from the Reliability and Maintainability Center at the University of Tennessee Knoxville. He is certified as a machinery lubrication technician (MLT I) and for ISO Vibration Analysis I. Contact him at [email protected].

Document what you fix

Once our program began, it became essential to maintain comprehensive documentation of the program itself to justify the investment and to enable stakeholders such as operations and maintenance to keep performing reliability tasks.

Today, as technicians complete any repairs in the plant, failure code details are added into work orders using drop-down selection boxes. If a technician adds a failed belt drive, the entry will also contain the correct failure mode (e.g., belt slip), cause (e.g., insufficient tension), and effect (e.g., belt noise). This allows us maintain comprehensive documentation of what our teams are finding and what caused the defect.

With trending data recorded, the reliability team can adjust maintenance strategies to reflect key trends. Coupled with recording maintenance activities, the CMMS has been overhauled to ensure that every piece of equipment accurately displays critical data such as oil type, points of vibration, type of grease, subclass of equipment, and failure modes. With all this data at our fingertips, the team is building an automated system to identify FMEAs and use those FMEAs to drive decisions for reliability-centered maintenance.

Click here to view strategies 3 and 4