root-cause-failure-analysis

Why RCFA needs root cause case management to succeed

Nov. 11, 2020
Equipment failure is not inevitable when it can be prevented with systematic analysis.

Too many organizations drop the ball when it comes to root cause failure analysis (RCFA). Well-intended efforts to find the origins of equipment failures often lose momentum once the corrective actions are identified and delegated.

What is lacking is the systematic ability to track who is responsible and accountable for each corrective action, monitor progress as the actions are carried out, validate whether the corrections were effective, and only then close out the case as fully completed. Diagnosing root causes without following through means that the issues are likely to recur. Structured root cause case management provides the essential visibility and follow-through capability.

Beside significant cost and risk reduction benefits, being able to compile and share RCFA knowledge from a central location is one of the driving forces for using root cause case management. Documenting root causes and corrective actions helps newer engineers who encounter problems to build off the tribal knowledge of highly experienced engineers and technicians. It also helps engineers using similar equipment in other locations to leverage the learnings of their peers.

This paper examines why RCFA case management is needed in addition to RCFA, and how companies such as a major U.S. metals manufacturer and a large U.S. pulp and paper mill are improving equipment reliability with an improved RCFA case management approach.

Challenges to realizing RCFA’s known value


Reliability engineers rely on RCFA to solve critical – and sometimes routine – problems with equipment, processes, and systems. Identifying the underlying origins of a problem, making corrections, validating their effectiveness, and applying the corrections to other similar machines, helps to prevent the problem from ever occurring again.

However, it is surprisingly common for reliability organizations to do the hard work of RCFA and issue emails delegating corrective actions, but then forget about it until the problem arises again. Organizations that successfully find root causes but fail to follow through in applying appropriate corrective actions will routinely come up short in their RCFA goals. The following are commonly missed opportunities:

Commit to performing RCFA. Many plants take an informal approach to conducting RCFAs. They will wait for something major and catastrophic to happen before the prospect is even considered.

Identify RCFA triggers. Few plants establish a formalized set of triggers for conducting RCFA. Those that do typically prioritize them based on failure type, severity, repetition, equipment costs, downtime costs, or safety risks. When trigger types are not cooperatively planned, valuable opportunities for RCFA may be missed.

Establish a plan for case ownership. Plants that assign responsibility for failures on the fly do not always make the best choices. Advance planning for how to designate RCFA case ownership and ensure complete follow-through involves understanding the resources, authority, capabilities, and assigned areas of each personnel group.

Centralize case management. RCFA aids such as 5 Whys, Cause Mapping, and TapRooT help with analysis, but they lack comprehensive case tracking and follow-through capabilities. Without a centralized mechanism to enter, communicate, and track the progress of RCFA cases from start to finish, they are easily lost and forgotten.

Manage case progress. Visibility into responsibilities and due dates are lost when cases are tracked in personal spreadsheets or file folders, and email is the communication method of choice. Accountability and oversight are sacrificed when there is no centralized, searchable case management tool that systematically produces notifications of responsibility.

Verify corrective actions are completed. Great analysis is worthless if the corrective actions are never applied, whether it’s changing the purchase order for a deficient component, changing the preventive maintenance (PM) practices on a particular machine train, or otherwise. This lack of follow-through occurs with surprising frequency when unstructured RCFA case management approaches are used.

Verify corrective actions are effective. Applied corrections and adjustments must be validated at some later point to ensure they have the intended effects. Without timely reminders to ensure completion of this step, the problems may reoccur, or unintended consequences may arise.

Apply effective corrective actions to similar equipment. Validated corrective actions may be beneficial for similar equipment at the same site or other plant locations. Identifying those assets and applying the corrections proactively is rarely done, but it is necessary to optimize the return on the RCFA investment.

RCFA case management success stories


The metals producer and pulp and paper mill are among the organizations that have upgraded to a structured RCFA case management approach. Both are very large plants with thousands of components, and in both cases, the initiative was driven by managers who strongly support eliminating root causes of failure and increasing RCFA accountability and oversight.

In their efforts to transition from a reactive culture to one that learns from breakdowns and takes action to prevent them from happening again, they both chose a root cause case management solution offered as part of a comprehensive reliability information management system from 24/7 Systems.

Metals Manufacturing Plant. A U.S. metals plant expanded its root cause case management capabilities by formally introducing the RCFA program in October 2019. While it does not have a reliability team, the predictive maintenance (PdM) group collaborates with maintenance engineers who are responsible for thousands of components. Collectively, they have opened more than 50 RCFA cases since the program began.

To initiate its RCFA program using the new case management tool, the plant maintenance engineers were trained on how to use the software so that their follow-up actions to major breakdowns could be more efficiently tracked. When a failure occurs, the maintenance engineer responsible for the affected equipment will enter the case and take ownership of preventing recurrences.

Prior to rolling out the RCFA case management software, follow-up on equipment breakdowns by plant engineers was not formally documented. Corrections to preventive maintenance (PM) work orders remain their primary focus; for example, if a failure is attributed to incorrect greasing frequency, the PM work order will be adjusted accordingly.

The PdM group would separately evaluate major breakdowns from the standpoint of determining if sufficient prediction was provided to engineers on the failure. Both the line and PdM engineers’ breakdown analyses were captured in personal files that used to be hidden in digital folders. Now, the breakdown follow-up documents are attached to the corresponding root cause case.

From a management perspective, the goals of visibility and accountability are already being achieved. Some of the metals producer’s RCFA cases have had action items completed, though none have made it all the way through the validation stage yet, where the action items are verified to have successfully prevented the breakdown from occurring again, because not enough time has passed yet collectively, since the program began in the Fall of 2019.

Pulp and Paper Mill. The U.S. pulp and paper mill has more experience with RCFA, having conducted the process since 2012. It has about 35 open root cause cases at any given time and closes between 30 and 60 cases per year. Before switching to the new case management solution in February 2018, the mill used an Access database to enter RCFA cases and actions but lacked tracking and follow-up capabilities.

The mill’s reliability manager and reliability engineer strongly believe in the importance of tracking corrective actions to completion. That was the primary driver for implementing the case management tool. With the Access tool, the risk of repeat failures of equipment, incorrect parts set up and/or installed, and lack of precision maintenance was always present.

For example, being a large, old mill, its bills of materials (BOMs) are not always correct. There were many instances where the Access database showed BOM updates were needed but they never got done. In one case, an incorrect pump was specified for an area with caustic chemicals. Mill personnel spent four to five hours putting the pump in, realizing the error and intending to update the BOM for next time. The new pump failed three days later, before the BOM update was made, so another four to five hours was spent installing another incorrect pump. By the third time around, the BOM correction had been made.

The mill is also using the software’s vibration analysis capabilities. By providing access to reliability data in aggregate, failures are being solved that would have continued to occur previously because the problems did not meet the individual triggers. For instance, RCFA of trending jackshaft failures revealed the manufacturer had started using inferior off-brand bearings with incorrect hardness.

“With the new reporting and ability to hold people accountable, we are saving hours upon hours of time. It gives us traction with the assigned actions because no one can pretend it’s the first time they’ve seen the assignment,” says the reliability engineer. “You can’t manage what you can’t track, and now we can track it so we can manage it. I can pull up the digital record on my phone in two seconds if someone asks me: ‘Didn’t this happen before?’ We couldn’t do that with the Access database.”

The new root cause case management and dashboard tool provides the mill with a control mechanism to ensure follow-through of corrective actions, adds the reliability engineer: “The solution follows a defined path but is flexible and easy to use. We are able to assign a due date, enter comments of what was done throughout RCFA, add files to the case, and close it out after completion and validation.”

Neither NAS nor the pulp and paper mill could have come this far without solid management support. Now that the companies have centralized their root cause case management, the full case backlog and all the details are systematically captured and managed, and responsible parties are automatically notified at case and corrective action entry and closure, focusing attention when and where it is needed. Additionally, reliability leaders can easily oversee the entire process from the dashboard and apply filters for selected decision-support data, such as all corrective actions for a case, or all corrective actions assigned to an individual.

Today, authorized users can see the RCFA process from end to end, including:

  • when a component failure that meets predefined parameters triggers a new root cause case
  • who is assigned to the case and when
  • the problem’s severity, downtime, faults, and repairs
  • whether root causes or corrective actions have been identified
  • individuals assigned to the corrective actions
  • due date and work order # for the corrective actions
  • progress on the actions and when they are checked off as completed, and whether any actions are suspended
  • who is assigned to validate implemented corrective actions
  • the due date and status of validation
  • when the case is closed.

RCFA case management best practices


Consolidating all RCFA activity on a single case management tool provides a better way to hold people accountable for finishing root cause investigations, coming up with corrective actions, implementing them, and verifying their effectiveness. Moreover, it causes plant personnel to think twice about failures and transition from a reactive to a proactive mindset. Failure is not inevitable if it can be prevented next time.

The ideal sequence of events is supported with comprehensive case management capabilities:

  1. Identify RCFA candidates. Components exceeding established limits or triggers, equipment breaking down and causing downtime in the line, or having an inline spare replace equipment with a major failure are examples of RCFA candidates. Having a centralized, searchable repository of all PdM findings, repair history, and RCFA history facilitates recognition of when triggers are met and which machines have experienced failure, because the data is at the user’s fingertips.
  2. Capture RCFA case information systematically. The case owner opening the case will specify the equipment, location, problem, severity, frequency, faults, component and production downtime, and more. There will be a problem summary, such as a certain motor tripped, and a detailed description, such as smoke was billowing out, the breaker reset, the line stopped multiple times, etc. A dashboard view will show the new case with indicators that no root causes or corrective actions have been identified yet.
  3. Assign/notify the responsible party to conduct the RCFA. Assigning roles and responsibilities ensures accountability. A reliability or maintenance leader will typically make the assignments, and email notification is automatic.
  4. Isolate and document the root causes. This portion of the process includes identifying the primary failures, failure progression, and root causes. If it is known a motor’s windings shorted, the root cause may be that there was water in the windings. It is also possible that multiple causes will be determined for a case. The dashboard view will show the root cause is identified but no corrective actions are determined at this point.
  5. Track RCFA progress. Managers or others overseeing RCFA can use the dashboard and filters to monitor how long cases are open, who is responsible for them, and follow up to keep the analysis process moving.
  6. Identify and enter recommended corrective actions. For each root cause, one or more corrective actions will be determined. To prevent water in the windings, the corrective action may be to order better seals for the motor, switch to a washdown duty motor, or build a barrier around the motor to avoid water impingement on the windings. Often, both physical and procedural actions will be required, which is why multiple corrective actions for a case are common.
  7. Assign/notify the responsible party to implement the corrective actions. As soon as a corrective action and its due date are assigned, the responsible party automatically receives email notification and a work order is created. Each action is checked off upon completion. The case owner is automatically notified when a corrective action is checked off as completed.
  8. Track completion of the corrective actions. Managers or others overseeing RCFA can use a dashboard and filters to monitor the progress of corrections and follow up where needed. It is also possible to suspend corrective actions if it turns out they are not feasible or necessary.
  9. Support or refute the effectiveness and close the case. Several months after checkoff, the responsible party is expected to validate the corrective actions and close the case. The standard operating procedures (SOP) should include a reminder to verify that all corrective actions were completed, and the new procedures are working as intended. Cases with effective resolutions can be closed. Cases requiring further work will be updated accordingly. The case owner is automatically notified when a case is closed.
  10. Track closure of RCFA cases. Managers or others overseeing RCFA can use a dashboard and filters to monitor and follow up on corrective action validation and case closure.
  11. Link case documents. Many documents are produced during RCFA investigation and correction, including shop reports, consultant reports, photographs, initial RCFA test documents, and more. Attaching them to the case record makes them available for future reference.
  12. Apply beneficial corrections to similar equipment. RCFA’s value is extended when effective corrections are applied to equivalent equipment. If a series of three pumps are all inline, side by side, a PM adjustment for one pump may be applied to all three. If that same pump series is installed across the plant or at another site and the operating conditions are similar, those PMs, too, may be adjusted accordingly.

Successful root cause elimination is a process, not just an analysis. This suite of capabilities helps reliability teams achieve RCFA goals that were previously out of reach, from improving accountability and oversight to significantly increasing reliability, uptime, and cost savings. Without a consolidated approach, the process may be shortchanged or fall apart, hindering the ability to prevent costly future failures.

Critical to RCFA success is strong management buy-in, communication, and a good support system to make the program run effectively. However, case management success requires a systematically incremental approach. Starting work from procedures and practices already in place and documenting and formalizing them in a case management tool will ensure step-by-step success and overall program effectiveness.

About the Author: Forrest Pardue

About the Author

Forrest Pardue

Forrest Pardue is president and founder of 24/7 Systems Inc. After earning a BSEE at North Carolina State and then an MBA, Forrest has worked in the field of vibration analysis and PdM for more than 40 years. As one of the founding members of Computational Systems, Inc. (CSI), he was actively involved in the technical and market development of modern condition monitoring technologies. Following Emerson Electric’s acquisition of CSI in 1998, Forrest co-founded 24/7 Systems.

Sponsored Recommendations

Arc Flash Prevention: What You Need to Know

March 28, 2024
Download to learn: how an arc flash forms and common causes, safety recommendations to help prevent arc flash exposure (including the use of lockout tagout and energy isolating...

Reduce engineering time by 50%

March 28, 2024
Learn how smart value chain applications are made possible by moving from manually-intensive CAD-based drafting packages to modern CAE software.

Filter Monitoring with Rittal's Blue e Air Conditioner

March 28, 2024
Steve Sullivan, Training Supervisor for Rittal North America, provides an overview of the filter monitoring capabilities of the Blue e line of industrial air conditioners.

Limitations of MERV Ratings for Dust Collector Filters

Feb. 23, 2024
It can be complicated and confusing to select the safest and most efficient dust collector filters for your facility. For the HVAC industry, MERV ratings are king. But MERV ratings...