How to develop an effective root cause failure analysis process

Adopt a reliability-centric organization structure to improve reliability, maintenance, and operations.

By Craig Cotter, PE, CMRP

1 of 3 < 1 | 2 | 3 View on one page

A reliability-centric organization makes reliability the focus of the maintenance and operations departments. The company must have a strong, independent reliability leader who not only looks at traditional reliability improvements, but also influences and leads the organization’s operations, maintenance, capital, and turnaround functions to improve overall corporate performance, with the motto: “Engineer it right, keep it running, and repair it right.” This is final installment of a multi-part series.

Click here to read Part I Click here to read Part II | Click here to read Part III

Developing a root cause failure analysis (RCFA) process and culture is the primary value the reliability department brings to the asset. Once the organization understands the value of driving out poorly performing equipment, incorrect behaviors, and outdated processes, the organization will improve in more areas than just equipment MTBR.

The first steps in designing an effective RCFA culture is to develop effective procedures for when and how to conduct these analyses and a process to identify when equipment is reaching the end of its useful life (Figure 1). Additionally, the reliability department must have a process to organize the asset into an effective defect-eliminating organization.

Figure 1. Preliminary work is required before starting an RCFA program.
Figure 1. Preliminary work is required before starting an RCFA program.

RCFAs must be performed on the equipment and processes that are adversely impacting the asset’s reliability. However, where do you begin? Do you perform them on all equipment failures? One method is to develop a set of definitions for your asset’s top 10 worst actors, other bad actors, and repetitive failures. For example, reliability can develop the following lists:

  • top 10 list of defective equipment — based on maintenance and lost opportunities
  • bad actors — equipment with three or more failures in a two-year period
  • repeat offenders — equipment with repeated failures in the past six months.

The number of failures and time frames in the latter two definitions can be modified to limit the lists to workable numbers. Without developing these target lists, the organization will not know where to focus its efforts.

From experience, the most effective approach is to go after the top 10 equipment items first, followed by the bad actors, and later the repeat offenders. One plant study indicated that the highest returns on investment are achieved by addressing the top 10 list.

The plant’s RCFA process must be aggressive and is essential for addressing the bad actors and repeat offenders. The chart shows the results of driving out the bad actors (Figure 2). In this chart, only one of the original bad actors remained on the list after 3.5 years and was slated for replacement during a turnaround.

Figure 2. Tracking bad actors over a period of three years, only one of the original bad actors remained and was slated for replacement during a turnaround.
Figure 2. Tracking bad actors over a period of three years, only one of the original bad actors remained and was slated for replacement during a turnaround.

These results show that by finding solutions to all of the bad actors and implementing those solutions, maintenance was not fixing the same equipment over and over again. However, the chart also shows that new bad actors came into the mix. Without solving the existing bad actor problems, the list would have increased, thus increasing the organization’s unnecessary, repetitive work.

One company set a goal for its reliability engineers to perform one to two RCFAs per month in their assigned units. This may sound rather low, but the engineers didn’t think so, and they struggled to meet that modest goal, along with all of their other goals and activities. However, a system was put in place to review all work notifications against the lists of top 10, bad actors, and repeat offenders, and RCFAs were assigned to all applicable notifications. RCFAs were not completed on all of them, but reliability personnel focused on the most important equipment. The MTBR improved over time and that enabled the engineers to begin work on other reliability initiatives.

Performing RCFAs is essential, but having a system to implement the RCFA recommendations is even more important. The reliability department cannot implement the recommendations alone; they must collaborate and cooperate with the operations, maintenance, and capital/turnaround departments. A cross-functional reliability improvement team (RIT) process is an excellent method for developing and solving top 10 reliability issues. An RIT process also solves other non-machinery reliability issues, integrates the operations, maintenance, and capital/turnaround departments into the process and gets their support and buy-in. This process should define the team structure, the team members’ roles and responsibilities, objectives, top 10 list workflow, budget requirements, key performance indicators, and reporting responsibilities. Of all the processes in the plant concerning reliability, this is the most important, as it will ensure the asset’s resources are continuously focused on the reliability issues and driving the recommendations from the RCFA to completion. Lessons learned from these teams should also be reviewed to determine if the findings could be applied in other parts of the company (Figure 3).

1 of 3 < 1 | 2 | 3 View on one page
Show Comments
Hide Comments

Join the discussion

We welcome your thoughtful comments.
All comments will display your user name.

Want to participate in the discussion?

Register for free

Log in for complete access.


No one has commented on this page yet.

RSS feed for comments on this page | RSS feed for all comments