Tie condition monitoring into RCM and RCA

Use information to assess machine health.

By Bill Hillman, CMRP, Ludeca

In brief:

  • Where does CM come into play in the RCM process?
  • RCM draws clear and accurate boundaries around a physical asset.
  • CM is not only easy to incorporate into both RCM and RCA but is essential in achieving RCM and RCA success.

A successful reliability-centered maintenance (RCM) program or root cause analysis (RCA) is difficult to imagine without condition monitoring (CM). RCM is a process that has been around for a number of years and has proven successful because of the sheer logic of the process.

Just what is RCM? We will forgo any formal definitions and answer the question by looking at how RCM works. Succinctly, RCM draws clear and accurate boundaries around a physical asset. All things contained within these boundaries are then subjected to the RCM analysis. All of the functions of that asset are then identified. In other words, what does the asset do? Then all of the ways that those functions can fail are identified and analyzed. Once this is done, tasks are developed to prevent or minimize the consequences of the failures that are likely to occur and would have a negative consequence. Simply stated, an analysis is performed and then something is done (tasks) to keep the asset functioning to a required level. It is difficult to envision a process more logical than RCM for maintaining physical assets (machinery).

Features of RCM

  • Identify and preserve system function.
  • Identify failure modes that could produce functional failures.
  • Prioritize the importance of failure modes (risk assessment.)
  • Select effective tasks to prevent high priority failure modes.

Steps in performing RCM

  • Step 1: System Selection and Information Collection
  • Step 2: System Boundary Definition
  • Step 3: System Description and Functional Block Diagram
  • Step 4: Identifying System Functions and Functional Failures
  • Step 5: Failure Mode and Effect Analysis
  • Step 6: Logic Decision Tree Analysis
  • Step 7: Task Selection

Results of task analysis

  • perform a task or a combination of tasks
  • no maintenance or run to failure (RTF)
  • take some other action such as redesign or modify the process.

Reasons for selecting maintenance tasks

  • reduce or eliminate the consequence of failure
  • reduce the frequency of failure
  • detect onset of a failure
  • discover a hidden failure
  • do nothing, because of valid limitations.

Where does CM come into play in the RCM process? When all the RCM analyzing is complete, there comes a time in which something needs to be done. That something is the tasks that are selected for the reasons stated above. There is a preferred order by which tasks are selected.  To prevent or minimize the consequence of failure the first type of task considered is a condition directed or predictive task (CM).  Why would condition directed or predictive tasks (CM) be considered first? The answer is simply because CM tasks are usually the easiest to preform and, more importantly, least invasive. A large portion of equipment failures are self-induced. In “RCM — Gateway to World Class Maintenance,” Anthony M. Smith and Glenn R. Hinchcliffe state that as many as 50%t of machine failures are due to human error in maintenance work. Some studies put this number as high as 70%. In other words, we break things in our attempts to make them better.

“Equipment failure has played a major part in some of the worst accidents and environmental incidents in industrial history,” said RCM guru John Moubray. “As a result the processes by which these failures occur and what must be done to manage them are rapidly becoming very high priorities indeed. It becomes steadily more apparent just how many of these failures are caused by the very activities which are supposed to prevent them.” One of the greatest values of CM is that it helps us work only on the things truly in need of work. CM is about taking measurements and is much less invasive than using wrenches and can be done while the machines are running, reducing unnecessary downtime. The reason it is so easy to tie condition monitoring into RCM is because condition monitoring is already an integral part of RCM, as is evident in task selection requirements. RCM also provides a method for logically applying CM tools. Because failure modes are targeted, CM is not applied helter-skelter, but rather for specific purposes.  Refer to the four categories below to learn more about task selection order.
RCM uses four categories of tasks to address failure modes:

Proactive

     1.  condition-directed or predictive
     2.  time-directed (scheduled restoration and scheduled discard), also known as preventive maintenance

Reactive

     3.  failure finding (functional checks)
     4.  redesign and run to fail.

RCA is any structured approach to identifying the factors that resulted in the harmful outcomes (consequences) of one or more past events in order to identify which behaviors, actions, inactions, or conditions need to be changed to prevent recurrence of similar harmful outcomes. There are many methods for doing a RCA, both informal and formal. The best methods have well-defined processes leaving very little to chance. Theoretically, we could take every root cause back to the beginning of the big bang. However, this would accomplish nothing even if it were possible. A root cause is rarely an initiating cause of a causal chain which leads to an outcome or effect of interest. Commonly, root cause is misused to describe the depth in the causal chain where an intervention could reasonably be implemented to change performance and prevent an undesirable outcome. In other words, we need only to progress back to an event where something can be identified that allows us to control or prevent the failure. How do you know when the root cause is found?  A root cause has three identifying characteristics.

  1. It is clearly a (or the) major cause of the symptoms.
  2. It has no worthwhile deeper cause. This allows you to stop asking “why” at some appropriate point in root cause analysis.
  3. It can be resolved. Sometimes it is useful to emphasize unchangeable root causes in your model for greater understanding and to avoid trying to resolve them without realizing it. These have only the first two characteristics.

Basically RCA is about the gathering of data or information. We use this information in deciding how to prevent the failure or minimize the consequence of the failure. CM is an excellent method for collecting information. Let’s use bearing failures to illustrate. Without a good CM process bearing failures will usually progress to the point where catastrophic damage occurs. Not only is the bearing damaged, but there is the possibility that the rotor supported in the bearings will be damaged as well.

Bill Hillman, CMRP is a technical contributor for Ludeca. He is an infrared thermographer certified by the Society of Tribologist and Lubrication Engineers, a trained reliability-centered-maintenance facilitator, and a past chairman of the International Council for Machinery Lubrication (www.lubecouncil.org). Contact him at (903) 407-9488 or billcmrp@yahoo.com.

When this happens it can be difficult to view the damage and determine the cause of the failure. Excessive heat may have plasticized the bearing, resulting in component distortion to the point that the bearing is now a blob of grey metal and slag. In other words, the crime scene has been contaminated to the point that the evidence is lost. If an RCA is initiated in this scenario, finding the root cause may prove to be difficult because of the lost evidence.  Had the bearing been monitored with ongoing vibration analysis, the impending failure could have been detected at its onset and the bearing taken out of service before any collateral damage occurred.  Let’s suppose an RCA is now initiated in order to determine why the bearing failed. We begin our RCA process by asking “why” questions or initiating whatever RCA process we have selected. As previously stated, RCA is mainly about gathering information, and in this scenario we have an intact bearing containing evidence instead of the grey blob of metal and slag found with destroyed evidence in the previous scenario.

The bearing is inspected and electrical fluting is visible on the outer race.  Why is there electrical fluting? The bearing is a motor bearing, and the motor is grounded to the building. A building ground to the ground grid is found to be open. Why is the ground not connected? No one remembers why or when it was disconnected. Apparently the ground has not been corrected in years. The plant personnel knew the ground was disconnected but did not think its function was important. Why did personnel not think the ground was important? They did not understand the function of the ground and had never been trained in its importance. Root cause: plant personnel not trained on proper equipment grounding. Solution: train plant personnel and update RCM along with standard maintenance procedures. And, of course, reconnect the disconnected ground circuit.

The scenario discussed above shows how valuable CM can be when conducting RCA. In fact, when equipment failures are involved, CM can provide information for RCA more often than not.

In summary, CM is not only easy to incorporate into both RCM and RCA but is essential in achieving RCM and RCA success.