Learn the basics of failure analysis techniques and risk mitigation

As CMMS vendors search for ways to differentiate their products and services in light of increased competition, many have enhanced their offerings to include a variety of powerful failure analysis techniques and risk mitigation strategies. Asset Management Expert David Berger illustrates some of the more basic techniques and simple strategies to mitigate risk and exploit improvement opportunities.

By David Berger, P.Eng.

As CMMS vendors search for ways to differentiate their products and services in light of increased competition, many have enhanced their offerings to include a variety of powerful failure analysis techniques and risk mitigation strategies. Of course, this trend is a response to increased demand from maintenance professionals who want a range of tools to improve asset reliability, availability, performance and quality of output, while decreasing the asset lifecycle cost.

Although some techniques, such as reliability-centered maintenance (RCM), have a reputation for being quite complex and time-consuming, others, such as simple Pareto analysis, can support significant improvements. This month, I’ll show you some of the more basic techniques and simple strategies to mitigate risk and exploit improvement opportunities. Next month, you’ll read about more advanced techniques such as RCM, failure mode and effects analysis (FMEA) and root cause analysis (RCA) that build on the base principles.

For many years now, almost every CMMS package has been offering simple, powerful analytical tools for maintenance. Some look backward and use historical data to determine improvement opportunities. Others are more forward-looking or predictive. These tools, and what to do about what they reveal, are valuable.

Looking back


Even the simplest CMMS provides users with at least three coded fields to be used when identifying, troubleshooting and completing maintenance work. While vendors might argue about what to call these codes, you can be generic and refer to them as problem, cause and action codes.

The problem or symptom code, first identified by whoever raises the work request, might say something like “dripping water.” When a maintenance technician is dispatched to the scene to make a diagnosis, a cause or failure code, such as “leaky valve,” can be entered on the work order. Finally, whatever action was taken to correct the problem, such as “replaced valve,” is recorded as an action or remedy code.

Once you accumulate a significant historical record, use the data as a diagnostic tool for troubleshooting. In the example above, the next technician dispatched to investigate a “dripping water” problem should know the possible causes after examining the equipment history database. The technician might conclude that if the valve is from a given manufacturer, 75% of the time it was leaky and 90% of those times it had to be replaced.

Looking forward


Also, a number of CMMS-based tools either can help prevent or predict failure. One of the most basic of these is preventive maintenance functionality, the use-based maintenance triggered at regular intervals of time or meter readings. An example is an engine oil change every six months or 7,500 miles.

CMMS packages also assist with predictive technologies such as condition-based maintenance (CBM) that monitor the health of an asset or its components. Condition readings (pressure, voltage, wear, etc.) that exhibit a trend with a certain pattern or pass through an upper or lower control limit initiate actions. It might trigger a control loop to shut a valve, activate an alarm, send an e-mail or generate a work order. This ensures that failure has been anticipated and dealt with “just in time.”

Risk mitigation options To determine the best maintenance approach using the strengths of the methodologies previously described, consider three basic questions you should ask for each asset or component, starting with your critical assets.

  • What does this asset or component do? Determine the function of a given piece of hardware.
  • What happens if it fails? Determine the potential effect and risk of different failure modes in terms of health, safety, environment, regulatory compliance, cost, revenue and so on. Consequences range from negligible to catastrophic.
  • What’s the most cost-effective approach to maintenance? Given your responses to the two previous questions, determine which approach to maintenance mitigates the risk in the most economic manner.

In some cases, a run-to-failure approach is sufficient because the consequences of failure are negligible, such as when a light bulb burns out. In other cases, preventive maintenance is the most cost-effective approach, such as regular inspections of fire extinguishers. Finally, predictive maintenance might be the optimum approach if the effect of a failure outweighs the added cost of monitoring a given asset or component. An example of the latter is monitoring vibration on critical rotating equipment. Usually, a mix of approaches is relevant for a given piece of equipment, depending on the types and consequences of failures possible.

Improvement options

Even the most basic CMMS implementation can do better than simply analyzing equipment history or establishing a cost-effective preventive and predictive maintenance program. Consider still other variables in identifying your improvement options.

  • Equipment design: Work with the OEM, an alternate vendor or your own engineering staff to change the design of the equipment, components or parts to reduce the total asset life-cycle cost.
  • Equipment manufacturing: Work with the OEM to identify and eliminate equipment defects, those material variances from design specifications that lead to warranty problems.
  • Equipment installation and setup: Work with internal or contract resources to install equipment properly, perhaps using reinforced floors or precision equipment balancing.
  • Equipment operation: Improve equipment operation by training operators to adjust the equipment properly under different conditions.
  • Equipment maintenance: Improve equipment maintenance by training technicians to “fix it right the first time” or use CBM.
  • Process design: Adjust the production process to reduce maintenance costs, such as by using gravity feeds instead of conveyors.
  • Product design: Redesign the product to reduce maintenance costs, such as by using less abrasive surface material to reduce equipment wear.
  • Environmental conditions: Change the standard operating conditions, such as keeping the room at a constant low temperature to reduce equipment overheating.

Over time you can track and analyze problem, cause and action codes to reveal important information about assets and these eight variables. For example, suppose there’s a recurring problem with leaky valves. Use the CMMS to determine the problem frequency and the cost of fixing it. A more detailed analysis might correlate the problem with a specific valve manufacturer or brand. This allows the engineering and purchasing departments to specify alternative parts or source equivalent parts elsewhere.

Pareto analysis is another easy-to-use but critical tool for analyzing historical data. Use the problem, cause and action codes for each asset or asset category to determine the three most costly plant problems during the past year, or the most common root cause of downtime on a given compressor during its lifetime. This allows maintenance management to focus on the most costly problems, causes and actions and determine the most appropriate improvement opportunities in the eight areas described above.

E-mail Contributing Editor David Berger, P.Eng., at david@wmc.on.ca

Show Comments
Hide Comments

Join the discussion

We welcome your thoughtful comments.
All comments will display your user name.

Want to participate in the discussion?

Register for free

Log in for complete access.

Comments

No one has commented on this page yet.

RSS feed for comments on this page | RSS feed for all comments