Understanding asset failure

David Berger says implement maintenance practices to predict, prevent or react to a failure.

By David Berger, P.Eng., contributing editor

Your CMMS can be a powerful tool to optimize asset performance and reliability and manage the total cost of ownership. But like any software tool, a CMMS is only as good as the data being entered and the user’s skill in analyzing and acting on the information. Thus, properly managing your physical assets requires a basic understanding of how each asset and its components behave throughout the lifecycle. This ensures you collect the right data at the right time for the right purpose.

Understanding your assets means knowing what they and their components are used for, the nature of their failures and which maintenance policies should be put in place to predict, prevent or react to a failure in the most cost-effective manner.

Definition: Asset failure

Most people would agree that asset failure simply means a breakdown or the inability to use the asset. Although the definition of asset failure appears to be obvious, in my experience it can be misleading at times.

Suppose, for example, your windshield wiper blades are worn to the point that they can’t clear water from the glass. Would this be considered asset failure? One test as to whether the asset failed is whether the vehicle can be operated in a safe manner without environmental or financial consequences. If there are five days of sunshine ahead, has the asset really failed? Perhaps the asset has failed, but there are no consequences to the failure until the need to use the wipers arises.

Some might argue that only a component (wiper blade) failed, and not the asset (vehicle). If the wiper leaves a thin streak and there’s still some visibility, has it really failed? For example, what if the streak isn’t fully in your line of sight?

It’s clear that failure can be defined only after there’s an answer to the question: “What does this asset/component do?” When an asset or component no longer fulfills its function, it’s said to have experienced a functional failure. Let’s say the vehicle is to be used to transport people between buildings during rain storms. Any time an operator can’t fulfill this purpose in a cost-effective, safe and environmentally sound manner, failure, by definition, will have occurred. Of course, cost-effectiveness, safety and environmental soundness must still be defined as precisely as possible. In some cases, regulatory bodies provide guidelines as to what is considered safe, healthy or environmentally acceptable.

Failure analysis

By analyzing what can fail and why, you can ultimately get more out of assets throughout their lifecycles, while minimizing the total cost of ownership. In turn, this fosters a cost-effective maintenance program that prevents failure, predicts failure through condition monitoring or simply allows running to failure if consequences are minimal.

Failure modes and effects analysis (FMEA): This technique involves determining the different ways an asset or component might fail (failure modes), and what the consequences might be (failure effects). It also determines the probability of each failure mode occurring, as well as the potential severity of consequences. The key objective of FMEA is to make changes to the product, process, environmental conditions or asset itself to reduce the probability and severity of potentially high-impact failures.

Although the definition of asset failure appears to be obvious, in my experience it can be misleading at times.

Historical data: Information collected from the CMMS for like assets and components identifies costly recurring problems. Techniques such as Pareto analysis can determine which problem, cause and action codes have the highest frequency. Historical data also can determine if correlations exist between frequent, high-impact problem or cause codes and, say, time of day/year, weather conditions, operator on duty, technician who last worked on the asset, OEM or brand of parts used.

Experimentation: Another way to analyze failure is to run tests on the shop floor, in the field or in a lab. Experimentation provides an opportunity to test what-if scenarios for determining the root cause of a problem, understanding the severity of failure, ascertaining the conditions that predict various failure modes and establishing the most cost-effective approach to minimizing the effect of failure. In some cases, experimentation is done in partnership with the OEM or a supplier of relevant specialized services.

Root cause analysis (RCA): Rather than simply addressing the symptoms of a high-impact problem repeatedly, RCA provides formalized tools and techniques for determining the root cause. Sometimes this involves an iterative trial-and-error approach, and in other cases, detailed data can be collected and analyzed. Operators and technicians typically are asked to attach pictures and record more detailed information on CMMS work orders to permit a study of the conditions before and after failure. This provides more clues in exploring the root cause.

Failure patterns

Before the 1960s, it was commonly believed that failure probability increases over time for most assets and components, this following a relatively stable period of random failures. This failure pattern is depicted in Figure 1 as “Type A,” showing likelihood of failure versus time. Extensive studies by the airline and military industries found, surprisingly, that age-related failures accounted for about 20% of all failures. This included types A, B and C failure patterns. Instead, the more prominent failure patterns for assets and components were found to be random in nature and accounted for roughly 80% of failures. These include types D, E and F. All failure pattern types can be summarized as follows:

Figure 1. These plots of probability-versus-time represent common failure mode types.
Figure 1. These plots of probability-versus-time represent common failure mode types.

Type A: As an asset or component’s age approaches its expected life, and after an initial period of random failure, there’s a rapid increase in the likelihood of failure.

Type B: Commonly known as the “bathtub curve,” this failure pattern is particularly relevant to electronic equipment. There’s an initial period during which there’s a higher likelihood of failure, but this gradually decreases and the curve then follows the Type A pattern until the asset or component’s end of life.

Type C: This pattern shows a steady increase in failure likelihood as the asset or component ages. This pattern might be caused by constant fatiguing.

Type D: Other than an initial break-in period during which the probability of failure is relatively low, this failure pattern shows an equal likelihood of failure at any point in the asset or component’s life.

Type E: The Type E failure pattern has an equal likelihood of failure, regardless of the age of the asset or component.

Type F: As with the initial period of the bathtub curve, the Type F failure pattern starts with a relatively high likelihood of failure. After the initial period, this type then mimics the two other random patterns. Type F failure pattern is often tagged as infant mortality.

Email Contributing Editor David Berger, P.Eng., partner, Western Management Consultants, at david@wmc.on.ca.

Show Comments
Hide Comments

Join the discussion

We welcome your thoughtful comments.
All comments will display your user name.

Want to participate in the discussion?

Register for free

Log in for complete access.

Comments

No one has commented on this page yet.

RSS feed for comments on this page | RSS feed for all comments