Measuring risk versus reward in CMMS

In my view, the reason for the changes taking place in maintenance is enabling technology such as improved CMMS packages and automated condition monitoring. But more importantly, there’s a real demand from management for a maintenance strategy that balances the risk of a run-to-failure approach with the cost of a comprehensive preventive or predictive maintenance program. This is the maintenance management dilemma.

Achieving a more reliable asset base requires more PM inspections and PdM, which in turn, drives up costs. As well, more PM shutdowns for scheduled maintenance reduce asset availability and, therefore, decrease performance or production throughput. However, less preventive and predictive maintenance means more forced outages, greater repair costs, increased loss of production and diminished control. So, how can management achieve the optimal mix of approaches -- failure-based, use-based and condition-based maintenance?

A classic myth
Some companies are actively searching for an optimal solution to this maintenance management dilemma, but most feel they are still struggling with the basics. In fact, this leads to what I call the classic myth, which is thinking that the more PM or even PdM we do, the better off we’ll be. I think many people believe in the continuum shown in Figure 1. At one extreme is the worst case scenario, namely constant fire-fighting through a 100% run-to-failure maintenance strategy. On the other end of that spectrum is nirvana, represented by 100% preventive or planned maintenance.

There’s no question that it’s a positive step to move away from a policy of 100% failure-based maintenance for every asset. Doing so reduces the huge risks associated with loss of production output and the cost of repairing catastrophic failures. However, it may be equally good to back away from adopting a policy of 100% preventive maintenance across assets because of the excessive cost. Thus, the optimal solution lies somewhere between the extremes, and it may differ for each component, asset or system. Moreover, the optimal position along the continuum for a given asset depends on which variables or metrics are controlling.

Key metrics
In other words, optimization in terms of cost may yield very different results than optimizing for reliability or asset performance. In fact, the two options may have opposite implications. A simple example can be used to illustrate.

Suppose swapping out a $5 part in about four hours every two months, and overhauling the asset each year in about 10 hours will increase the reliability of a given asset to a MTBF of 10 years. The total cost of this use-based approach is $5 for materials, plus 4 hours x $50/hour = $200 labor to conduct the maintenance, plus a potential 4 hours x $2,000/hr = $8,000 of production loss if the line is at capacity, for a total of 6 x $8,205 annually or $49,230. The cost of an annual overhaul is, say, $500 for materials, plus 10 hours x $50/hour = $500 in labor, plus 10 hours x $2,000/hr = $20,000 of production loss, for a total annual cost of $21,000. If the asset life is 10 years, the total cost of this approach to maintaining the asset is 10 years x ($49,230 + $21,000) = approx $0.7 million.

Alternatively, suppose a run-to-failure approach means conducting an eight-hour repair of the entire asset every four months. The total cost of this run-to-failure approach is, say, $7,000 for materials for an extensive repair and parts replacement, plus 8 hours x $50/hour = $400 labor to conduct the repair, plus a potential 8 hours x $2,000/hr = $16,000 of production loss if the line is at capacity. If the asset life is 10 years, this approach runs the total cost of maintaining the asset to 30 x $23,400 or approx $0.7 million.

So which approach is optimal? If you are reliability-driven and it’s critical to maximize MTBF for safety reasons, then the obvious choice is to go with a use-based policy. If you want to maximize asset performance, then a failure-based approach is superior as it results in 24 hours of downtime per year versus 34 hours with use-based, or 10 more hours of production output per year (assuming production at capacity). Finally, if life-cycle cost was the key driver, then either approach will do as they have the same total cost of ownership.

Two other metrics that you should consider when comparing approaches are as follows:

Asset utilization (an asset might be available for production but not utilized).
Product quality (such as scrap rate or thermal efficiency).

Thus, in the example above, if the run-to-failure approach resulted in higher scrap levels over the life of the asset, and use-based maintenance could be planned for off-hours because the asset wasn’t producing at capacity, then the run-to-failure approach loses any advantage.

Although it may be easy to calculate the superior approach in this example, is it really optimal? Suppose you adjusted the time interval that triggers the use-based maintenance from every two months to every three months? What if you used a different trigger, such as a meter reading, or even multiple triggers? What if you considered condition-based maintenance? What’s the effect of changing the skill level of the people conducting the maintenance?

Policy and program options
Before exploring the notion of optimization in any more detail, let’s define our policy options for maintenance initiation. The list of perceived options is long and confusing, including:

Run to failure.
Reactive.
Corrective.
Elective.
Proactive.
Preventive.
Planned.
Predictive.
Scheduled.

Every company defines these terms differently and there’s rarely a standard definition within a company. Nevertheless, the terms can be replaced with the following three policy options for maintenance initiation:

Failure-based (initiated after a random failure occurs).
Use-based (initiated on the basis of time or a meter reading).
Condition-based (initiated on the basis of a current physical state or trend).

This simplified list avoids the confusion associated with terms such as planned maintenance, which could be condition-based or use-based. Even preventive maintenance is confusing as some would define it as only use-based maintenance, while others would lump in condition-based maintenance as well.

Optimization parameters
A true optimization model could determine not only the appropriate approach to initiating maintenance, but also the optimal mix within a given policy. For example, for a condition-based maintenance policy, one consideration is the current state of deterioration, which ranges from brand new to complete failure with various definable intermediate stages. Depending on the deterioration stage, a critical parameter to consider is the time interval between regular inspections of the asset’s condition that determine what, if any, maintenance is required. Another key variable is the type of maintenance needed for a given condition, such as a minor maintenance (replace a component part) or a major maintenance (complete overhaul).

Part two of this two-part series will appear next month.

E-mail Contributing Editor David Berger at [email protected]