Reliability ROI: How to implement a zero-sum maintenance strategy

So you’ve bought into the reliability-based maintenance concept lock, stock, and barrel and have developed a program that includes all of the key elements: planning and scheduling, predictive maintenance, and root-cause failure analysis. But that mountain of cash that all of the leading reliability pundits promised has yet to materialize. Worse, you’re spending more money than when you first started down the road to reliability.

Many organizations, disillusioned by the lack of progress toward their financial targets, end up cutting their fledgling reliability programs and go back to the reactive maintenance model. This ultimately keeps them from reaching best-in-class status in their marketplace. If you find your program in this situation, it might be because you made one crucial mistake: You didn’t follow the money.

Many reliability programs are what I’ll call technology-driven versus money-driven. Reliability leaders focus a great deal of their time and energy searching for the latest breakthrough technology and spend less time crunching numbers. We’ve all been guilty of looking for the “better-built mousetrap.” The problem with this approach is that no matter how technologically advanced the mousetrap is, you still have to place the trap in the right place to catch the mouse. And the mouse in our story is money.

The deck is stacked against the reliability leader, as an ever-growing number of better-built mousetraps are entering the marketplace. An ever-growing army of salespeople is working hard to convince you that their company has the answer to your problem. I’ve got a news flash for you: I have yet to meet an executive who cared one iota about the latest technology, but all of them seem to care a whole lot about money. So if you want to get the type of support that will make your program successful, show them the money.

Don’t get me wrong; there are certain program elements such as planning and scheduling and full utilization of your CMMS that are foundational to successful reliability programs. The payback for these program elements usually lags the initial investment period by anywhere from several months to years. And you need to invest in these foundational elements to be successful. But many other program elements can give you immediate returns on your investment, bringing much-needed cash (i.e., credibility) to your effort. And you can find these hidden gems by following the money and using what I call a zero-sum maintenance strategy.

The principle of employing a zero-sum maintenance strategy is quite simple. It relies on the reliability leader’s ability to recognize that time and money are limited resources. The most-successful leaders learn to focus on the opportunities that present the greatest potential return on their investment. Understanding the cost in both time and money, calculating the potential return on that investment, and then pursuing only the program elements that promise a reasonably high return will ensure that your efforts generate rather than consume cash.

How to get there? The following principles are a starting point.

Learn to do the math.
All maintenance tasks should address a specific failure mode. Use the least-expensive and most-effective task to do the job.
Start with centrifugal pumps.
Use life-cycle cost (LCC) analysis to make decisions.
Look at your PMs.
Don’t forget to explore initiatives that have little or no maintenance payback.

Learn to do the math

First and foremost, teach yourself to always think like an accountant instead of a maintenance professional. This is difficult for most of us; we are hard-wired to solve problems, not crunch numbers. But how many of your decisions actually generate a hard dollar payback? Take your vibration program. Most everyone would agree that a vibration analysis program is essential to any reliability excellence strategy. But ask yourself the following: How much money does your vibration program save? What does it cost to operate? How much equipment should you cover with this technology to maximize the return on investment (ROI)?

I have worked in the corn-milling industry my whole career. A well-known predictive maintenance company developed a benchmark study for the chemical industry that compared profitability with the percentage of equipment covered by a number of the most-widely used predictive maintenance technologies. Early in my career, the company shared with me its study, and insinuated that the chemical industry was a good benchmark for the corn wet-milling industry. The service provider recommended that best-practice companies employ first-quartile equipment coverage levels. Figure 1 shows the percentage of equipment coverage levels by quartile for vibration analysis.

I’m not here to suggest the company wasn’t well-intentioned, but its recommended coverage level did not produce the best ROI for our facility. I can’t share our actual coverage percentage, but I can tell you it is significantly below first-quartile levels.

We developed our routes using the following criteria:

Only include rotating equipment whose failure modes can be readily detected with vibration analysis.
Use criticality analysis and an understanding of the cost of lost production to establish a threshold for equipment covered.
Review all equipment that fell below the cutoff. Review the 10-year history of the cost of maintaining the equipment. Estimate cost reduction that might be realized by covering the equipment with the technology.
Compare the cost reduction with the cost of adding the machine to the route. If the target ROI is achievable, add the machine to the route.

All maintenance tasks should address a specific failure mode

All maintenance departments have a tribal history. Have you ever asked someone why a task was being performed a certain way and been told “because we’ve always done it that way”? When you start looking at all maintenance tasks and ask yourself which failure modes are being addressed by completing the task, you’ll be surprised by the inefficiencies that exist.

I would encourage you to develop a culture in which all work requests address specific failure modes. This is where reliability-centered maintenance (RCM) can be extremely beneficial. But RCM studies can take a great deal of time to complete, and time is a resource that’s always in short supply.

I have discovered a unique approach that can streamline the process. We use what I call RCM object-type templates. Take a common equipment class, such as centrifugal pumps, and complete a generic failure modes and effects analysis (FMEA) without identifying equipment-specific operating context or failure consequences.

This approach will let you quickly understand the general failure modes associated with the equipment class. Once you understand the general failure modes, look at the variety of tasks and technologies that you can use to identify and correct each failure mode. Ask yourself two questions:

Will the inspection task or technology address the failure mode?
If so, is it the most cost-effective method at my disposal?

Start with centrifugal pumps

If I had to pick one place to set mousetraps in your facility, I’d set them around your centrifugal pumps. Why start there? Centrifugal pumps are the most common type of equipment found in most facilities; reliability principles for this asset class are well-understood; and many pumps are not designed or installed to best-in-class reliability standards. In addition, the payback for pump improvements is usually two-fold, yielding both energy and maintenance savings.

Review your list of bad-actor pumps. Unless the primary failure modes are related to the material of construction or to mechanical seal fluid film quality or consistency, chances are the pump is operating far from its best efficiency point (BEP). Many times when this is the case, the potential energy savings will be 2.5 to 3 times greater than the maintenance savings. Do not forget to tap into energy-efficiency rebates; they can be a great source of payback when you complete your pump improvement project.

We use a data-driven approach to determine whether the proposed solution will provide a meaningful return on our investment. Our assessment process is as follows:

Understand the primary failure mode(s).
Model the process operating requirements using a system curve rather than a single duty-point approach. You will be surprised how seldom pumps are modeled using a system curve. If you doubt this, ask your local pump supplier.
Perform a machine assessment, looking for mechanical root causes of failure. Examples include pipe strain or mechanical looseness resulting from base deterioration.
If the primary failure mode is related to mechanical seal leaks and the analysis in steps 2 or 3 does not identify the root case, chances are that the root cause is related to seal fluid film quality, pressure, or fluid film stability. We have found that modifying API plan 54 arrangements to our best-practice standard or installing a “seal pot” API Plan 53A usually resolves the issue.

Make sure you include the cost of evaporating water for inboard seal leaks or the cost of waste treatment for any water not recycled to a collection tank or for outboard seal leaks. Again, knowing the cost of these variables is crucial to making a wise investment. This approach has yielded great returns on our invested time and money.

To quickly model system performance, one must know what size impeller is in the pump. This sounds simple, but when we started our journey toward world-class pump reliability we did not know the answer to this question for the majority of pumps in our plant. Engineering records were not always readily available, and process changes were not always documented.

If this is a problem at your facility, I’d suggest that your job plans for pump change-outs where the impeller size is not known include a request for the craftsperson to document and record the impeller size. Include that information in your CMMS. Because we do not stock all impeller trim sizes, choosing instead to stock only full and the most common trim sizes, we include the information as a text line rather than a material code.

Use life-cycle cost (LCC) analysis to make decisions

I’ve heard many people talk about life-cycle cost (LCC) analysis, but I don’t believe it’s commonly used for problem-solving/decision-making. It’s not as difficult as many would make it out to be, but it does require discipline. The reason I’ve heard most often for not using LCC analysis is that it’s too difficult to calculate maintenance and operating costs. But if you have modeled your process and know your annual production capacity requirements and the motor and driven-equipment nameplate efficiency, energy costs will be easy to calculate.

And if you have maintenance strategies and repair histories for like equipment in a similar operating context, maintenance costs can be estimated. To become proficient with this or any other tool, however, you have to use it. As you complete more and more LCC analyses, your estimating techniques will improve.

Figure 2 shows a sample LCC analysis for one of our bad-actor pumps. We completed the analysis and chose option 3 because it had the lowest total cost of ownership over the course of the pump’s life cycle.

You will notice that option 1 tells us to “do nothing.” As previously stated, we are hard-wired to solve problems, so living with the problem does not feel like a solution to the reliability professional. But if the LCC analysis indicates that doing nothing produces the lowest total cost of ownership and you have the time to manage the problem, I’d recommend that this be your course of action.

The project was completed and the new pump commissioned 27 months ago. I’m happy to report that the maintenance costs since startup for this example have been only $225. Compare this with an average annual maintenance cost of $14,500 for the three-year period before the modification, and you can readily see the cash flow that can be generated by attacking your bad-actor pumps.

Look at your PM work

PM optimization is another fertile field to look to for cost savings. If you have a preventive maintenance program and you haven’t performed PM optimization (PMO), chances are you have too many PMs. PMs should be used only to address failure modes that are time-based or are hidden from detection without performing the inspection. RCM studies indicate that only 10%–12% of failure modes exhibit these types of failure patterns. As a first cut, you should review existing PMs, and recalling principle No. 2, ask the following questions.

Which failure mode is the PM supposed to prevent?
Is there a more-efficient task you can use to avoid or address the failure?

Remember, the act of starting and stopping equipment induces failure modes, so use PMs only where condition-based tasks will not do. When we employed our first cut, we were able to eliminate $157,000 worth of PMs without increasing the number of failures.

But this first cut is only the starting point of PMO. PM frequency can be optimized by analyzing the cost avoidance (both production impact and the failure maintenance that occurs between PM intervals) and comparing it with the cost of performing the PM.

Figure 3 illustrates this principle. Plotting the cost of the PM and the cost of the sum of production and failure maintenance cost avoidance will yield the frequency that will produce the lowest total cost of ownership.

Finally, do not forget activities that produce good ROI but have little or no favorable impact on the maintenance budget. There are tried-and-true maintenance program elements that, while increasing maintenance costs, produce great returns on time and money invested.

[sidebar id="6"]

Consider, for instance, an ultrasonic (UE) leak detection program for steam traps. The UE steam trap program hits maintenance budgets twice. First, the maintenance department has to pay for the resource to perform the survey. Then it has to pay for the steam traps to be replaced. But although our steam trap program increases maintenance costs, it has a total ROI of between 5:1 and 7:1.

If you want to get better financial results from your reliability program, stop chasing technology and follow the money. Get into the habit of being able to cost-justify each of your program elements. Do not get caught in the pitfall of constantly pursuing a better-built mousetrap. Sound reliability principles have not changed much over the years. Take the time to do the math and understand what the activities you pursue cost in time and money and the kind of return on investment you stand to gain. Think like an accountant and use the other five suggested zero-sum maintenance strategies and see how many mice (dollars) you end up catching with this bait.