CMMS / Software / Asset Management System

Revisiting reliability and your CMMS: How to reduce costs with a reliability management program

David Berger explores how to use all of the tools your system offers to lift reliability and justify investments.

By David Berger

Reliability is one of the oldest concepts in asset management, but in recent years it has become a hot focus area for companies. To a large extent, this is because we have become so reliant on automated equipment, the internet, and information systems.

Many CMMS vendors have been enhancing reliability-related functionality, accelerating the trend. Below is a summary of basic CMMS features and functions that can help you reduce costs when you establish and use a reliability management program.

Criticality analysis: Most CMMS packages are able to calculate or at least record the criticality of each piece of your equipment and its component parts as a coded field on the equipment master file. Preventive and predictive tasks can then be defined to avoid failure of assets flagged as having a higher criticality. In addition, the user can record corrective tasks required in the event of mechanical breakdown.  In some cases, redundancy or “mirroring” can be a relatively inexpensive way to minimize asset downtime.

Failure analysis: Coded fields on the CMMS simplify data collection and force consistent reporting of failures. Most CMMS packages also offer descriptive fields, allowing for input of more-detailed failure explanations.

A “problem” code refers to how a breakdown is reported.It is usually tied to a given class of assets – for example, motors, pumps, or rooms. In a facilities operation, a tenant might report that a room is excessively hot or cold. 

A “cause” code is determined by the maintainer upon investigating the problem. The more-advanced CMMS packages tie a set of cause codes to a given problem code of a certain asset class, thereby creating a nested hierarchy of codes. In the example above, possible causes tied to asset = “room” and problem = “too cold” may be failed thermostat, blown circuit breaker, inoperative fan, and so on.

The “action” code records what work was done to fix the problem. Again in the example above, nested and hierarchical action codes might be “repaired fan,” “reset circuit breaker,” or “replaced thermostat.”

Finally, a “delay” code explains why operations have temporarily ceased – maybe there’s a wait for raw materials, or an operator break, or a product changeover. Identifying the most frequent and time-consuming reasons for the delay will provide valuable insight into the priority of problems that need to be addressed.

Pareto analysis: Failures can be prioritized in terms of impact on safety, operational output, cost, and other factors. Use statistical analysis of equipment history to determine the high-frequency, high-impact problems, their underlying causes, and most cost-effective actions. Pareto analysis is one such tool. A Pareto chart is nothing more than a frequency distribution of problem codes that can be plotted on a simple spreadsheet. More-sophisticated CMMS packages can assist with this kind of analysis.

Diagnostic analysis: The most advanced CMMS software is moving away from simply reporting on coded history. Far more useful is a knowledge-based or rules-based troubleshooting database for identifying the best course of action for a given problem. If, for example, a motor fails in a given piece of equipment, the diagnostic tool determines the statistical likelihood of each cause code and suggests corresponding actions to consider. Additionally, correlations can be made with equipment or part vendors to determine whether there is a higher failure rate originating from a given vendor. This allows you to take preventive or predictive steps to minimize costly downtime and/or approach the vendor to fix the problem.

Asset performance analysis: One effective way to focus attention on asset care is to show the relationship between equipment reliability and operational productivity. This can be accomplished by tracking simple measures on the CMMS such as maintenance cost per unit of output or operations cost per minute of equipment downtime. More important than the actual value of each measure is the trend over time.

Analysis of other measures: CMMS vendors have other useful measures of reliability. Two such measures that are rising in importance are mean time between failure (MTBF) and mean time to repair (MTTR). By tracking MTBF and MTTR for each critical asset, companies can determine whether they’re making reliability progress.

Condition monitoring and analysis: This is an important feature of every CMMS. The simplest packages let users manually input data such as setpoints or equipment use meter readings for triggering PM routines. The more sophisticated CMMS is connected to automated data collection sources via the internet of things, through integration with third-party software, or via a direct connection to the source. The software then analyzes incoming data to ensure that it trends within user-defined control limits. When data strays outside the control limit, users are alerted and/or automatic action, such as issuance of a work order, is taken.

PM cost analysis: Everyone agrees that a planned environment is far superior to a reactive environment with its constant firefighting. What isn’t always clear is the break-even between these two worlds. A technique sometimes referred to as Weibull analysis graphs the decreasing cost of planned maintenance compared with the increasing cost of unplanned maintenance as the PM interval increases. Where the two curves meet at a given PM interval, cost is optimized.

Predictive maintenance analysis: A simple example of this is comparing the history of engine failures with the condition of the lubrication oil before the failure. It may then be possible to predict the need to replace the oil, the rings, and so on, given the trends in oil temperature, viscosity, and the amount and type of particulate in it.

Lifecycle analysis: One of the key decisions in any reliability management program is when to repair or replace a given asset. Suppose in my earlier example that the problem “insufficient heat” had been caused by a failed thermostat in, say, 80% of the cases reported in equipment history. The average cost of repairing the unit may have been $225 for parts and labor. Further analysis reveals that to replace all of the thermostats would cost only $125/unit. Moreover, preventing failure would ensure that tenants are not left in the cold, especially during extended cold spells. Thus, repair/replace decisions can be justified based on statistical analysis of equipment history and cost data.