The risk management balancing act: How to recognize and navigate the tradeoffs

David Berger explores using your CMMS to maximize asset performance, availability, and reliability.

By David Berger

Risk is a key consideration in striking the right balance between minimizing the total cost of ownership of your assets, and maximizing their performance, availability, and reliability. It is all about tradeoffs you make when you try to optimize your asset design, work program, and work schedule. This column provides some basic guidelines on to how to recognize and navigate these tradeoffs.

The very first tradeoff you make is in the design and selection of assets to build or purchase in the first place. How do risk and upfront costs change when asset design specifications change? Examples of design changes are specifying a better quality material used in construction, replacing an automated control loop with a manual one, or eliminating redundant component systems. Some considerations during the design / build / purchase phase of acquiring an asset, for comparing the change in risk with the savings achieved by adding, changing, or eliminating a design specification include the following:

  • Will degrading or eliminating a specific design specification change the total cost of ownership for the full asset lifecycle, not just the cost to build / purchase the asset? Put differently, will the cost to operate and maintain an asset increase if the quality of material used in its construction is lower?
  • Will degrading or eliminating a specific design specification result in a significant increase in the probability of risk factors such as reduced health and safety of employees or the public, increased probability of environmental spills, increased probability of unexpected asset failure, or increased probability of public or government reaction?
  • Will it result in a positive or negative change to the potential impact of each risk factor (e.g., the operational risk goes from negligible to catastrophic impact if you remove the automated control loop)?
  • Is the change in cost outweighed by the resultant change in risk?

For example, suppose removing a redundant component will save $1 million, but adds $200,000 in additional maintenance costs to detect and predict failure, for a net savings of $800,000. If the probability of a catastrophic failure increases by 3%, with an estimated $100 million in health and safety, environmental, operational and reputational impact, then the risk is almost four times the savings and is therefore not recommended.

Once the asset is in production, the CMMS should be used to record the final risk score based on the probability and impact of the risk factors mentioned previously. This score is referred to as the “asset criticality”, which is a field available on most CMMS packages. Some CMMS packages provide a more sophisticated risk scoring algorithm to derive the asset criticality, but most packages have only a static field. Those assets with scores above a certain user-defined risk score would be deemed “critical assets” flagged as high risk / priority.

Once an asset is operational, it should be maintained based on an optimized work program. The work program begins with determining optimal maintenance policies for each asset or its components, starting with the most critical assets. For example, how often should a given piece of equipment be unavailable to the operations team, in order to inspect / replace a filter, and under which of the following maintenance policies?

  • Fail-based maintenance (FBM) – let the asset run to failure and then replace the filter
  • Condition-based maintenance (CBM) – change the filter when periodic inspection reveals that a certain trigger point has been reached (degraded flow rate, color of filter, level of particulate in flow)
  • Use-based maintenance (UBM) – change the filter at a given time interval, meter reading, or event, regardless of its condition

Your choice of maintenance policy will ultimately affect three fiscal areas: the cost of maintaining the equipment (asset reliability and lifecycle cost), the cost of operating the equipment (degradations in asset performance and quality of output), and the revenue stream derived from operating the equipment (asset availability). Thus, selection of the optimal maintenance policies is based on a reasonable tradeoff between the cost of maintenance, and the probability and impact of asset failure. More critical equipment that has a high risk of catastrophic failure would justify a more expensive approach to maintenance, such as a major overhaul each year and/or condition-based maintenance with online, real-time inspection involving multiple measures.

Another aspect of risk management under the work program is embedded in the job plans. Each job plan should be risk assessed in order to establish priority. The same risk factors mentioned above can be used to evaluate the work priority. For example, which jobs are critical to complete on time because of regulatory requirements, or for high health and safety risks? Each job plan can be risk scored accordingly. Any job plans whose scores are above a defined threshold can be considered critical work, sometimes referred to as “critical PMs”. The CMMS is used to document the work program, including all maintenance policies, triggers, intervals, job plans, risk scores, detailed procedures, drawings, quality and performance standards.

David Berger, a Certified Management Consultant (C.M.C.) registered in Ontario, Canada, is a Principal of Western Management Consultants, based in the Toronto officeDavid Berger, a Certified Management Consultant (C.M.C.) registered in Ontario, Canada, is a Principal of Western Management Consultants, based in the Toronto office. David has written more than 200 articles on a variety of topics such as maintenance management, operations management, information technology, e-commerce, organizational design, and strategy. In Plant Services magazine, he has written a monthly column on maintenance management in the United States, as well as three very extensive reviews of maintenance management systems available in North America. David has done extensive work in the areas of strategy, information technology and business process re-engineering. He can be reached at

Even if you take great pains to provide a risk score for all assets, and a risk score for all planned work within an optimized work program, there is still no guarantee that the schedule will also be optimized. This is because the scheduling of work on assets involves another round of tradeoffs. Suppose the work backlog is far greater than the resources available to complete it all on time? Furthermore, suppose the operations team is behind in production, and is therefore reluctant to allow scheduled downtime of the asset so that the maintenance team can do their work?

The scheduling process must prioritize based on the risk scores of both assets and the work. Some CMMS packages multiply the two together to create what some CMMS vendors refer to as the RIME number (Ranking Index for Maintenance Expenditures = asset criticality × work priority). Additionally, risk scores may change over time. For example, a particular job may have a drop-dead date that is fast-approaching, and for which serious regulatory consequences will be incurred if the date is missed. Similarly, replacement of a belt or changing the oil may become more critical if left for too long. On the other end of the scale, sometimes if you wait long enough, work may become redundant, superfluous, or simply less urgent because of changes in senior management strategy, legislation, product mix, and/or economic conditions.

Some CMMS packages have features that can help users automatically change risk scores over time, based on user-definable algorithms. Other CMMS packages use a simple time-based calculation called the “critical percent” to help identify escalating risk. If the critical percent = 0, the work was just executed. If the critical percent = 100, then the work is just coming due. If the critical percent = 200, then the work is overdue by one complete cycle, such as a monthly job plan that is one full month overdue. As the critical percent climbs, the more advanced CMMS packages can be configured to automatically notify management. For example, when the critical percent reaches 125%, notify the maintenance manager. At 150%, notify the plant manager, and above 200% notify the President.

Read David Berger's monthly column, Asset Manager.