Reach for reliability

What it takes to make the climb from reactive to RCM

By Ricky Smith, CMRP

It’s great to talk and write about best practices, reliability-centered maintenance (RCM) and how important they are to a plant’s ability to compete in our global economy, and we’ve been doing it for years. We know many facilities have embraced the principles and some have even implemented them. We also know many have not.

We decided to take a measure of the degree of implementation, the prevailing attitude towards RCM, and the differences between the RCM haves and have-nots. We invited maintenance professionals of all stripes to tell us via a Web-based survey not only where their plant’s actual practices are on the scale from reactive to reliability, but also how well their departments cooperate, their management’s attitude towards maintenance, and some key factors that drive the necessary culture changes.

Some 272 industry professionals participated in the survey, and more than two-thirds are maintenance and reliability managers. A significant number of senior managers as well as plant and production managers participated.

The results may be surprising to many and, I hope, eye-opening to those who have them closed. Bear in mind that responses are voluntary, and people are more willing to volunteer good news than bad.

Waving the flag

Starting off on a high note, every single respondent stated that asset reliability is a significant concern to them (“Survey Results” sidebar, question 1). I’m not surprised. Most reliability professionals I know are always questioning themselves whether or not they have optimal asset reliability at optimal cost, and this concern resonates with senior management.

But is reliability really under control — and is it sustainable for the future? No matter how good a grip you think you have, never underestimate the need to keep looking for ways to get better by ensuring the reliability of your capital assets, measuring reliability and continuously improving. The impact of asset reliability on asset utilization and performance dictates that we pay constant attention to this critical process.

Does management understand?

In most plants surveyed, senior management seems to understand the significance of reliability (question 2). One of the questions I constantly hear is, “If senior management understands the significance of reliability, why don’t they support a reliability initiative?”

Most senior management cannot and won’t accept a reliability initiative if it is not supported by a business case, points out Jack Nicholas, a world-renowned reliability expert. In the business plan, senior management wants to see:

  1. The value a reliability initiative will bring in hard dollars through:
    a. Increased capacity, asset availability.
    b. Reduced maintenance cost.
  2. Other outputs (not typically captured in hard dollars).
    a. Decreased risk of environmental incidents.
    b. Decreased asset life-cycle cost.
    c. Decreased capital maintenance (replacing equipment because it is “worn out” or “old”).
  3. The time to value (from when the initiative starts to when the company will start realizing results).
  4. The cost of the initiative (nothing is free).
  5. Amount of internal and external resources required.
  6. A plan with a timeline.
  7. Key performance indicators (KPIs) that will be used to manage the initiative (leading and lagging KPIs).
  8. Length of time for total return on investment (must be validated by the company’s financial expert).

If these items can be delivered in a professional manner, it’s hard for management not to accept and support the initiative. In fact, we want senior management to be the sponsors of any reliability initiative. Top leadership has control over the destiny of a plant. In particular, if a plant is at risk of closure, projects such as reliability improvement initiatives can be game-changers.

Speaking of closure, 28% of respondents report their plant or operation was at risk of being downsized or shut down (question 3). Numerous government reports say that in the next three to five years, 25% to 30% of companies will be downsized or shut down. For example, Ford and General Motors are closing plants and laying off thousands of employees.

“Business conditions that used to change every seven to nine years now change every seven to nine months,” Andy Harshaw, vice president, Dofasco Steel, was recently quoted as saying. “Companies must be flexible to change or face the fact that they may shut their doors.” Harshaw went on to say that managing asset reliability was important to Dofasco’s strategic goal and survivability of his company.

Who owns reliability?

More than 46% of respondents say the maintenance department owns the reliability of their plant/facility. From numerous discussions and my own experience as a maintenance manager, I know most companies blame asset reliability issues on maintenance. I say, “In the best companies in the world, everyone owns reliability.” Not surprisingly, only 20% of respondents gave what I consider the best answer (question 4). Until production accepts a partnership with maintenance to care for assets and keep them reliable, the plant will probably never reach the level of optimized reliability at optimal cost that is required for the company to reach its business goals.

Production/Operations should be the number one believer and driver of an effective preventive maintenance (PM) program. If they don’t own the reliability of the assets, the PM program probably won’t be effective. Almost 70% of survey respondents stated they had an effective PM program, but 44% indicate that equipment breakdowns are the norm (question 5). A preventive maintenance program cannot be effective if equipment breakdowns are the norm.

An interesting correlation is that 44% of respondents say breakdowns are the norm, and about the same percentage assess the reliability of their assets as ranking between being 1 and 5 on a scale where 1 is “real bad” and 10 is “world-class”. I conclude without surprise that an effective Operations-driven preventive maintenance program improves equipment reliability, reduces reactive maintenance and adds value to a company.

Informal versus formal PM programs

Looking at the situation more closely, I must ask, “How are companies developing PM programs?” In my experience, PM programs are typically developed informally, based on manufacturers’ suggestions, work requests (largely reactive), or simply on work that has always been done that way. When your PM program isn’t technically-based and not connected to a reliability-based maintenance strategy, typically more than 80% of the work you are executing is reactive, creating the defects we know as equipment failures. Progressive environments use a formal, technically-sound process where work orders can be traced back to the failure analysis that found the problem and created the task (Figure 1).

Only 34% of respondents say they use a formal methodology of looking at failures to determine the maintenance strategy to prevent and predict failures (question 6). A similar percentage (35%) rank the reliability of their assets between 8 and 10 on the scale where 1 is real bad and 10 is world-class. I can assume the 35% of companies who have high reliability also use some type of failure analysis methodology to develop their maintenance strategy. The analysis they perform is most likely RCM, failure modes and effects analysis (FMEA), maintenance task analysis (MTA), or some other proven methodology.

To measure is to manage

Most people have heard of Dr. W. Edward Deming and his manufacturing philosophies. Perhaps his most famous quote, which all successful companies believe (and unsuccessful ones tend to forget), is, “You cannot manage something you cannot measure.”

The survey results on measurements point to some interesting findings. Fully 41% of respondents say they manage using leading KPIs (question 7). Leading KPIs are the only effective way an organization can manage their reliability process. “Leading KPIs lead to results,” says Ron Thomas, a reliability leader at Dofasco Steel.

The results are tracked by lagging KPIs such as cost, asset downtime, number of failures, etc. Some 45% of respondents say they don’t manage with leading indicators, so at best, we assume they try to manage with lagging indicators. But decisions need to be made based on problems in the asset reliability process before they impact results. An example may be that scheduled compliance (a leading KPI) is off-target. If this situation isn’t corrected, the result could be higher production cost because maintenance work isn’t being accomplished on time with the right amount of resources, which causes excessive equipment downtime.

Figure 1

Figure 1: Progressive plants use a formal process where work orders can be traced back to the failure analysis that found the problem and created the task.

 

Many people ask me, “What is the first step to develop leading and lagging KPIs for my organization?” Most really don’t want to hear my answer, because everyone is looking for the silver bullet or a quick fix. If you want to effectively manage an asset reliability process, you must have the process elements (such as work identification, planning, scheduling, work execution, etc.) mapped and defined with tasks, roles and responsibilities; leading and lagging KPIs; etc. In the survey, just 23% of respondents say they have mapped and defined their reliability process. Figure 2 shows an example of a process map, in this case for procured materials and services.

Figure 2

Figure 2: Understand how to apply leading and lagging indicators by mapping the process. Then see where measurements can spot performance problems before they affect reliability.

Using the right KPI is critical to knowing where you are in a process. When we asked, “With what metrics do you measure the reliability of your assets?” only 23% of respondents state they used mean time between failures (MTBF).(MTBF is simply dividing the number of asset failures into time — for example, if you have three functional failures in 24 hours, the MTBF is 24 divided by three, or eight hours.) MTBF is one of the most fundamental measures of reliability. Other measurements may be affected by reliability, but MTBF’s only focus is measuring asset reliability. If you would like a copy of the “MTBF Users Guide” I developed, send me an e-mail at the address at the end of the article.

MTBF becomes less important as reliability increases, so then a company may begin focusing on, say, the number of potential functional failures identified in a specified period of time. In the survey, 9% indicate they are currently using this metric to measure asset reliability. These are probably the plants you would want to visit to learn how they do it.

Interestingly, even though only 23% of companies measure MTBF to manage reliability, 43% say their CMMS/EAM can provide this information. The real problem is that most companies cannot measure reliability of their assets because they currently don’t collect the data in a manner that would make this KPI valid.

Only 34% of respondents say that a work order is written close to 100% of the time for a functional failure or breakdown. Almost 30% say they either don’t write a work order, or write one less than 50% of the time. I believe that you cannot improve something you cannot measure, and all successful managers agree with this philosophy. Another is that managing with bad data leads to bad decisions.

Depth of understanding

The great Winston Churchill said, “I am always willing to learn, however I do not always like to be taught.” This is true in the world of reliability. Most managers are willing to learn, however, they aren’t willing to be taught something new so they can understand the basics of reliability.

More than 90% of managers are intimidated by the word reliability because they do not understand reliability, says Terrence O’Hanlon, CMRP, of ReliabilityWeb.com and Reliability and Uptime magazines. The survey shows a serious gap between what people think they know about reliability and their actual knowledge of reliability fundamentals. Most managers don’t understand nor apply the basic principle of reliability.

For example, only 11% of respondents say their company applies the principles of the P-F Interval, and 46% state they don’t use this basic concept at all (question 8). The P-F Interval is one of the foundational principles of asset reliability, which focuses on detecting failures far enough in advance that a proactive task can be implemented to mitigate the failure. This is the foundation of an effective preventive and predictive maintenance program. I always say, “It isn’t what you know that will kill you — it is the things you don’t know.” This is definitely true in the world of asset reliability.

Question 9 asked how well respondents understand the definitions of failure modes, equipment functions, total functional failure, and partial functional failure. These are some of the most important foundational elements of reliability, and must be understood to develop a proactive maintenance strategy. Only 24% of respondents say they understand these fundamental elements, while 30% either know nothing or very little about them.

Malcolm Forbes says, “The goal of education is to replace an empty mind with an open mind.” Once a manager is educated in the basic principles of reliability, their world will change. They will feel like they have suddenly seen the sunlight after having lived under a mushroom all their life.

Indicated actions

This survey helped identify serious gaps in many companies’ relationship to reliability. At the same time, it indicates a path to understanding how we can optimize asset reliability. A reliability initiative will be supported and can be successful if you have the business case — essentially a financial improvement plan for your company.

I have seen many companies try some type of initiative to improve reliability. Usually it either didn’t provide the value expected, or took too long to see the gains. Most reliability improvement initiatives deliver some return, but to make a quick impact to the bottom line — to achieve what I call performance breakthrough and a rapid payback — we need sustainable change (Figure 3). That change can only occur when managers and floor-level personnel see success and participate.

Figure 3

The survey found that 33% of companies have a “successful” reliability improvement initiative currently in place, and 36% of those companies say the initiative will pay for itself in one year or less (Question 10). All told, 82% say the initiative will pay for itself in less than three years.

More than 37% say the reliability initiative will last forever. It’s so important to understand that a true proactive asset reliability initiative is a continuous improvement process that last forever. As assets age, as the company experiences equipment failures, and as its business changes, reliability must be continually optimized. Continuous improvement must be embedded into the maintenance and reliability process.

The maintenance and reliability model in Figure 4 is a perfect example of how continuous improvement becomes part of the maintenance and reliability process. This model is known as the “Proactive Asset Reliability Process” and is used by some of the most successful companies in the world.

Figure 4

Figure 4: The Proactive Asset Reliability Process shows the role of continuous improvement, and is used by some of the most successful companies in the world.

Lessons are simple

A few simple lessons must be learned if you want a successful reliability improvement initiative. These aren’t options, but principles which must be followed or reliability will be at risk.
  • Executive sponsorship is required. A company needs a committed champion at the executive level to take ownership and responsibility of the initiative.
  • Floor-level operators and maintainers must be part of the design and share in the success of the new maintenance and reliability strategy.
  • Everyone from the floor level to the boardroom must have some level of education in reliability. For change to occur, people need to understand why they need to change. If you need to educate everyone in reliability, contact me and I will provide resources to help you develop and execute effective reliability training.
  • Develop a balanced scorecard for all levels of the operation, from the floor to the board room. Establish targets and goals for most KPIs on this scorecard. People want to know their score in the game.
  • Be successful by developing a plan and following it. With respect to meeting financial targets and deadlines associated with the plan, remember the saying, “under-sell, over-deliver.”

Finally, here are the steps, based on best practices, to implement a successful asset reliability process:

Step 1: Develop a business case to identify the financial opportunity. The business case must identify the projected financial outcome in hard dollars. The financial outcome may be found by increased capacity, reduced maintenance labor and material cost, increase asset utilization and more. The plant management team developing the business case must include a finance person (comptroller, chief financial officer, etc.).

Step 2:  Assets should be ranked based on risk to the business and their condition. Knowing your critical assets is so important to ensuring success of this initiative. More than 48% of survey respondents have ranked their critical assets. You will need to execute this initiative one asset at a time and focus first on the asset that provides the quickest payback. People will only change if they see change occur and believe in it. Taking the right step at the right time is so important to a successful reliability initiative.

Step 3: RCM methodology (RCM, FMEA, or MTA) must be applied to the asset with a joint team of operators and maintainers working together to design a proactive maintenance strategy for the asset. In the survey, the numbers of respondents who say they have a successful reliability initiative and who say they use RCM methodology is the same — a big hint.

Step 4: Use reliability software to assist in managing the health data of your assets. Reliability will now be managed based on the health of the assets, not breakdowns. Less than 28% of respondents say they have a successful reliability initiative in place, and only 20% use reliability software to collect and disseminate health data from their assets. In a typical plant, you could be managing as many as 60,000 to 80,000 data points coming from visual inspections, PLCs, predictive maintenance tools such as vibration monitors, and other sources. It’s also very important that reliability software be linked to a CMMS/EAM to reduce human error and integrate continuous improvement into your reliability initiative.

Step 5: Continue the process throughout the plant, at least on critical assets, and template the results on like equipment wherever possible.

Step 6: Establish Leading and Lagging KPIs to manage the process.

To be successful when improving and optimizing reliability at optimal cost, you need four things in harmony with each other: practices, processes, technology and people.

A proactive asset reliability process must be followed. Best practices must be adopted, applied and followed for each element of the maintenance and reliability process. An example of a best practice noted in this survey is that successful companies must identify the proactive work that will improve and sustain reliability.

Methodologies such as RCM, MTA and FMEA should be used. Technology including a CMMS/EAM system, reliability software and PdM tools are the enablers.

Of course, people are the heart of all initiatives, no matter what the domain. We need proactive senior management that will sponsor and drive reliability projects. We need middle management that will champion projects and support employees in the midst of cultural change from reactive to proactive. Finally, we need Maintenance, Operations and Engineering employees empowered to care for their assets to optimize reliability and embrace the change, because it really does mean a better way of life.

Survey results

1. Do you consider Reliability of your assets to be a significant concern to you?

Yes 100%
No 0%

2. Management’s attitude
Does senior management understand the significance of reliability in your facility/plant?

Yes 78%
No 22%

3. Is your plant at risk of being shut down or downsized due to cost or other factors?

Yes 28%
No 72% 

4. Who owns reliability in your facility/plant?

Maintenance department 46%
Production/Operations department 6.6%
Both 25%
Senior management 1.6%
Everyone 20%

5. Are breakdowns the norm (reactive maintenance is in full effect)?

Yes 44%
No 56%

6. Has your facility/plant used RCM or some other type of failure analysis methodology to determine what must be done to prevent or predict failures, at least for critical assets?

Yes 34
No 59
Uncertain as to what you are talking about 6.7%

7. Do you manage your maintenance and reliability process using leading KPIs?

Yes 41%
No 46%
Uncertain as to what you are talking about 13%

8. Does your company apply principles of the PF Interval?

Yes 11%
No 46%
Not sure 43% 

9. How well would you say you understand the definition of failure modes, equipment functions, total functional failures, and partial functional failure? (1 = unknown, 10 = expert)

Not at all (1) 9.1%
A little (3) 21%
Somewhat (6) 45% Yes, very well (10) 24%

10. If you have a successful reliability initiative in place, when will it pay for itself?

0-12 months 36%
1-3 years 46%
More than 3 years 18%


If you have comments or questions please send me an email at ricky-smith@comcast.net