The secret to improving asset reliability

The single most important factor was learned from decades of experience.

By Sheila Kennedy, contributing editor

The origins of reliability-centered maintenance (RCM) date back to 1960, when the Federal Aviation Agency (FAA) formed an industry task force to investigate the capabilities of preventive maintenance in anticipation of jumbo jets coming online, according to Jack Nicholas, who gave the keynote presentation at the 2014 Reliable Asset World and Ultrasound World X co-located conferences hosted by UE Systems in Clearwater Beach, Florida.

Nicholas, a pioneer in reliability-centered maintenance, described to a room full of reliability professionals how that action ultimately led to modern-day asset reliability best practices.

“At that time, the FAA recognized that their scheduled tasks were not effective, they were mostly intrusive, and they were creating problems,” said Nicholas. In the decades that followed, reliability practices have greatly improved. “Today, airplanes almost never fail due to maintenance. It’s usually pilot error or another cause.”

jack nicholas
“The single most important thing you can do to assure reliability is to focus on continuous improvement in your operating and maintenance procedures.” — Jack Nicholas, Reliability-Centered Maintenance Pioneer

In the 1960s, studies showed that the bathtub curve, marking asset life cycle, or the time between infant failure and the end of useful life, represented a very small percentage of all failures, so attention turned instead to the conditional probability of failure. In addition, wear-out was believed to be the dominant cause of failure, but when tests proved otherwise, infant mortality failure became the target. Infant failures occur early in the operating cycle following maintenance.

Economics was the driving force for RCM in commercial and military aircraft, and it has proven to be the driving force for every industry that has adopted the practices since. “The airlines and military needed to make the aircraft economically viable, less prone to failure, and more prone to early detection of degradation without intrusion,” said Nicholas.

Beginning in the early 1970s, prior maintenance practices made way for a maintenance study group (MSG) approach. The results in the Navy were tremendous, with shipyard overhauls reduced from every five years to more than 11 years. “Savings of $1.7 billion were projected by 1988, representing an 11% reduction in the life-cycle maintenance cost of Navy ships in operation. Their available life cycle increased from 25 to 33 years,” said Nicholas.

Streamlined RCM began being applied to the attack submarine fleet. “The streamlined RCM approach is very close to perfect for the kind of work needed on just about anything, when full RCM is not economically viable” said Nicholas.

By the late 1970s, the MSG strategy had coined RCM, and it wasn’t long before utilities, manufacturing, and mining facilities, along with NASA and the DOE and DOD, began adopting the practices. “NASA used to tear down and rebuild entire electrical distribution systems every five years,” said Nicholas. “That introduced significant infant failure risk that could not be sustained.”

Nicholas cautioned that those who implement RCM will lose talent at some point, and re-education will be necessary. “People will revert to the old way of doing business, because the natural inclination is to break equipment apart, inspect it, fix it, and put it back together.”

The most costly and rigorous form of RCM is super-classical RCM, although variants and derivatives exist. Economics often drives the decision-making. Full-blown RCM is best for the bad actors, which generally represent 20% of all equipment. For the remaining 80%, the better-behaving assets, some variant of RCM is suitable. For all forms of RCM, include as many non-intrusive tasks as possible, such as ultrasonics, to reduce the chances of failure caused by maintenance.

Risk threshold investigations (RTIs) should be performed on the 80% by operations and maintenance personnel. RTI asks what the probability is of failure modes that could occur and produce a serious or catastrophic event, even if the probability is remote. “Look at a number of risk criteria, come up with the show-stoppers, and decide what can be done to mitigate that risk,” said Nicholas. “It is a qualitative methodology. It is something you can do while waiting for money for more rigorous RCM approaches.”

Root cause analysis (RCA) should be performed on the entire 100%. “Use any RCA approach you like and take action on it. That’s the biggest problem: the lack of action,” cautioned Nicholas. “Don’t limit the time to do RCA. Keep doing it until you get it right, and then track the action to ensure the root cause is solved.”

The newest RCM approach is defect elimination (DE). It focuses on the worst offenders, or the 1%, when there is no money for RCM or experience-centered maintenance, and you’ve done RCA.

Sheila Kennedy is a professional freelance writer specializing in industrial and technical topics.Sheila Kennedy is a professional freelance writer specializing in industrial and technical topics. She established Additive Communications in 2003 to serve software, technology, and service providers in industries such as manufacturing and utilities, and became a contributing editor and Technology Toolbox columnist for Plant Services in 2004. Prior to Additive Communications, she had 11 years of experience implementing industrial information systems. Kennedy earned her B.S. at Purdue University and her MBA at the University of Phoenix. She can be reached at

Nicholas also presented several charts that illustrate the relationship between safety and reliability. He showed how injury rates go down as asset utilization and overall equipment effectiveness go up. Plants that require less corrective and reactive work are safer. Plants that are more proactive with preventive and predictive maintenance, and are more disciplined in their maintenance scheduling, have lower injury rates.

He also shared the environmental and economic advantages of RCM. The rate of environmental incidents goes down as asset utilization and overall equipment effectiveness go up, and equipment that is more reliable has a lower production cost per unit.

Finally, Nicholas shared the single most important thing you can do to assure reliability: continuous improvement of procedures. “Drive the percentage of infant failures by having a living procedures process for both operations and maintenance, particularly operations, and continuously improve your procedures. If you are a procedure-based organization, it is far easier to avoid human error, improve reliability, and therefore be safer.”

Nicholas summarized his presentation with the following key points:

    • RCM (classical or super-classical) is still the most effective methodology for determining what maintenance to perform on critical systems, and it is probably applicable to about 20% of systems.
    • RCM variants or derivatives are good for the other 80%.
    • Both of the above must focus on nonintrusive tasks.
    • Implementation, not analysis, continues to be the biggest problem with RCM.
    • Risk threshold investigation is a good interim measure while waiting for RCM analysis resources.
    • Defect elimination on 1% of condition-based maintenance work orders will bring the earliest payback and biggest payoff.
    • Use root cause analysis when unsure of what caused a failure.
    • The single most important thing you can do to assure reliability is to focus on continuous improvement in your operating and maintenance procedures.

Ultrasound World Reliable Asset World