PS0706_Cover_web

Demand reliable equipment

June 13, 2007
How does one demand reliable equipment? Most industrial managers don’t know the answer to that question, and too often, they’re part of the problem. The end of excessive downtime begins with a simple spec.

How does one demand reliable equipment? Most industrial managers don’t know the answer to that question, and too often, they’re part of the problem.

Management policies of the past shape the future, good or bad, and many times, management develops policies that tell purchasing to buy at the lowest cost. Engineering is told to meet a deadline no matter what it takes. Vendors are ordered to deliver the parts and equipment on time, at the lowest cost, with no requirements for a specific level of reliability of the parts and equipment. Maintenance is urged to, “Hurry! Let’s get the production line up and running.”

What these great policies have in common is their goal is to save money and time. But as they begin to take effect, companies find either asset reliability is becoming less controllable, or they’re starting to spend large amounts of money to preserve it.

What’s needed is optimal reliability at optimal cost at all times. A company must spend only what is required for a specific amount of reliability, and no more. I call it, “reliability on demand.”

Because most companies don’t understand this process, major reliability problems begin when new equipment is introduced or a rebuild is completed. The immediate effect is lost money and often, ultimately, lost jobs.

What you don’t know is killing you

What most people don’t know about asset reliability is that most failure modes fall into the “infant mortality” category (Figure 1). Infant mortality is a term used when new or overhauled equipment breaks down shortly after startup. Originated in the health care world, a medical dictionary defines it as, “when a child dies under the age of one.”

In the world of asset reliability, we define infant mortality as the failure of an asset during startup, within a short period of time after new equipment is installed, or soon after the equipment has been overhauled to a “previous state” condition. This short time between when equipment is started up and fails could be minutes, hours, days or months. Reliability studies typically show that about 68% of known failure modes are a result of infant mortality.

So, most failures are likely to occur soon after new or overhauled equipment is started up. As most equipment is operated, it becomes less likely to fail. At some point, the probability of failure levels out to a plateau known as random failure, which allows failures to be detected through a proactive maintenance strategy. Infant mortality is difficult to identify and detect before failure occurs.

On first encounter with Figure 1, most people don’t believe it. The first time I saw it, I thought, “No way most failures come as a result of infant mortality.” I was wrong. Let’s think about this for a minute. I recommend you perform a basic experiment to see if infant mortality is prevalent in your operation:

Step 1: Measure the mean-time-between-failures (MTBF) for a production line or process area during some time interval (t) while it’s running in a steady state. MTBF = t divided by the number of emergency work orders. For example, at steady state, 160 hours divided by 10 emergency work orders gives a MTBF of 16 hours.

Step 2: After the next shutdown, when equipment has been overhauled or new equipment has been added, measure MTBF again for the same time period: 160 hours divided by 20 emergency work orders gives a MTBF of eight hours.

Step 3: Compare the data.

The first step in resolving any problem is to acknowledge you have a problem. Once this has been achieved, a company is ready to begin the journey to reliability on demand.

Why equipment dies


Causes of infant mortality can be broken down into a few categories (Table 1).  To truly reduce the likelihood of infant mortality and improve reliability, those issues (and more) must be addressed and prioritized by risk.

Table 1: Causes of Infant Mortality

1. New/overhauled equipment

  • Equipment doesn’t meet requirements
  • Poor design
  • Lack of quality in manufacturing
  • Equipment installed incorrectly

2. Maintenance issues

  • Unmet or unknown lubrication specifications
  • Specifications not followed during repairs, rebuild or installation
  • Preventive maintenance inspections not performed to standard
  • Equipment being opened up and inspected too frequently
  • CMMS/EAM not used to calculate MTBF

3. Production issues

  • Operator not starting up equipment according to standard operating procedure
  • Production constantly stopping and starting equipment (lunches, breaks, shift changes, etc.)

4. Other

  •  Power surges
  • Unstable floor for new equipment
  • Purchasing the cheapest parts or equipment

The U.S. Department of Defense (DoD) is very much aware of infant mortality. It has completed research to identify how to make equipment more reliable so that once it is commissioned or overhauled, it will have a high probability of operating failure-free for a specific time. Extensive DoD standards and procedures have been developed to optimize reliability. The DoD reliability standards, process and more can be found at www.enre.umd.edu/publications/rs&h.htm.

Make your demands known


Companies that continue to experience reliability problems need to think hard about how to stop or, at the bare minimum, decrease the number of infant mortality failures. More than 98% of companies I’ve visited experience infant mortality at a high rate, and are unaware of what they can do about it. Simply put, demand reliability from everyone.

Design and purchase new equipment that has the highest probability of meeting the reliability needs of the business. Reduce equipment problems and increase profitability by tracking each asset and being sure it is meeting the specified life expectancy. Focus the process with a few simple questions:

  1. What is the overall expected downtime of the new production line or process? This is a question people are either unprepared to ask or are afraid the answer can’t be determined. This question must be answered – otherwise, new equipment or a process is started up and reliability becomes a problem for years. The company ends up replacing management, hiring a consultant and trying all kinds of possible solutions to the problem. In the mean time, it and its shareholders are losing millions of dollars. So the total downtime (scheduled and unscheduled) a company is willing to accept while meeting the production line or process goals must be known before equipment can be researched and purchased.

    In 1984, I was assigned to develop a process for equipment life cycle (cradle-to-grave) planning of all new production equipment and production lines for a large international corporation.
    Immediately after developing the process, a new production line was proposed to be designed and installed in one of our U.S. plants. At an executive meeting, I asked how much scheduled and unscheduled downtime would be allowed for us to meet our production and cost goals for this new production line.

    Everyone looked at me like I had a communicable disease or something. Then our senior vice-president eyed me with determination and said, “I don’t know, but I think we can find the number.”

    I discovered that the easy part is determining how much downtime could be allowed – we knew the production rates. The hard part was determining how much of the total maintenance downtime would be unscheduled and scheduled. We established that 90% would be scheduled, and 10% unscheduled. This was an interesting experience and this process shaped our corporation for many years to come.

    The next question is really the next step in the quest for reliability on demand.
  2. What is the required MTBF for new equipment to meet the business goals of the new production process or production line? If companies would research and purchase motors, gear reducers, programmable logic controllers and even specialized equipment for which the known failure rate has been determined, the reliability of production lines would have a much higher probability of meeting the goals set by their corporation.

    There are organizations in the United States, Canada and Europe that have determined reliability rates and set standards. The primary such U.S. organization is the American National Standards Institute (ANSI), www.ansi.org. MTBF data on specific equipment types and manufacturers may be found on the ANSI Web site. If a manufacturer states that its equipment conforms to ANSI specifications, you can be assured the reliability data conforms to a specific standard (the data is reliable). For example, the standards the Hydraulic Institute (www.pumps.org) has established for mean-time-between-repairs (MTBR) are for pumps and other equipment meeting ANSI standards.

    Researching and purchasing reliable equipment is 80% of the solution, so why do so few companies actually research MTBF before purchasing equipment? If an equipment manufacturer doesn’t have MTBF data on the products they sell, you must consider whether you’re willing to take a risk on reliability by purchasing it.

    During the early 1990s, I was asked to visit a facility where a large gear reducer failure on a critical asset had shut the plant down. When I arrived, the plant manager immediately escorted me to the crime scene. It was a terrible mess, with broken helical gears and bearing parts laying everywhere. I asked a few probing questions, as any good investigator would. How long had this gearbox been in operation? The answer was, “More than 20 years.” I then asked, has the gearbox previously failed? The maintenance manager, who worked at the plant for more than 16 years, answered with a clear, “No.”

    The plant manager was anxious. He asked what I thought, so after dissecting a large amount of data (just kidding), I told him the gearbox was simply worn out. A local vendor had recommended a “better” gearbox they had on the shelf. I told him, in my professional opinion, I would purchase the same type of gearbox that had failed and stay away from the “better” gearbox. Sometimes common sense must rule.

    Some people just don’t get it. Having the right data at the right time could help a company save millions of dollars. A company must demand reliability, and not allow a salesperson to make technical decisions that affect your business. One must set policy that directs purchasing to provide MTBF data to engineering or maintenance before a purchasing decision is made on a specific type of equipment.
  3. How does a company manage the possibly hundreds of failure rates so it won’t exceed the goal set for a production line or process? Once the amount of downtime allowed is known and MTBF standards are established, the difficult work begins. When managing new equipment failure rates, always remember MTBF is the average time between failures. The best way of managing failure and risk to meet a specific business goal is using reliability-centered maintenance (RCM). This process assists management in determining a proactive maintenance strategy to mitigate risk and financial losses.

    RCM methodologies were developed by U.S. Navy reliability engineers in the 1940s and 1950s, and in the 1960s by the airline industry. These reliability engineers understood that a technically sound methodology must be developed to control risks and consequences. In the past 30 years to 40 years, most industries have used RCM methodologies in one form or another to control risk and cost.

    Knowing the expected or allowable downtime of new equipment or processes, knowing the MTBF, and managing failures aren’t optional if a company is to have reliability on demand. If one of these steps is skipped or performed to a substandard level, the consequences will be unpredictable but the outcome won’t. Many people have told me this process has to be difficult, and I tell them it is difficult, but often it’s the difference between a company making money or closing the doors. Nothing that provides such great rewards is ever easy. Next time you buy new equipment, demand reliability.

If you have questions concerning any part of this article or would like some friendly advice, e-mail me at [email protected].

Sponsored Recommendations

Arc Flash Prevention: What You Need to Know

March 28, 2024
Download to learn: how an arc flash forms and common causes, safety recommendations to help prevent arc flash exposure (including the use of lockout tagout and energy isolating...

Reduce engineering time by 50%

March 28, 2024
Learn how smart value chain applications are made possible by moving from manually-intensive CAD-based drafting packages to modern CAE software.

Filter Monitoring with Rittal's Blue e Air Conditioner

March 28, 2024
Steve Sullivan, Training Supervisor for Rittal North America, provides an overview of the filter monitoring capabilities of the Blue e line of industrial air conditioners.

Limitations of MERV Ratings for Dust Collector Filters

Feb. 23, 2024
It can be complicated and confusing to select the safest and most efficient dust collector filters for your facility. For the HVAC industry, MERV ratings are king. But MERV ratings...