Demand reliable equipment

How does one demand reliable equipment? Most industrial managers don’t know the answer to that question, and too often, they’re part of the problem. The end of excessive downtime begins with a simple spec.

By Ricky Smith

1 of 2 < 1 | 2 View on one page

How does one demand reliable equipment? Most industrial managers don’t know the answer to that question, and too often, they’re part of the problem.

Management policies of the past shape the future, good or bad, and many times, management develops policies that tell purchasing to buy at the lowest cost. Engineering is told to meet a deadline no matter what it takes. Vendors are ordered to deliver the parts and equipment on time, at the lowest cost, with no requirements for a specific level of reliability of the parts and equipment. Maintenance is urged to, “Hurry! Let’s get the production line up and running.”

What these great policies have in common is their goal is to save money and time. But as they begin to take effect, companies find either asset reliability is becoming less controllable, or they’re starting to spend large amounts of money to preserve it.

What’s needed is optimal reliability at optimal cost at all times. A company must spend only what is required for a specific amount of reliability, and no more. I call it, “reliability on demand.”

Because most companies don’t understand this process, major reliability problems begin when new equipment is introduced or a rebuild is completed. The immediate effect is lost money and often, ultimately, lost jobs.

What you don’t know is killing you

What most people don’t know about asset reliability is that most failure modes fall into the “infant mortality” category (Figure 1). Infant mortality is a term used when new or overhauled equipment breaks down shortly after startup. Originated in the health care world, a medical dictionary defines it as, “when a child dies under the age of one.”

Figure 1

In the world of asset reliability, we define infant mortality as the failure of an asset during startup, within a short period of time after new equipment is installed, or soon after the equipment has been overhauled to a “previous state” condition. This short time between when equipment is started up and fails could be minutes, hours, days or months. Reliability studies typically show that about 68% of known failure modes are a result of infant mortality.

So, most failures are likely to occur soon after new or overhauled equipment is started up. As most equipment is operated, it becomes less likely to fail. At some point, the probability of failure levels out to a plateau known as random failure, which allows failures to be detected through a proactive maintenance strategy. Infant mortality is difficult to identify and detect before failure occurs.

On first encounter with Figure 1, most people don’t believe it. The first time I saw it, I thought, “No way most failures come as a result of infant mortality.” I was wrong. Let’s think about this for a minute. I recommend you perform a basic experiment to see if infant mortality is prevalent in your operation:

Step 1: Measure the mean-time-between-failures (MTBF) for a production line or process area during some time interval (t) while it’s running in a steady state. MTBF = t divided by the number of emergency work orders. For example, at steady state, 160 hours divided by 10 emergency work orders gives a MTBF of 16 hours.

Step 2: After the next shutdown, when equipment has been overhauled or new equipment has been added, measure MTBF again for the same time period: 160 hours divided by 20 emergency work orders gives a MTBF of eight hours.

Step 3: Compare the data.

The first step in resolving any problem is to acknowledge you have a problem. Once this has been achieved, a company is ready to begin the journey to reliability on demand.

Why equipment dies

Causes of infant mortality can be broken down into a few categories (Table 1).  To truly reduce the likelihood of infant mortality and improve reliability, those issues (and more) must be addressed and prioritized by risk.

Table 1: Causes of Infant Mortality

1. New/overhauled equipment

  • Equipment doesn’t meet requirements
  • Poor design
  • Lack of quality in manufacturing
  • Equipment installed incorrectly

2. Maintenance issues

  • Unmet or unknown lubrication specifications
  • Specifications not followed during repairs, rebuild or installation
  • Preventive maintenance inspections not performed to standard
  • Equipment being opened up and inspected too frequently
  • CMMS/EAM not used to calculate MTBF

3. Production issues

  • Operator not starting up equipment according to standard operating procedure
  • Production constantly stopping and starting equipment (lunches, breaks, shift changes, etc.)

4. Other

  •  Power surges
  • Unstable floor for new equipment
  • Purchasing the cheapest parts or equipment

The U.S. Department of Defense (DoD) is very much aware of infant mortality. It has completed research to identify how to make equipment more reliable so that once it is commissioned or overhauled, it will have a high probability of operating failure-free for a specific time. Extensive DoD standards and procedures have been developed to optimize reliability. The DoD reliability standards, process and more can be found at

1 of 2 < 1 | 2 View on one page
Show Comments
Hide Comments

Join the discussion

We welcome your thoughtful comments.
All comments will display your user name.

Want to participate in the discussion?

Register for free

Log in for complete access.


No one has commented on this page yet.

RSS feed for comments on this page | RSS feed for all comments