You have to be committed to maintain complete control-system reliability

In order to maintain complete control-system reliability, you have to be committed. Here are four often-overlooked flaws that can creep into modern automation and control systems while you're not watching.

By Arthur Zatarain, P.E.

1 of 2 < 1 | 2 View on one page

Are you committed to control-system reliability, or are you merely involved? Ruthless oilman J.R Ewing, of TV's Dallas fame, once explained the difference by saying, “It's like ham and eggs at breakfast. A chicken was certainly involved, but that pig was committed."

The folks who count on you to design and maintain a reliable control system won't expect quite such commitment from you. But they do expect a commitment to a control system that meets their expectations for performance, safety and economy. Although you know your systems inside and out, and often check their pulses, small problems can fly below radar to decrease reliability and dramatically increase downtime.

My experience in field service and forensic engineering has shown that ordinarily dependable systems can be jinxed by relatively minor oversights. Smooth-running systems might be the most vulnerable because they are rarely analyzed. The four situations discussed below range from obvious to obscure, but all of them may be closer than you think.


I've never seen a relay that stumped me, but I've certainly worked with my share of perplexing PLCs and other programmable devices. One certainty about computerized equipment is that every element must be 100% compatible: hardware, operating software, application programming and even interconnecting cables. Every aspect must fit perfectly to produce a reliable system.

With modern controls, even a small PLC can contain multiple processors, as well as several I/O and communications boards, each driven by on-board microprocessors. Those devices must have the proper operating codes (either ROM or firmware) to function properly with the particular variant of your application code. It's not always obvious that yesterday's program might be incompatible with today's apparently identical spare processor. Therefore, control-system reliability is quietly threatened by the continually evolving software provided on replacement parts.

I tripped over this anomaly while analyzing an accident on a circuit board manufacturing machine. The used equipment had been damaged in shipment, and the PLC processor had been replaced with an assumed compatible part -- same brand, same part number -- but with a later version of firmware. The machine operated normally for weeks before a safety switch failed to prevent automatic operation. The problem was traced to a “trick” in the ladder logic that had been used to work around a flaw in the firmware of earlier controllers. Although the flaw was later rectified on board the controllers, the earlier work-around was incompatible with the later firmware. The manufacturer's literature explaining the situation had never made its way to the factory floor.

Merely installing a spare card, therefore, might introduce unanticipated operation that goes unnoticed until a particular situation arises. Even a known incompatibility can result in expensive downtime until a suitable guru can be located to coordinate the mismatched parts and programs. Even the best guru can have his hands tied until a vintage DOS-based computer can be found to run the outdated software needed to update the equipment. The bottom line here is that although you have lots of spare parts on hand, they must be 100% compatible. Will you be able to determine compatibility in real time while costly production slams to a halt? If not, then now's the time to work out a better plan.


Although ESTOP and LOTO sound like space-age cartoon characters, they’re better known as emergency stop and lockout-tagout –- two essential components in any reliable control system. Both relate to halting an industrial machine or process effectively. However, ESTOP and LOTO serve significantly different purposes that can inadvertently work against each other if improperly coordinated. Careful consideration during initial design and later modification is essential for reliable control-system operation and maintenance.

ESTOP is aimed at producing a safe and immediate shutdown during any phase of operation. Its primary goal is minimizing harm to people, property, production and the planet. The details of performing an ESTOP vary greatly among situations, ranging from very simple (as in a drill press) to very complex (as in a roller coaster). Regardless of any particular design, every ESTOP system shares the common feature of simple and latched activation: once you push the red mushroom, it activates a safe mode, and nothing short of a manual reset will restart it.

LOTO, on the other hand, affects nonoperational maintenance concerns that are generally handled by specifically trained maintenance staff. LOTO usually occurs via a preplanned process rather than as a single action, and applies primarily to offline equipment. Although some aspects of ESTOP and LOTO often overlap, a major philosophy difference is that ESTOP doesn’t demand relief of stored energy. In fact, stored energy is often required during ESTOP to prevent undesired motion.

Alternately, LOTO requires removal of all energy sources, such as electrical, pneumatic, hydraulic and mechanical. LOTO also requires a means to test its effectiveness before maintenance begins. That particular aspect is one area that can affect control-system reliability when ESTOP and LOTO functions overlap.

I once analyzed an accident involving mixed ESTOP and LOTO that occurred on a hydraulically powered foundry conveyor. The incident began when a maintenance worker used ESTOP to halt the power source and disable the actuator controls before servicing a high-pressure hose. Although the machine appeared dormant while the controls were tested, the energy trapped within the system was enough to remove his arm when the hose was disconnected.

One flaw in that LOTO process was the incorrect use of ESTOP to disable the solenoid valves. Subsequent testing after the ESTOP suggested that the conveyor was inert, but only the controls were dormant. The use of ESTOP, rather than a separate LOTO control mode, blocked the means to relieve the trapped pressure and properly check for stored energy.

1 of 2 < 1 | 2 View on one page
Show Comments
Hide Comments

Join the discussion

We welcome your thoughtful comments.
All comments will display your user name.

Want to participate in the discussion?

Register for free

Log in for complete access.


No one has commented on this page yet.

RSS feed for comments on this page | RSS feed for all comments