As a wise singer once crooned, you have to “know when to hold ’em and know when to fold ’em.” But Kenny “The Gambler” Rogers merely had to beat long-shot odds to win at his game. Outside the casino, designers of industrial control systems don’t have the luxury of being right only 51% of the time. For many manufacturing and process systems, a control system failure — even for a second — simply isn’t an option. Hence, it’s important that control systems deliver safe and reliable performance, even when things go wrong.
Also important is the need to maintain production uptime; while additional control devices help prevent accidents, they also reduce uptime by increasing opportunity for nuisance trips. You need to find that delicate balance between safety, production reliability and overall cost when designing, operating and upgrading production control systems.
The established concepts of safety and reliability for industrial controls are detailed in ANSI/ISA 84.1, Application of Safety Instrumented Systems for the Process Industries. This ANSI/ISA standard applies within the United States and is equivalent to the IEC61511 standard in Europe and other areas. These standards reveal that statistical analysis of safety instrumented systems is a science in itself. Fortunately, only a few basic concepts are required to appreciate the simplified discussion presented in this article.
Safety in numbers
Although using only a single control device often is appropriate, much of safety instrumented system (SIS) design incorporates multiple devices to perform a single control function. The multiple units are cleverly arranged to accommodate the anticipated failure of any single device. Although formal terms such as replicated, complementary or diverse aptly apply to the various arrangements, the catchall term “redundant” is normally used to describe any flavor of multiple-device configuration.
The SIS concept uses an “M out of N” terminology to describe device configuration; reliability is based on M number of properly functioning components out of a total of N. This concept often is noted as MooN (spoken as “M out of N”). For example, 1oo2 (“one out of two”) might represent an arrangement of two relays in series; depending on context, this arrangement can safely shut down a process with only one of the two devices, or it can continue safe operation with only one of two. The terminology for each context is the same, but the applications are quite different. Further examples of typical SIS architectures include:
- 1oo1: A single fuse or rupture disk that limits an over-current or over-pressure malfunction in a near infallible mode.
- 1oo2: Two power supplies connected in parallel to accommodate shutdown of either one. Only “one out of two” is required for continued safe operation.
- 2oo2: Two high-level sensors connected in series that permit a tank inlet valve to open. “Two out of two” devices, both indicating there is no high level, are required to safely open the valve.
- 2oo3: Triple modular redundant (TMR) pressure transmitters configured in a voting system. “Two out of three” devices must agree to continue safe production should one of the three transmitters fail in any manner.
Figure 1. These relay contact motor control schemes show how the degree of reliability desired determines the degree of complexity needed in the control system.
Note that each of these examples addresses a specific malfunction of a control device. This important concept will be explored a bit more later on. Figure 1 illustrates four examples of increasingly complex SIS architectures; all are based on simple relay contact motor control.
Demanding reliability A key SIS concept used to evaluate reliability is called probability of failure on demand, or PFD. Its calculation is complex, and often controversial, but is simplified here to denote the percentage of time that a device is expected to not perform its control function properly. As with golf, the goal with PFD is a low score.
Different levels of PFD might apply to the same device based on its role in the overall system. For example, a pressure sensor might have a 4% probability of causing a nuisance trip, but only a 2% probability of causing an unsafe situation. Because these probabilities are calculated on a per year basis, and accumulate over time, a device with a 4% PFD is estimated to malfunction once every 25 years (4% failure/year x 25 years = 100% failure). And because the PFD is estimated for each device, the net reliability of a total system rapidly decreases if multiple devices affect a single control function. Therefore, low PFD values for each device are prime design criteria.