Behavior based reliability and the second failure
One of the many services our production equipment will perform for us, if we let it, is self-scheduling maintenance. If we fail to pay adequate attention to operating conditions of our equipment the equipment will notify us of the need for repairs. These repairs and production losses will cost something like five times what planned maintenance would have, and the notification will come with zero advance warning, but otherwise the service is free of charge.
Sometimes run-to-failure is a good policy, but not for most equipment. If the failing asset is at all important, as much planning goes into backup capacity and procedures as would be required to provide PM or PdM support for failure prevention.
The behavioral issue here is that for any piece of equipment to run to failure, one or more humans have to allow it. Equipment failures are normally signals that two failures have occurred: The equipment failure is, of course, one. The other, which normally occurs first, is the failure of the organization to predict and prevent the equipment failure.
There generally aren’t very many types of behavioral failure, and the types that do exist are pretty understandable. In some cases top management may have elected to “save” money on reliability support. Sometimes the preventive or predictive maintenance tools that are needed haven't yet been developed and put into place. It may be that operators have observed the developing failure and failed to respond, due either to lack of ownership or lack of training. It is often true that a work order to prevent the equipment failure is in the system, waiting to achieve the priority it needs to be carried out. In this case resources or priorities are mismatched to the needs of the plant.
Behavioral issues and omissions are not difficult to identify. An RCA approach based on “the five whys” will usually be enough to help a team determine what kind of organizational behavior led to the failure. Once that determination is made, it isn't hard to figure out what kind of equipment monitoring, operator training, maintenance scheduling, or other correction will be required to prevent a recurrence of the failure. As long as management does not allow the RCA to degenerate into a "blame game," good, actionable information will come from the reviews.
If every emergency work order is subject to an "equipment failure" RCA to identify technical causes and an "organization failure" RCA to spot behavioral causes, prevention will become feasible. The terms "equipment failure" and "organization failure" are keys to the process, because the acknowledgement and correction of the behavioral issues is essential.
All this assumes that the amount of emergency work in the plant is tracked and reported to the organization. If it isn't, then management should start with an RCA to find out "Why don’t we understand the causes of surprise failures that cost several times what they should to fix?" Note that the blame game must be avoided here as well. The first question shouldn’t begin "Who doesn’t care enough to find out . . .?" That might be a good question to ask yourself, though.
Please remember – self-scheduling maintenance is a luxury none of us can afford. The "Amen!" you just heard came from your customers.