Many organizations struggle with a high level of uncertainty when it comes to asset reliability. Whether equipment will be available when it’s needed to meet customer demands may be anyone’s guess. Loss of availability results in increased human and monetary costs and often can jeopardize safety or environmental regulations compliance.
When digging deeper to determine the root causes of unreliability, you’ll often find the MRO storeroom not functioning well, the PM program poorly designed (at best) and without use of effective condition-based approaches, a lack of maintenance planning, and a minimal weekly maintenance schedule. Add to that limited partnerships with other stakeholders, such as production teams, which often don’t make the equipment available for maintenance. Production personnel often lack standardized work practices themselves and induce failures while operating the equipment.
To establish a solid foundation for reliability, the organization must first address the basics. There are eight steps to accomplish this, presented here in no particular order.
1. Align the organization for reliability
In conducting this alignment, there are three questions to address:
- Which job positions are affected?
- What are the role expectations?
- How is position performance measured?
I once taught a maintenance planner-scheduler course for a large facility. During my time there, I spent a few hours each day outside of class on a smaller-scale effort to educate the production supervisors. I asked a group of 10 production supervisors, each of whom had responsibility for separate lines of production equipment, “When downtime strikes, whose line is most important?” They all raised their hand in the air.
Most organizations lack formal work processes that come in the form of graphical process workflows, RACI or RASI charts, and definitions documents. While these may seem trivial to many and extra work with little return, the reality is that many organizations don’t have clearly defined roles and responsibilities. They may have job descriptions when hiring, but the activities are often different when in the field. Efforts overlap and are often duplicated in the name of getting the job done. The onboarding process for new team members takes longer than necessary. These work processes define the roles required as well as their responsibilities.
In most organizations, I look for the positions identified in Table 1, along with the spans of control where applicable. When looking at the spans of control, consider them with respect to organizational size and process maturity. The spans of control are guidelines, and the numbers are intended for multiples of a position, meaning that if 55 technicians exist, then two to three planner-schedulers are required.
On process maturity, if the organization is new to or not robust in its maintenance planning and scheduling activities, it’s better to err toward the lower number until the processes are established. For example, with a maintenance planner-scheduler, use multiples of 20 technicians. In smaller organizations, don’t assume that you don’t need a role because the number of technicians falls below the lower number on the span of control. You can easily justify a full-time planner-scheduler for 8–10 technicians. The position of maintenance or reliability engineer often is overlooked in smaller organizations. This position is all about continuous improvement. I encourage organizations to dedicate some level of resourcing to the function, even if it is a technician devoting four hours per week to looking at problems and equipment history in an effort to improve.
2. Determine your maintenance strategies
I have found on site visit after site visit that organizations are doing too much time-based preventive maintenance. In many cases, those same organizations are issuing multiple PMs for a given period on the same asset (i.e. weekly or monthly basis). Many of these PMs are the result of knee-jerk reactions to past failures and are not generated from a root-cause perspective. I have seen as many as 10 separate PMs with the same weekly PM frequency on the same machine.
Sadly, most of these PM tasks (40%–60% from RCM2 studies) fail to address any likely failure modes. Most organizations in the top percentile of performance do only about 20% of their physical maintenance based on time. Moreover, intrusive maintenance introduces failure to an otherwise stable system at the rate of 70% or more. Winston Ledet, in his book “Don’t Just Fix It, Improve It,” says that 84% of failures are due to poor work behaviors.
The essential point is that to be effective, there must be a basis such as reliability centered maintenance (RCM2, RCM3, or FMEA) that couples proven methods and your equipment experience to define a maintenance strategy. These strategies will be a combination of condition-based, time-based, and predictive technologies. As part of the analyses, you will also determine failure modes that must be addressed from the perspective of training, standardized procedures, or re-engineering. Once these strategies are defined, they are implemented in the CMMS and triggered for execution.
When failures occur, a root-cause process (RCA or RCFA) should attempt to identify the root causes. The maintenance tasks and strategies should be reviewed to ensure the likelihood of preventing or mitigating the consequences of failure. If changes are required, follow through must occur to ensure implementation.
3. The MRO storeroom
A vital component to ensuring effective work execution and improved reliability is a well-managed MRO storeroom or materials management process. The reality is that most storerooms are either models of excellence or very poorly executed. There does not seem to be much middle ground when it comes to an organization’s storeroom practices.
When poorly managed, it is common to find storerooms with more than 50% of the materials being obsolete. Old removed (i.e., worn-out) parts litter the storeroom shelves, waiting for reuse, only to fail quickly when they’re installed. Drive belts hanging from pegs on the wall are cracked and dry-rotted. Conditions like these are counterproductive to ensuring asset reliability. The storeroom is or becomes a cost burden instead of a profit center when poor practices exist.
Materials should be identified and acquired in advance for planned work. These materials should be should be assembled in kits and staged in secure areas for the forthcoming work. The storeroom should have a PM program for spare rotating equipment, and the following practices and processes should be implemented:
- First in, first out (FIFO) by using date stamping practices to address shelf life
- Obsolescence management with “where used” for all stored items
- Bills of materials maintained in the CMMS
- Accurate nameplate data and item masters for both stock and nonstock items
- An effective storeroom layout based on ABC principles
- Adequate security and item transaction processes
- Minimum/maximum and safety stocking levels.
4. Identifying and prioritizing the work
From a best-practices perspective, 90% of all work should be planned and scheduled. However, many reactive organizations engage in 60%–90% unplanned work. Every hour spent planning the work saves three to five hours in execution. But to plan it, it needs to be identified in the CMMS. By identifying the work, regardless of whether it’s corrective, emergency or urgent, helps provide an equipment history. We understand how long an asset is down, the reliability of the asset, and where we are spending our maintenance dollars. The maintenance or reliability engineer then can utilize this equipment history to improve asset reliability and reduce overall costs.
However, for several reasons in the reactive organization, it is common to find that work, especially emergency or urgent work, is undocumented in the CMMS. Some sites do a little better with requiring technicians to complete a work order to charge storeroom parts. Maintenance resources are finite on a given shift as a rule, so you need to deploy them based on asset criticality (risk) and priority. To reach the 90% planned levels, you can’t do everything as emergency or urgent work – you only have so many resources in each time frame or shift.
One way to do this is to establish a priority matrix for execution (see Table 2). Notice that there are three planned work priorities beyond PMs. This approach allows us to segment our planned work. It also ensures that we work on all priorities rather than just a single routine work class or priority code, so that less-critical work doesn’t fall off the radar screen, frustrating those who requested it. When this happens, the response often is to re-enter the work request as “safety” in the hope that the work will get completed.