You would think that with the recent rise in popularity of the internet of things (IoT) and the veritable explosion of smart devices in all walks of life, that an article entitled SMART PMs, would deal exclusively with technology. While smart-enabled devices and machine connectivity are revolutionizing the way our factories work, we cannot lose sight of the fact that our maintenance strategies have to be as smart as the technology-laden assets we hope to maintain.
Smart-enabled devices are essential for maintaining our factories’ competitive advantage, but if you focus as much energy on developing smart maintenance strategies, your organization will run like a well-oiled machine and develop a reputation with its customers as the reliable supplier of choice.
SMART is an acronym generally defined by the five elements which comprise its name: Specific, Measurable, Achievable, Relevant, and Timely. A practitioner can use SMART criteria to develop predictive maintenance (PdM) strategies, preventive maintenance tasks (PMs), and can also review existing PMs to optimize their effectiveness (PMO) as well.
If reliability professionals are serious about developing SMART maintenance strategies, I can’t think of a better way to start than by completing Failure Modes and Effects Analysis (FMEA) studies for the primary asset classes found within their manufacturing facilities. FMEA is the linchpin on which SMART maintenance strategies can be quickly built. And quick is the name of the game in our ever increasingly fast-paced, results-oriented world. Don’t believe me? When is the last time your boss extended a due date for you to complete a task?
Performing classical FMEAs can be both labor intensive and exhausting for your personnel, and most facilities will not invest the time or resources to complete studies of such a comprehensive nature. Traditional FMEA studies will quickly burn your people out, and when this happens, the initiative will fall apart.
At my facility, we use a streamlined approach which I call FMEA Object Type Templates to capture the most common failure modes and cause codes for a given asset class. In this way, the primary failure modes/cause codes can be identified for the entire asset class holistically, so SMART maintenance strategies can be quickly established and implemented factory wide. Figure 1 shows an excerpt of the FMEA Object Type Template completed for the rotary steam tube dryers for our site.
Notice columns four and five in Figure 1. Column four identifies the detection method or proactive maintenance strategy used to address the failure mode/cause code.
Figure 1. Example of Partial FMEA - Rotary Steam Tube Dryer
You will see that both condition-based and time-based maintenance strategies are identified, with time-based strategies being assigned to both operations and maintenance personnel. Column five defines the specific task which will be assigned to address the failure mode. Once the appropriate strategy and executable tasks have been chosen to best address the failure mode, one can apply SMART criteria to ensure the inspection and executable tasks produce meaningful results (i.e., costs are reduced and downtime is decreased).
Smarten up your PdM strategy
Figure 2 illustrates the questions one might ask when applying SMART criteria to your PdM program. Let’s dive into the details of what a SMART enabled PdM strategy might look like, using vibration analysis as the example technology.
Figure 2. SMART Criteria for PdM Coverage
The S for Specific asks whether the technology can detect the failure mode with enough certainty to make the investment of time and energy worthwhile. Carefully matching the correct technology to a specific failure mode is an important first step in creating an effective maintenance strategy. By using the FMEA study findings and applying multiple technologies to specific failure modes, you will ensure that your PdM strategy will detect faults early enough that the maintenance department can plan, schedule, and execute the work before collateral damage and full functional failure occurs.
Once the technology has been selected, there must be adequate Measurements in place to ensure the fault will be detected at its earliest stage. In the case of vibration analysis, the forcing frequencies should be calculated for the entire machine train so band alarms can be set up for the various faults which may occur. Integrating this mindset of holistic asset care into your CapEx process is an essential element in this approach.
As an example, whenever you purchase a gearbox on a new capital project, the requisitioner should require that all of the bearing and gear information should be submitted to the site reliability engineer by the supplier. In this way, forcing frequencies such as the gear mesh frequency can be calculated prior to commissioning the equipment. And one should always commission the equipment throughout the entire range of operation at startup, so that baseline data can be documented and stored in the database. Vibration commissioning data can also be used as acceptance criteria for whether precision installation practices were followed.
To illustrate the A for Attainable in the SMART acronym, we might ask if the resource is truly dedicated to the task at hand. Sometimes the resources used for PdM work wear many hats; it is almost as if their predictive maintenance assignments are an afterthought, and not their primary focus.
If you want your PdM program to produce “best practice” results, then the resources must be dedicated to this role. If you use internal resources for PdM, and every time emergent work comes up and you’re short people, you pull the vibration technician off of assigned route activities, then you will miss faults and possibly suffer the consequences. This is the point where the organization must decide whether to use internal resources or to outsource PdM. Both approaches have pros and cons, and ultimately it is up to the site to determine which is the best approach.
The R for Realistic might involve the level of training the resources are given to ensure they are adequately prepared to detect and analyze faults. For vibration analysis, I would suggest, at minimum, the lead technician be trained to ANSI CAT III; if the site has a number of complex machines or turbomachinery, then the lead technician should also have a fundamental understanding of methods such as Modal Analysis and Operating Deflection Shape (ODS) analysis. Again, whether you build these capabilities in house or outsource this work is up to site discretion.
Suffice to say that if the internal reliability technician does not spend a reasonable amount of time using the technology or specialized method of fault detection afield, they will never gain the expertise necessary to become proficient. I’m an avid pheasant hunter and have had a number of German Shorthaired Pointers in my lifetime. In my prime, these dogs saw more action in one year than many dogs do in a lifetime. Like my bird dogs, the more different situations your analyst sees afield, the better they will get at outsmarting their quarry. In the case of my dogs, it was gamebirds; for the analyst, it is machine faults.
Finally, that brings us to T for Timely. Is the inspection interval frequent enough to detect the majority of machine faults? If you are having emergency failures for the specific failure mode between inspection intervals, then you might have to adjust your inspection interval. This brings me to the subject of route-based inspections versus 24/7 monitoring. Recent technological advancements have made 24/7 condition monitoring both affordable and available, and it should be evaluated for its suitability for your more critical assets.
Engage the operators
SMART criteria can also be used to develop an effective operator driven reliability (ODR) program. I do not believe you can build a “best in class” asset management strategy without fully engaging operations. If you believe that anywhere from 25-40% of equipment related failures are due to the way production operates equipment, then you have to find a way to engage the front-line operators in your war against defects.
Building a successful ODR program involves more than just asking operators to inspect their equipment. Many operators are not adequately trained on basic maintenance troubleshooting techniques. They do not understand the cause and effect relationship between the symptoms (warning signs) and the consequences (the resulting functional failure). Remember, their primary job is to get product out the door.
When a machine starts showing the first warning signs of functional failure, it still has the ability to perform its basic function. A pump with a leaking mechanical seal is still capable of moving product from Point A to Point B. Operators, in a culture where reactive running mode is the norm, learn to ignore the warning signs, make do, and wait until full functional failure occurs before writing notifications. “If it’s not broke, don’t fix it” rules the day.
If you want operator-led inspections to improve your overall equipment reliability, I suggest you develop inspection PMs using SMART criteria. Figure 3 illustrates the questions one might ask when developing a SMART ODR program. Again, let’s dive in a little deeper and go behind the questions.
Figure 3. SMART Criteria for ODR Program
I cannot stress the importance of having inspection tasks which are both Specific and Measurable. The inspection task should be “point specific” and not generic in nature. The point specific task should directly address a known failure mode. For example, if your PM says Check pump, the inspection comments you receive back very well might say Pump OK. Write your inspection criteria as specifically as possible for improved results.
As another example, take the following inspection task: Check whether a rotary dryer trunnion bearing is receiving adequate lubrication. If the task states “inspect lubrication on trunnion bearing” you might get several different answers, and any notification that is written will be more opinion then fact. But if your inspection task is point specific, and instructs the operator to count the number of drops of oil dripping on the trunnion in a given time interval, you will be making a specific request and receive a specific answer. The task may further ask the operator to determine whether the oil is being evenly distributed across the trunnion’s surface, and provide an explanation illustrated by two pictures – one where the oil film was uniformly colored across the surface, indicating an acceptable condition; and one where a tell-tale darkening or discoloration occurred on one end of the surface, indicating a potential misalignment issue.
Keep your operator-led inspections simple. Once again, remember that the operator’s primary responsibility is to get product out the door. Your operator basic care program should serve three purposes: First and foremost, you are trying to help the operator gain an appreciation for the impact their role plays in equipment reliability. As long as maintenance is seen as the maintenance department’s responsibility, your asset management strategy will never be best practice. Equipment care is everyone’s responsibility, just as servicing the customer is not solely the responsibility of operations. When silos are broken down, reliability excellence takes root and flourishes. Remember, 25-40% of failure modes are caused by equipment operation.
Second, the simple corrective actions that operators can perform when they gain an appreciation for how important these tasks are will greatly extend equipment lifetime. I know it sounds simple but it is worth repeating: keep heat and dirt away from the machine, keep the machine clean, and keep it from vibrating.
Last but not least, when done correctly, operator inspections can identify specific failure modes so notifications can be written that are meaningful and relevant. Operator-led inspections are not meant to diagnose the root cause of failure; they are meant to identify the symptom of machine failure with enough advanced notice so the issue can be effectively troubleshot by a highly skilled craftsmen or your planner.
Many facilities have hundreds, if not thousands of assets, and operators can become additional sets of eyes and ears which can act as a first filter to identify opportunities so limited maintenance resources can focus on corrective actions. The trick is to write inspection criteria in a manner so failure modes can be identified early enough to allow successful troubleshooting, work planning and scheduling before collateral damage and full functional failure occur.
Creat SMARTer PMs
SMART criteria can also be developed to create new PMs as well as to complete a PM optimization study. Most organizations have written numerous PMs without looking at them from a strategic point of view. Many PMs are created from manufacturers’ recommendations, or because someone in operations requests that one be set up. The problem with this approach is that you are assuming the recommendation is coming from someone who thinks about maintenance the same way you do!
PMs are most effective when the equipment has a component or components which exhibit time-based wear, or where a hidden failure is present which cannot be easily detected without a visual inspection. Classical RCM teaches that the vast majority of equipment failure modes are condition-based, and therefore random in nature. Depending on your operating context, condition-based failures can comprise approximately 90% of your failure modes. When an organization uses FMEA to build its maintenance strategy, then subsequently develops SMART criteria to establish rules for PM creation, they will usually find a number of PMs that are “non-value added” and can be deleted when they complete a PMO study.
PM optimization (PMO) is a great tool to determine whether a PM is “value-added” or not. PMO looks at the failure modes it is designed to address to see how effective it will be once executed. Figure 4 shows an example of SMART criteria which can be used to establish new PMs as well as to perform the PMO study.
Figure 4. SMART Criteria for PM Creation and Optimization
The first step is ensuring that new or existing PMs actually address a specific failure mode. A good rule of thumb is, if the failure mode is not directly identified in the FMEA as one which will benefit from a time-based inspection, then the PM should not be created; and if the PM already exists, then it should be deleted.
Once you determine that a PM is actually needed, it should be written in such a manner that it has objective inspection criteria. If your PM says Check agitator, you are apt to get craft comments back that state Agitator OK. The PM should be written with quantitative “as found” and “acceptable condition” criteria.
Let’s look at our example for the rotary steam tube dryer. If a task in the PM was to measure the tire runout, then you would want to have “fill in the blank” placeholders for the measurements plus the acceptable limits (tolerances) stated right in the PM. Figure 5 illustrates this point. Avoid statements such as Tire runout within tolerance Y/N; require readings to be taken and recorded to avoid subjectivity.
Figure 5. Example of “As Found” and “Acceptable” Condition PM Criteria
One question that many people ask is, Should my inspection task include corrective actions for “as found” conditions out of tolerance? This is up to the site’s discretion, but here is the approach we take at my facility. If the corrective action can be done within the PM scheduling window, if the risk of full functional failure is imminent, or if delaying the corrective action may result in collateral damage (increased costs), then you should complete the corrective action during the inspection interval. If you routinely correct all issues found at the time of inspection, you might have a wide variability in the time it takes to complete the PM, thus making scheduling a difficult task. This variance will show up when you complete the PMO study.
When one is determining whether the task is relevant or not, it is helpful to consider the following.
- More often than not, does the PM task result in corrective actions or generate follow-up corrective action work orders (WOs)?
- Does the OEE improvement and failure maintenance cost avoidance justify the added expense of completing the task?
If the answer to these two questions is a resounding no, then chances are the PM is not worth doing.
Once a PM has been completed several times, it is worth going back and reviewing the maintenance history of the asset. If failure maintenance is occurring between inspection intervals for the specific failure modes the PM addresses, either the PM was not written well, is not being completed correctly, or quite possibly the inspection interval is too long and should be shortened. It is at this point that you should also consider whether lengthening the interval is appropriate if corrective actions are not routinely being identified.
PMO will help fine tune the facility’s preventive maintenance program. The combination of performing FMEA Object Type Templates to identify maintenance strategies for the primary failure modes of your most common equipment types coupled with PMO will reduce the vast majority of the “non-value added” work at your facility.
“Non-value added” work elimination can be greatly enhanced if the facility has a good criticality ranking system and accurate cost accounting. A quick review of the equipment list will uncover assets that have little to no impact on the business and low overall maintenance cost. If you have PMs on equipment that meet both criteria, and the PMs you have established are not being completed to meet a safety, environmental, or customer compliance issue, then it is highly likely that the PMs are “non-value added” and can be eliminated.
Once the preventive maintenance plan has been scrubbed of “non-value added” maintenance plans, the facility should resource-level the remaining PMs. As PMs are created and initial call dates set, there will be peaks and valleys in the volume of work due to the arbitrary nature of the creation date, and this will cause capacity constraints. If the site does not evaluate the overall PM schedule by craft and calendar date, then inefficiencies will result.
The manufacturing world is rapidly changing with the advent of smart machines, and those who do not adapt will be left behind. But as our assets become increasingly more complex, do not forget that people are still responsible for maintaining your equipment. The reliability professional who develops SMART maintenance strategies will gain a competitive advantage and propel their organization to best in class performance in this, our ever increasingly performance-driven marketplace.
If you would like to hear more about my philosophy on SMART maintenance strategies or asset management in general, I would love to hear from you.