Podcast: Boosting equipment reliability with smart maintenance scheduling strategies
Key takeaways
- Condition monitoring won't be perfect at first—expect a learning curve and commit for long-term gains.
- Avoid running time-based and condition-based maintenance systems together—it wastes resources.
- Pair condition monitoring with root cause analysis to drive real business impact.
- Protect condition monitoring resources from daily disruptions—require plant manager approval to reassign.
Joe Kuhn, CMRP, former plant manager, engineer, and global reliability consultant, is now president of Lean Driven Reliability LLC. He is the author of the book “Zero to Hero: How to Jumpstart Your Reliability Journey Given Today’s Business Challenges” and the creator of the Joe Kuhn YouTube Channel, which offers content on starting your reliability journey and achieving financial independence. In our monthly podcast miniseries, Ask a Plant Manager, Joe considers a commonplace scenario facing the industry and offers his advice, as well as actions that you can take to get on track tomorrow. This episode explains why running time-based and condition-based maintenance in parallel is a costly mistake.
Below is an edited excerpt from the podcast:
PS: Hello and welcome to Ask a Plant Manager a special series from Great Question: A Manufacturing Podcast. I am Anna Townshend. I'm managing editor of Plant Services, and with me is my co-host, Joe Kuhn. He is a retired plant manager, industry consultant, author and YouTube influencer. Joe, as always, very happy to have you here with me. I know you always say you're mostly retired, and I pause for a moment to dream about what that would be like, and then back to reality. But we thank you for lending your free time to us, even though you say you have a lot of it in retirement. So I know you do have a new grandchild now, so she's probably taking up a little bit of that time, but we're happy to have you here with us.
JK: Yeah, absolutely new grandchild. And my daughter and son in law have a house. So I'm the actually the free handyman, so I'm keeping busier. They live just 10 minutes away. So I'm excited to be back for another question. Maybe I can help somebody with some of the lessons I learned over my 32-year career.
PS: Awesome. All right, so today we're going to talk very specifically about maintenance scheduling. So you know, there are different ways you can do this. Many facilities schedule their maintenance work on a time schedule. So every week, every month, every six months, depending on the asset or the care that it needs, and that maintenance is always done on that schedule, really, whether it needs it or not.
And of course, then we have condition-based maintenance, which uses tools and technologies to monitor equipment condition and then schedule maintenance based on that information that's collected. Generally, I'd say condition monitoring is probably going to be more accurate or efficient. In theory, I'm not sure that's always the case, but it's definitely a more proactive maintenance schedule.
However, it can be a big change for facilities to move from time-based to condition-based maintenance, and of course, there's always a lot of ways to screw that up along the way, I guess so. And Joe, that's why we have you here to highlight those mistakes that we might not be thinking about. So where does this transition from time-based to condition-based maintenance often go wrong, and how does operations and maintenance typically fail in this transition?
JK: Well, I actually have a lot of experience with this going wrong and how to get the ship right. And I've done a lot of consulting on it too, to help people not learn things through hard knocks and failures. When you're going through culture change, which going to condition monitoring from time based maintenance is a culture change, a lot of people will want to hang on to the past, so you really want to get this right, to create some wins and some enthusiasm. But the first thing I want to say is, you can't expect condition monitoring to be perfect, and especially not perfect on the first day. You got a vibration monitor out there, and you got it on a route, or maybe even continuous monitoring, and then the equipment fails, and unbeknownst to you, it just failed. And everybody points the finger and say, ‘Why didn't you know this? I thought this was going to be the end all, be all,’ and it's not going to be right out of the gate. So that's just something to go in with. It is a massive game changer for uptime and lowering your cost. I'm a massive fan of it, but don't expect it to be perfect right away.
The second thing I want to say about what can go wrong is, and this is wrong from the get-go, is the maintenance organization and the operations organization will want to continue to do time based maintenance on top, or they'll do their predictive maintenance on top of their time-based maintenance. So you really have two systems running at the same time, that's just going to cost more money. Just one example, I had that was just crazy is we had a PM to change the oil out of this hydraulic machine every quarter. And we also were doing oil analysis every month. You don't need both of those. It just cost more money, which really turns into resources, and then you don't have enough resources to do everything, and so it just gets it bogs the system down. The cost goes up. The naysayers will look at what you've done and say, ‘Well, our costs have gone up, and we're really not seeing any impact. Why are we even doing this?’ So you have to get the time-based PMs out of the system that are covered by condition monitoring. Don't run two systems. This is a problem, a big problem.
The third one was a major learning. Third thing I want to say is a major learning for me, and I give credit to Ron Moore. He's got a book called Making Common Sense, Common Practice. We were doing a lot of condition monitoring, and we weren't seeing the business impact. I'm talking about several months. And I read his book three times, and on the third time, I picked up on something that I missed the first two. It's on page 220 and this is in 2003 so I remember the page number 22 years later. Page 220, of my edition of that is, you’ve got to add problem solving. The gold mine for condition monitoring is you're going to find anomalies, like a vibration anomaly early on, and when you change out that bearing, or whatever that source is, a vibration, you have to learn from that what happened to cause that. Was it the type of bearing you bought, the application, how is it being used, how is it being maintained? What was the root cause of that? If you don't add root cause problem solving to your condition monitoring process, you will be disappointed. I promise you'll be disappointed. So that's really what I want you to remember. You have to put in a problem-solving system or it's just going to look like more work. Hopefully that makes sense.
Another thing, this is operations gets involved in this and you find an anomaly. Say it's a heat anomaly. You’ve got infrared, and you're monitoring some switch gear, and you’ve got some heat. The first thing the operations guy or gal is going to say is how long can we run it? How long can we run it? When's it going to fail? When you've got heat, something like heat, it could go from, 100 degrees to 400 degrees in 15 minutes. Or it could be in 15 days, 15 weeks. You really don't know. What you know is you got an anomaly there. Now, maybe over time, you can build up some information on how long it advanced, how fast it advances. But production always wants to say, ‘Hey, it's only Monday. We got an outage coming on Saturday. Can we make it to Saturday,’ and that's a guess. That's a guess.
What we started doing is trying to address it as soon as possible. And that may be, you're making a product change on Tuesday, and it's Monday, and you say, ‘Hey, let's take an extra hour and fix this. Let's take an extra two hours. You want to find those problems when they're real small, so you don't have the final failure. Also when they're really small, it goes back to that problem solving I just talked about, when you have a pile of ashes on the ground where something just completely burned up. It's hard to troubleshoot that when you’ve got a very small anomaly, a heat anomaly, a vibration anomaly, water in your oil. You’ve got something going on. But it’s so much easier to troubleshoot before and root cause before final failure.
One of the biggest things that this is moving on, one of the biggest problems people have with transitioning, is not marketing the change, it's going to look like more work. It's a change. People are going to be at best on the fence about this change. You'll have a few people that are super excited about it. Then you'll have the naysayers when you find an anomaly on, say, it's vibration on a pump. And you change out that pump, and you find the root cause, you've got to communicate. ‘Hey, we fixed this in two hours in a planned way. Last time this pump failed, it went down for three days. It cost us $75,000 so make that little announcement to everybody. Everybody's going to know the failures that you have, but you've got to communicate the failures you didn't have, and that is a big job in sales.
If you don't do sales right, whenever the business cycle changes, and you need to make some changes, you need to cut some cost, people may look at condition monitoring because, ‘Well, we added four people, we added some training to doing this. We bought some equipment. We're not getting any results from this,’ and then they cancel it. You’ve got to market it hard. You’ve got to maintain your sponsorship.
The last thing I want to say, and all these are big, all these I'm talking about with passion on because they were all hard lessons learned. Okay, pulling the resources that are assigned to condition monitoring. Say you have four people doing routes. They're doing routes on IR, UE, vibration and lube. They're doing all these routes. Well, these people are typically highly trained. They're good problem solvers, and when there's an emergency of the day, they're an easy resource to say, ‘Let's just pull these guys into today's emergency, and on paper, this happens. I'm telling you, I can't believe this is almost 100% of, I'm going back a few years, but 100% you'll assign four people, 40 hours a week. That's 160 hours to condition monitoring that goes into planned work on your KPIs. But when you talk to the technicians, they'll say I was pulled into emergency work. Three days out of five, four days out of five, two days out of five. I've never gone to a plant where they were left alone 100% okay, so that's a trap. How do you fix that trap? Very easy. Easiest thing in the world to do cost no money is require plant manager approval to pull the PDM resources off PDM work. As the plant manager, I never got a phone call after making that rule, never, and I don't know a plant manager that has ever got the phone call. It's just the easy button for a supervisor to pull those people and you've got to commit to this. That's the point. You’ve got to commit to this. Don't commit to it on paper. Commit to it with time and resources. So those are my real world, hard knocks, scars from shifting to a condition monitoring culture. It is great when you get there, but these are missteps I've seen almost in every location.
PS: Great, lots of good stuff there. I would like to reiterate the good points you made about what condition monitoring is not; it is a major culture chain. It's not going to be perfect right away. It's not going to fix everything right away. You’ve got to do that RCA on those assets and the anomalies you see, and problem solve. Do that root cause analysis to figure out what's wrong, to actually see those results. And like you said, you really have to commit to it before you're going to see the results. One follow up question I want to ask you. You talked about this mistake of running two systems where you’ve still got your time-based maintenance going on top of adding condition-based monitoring. What do you think the cause of that is? Is that a culture change thing where, say you bring in a new engineer to do the condition-based monitoring, but you've still got engineers focus on time based? Is that a personnel issue? What do you think is the cause of that generally?
JK: Well, I'll say one word. It's comfort. It's comfort. It's something you know, time-based maintenance. You’ve got a reputation based on time-based maintenance, and you'd say, ‘Why don't we just add condition monitoring onto that, and that really reduces risk of being wrong.’ So what's going to happen? I didn't expand on this, but what's going to happen is you may take away 100 PMs, time-based maintenance PMs. You may be wrong on three of those, and three of them you had a failure, yet you should have been doing time based maintenance on and you pulled out 100 and you're wrong on three. And what you’ve got to do as a leader is decide, are you okay with that? And many maintenance managers and operations managers, they aren't okay with that. And to me, I look at that, ‘Hey, I made up the bat 100 times, and I got 97 hits.’ If you're afraid to make a mistake, you're going to do this wrong. You're going to spend more money. You're not going to fully commit. You've got to be prepared to say this is the best practice. This is what the best minds say to do. This is what the best companies are doing. We may be slightly less than perfect in our application of this, and you’ve got to be okay with that. The comfortable thing to do is to keep the 100 PMs and add condition monitoring on top of that, you just added cost. You added cost, and you're confusing the organization on what you're trying to do. You've got to get ‘Hey guys, this is where we're going. We're going to condition monitoring. We're not going to just rebuild the engine. We're going to rebuild the engine. When the engine tells us to rebuild it.’ We're going to have more uptime, we're going to lower cost, and we're going to problem solve, and we're going to get better every day. So yeah, absolutely, it's comfort.