Podcast: RCA 101 – When, how, and how often you should conduct root cause analysis

In this episode of Great Question: A Manufacturing Podcast, Shon Isenhour and Brian Hronchek from Eruditio explore the importance and function of root cause analysis.

Thomas Wilk

June 6, 2024

14 min read

Podcast RCA 101 – When, how, and how often you should conduct root cause analysis

Root cause analysis helps engineers analyze asset performance and identify the source of machine failure. But how many RCAs are enough for your maintenance program, and how can you use them to change behaviors instead of just fixing the assets? Shon Isenhour and Brian Hronchek of Eruditio join us for a discussion on how to optimize your time spent doing RCAs so they have maximum positive effect on your plant.

Below is an excerpt from the podcast:

PS: This podcast is a follow up to a previous series of podcasts done with Brian on FMEAs. We thought we would take on a new topic that's related to FMEAs – root cause analysis, or RCA. What I’d like to talk about today is, what is its primary function when it comes to building on a maintenance strategy, and then talk about some of the tricks of getting through it? Some of the pain points, some of the things that you have seen in the field where people either stumble when they're doing it or that they've learned over time, how to finesse those issues.

BH: Thanks Tom, and you remember we talked the last time about the purpose for some of these tools, right? Some of them are very similar, where we're measuring risk. But the question is, what are we measuring risk for? Why are we measuring risk, or why we're measuring loss?

So remember criticality, it's that theoretical importance of an asset to the business. We're not necessarily focused so much on performance, we're focused on “what if.” What if something happens? And if that “what if” is bad enough, we're going to use a FMEA or FMECA to further evaluate that risk and come up with a good maintenance plan. But that maintenance plan won't be perfect, and we're going to have failure.

So, at some point, we use root cause analysis to analyze the actual performance, to come back with a plan to improve the maintenance plan. Then we go back into the FMEA or we go back into our asset strategy and we add just what we found during the RCA and make it a permanent part of how we maintain the assets. So maybe that puts it in context or ties all the tools together.

PS: It does, it does. How often would you perform an RCA on a single asset? Is it an annual thing, or is as-needed?

SI: That's where I think it's a little different than the FMEA, where we may want to go in and update it on a reoccurring basis. We're typically going to use that RCA where we've experienced a failure. So where we've run into a downtime issue that meets certain triggers, or a loss of quality or a loss of throughput.

PS: When it comes to RCAs, an RCA will be useful on assets of a same asset class, but in different plants. So, you couldn't go to an RCA library exactly to pull down some of the results from one plant and compare it to another, or could you?

SI: You actually can, there's a lot of times that I'll go back and look at RCAs that we've done in other facilities to think about what's possible, what might have happened. You do have to be kind of careful because if you take an RCA from one plant and you bring it to another, it's not in the same operating context. It may not even be experiencing the same kind of problem. So you don't want it to steer too far off to one side of the other. You really want to make sure that that you're using it to supplement, not to build.

BH: Yeah, I'd say that's probably one of our biggest misses in industry, is that we don't necessarily share. So to Shon's point, that doesn't mean that you have to adopt the solution that comes out of an RCA on a similar asset from a different plant. But if you don't take the time to review it, you could end up with the same failure in the future, having had the opportunity to fix it before it ever happened. So it's always great to share those results, share the information across like assets in other plants, that you at least have the opportunity to determine is this a risk for us or is it not? Should we adopt this or should we just leave it?

PS: Well, for those to whom RCAs are new or relatively new, Brian, you blocked out for me the functional elements of the RCA. Could you run through those real quick for us?

BH: Shon has a video about this on YouTube, but when you know you traditionally look at RCA, root cause analysis, and we start with the foundation, which is the Five Whys, it's the most basic tool you can ask: Why did that happen? Well, why did that happen? And you keep thinking why? And what you can do if you do that is you can lead yourself in circles and never get any deeper into solving the problem.

But if you put some sort of structure to the purpose for asking that question, you can get deeper. So say there is an effect, a motor failed. Why? Well, there's a physical cause. There was a lack of grease. OK. Well, why? There was a human cause, because someone didn't grease it. And you ask why again? Well, Bob didn't grease it because there was no PM created for that, which is the systemic cause. And then you ask why again? Well, the reason we didn't create a PM is because leadership doesn't believe in proactive maintenance. So you get finally to the lowest, the deepest root, which is the latent cause or the leadership responsibility for why that did or didn't happen.

Asking “why” is to get as deep as you can into that structure, because the deeper you solve it, the more problems it solves. If I just grease that bearing, I prevent that bearing from failing next time. But if I go to leadership and train them on the importance of a lubrication program, I've fixed it for all of my assets.

SI: The only struggle is, it is typically harder to implement solutions in those systemic and latent levels, those lower levels, because they truly change people, and the people change side obviously becomes the harder part in a lot of cases.

PS: That makes sense, we heard some about that this morning from the keynote where he talked about the trick is getting engagement and buy in from these teams to solve the deeper problems. Would that be the number one tip to look at when it comes to doing RCAs, is to understand what the rollover effects would be, once you get deep down into the analysis?

SI: I think probably one of the first tips I would start with is doing the right amount of RCA. I think a lot of people hear about root cause analysis as a way to solve problems, and then they go back and they ask for a root cause on everything.

The problem is it takes time, and that's probably the second issue. A lot of folks think that you can complete an RCA overnight and have it for the next morning's meeting. Um, you don't have the evidence, you don't have the data. All you're doing is jumping to conclusions and trying your best to support them so that you can make that statement in the morning meeting. And once you make that statement, no one cares anymore. They're not going to ask again.

The first thing for me is making sure that you're doing the right amount of RCAs, and I have a rule of thumb, it's not perfect, but it's two good RCAs per month, per RCA practitioner, so per leader, if you will, of these events. The reason I say that is because that gives you enough time to truly do a full analysis, something more than a Five Whys and get down into the details, gather the data, potentially even send things off, and have them reviewed and sent back. Secondly, now I can get down into the systemic and latent roots, which will also take time to implement. You don't want to overwhelm the organization.

Now, on the flip side, you don't want to do too few either, because if you do too few, then people just start forgetting to even do them or they forget how to do them, and they become ineffective and inefficient at them. So that's probably my primary tip. The second one I think or the second or third depending on how you look at the list, is this idea of getting down into systemic and latent roots, and not just fixing parts.

PS: When it comes to getting to the systemic and latent roots, it would seem to me that it would take some courage for people to probe that deeply, for fear of upsetting whatever structure is in place to get maintenance done, fear of embarrassing some folks in the maintenance team. Do you find that happens?

SI: That's absolutely true, because every single problem that you face in your organization will have a human root, and a lot of people don't want to hear that.

For example, if you go back to “the gearbox seized,” well, the problem’s maybe that we lost 8 hours of production. So then you say, OK, what was the root cause? And somebody looks at you and says “the gearbox seized.” Well that's that very first level of the root cause. So if we now go underneath that level and look, what we're going to find is those human causes.

If you stop there, you create a culture of blame, and I have lived in that world in my past life, years and years ago in industry where they were really driving to whose fault it was, as opposed to what we could do to the systems and the leadership portion, which is those systemic and latent issues. Now, if we drill down to the system and blame the system and not the people, as Edward Deming said, then we get away from blaming that individual who maybe made the gearbox fail. Now we start looking at what enabled him or her to make that gearbox fail or to introduce that defect.

But then we go even further, like Brian did, and now we're getting down into the latent roots of not having a precision maintenance program. That falls back on the leadership and I think that's one of the things that that really drives exactly what you're saying. There is some level of fear because if you drill deep enough, it's always a leadership issue. It's a culture that was created, and culture is created by leadership decisions.

PS: Someone made a choice on the line, or someone decided to go with the choices made years ago and stick with them, which is the choice in itself, and here we are today with the asset failure.

SI: Absolutely.

PS: Let me ask you a question, Brian. Are there asset classes that are more benefiting from RCAs than others? Or would you want to focus on assets after you do a criticality analysis and focus on just the critical assets? Or would disposable assets, say a motor 50 hp or under, which is probably going to be replaced rather than repaired, would those assets benefit from RCA as well?

BH: I think anything can benefit from an RCA, and the question back to what Shon said, is less about “am I focusing on the right asset?” and the question is more of “am I solving the right problem?” So if business success is measured as measuring downtime or safety, or the environment, you're measuring something that drives the business to success. And when you have a problem that eats away that success on a big enough scale, you say, “well, that's the problem I need to solve.”

If I have one catastrophic failure and that blows up a motor that costs a million and a half dollars and six months to replace, that's a really big problem. But if I also have 5,000 small motors that are failing every three months because of some lubrication problem, because of a lubrication practice, that might also be just as impactful. It's to look at that trigger like Shon said, look at the trigger, what is the biggest problem to the business and let me focus there.

SI: One of the things I think about is, there are three things in my mind you need in order to be successful with an RCA. The first is an RCA process. How are you going to make sure that you do it when you're supposed to, but you don't do it when you shouldn't. And that's going to include triggers, it's also going to include things like following up to verify that the solution that you put in place actually fix the problem. So that's your process from end to end for RCA.

The second thing that I believe you have to have is you’ve got to have tools, and this where I kind of upset some folks in the industry I think sometimes. But I don't believe you can do it with one tool. I don't believe I can solve the world's problem with Five Whys and fishbone. I think I've got to have more than that, and the kind of the way I equate that is, you know, I have an old 67 Lincoln Convertible and it's very unreliable and I would never get into that car with just an adjustable wrench. I'm going to take a toolbox with me because I know I'm going to need everything in that toolbox before it's over with, and I think that's really a way to think about RCA. You need a toolbox of tools that you can use effectively, depending on the situation and the problem.

We talk about three different kinds, the first being the tree tools. These are the branching tools like fault tree and logic tree, and of course Five Whys would be the simplest version of that. We also talk about time tools, like sequence of events. So there are certain situations where I actually need to see what happened and where it happened in that process, so those tools can be very effective when I have video data or when I have log data, those sorts of things. And then we also talk about a third set of tools which we call the transparency tools, and that's actually where we reach back to the podcast that Brian did a few weeks ago, and we grab the FMEA and the FMECA, and we're using that to analyze the way something that can something can fail.

Now I told you there are three, and that's two – the first was you need a process, the second was you need a set of tools. The last thing that I feel like you need is the people side, and that's people to help you do the RCA. I hear these stories of engineers getting in a room by themselves and doing an RCA, and they miss a lot of the components that that really lead to it, so they end up with a recurring failure, and quite frankly, they look a little silly. On the flip side, we have got to have people involved in understanding the problem. We've also got to have people involved in implementing that solution on the other side.

Read the rest of the transcript

About the Podcast
Great Question: A Manufacturing Podcast offers news and information for the people who make, store and move things and those who manage and maintain the facilities where that work gets done. Manufacturers from chemical producers to automakers to machine shops can listen for critical insights into the technologies, economic conditions and best practices that can influence how to best run facilities to reach operational excellence.

Listen to another episode and subscribe on your favorite podcast app

About the Author

Thomas Wilk

editor in chief

Thomas Wilk joined Plant Services as editor in chief in 2014. Previously, Wilk was content strategist / mobile media manager at Panduit. Prior to Panduit, Tom was lead editor for Battelle Memorial Institute's Environmental Restoration team, and taught business and technical writing at Ohio State University for eight years. Tom holds a BA from the University of Illinois and an MA from Ohio State University

Podcast: RCA 101 – When, how, and how often you should conduct root cause analysis

Keep listening

Read the rest of the transcript

Listen to another episode and subscribe on your favorite podcast app

About the Author

Thomas Wilk

editor in chief

Related

Maintenance Mindset: AI and the industrial future—What maintenance work will get automated and what will endures

Maintenance Mindset: AI and the industrial future—What maintenance work will get automated and what will endures

Trending

The human side of industrial safety: PPE, ergonomics, and heat stress trends in manufacturing

From compliance to prediction: How data is redefining industrial safety

Grid instability and asset health: Why industrial plants must rethink energy resilience and maintenance