Use data to turn problem assets into reliable production systems

Sept. 10, 2012
Stanton McGroarty says gather a core team and perform RCA for perennial equipment problems.

Maintenance is truly data driven when the data are guiding us to the most significant maintenance work we could be doing in our plant and showing us how to solve it. This is the second in a three-installment series, “Ready, Aim, and Fire,” named after the time-honored sequence for permanent solutions.

We agreed in Part I to use data to guide us to a couple of the biggest, most costly recurring maintenance headaches in the plant. Further, we said we’re ready to solve these problems when we have identified them and quantified what they are costing the organization. The cost information should be adequate to fuel the discussion that will inspire the small, cross-functional team we will need to set the two remaining steps in motion.

In the aim step, we will gather a core team and perform a special root cause analysis (RCA) for perennial equipment problems. The core team will determine which corrections we must implement to convert our problem assets into reliable production systems. When large, perennial problems are examined, it is almost always true that they are really bundles of related problems with multiple causes. The multiple causes usually require multiple solutions. The multiple solutions usually require help from a variety of functions.

These solutions will require the usual maintenance work orders, but they won’t stop there. Solving big perennial problems may also call for a mix of improved production processes, new PdM procedures, engineering changes, training support, new gages or inspection procedures, and new shop floor discipline.

Management support is essential. A change management program and a communication plan may also be needed, depending on the size of the required effort. Occasionally the input from marketing or customer-contact people may be helpful. Clearly, a cross-functional team is needed to create and organize the work list for this kind of problem solving. The aim section is designed to bring the need into technical and financial focus and to propose the next steps to design and install the fix.

When building your core team, start with the owners of the data that will bring each failure mode into focus. Using the ready exercise as preparatory work to determine which information each one has, create and convene a team to develop the story of the problem equipment. If a fully populated CMMS is available, this is simply a printout. More likely you will want to gather the people who maintain the key data files on the equipment under investigation. This will include the production supervisors, perhaps with clerks, who can place production on a calendar, along with the reasons for lack of output on slow days. The group should also include maintenance planners or leaders who are responsible for corrective action in the area under discussion. If metering, controls, or electrical people are from a different group, you’ll need to have them represented, as well. Each group should be able to position its data on a calendar or time line so that area performance problems and the corrective action relating to them can be tied together and placed on the calendar. Whoever orders maintenance parts should also have a chair at this table.

Core team report contents

The core team’s report to the organization should include the elements of a business case to attract interest in the proposed corrections. They must also include a complete, though not fully detailed, list of the corrections that must occur. For each failure, or group of identical failures, this should include:

  • number of failures
  • time and/or production lost
  • cost of lost production
  • cost of other losses, including non-financial cost of safety, ecological issues, customer relations, or other impact
  • cost to repair equipment, splitting labor and material, if appropriate
  • root cause of failure or group of failures (25 words or less)
  • succinct description and cost, where available, of improvements required to solve root cause (equipment modification; process and/or engineering changes; training, including list of groups to be trained; other improvements as appropriate)
  • description of next steps toward building a solution package for the group of root causes.

This information is deliberately not rolled up into a project because much of the needed information has not yet been developed. A broader group than the core team will be needed to develop and execute the full improvement project. That will be the fire portion of data-driven maintenance.

In your organization there may be other players who should be present for discussions of failures. Add them. What you’re looking for is a group, facilitated by you or someone else you choose, who will go through the past year or two of your problem machine’s downtime history with a discussion that starts something like this:

“What happened on Jan. 12? We lost five hours.”

“The main pump spun a bearing and we had to pull one off the shelf to replace it. The rebuild was a big job; we couldn’t stay down for it.”

“Why did it spin a bearing? Isn’t there a PdM program to check ultrasound or vibration on a key piece of equipment like that?”

And the RCA starts. Maybe a 5-whys approach will be thorough enough for determining most failure causes, but watch for the subtle ones. They may need whatever your more precise and thorough reliability approach is.

Your group should have the data and be the right people to support a discussion of the exact kind of repairs and their costs. This should include the cost of lost production. Bring in someone to report on the overall cost of expediting and premium freight to see if these are large enough to be added to the analysis. They often are. The same may be true of maintenance overtime, including the cost of the work that doesn’t get done because a key player was called in the night before.

My favorite reviews of this kind had multiple screens going with the foremen’s log, production data, work order histories, and repair material transactions all on display. On the wall was everything the company knew about the events. Each screen was being driven by the person who owned the data. The techs who fixed the problems were also in the room. We could usually drive straight to answers. A handy byproduct of these sessions was that the participants all came away understanding the value of a fully populated CMMS.

J. Stanton McGroarty, CMfgE, CMRP, is senior technical editor of Plant Services. He was formerly consulting manager for Strategic Asset Management International (SAMI), where he focused on project management and training for manufacturing, maintenance and reliability engineering. He has more than 30 years of manufacturing and maintenance experience in the automotive, defense, consumer products and process manufacturing industries. He holds a bachelor of science degree in mechanical engineering from the Detroit Institute of Technology and a master’s degree in management from Central Michigan University. He can be reached at [email protected] or check out his .
Subscribe to the Strategic Maintenance RSS feed

Early in the process, but probably not right at the beginning, it makes sense for the group to search for repetitive failures. They must be sure not to assume that two similar failures are identical, but, if they are, it is efficient and informative to process the RCAs for them simultaneously. Usually a quick check of the spares used will help determine whether two failures are identical. Some of my South African colleagues call this step “piling the corpses.”

Depending on the frequency of failures and the time period covered by the review, this RCA process may run for several days. This is a good investment of time when the prize is a solution of one of the plant’s most serious maintenance and reliability headaches. It is too big a project to run against every troublesome machine, but if the organization has been doing a good job of populating the CMMS, then a much larger percentage of machines can be processed this thoroughly.

At the end of the aim analysis, the team will build a report listing a year or two’s failures on the equipment under review. For each failure they will list one or more root causes, the cost of the event in lost production, and the cost of repairs. They will also outline the type of correction that will be needed to make the process whole and reliable. It is at this point that the team, with management support, will need to identify all the corrective projects that must be conducted to really fix the problems that drove all the root causes.

The team conducting the review is called the core team because it forms the core of the task force for correction, but it is not the whole team. At this point the core team will identify issues like training needs or improved production processes that must be solved by skill groups that are not yet present on the team.

The team will probably be able to draw up maintenance notifications or work order requests that are complete and specific enough for use by maintenance. For other kinds of corrections, those requiring help from outside the core team, detailed specification of the action to be taken is not possible. Even if it were possible, the corrections would not have the needed support from the groups that will have to do the work. This is one of the important reasons why the fire step will have to include some recruiting of new members to fill out the team and some well-developed explanation of the situation that must be corrected.

It is unfortunately true that most RCA efforts stop at this point. After all, the core team members understand the problem, and they have issued maintenance work orders. What else could possibly be required? This is the topic we will address next in the fire phase.