What a reliability office can do for you: Part 2

Get measurable savings and fix longstanding problems with a holistic approach.

By Jeffrey Ng, Kimberly-Clark Corp.

It’s well-known that performing maintenance on a condition-based or reliability-centered basis is the preferred method for plant maintenance operation. This is the safest and most cost-effective method, and it’s the objective for most organizations. But how does one begin the journey?
why a reliability office?

Read "What a reliability office can do for you: Part 1"

Infrared thermography

The infrared thermography program previously consisted of a yearly survey of all the motor control centers that was driven by corporate insurance requirements and focused on power distribution. This work was always outsourced, and every year several critical and severe issues were found.

After the team completed Level I Thermography training, routes were created to perform surveys of all the motor control centers, critical drives, and PLC panels on a semi-annual basis. These routes and surveys were more thorough than the yearly insurance survey and were focused on reliability. The routes were configured based on cabinet locations, opposed to which equipment was being powered. This increased the efficiency of the inspections and reduced waste. The routes were leveled so that the technicians performed thermography routes each month.

When the infrared routes began, a high number of reliability issues were found in many panels. Some of the issues were loose wires, unlubricated switches, broken fuse holders, and unbalanced loads. In the beginning of the program, many of the faults were deemed critical or severe. After the first 18 months, the number of reliability issues within the panels were significantly decreased, and now the rate at which issues are found in panels are one to three a quarter.

Working with open power cabinets exposes the technicians to arc flash hazards. To eliminate the safety risk, infrared windows were installed on cabinet doors to allow the inspections to occur without opening the cabinet doors.

Thermography surveys were not limited to electrical inspections. Rounds were set up to inspect the mill steam traps on a bimonthly basis. The first year of inspections yielded the repair or replacement of almost all of the steam traps in the mill. Many bypass circuits and valves were found to be faulty or in the incorrect position. Ultrasound was introduced about six months into the steam trap survey program. The ultrasound would confirm steam trap faults and would at times reveal faults that went undetected with thermography.

Thermography was further expanded to mechanical inspections of the asset. Routes were established to survey areas of the asset. These routes identified numerous faults, including bad bearings, failed check valves, and air leaks. This technology was also employed in tissue machine hood surveys. Leaks could be detected from a safe distance, limiting the technician’s exposure to the heat from the hoods.
precision alignment

As the reliability office became proficient with the initial three technologies of lubrication, vibration, and thermography, team members focused on the development of their skills with precision alignment. Formal training was conducted for not just the reliability office members, but for other mechanics and all of the mechanical engineers in the mill.

Precision alignment consisted of shaft alignment and soft foot. Before the reliability office, precision alignment focused on shaft alignment only and ignored soft foot. Best practices were developed and documented for precision alignment, base plate design, and foundation and grouting.

Precision alignment was easily tied to engineering practices. Many aspects of equipment design and installation practices to improve shaft alignment and to minimize the effect of soft foot can be addressed at the design level. This promoted design for reliability, instead of attempting to improve the reliability of the equipment after installation and startup.

Root cause failure analysis

Before the reliability office was instituted, the mill repaired the equipment to operating condition, but most of the equipment would fail again soon after. Failed components would be thrown away without investigating what may have caused the failure. When failed components were inspected, no documentation or sharing of the learnings occurred. The mill’s reaction to failures was to create a preventive maintenance inspection or replacement based on a time interval. This resulted in repetitive failures of the same or similar equipment, and a proliferation of preventive maintenance work that could not be completed within the work capacity of the mill or the planned downtime of the asset.

The reliability office began to investigate failures on major equipment. Root causes were identified and documented in root-cause failure analysis reports. The reports were shared with everyone in maintenance, engineering, and operations. Because the mill is one of six in the sector with similar assets producing similar products, the reports were also shared with the maintenance and reliability leaders at the other mills. The reliability office developed and authored the sector root cause failure analysis framework, which included the report structure and communication upon completion.

The purpose of the investigations was to find fixes that will prevent a repeat failure on similar equipment elsewhere in the mill. Countermeasures were developed for each failure, ranging from development of new assembly standards, to the application of precision maintenance techniques to new equipment designs.

Root cause failure analysis was employed to investigate the repetitive failure of the felt rolls on the tissue machine due to fretting corrosion. The bearings on three of the rolls failed after being in service for only a few months. The expected service life of the bearings was in excess of five years. The root cause failure analysis investigation found that although the bearing housings appeared to be dimensionally within specification when measuring the inner diameter, the housings were actually out of round and thus out of specification. The out of round bearing housings caused fretting corrosion of the bearings, leading to premature failure. New bearing housings were ordered with new specifications for roundness.

Team synergy

The reliability office provided the mill with a team of condition monitoring technicians who interacted on a daily basis. The technicians and engineers worked together to confirm equipment faults and develop corrective actions utilizing multiple condition monitoring technologies. For example, if an elevated temperature or vibration was detected, ultrasound would be used to discern a bearing fault versus a lack of lubrication or alignment. The combined use of technologies would help drive the proper corrective action. 

Having the team members trained on each of the technologies enabled better communication, better questions, and ease of interaction between the members. In one case, the lubrication attendant found metal shavings in the oil filter of the recirculating oil system. Unfortunately, there were three separate pieces of equipment in operation on the recirculating system, servicing a total of fifteen bearings. Which piece of equipment was failing?  The vibration technician used the vibration data to narrow it down to a single piece of equipment that consisted of two bearings, each of which could be changed independently.  One bearing would require two days to change, and the other required five days to change. Bearing faults could be seen in each vibration signature.

Ultrasound revealed that only one bearing had faults. A regular scheduled shutdown of the asset was planned a few days later. The reliability office team performed a bore scope on the suspect bearing during the planned downtime and confirmed the failing bearing.
results & Sustaining Progress

The total delay on a single tissue machine has been reduced 32% over the past three years, with unplanned mechanical and electrical delay reduced from 7.4% to 5.3% from 2014 through 2015. A significant portion of the delay in 2014 was attributed to process delay.

The mean time between failure (MTBF) of the tissue machines has increased by 35% over the past three years. A failure was defined as any event that caused the machine to stop making paper. MTBF has increased from 26.5 to 35.7 hours, a 35% increase.

There is little sense in building a program that is not sustainable. Sustainability cannot depend upon a single person or champion; it has to come from within the group. Building a culture that will nurture and support the reliability office is crucial. With the right culture, the reliability office can be a springboard to better morale, improved safety, and precision maintenance, ensuring a manufacturing site’s continued viability.

To share information and knowledge, we created a condition monitoring network, in which technicians from different sites meet via monthly conference calls to discuss learnings from failures and the use of the various monitoring technologies. The calls also address and help resolve common application issues across the sector. Of the most importance, the network lets sites work together to build technicians’ skills and support condition monitoring’s value across the sector.

Culture alone will not allow the gains of the reliability office to be sustainable. Team members will change over time, so how does the knowledge and practice transfer from the existing team to the new members? Written procedures and best practices will help. The documents provide a means to audit actual practices. If the written procedures are followed, then the data collected will be more consistent as it is gathered by the technician.

Documentation of each of the technology systems can identify areas that need improvement. System documentation helps with troubleshooting of the technology and processes, especially as the technology changes and improves. These documents provide models for new systems to be developed as the site adds technologies to their program.

Verification of corrective actions must be conducted. Collecting data after the equipment has been repaired and dissecting the parts replaced or repaired proves that the corrective action and analysis were correct. This will build confidence in the skill and knowledge of the reliability office.
Marketing of the reliability office skills and successes will show the value of the reliability office’s work and that the mill views its work as important to the longevity of the site.

The true sustainability of the reliability office will occur when its practices are firmly entrenched in the site and in the overall maintenance culture – when all failures are known before they occur and effective countermeasures are employed after a failure’s root cause has been identified.