Seven ways plants solve problems, and why the best reliability leaders understand all of them

Factories are more than collections of machines: they are decision systems made up of people interpreting signals under uncertainty.

Key Highlights

  • Strong reliability leaders match the problem-solving method to the issue instead of forcing one approach on every failure.
  • Plants cut repeat failures by diagnosing root causes, not just repairing symptoms or replacing equipment.
  • Risk management prevents costly incidents by addressing hazards before failures escalate into shutdowns or safety events.
  • Long-term reliability depends on strategy, design, and workforce knowledge—not just maintenance execution.

People often assume that reliability engineering is purely technical work. They picture vibration analysis, thermography, oil analysis, root cause investigations, or statistical modeling. Those things certainly matter. 

My background in chemistry and engineering shaped much of how I approach industrial systems, but over time I realized something else was quietly influencing how I think about failures, decisions, and organizations. That influence came from my degree in philosophy.

Philosophy teaches you to examine assumptions, question causality, recognize flawed reasoning, and understand how people arrive at conclusions. In industry, that matters more than many realize because factories are not simply collections of machines. They are decision systems made up of people interpreting signals under uncertainty.

Once you begin looking closely, you discover that nearly every major discipline inside a plant follows a similar pattern of thought. Maintenance troubleshooting, executive planning, risk management, medical diagnosis, engineering design, and scientific experimentation all operate through slightly different versions of the same cognitive framework:

Observe the situation, analyze the information, make a decision, execute an action, and review the outcome.

The problem is that many organizations attempt to solve every issue using only one method. They apply maintenance logic to strategic problems, management logic to engineering problems, or operational urgency to scientific uncertainty. That mismatch often creates poor decisions, wasted capital, and recurring failures.

Understanding the seven major problem-solving approaches commonly found in industrial environments helps leaders determine not only how to solve problems, but which thinking model should be used in the first place.

1. The Scientific Method: When the Plant Does Not Yet Know the Answer

The scientific method is appropriate when the organization genuinely does not understand the mechanism driving a problem. This approach is common in laboratories, R&D environments, process chemistry, contamination studies, and advanced troubleshooting situations where assumptions may be dangerous.

The scientific method begins with a question, followed by research, hypothesis development, experimentation, and analysis. The key distinction is that the outcome is not predetermined. The organization must be willing to accept that its original assumptions may be wrong.

In many plants, teams skip directly to corrective actions without first validating whether they understand the failure mechanism. That creates “solution chasing,” where actions multiply while understanding remains poor.

Example: I worked with a chemical processing facility where they experienced repeated varnish formation in turbine oil systems despite maintaining what appeared to be acceptable operating temperatures and filtration levels. They tried several expensive corrective actions and yet failed to resolve the issue. 

We had the plant approach the problem scientifically instead of operationally, with controlled sampling, oxidation studies, electrostatic discharge measurements, and additive depletion analysis. These techniques revealed that localized micro-dieseling inside the pumps was generating thermal stress far above bulk oil temperatures. The eventual solution was not additional filtration, but pump redesign and flow modification.

The scientific method matters when the plant is facing uncertainty rather than merely inconvenience.

2. The Problem-Solving Model: When the Failure Is Known but Requires Structured Correction

Most maintenance and reliability departments operate primarily within the classic problem-solving model. The issue is identified, root causes are investigated, alternatives are considered, corrective actions are implemented, and results are evaluated.

This model works best when the system itself is already reasonably understood and the primary objective is restoring reliable operation. The danger comes when organizations confuse symptoms with causes. Plants often become excellent at repairing recurring failures while never truly eliminating them.

Example: I was at a packaging facility that experienced chronic motor failures on a conveyor system every six to eight months. Maintenance teams replaced motors quickly and efficiently, but failures continued. We administered a structured root cause investigation (RCA) which eventually revealed that the motors themselves were not defective. The actual issue was excessive belt tension combined with misalignment introduced during hurried production changeovers. Once alignment procedures and tension standards were corrected, motor life extended from months to several years.

The lesson was simple: efficient repair is not the same as effective problem solving.

3. The Board Strategy Process: When Leadership Must Shape the Future

Strategic decision-making differs significantly from operational troubleshooting because the time horizon is longer and uncertainty is greater. Boards and executive leadership teams are not simply solving today’s problem. They are deciding what kind of organization will exist years from now.

This process requires environmental scanning, trend analysis, strategic prioritization, execution oversight, and continual review. Many industrial failures begin not on the plant floor, but in conference rooms where strategic decisions unintentionally create operational fragility.

Example: I once worked for a manufacturer who aggressively reduced maintenance staffing and eliminated apprenticeship development to improve quarterly financial performance. For two years the decision appeared successful because labor costs dropped while production remained stable. However, the organization gradually lost troubleshooting depth, institutional knowledge, and precision maintenance capability. Failures began increasing, response times worsened, and contractor dependency exploded. 

Within five years, reliability costs exceeded the original labor savings several times over. The board had optimized accounting metrics while unintentionally degrading operational resilience.

Strategic processes matter because the future reliability of a plant is often determined long before the equipment actually fails.

4. The Risk Management Process: When the Goal Is Preventing Catastrophe

Risk management differs from ordinary problem solving because the event being managed may not have happened yet. The purpose is prevention rather than correction. This process involves identifying hazards, assessing likelihood and severity, selecting controls, monitoring performance, and evaluating whether safeguards remain effective. High-performing facilities understand that low-frequency events can still produce catastrophic consequences.

Example: I visited a refinery a while ago and identified that operators occasionally bypassed alarm response procedures during startup because production pressure encouraged faster transitions. No major incident had occurred, but risk analysis showed that under the wrong combination of process conditions, delayed response could result in overpressure events. This is not a situation you want at a refinery! The facility implemented procedural redesign, interlocks, simulator training, and revised startup governance. Months later, a process upset occurred that likely would have escalated under the previous operating culture. Instead, the safeguards functioned properly and the event was contained.

Risk management often feels expensive because organizations are investing in failures that have not yet occurred. The challenge is that once catastrophe arrives, prevention always appears inexpensive in hindsight.

5 The Management Decision Process: When Tradeoffs Must Be Balanced

Management decisions frequently involve competing priorities rather than purely technical answers. Cost, staffing, downtime, production pressure, safety, and customer expectations all interact simultaneously.

This process requires evaluating current conditions, considering alternatives, selecting a path, supervising execution, and reviewing outcomes. Many poor industrial decisions are not irrational. They are incomplete.

Example: I worked with a plant manager that faced a decision regarding an aging air compressor system with escalating repair costs. Engineering recommended replacement. Finance preferred continued repair because the capital budget was constrained. Operations feared downtime during installation. 

Management ultimately reviewed lifecycle costs, outage risks, energy consumption, maintenance labor, and production impacts together rather than evaluating only purchase price. Upon direction, the plant replaced the system during a scheduled outage and reduced energy costs significantly while eliminating recurring reliability disruptions.

Good management decisions require systems thinking because local optimization often creates enterprise-wide inefficiency.

6. The Health Treatment Process: When Diagnosis Must Precede Action

Medicine and reliability engineering share remarkable similarities. In both disciplines, symptoms can mislead investigators if underlying causes are misunderstood. 

The treatment model involves gathering history, examining conditions, diagnosing the issue, selecting treatment, applying intervention, and monitoring outcomes. Experienced reliability professionals often think diagnostically rather than mechanically.

Example: Not long ago I spent time at a power generation facility. They experienced elevated vibration on a large pump. Several teams initially recommended balancing because vibration amplitudes were high. After working with a senior analyst, we approached the issue like a physician rather than a mechanic. He reviewed maintenance history, operating changes, lubrication trends, and process conditions before recommending corrective action. The investigation ultimately revealed cavitation caused by upstream flow restriction, not imbalance. Balancing the rotor would have treated the symptom rather than the disease.

Plants that diagnose before acting usually spend less money and achieve better long-term outcomes.

7. The Design Process: When the Existing System Itself Is the Problem

Some industrial problems cannot be maintained away because the underlying design is flawed. In these situations, the organization must move beyond correction into redesign.

The design process identifies the need, generates concepts, develops solutions, builds prototypes or systems, tests performance, and improves iteratively. This model is essential when recurring failures indicate that the process itself is fundamentally unstable or poorly engineered.

Example: Recently while at a wastewater treatment operation, they told me they struggled continuously with plugging, inconsistent chemistry performance, and excessive sludge generation. Maintenance efforts were extensive, but failures persisted because the original system had been designed around highly variable influent conditions without adequate equalization capacity. Working with the engineers, we eventually redesigned the pretreatment process with improved flow balancing and staged treatment controls. The redesign reduced chemical consumption, stabilized operations, and dramatically lowered maintenance frequency.

Sometimes the most expensive decision in industry is refusing to redesign a flawed system.

The Hidden Pattern Across All Seven Models

At first glance, these approaches appear unrelated. One belongs to science, another to medicine, another to governance, and another to engineering. Yet beneath the terminology, all seven follow nearly the same intellectual structure.

  • They observe conditions.
  • They gather information.
  • They analyze relationships.
  • They make decisions.
  • They execute actions.
  • They evaluate outcomes.

What changes is not the underlying logic, but the environment in which the logic is applied. This matters because modern plants are becoming increasingly complex. Reliability professionals today are expected to think simultaneously like engineers, diagnosticians, strategists, risk analysts, and systems designers. Technical knowledge alone is no longer enough. Organizations increasingly succeed or fail based on the quality of their collective reasoning.

The best plants are not simply better at fixing equipment, they are better at thinking.

About the Author

Michael D. Holloway

5th Order Industry

Michael D. Holloway is President of 5th Order Industry which provides training, failure analysis, and designed experiments. He has 40 years' experience in industry starting with research and product development for Olin Chemical and WR Grace, Rohm & Haas, GE Plastics, and reliability engineering and analysis for NCH, ALS, and SGS. He is a subject matter expert in Tribology, oil and failure analysis, reliability engineering, and designed experiments for science and engineering. He holds 16 professional certifications, a patent, a MS Polymer Engineering, BS Chemistry, BA Philosophy, authored 12 books, contributed to several others, cited in over 1000 manuscripts and several hundred master’s theses and doctoral dissertations.

Sign up for our eNewsletters
Get the latest news and updates