As an engineer in a maintenance group within a multinational pharmaceutical manufacturer, I see many conditions to which most plant professionals can relate. We’re under pressure to reduce both costs and staff while increasing reliability. Maintenance people, like most of the production support staff, have an extra twist added to our challenge. We don’t directly add value to the product our company sells. At best, what we deliver to our businessmanagement team is reduced costs and increased reliability. Ironically, the management team adds no direct value to the product, either. So a big question looms: How can maintenance people find a way to convey their value?
Philosophy
Instinct and feeling are personal assets that experienced maintenance people cultivate over years but use every day. In the name of reliability, maintenance technicians and supervisors need to know more about individual system performance data that many either take for granted or aggregate into larger groups of data.
A critical strength of any system health metric is that it reveals whether a system is stealing money. Energy costs aren’t going to stop rising. There’s no point spending finite resources to detect a system failure if the everincreasing energy bills have already put us out of business. Fortunately, diminishing efficiency often is a warning about equipment failure, so looking upstream in process time makes a lot of sense. Stockholders and upper management sometimes appreciate this sort of attention to the way money is spent.
Data is everywhere, and more arrives every day. Fortunately, the kind of data that you’ll need to generate a health metric is available from any system that controls equipment, distributes power or monitors a building. And these data already have been gathered, delivered, validated and paid for.
Also, our global neighbors in the European Union, India and China are in the same boat as far as this analytical challenge is concerned. Everyone must abide by the same laws of physics. Nor is this challenge an instance in which inexpensive labor can provide any advantage.
Experiment and data
That’s enough “thinking globally.” What about the “acting locally?” Several years ago, a happy set of circumstances allowed development of a modest experiment to explore some actual tools that might lead toward such a “health metric.”
The experiment is now a project that explores what it might take to develop a useful refrigeration system efficiency or “health” measurement. Some maintenance people might recognize this approach under the name “condition monitoring.” Marrying it to a valid statistical framework would allow business managers to make better decisions about refrigeration systems.
I established an experimental platform using a household freezer from Best Buy and attached to it a datacollection system cobbled together from surplus parts. This freezer has been subjected to many of the same conditions that its larger cousins would encounter in industry.
Table 1 shows some interim data from this platform. It lists a variety of conditions under study and the load placed in the freezer. Varying the load was simple. It consisted of adding or subtracting halfgallon jugs of antifreeze. An EPA license allowed me to transfer the R134a refrigerant in or out of the system using a recovery pump and a tank as a reservoir.

The quantity of R134a refrigerant was varied from the 129gram nominal charge. The middle column shows the duty cycle associated with each of the controlled conditions. Duty cycle is the fraction of the total possible run time that the compressor must operate to handle the imposed load. Capturing the current draw during each duty cycle permits forecasting the corresponding annual energy consumption.
The first piece of information to be gleaned from these data is that even near the nominal factorystandard refrigerant charge, there’s a lot of noise. By this I mean that repeated runs at similar or identical conditions produced somewhat variable results. One reason is that the system is sensitive to ambient conditions (Table 1). This is true of all refrigeration systems.
Had these data followed a normal distribution, regression analysis would help remove the effect of ambient temperature. It would then be straightforward to compare different charges and the corresponding system response. Many researchers are attacking this aspect of the problem to solve other challenges.
Data analysis
The big question remains: How can a maintenance department move most directly and effectively toward achieving this goal?
The approach I’ve chosen to explore is data analysis coupled closely to statistical validity. There are three valuable tools that you might find useful: principal component analysis, wavelets and PAST, the latter being an acronym for a particular type of analytical software originally designed for paleontological applications and available at http://folk.uio.no/ohammer/past.
PCA works
Principal component analysis (PCA) is statistical in nature and is a powerful tool for exploring the characteristics of data taken from a machine or process. It’s most useful when we begin looking for ways to improve maintenance process knowledge. PCA helps you decide whether the data that you’ve already collected is suitable for the problem you’re trying to solve.
PCA ignores any arbitrary assignment of variables or groupings present in a data set. Instead, it assesses the total variability present in the entire range of the data. PCA identifies whatever element or similar group of elements is responsible for the largest amount of variability. This variable or group of similar variables becomes the first principal “component.” PCA then performs the same operation on the remainder of the variability that wasn’t accounted for during the first pass. This means that the next component identified is as independent as possible from the previous one. PCA repeats this assessment as many times as there are variables. If the process being modeled really has only two independent variables, PCA won’t be fooled, even if you collect data that purport to contain seven assigned variables. PCA will come back with two independent components having high scores and five components with scores that indicate they are merely statistical noise.
By way of illustration, suppose you track some measurement in both metric and English units. PCA won’t be fooled. I will define them as statistically identical and treat them as a single variable. Next, PCA ranks the combined X and Y components in order of their ability to produce the observed data range variability.
It ranks each component in terms of its power to affect the data set. If a variable contributes a lot to the way the process behaves, its score — its Eigen value — is high. PCA can help us make the critical decisions about whether to measure something or ignore it. Table 2 is an example taken from the refrigeration study data.

It represents the results of applying PCA to 11 variables assumed to affect the freezer model. A few of the variables were ambient temperature, system “superheat,” current load, current refrigerant charge, compressor amperage draw, and the freezer skin temperature.
Any component with an Eigen value of less than 1 is to be discarded. Theoretically, each remaining component could be responsible for about 9% of the variability. If a component takes more than its 9% pro rata share, it’s a more important variable. The scores for other components must drop proportionally.
So, Table 2 tells us that only three variables (four at most) were important. The way to link components back to the variables with which you started is to eliminate a few variables that seem weak in their ability to drive things. At the same time, be sure that lowscoring components disappear at the same time. You won’t break anything, so keep trying.
Free software
The best way to learn and use PCA is by downloading software called PAST. It’s free, fast, easy, well documented and well supported. PAST makes it easy to apply PCA or several other statistical techniques to data stored in an Excel spreadsheet.
Check your wavelets
The next gem is the “wavelet.” These are members of a special class of mathematical functions and represent the general mathematical case of which Fourier analysis was the first and, for 150 years, the only known particular example. Alfred Haar discovered his infinite general mathematical case in 1909, as part of his doctoral thesis. Haar proved that there is an infinite group of functions — not just sines and cosines — that can be superposed to produce or duplicate any waveform or signal. Unfortunately, just about everyone but his review committee ignored his work for almost 60 years. Wavelets were resurrected and named in the 1970s to wring more value out of the data spikes that resulted from detonating underground charges in the search for oil.
Wavelets have an amazingly wide range of applications. They’re especially appropriate for analyzing irregular or nonrepeating events in a dataset. Their use with time series data is powerful and mathematically valid. Time series data must always be considered in the order in which it occurred. It’s often found around machinery or engineered systems that we want to run better, faster or cheaper. One example of time series data is vibration monitoring. There are a lot of things we can do to these data, but we must preserve its time location, or its meaning is lost to us. If all you could do is average your vibration data, any ability it had to warn of a change would be compromised.

Analysts use wavelets for “lossless” compression of signal data to avoid sacrificing any possible value that it might contain. Normally, when we shrink a data set to make it more manageable, there’s always a probability that we’ll toss out evidence of the thing we were looking for in the first place.
Lossless data compression eliminates this risk. Wavelets also support thresholding for noise control. Thresholding reduces the size of a data set by discarding any value above or below some cutoff. It’s worthwhile for data gatherers and analysts to investigate wavelets.
I hope that you find this information helpful and that it prompts you to testdrive some of these tools. If we can provide useful data, we’re helping ourselves and others.
Before using statistics, though, frame the problem statement so it requires what the tool was designed to deliver. Go forth and analyze.
Stephen Puryear is a quality engineer in the Facilities Operations group of Novartis Vaccines and Diagnostics in Emeryville, Calif. Contact him at [email protected].