# How to evaluate alarm limits for lubricating oils

## Use statistical evaluation to check alarm limits for machine lubricants.

*In brief:*

*Properly used statistical techniques are powerful tools for validating and improving the alarm limits applied during evaluation of oil samples taken periodically from operating machinery.**Alarm limits sets may be improved through the advantage afforded by using the cumulative distribution technique for evaluating alarm limit settings in nearly all measurement populations, whether normally distributed or skewed by one or more root causes.*

Properly used statistical techniques are powerful tools for validating and improving the **alarm limits** applied during evaluation of oil samples taken periodically from operating machinery. Limitations in the application of **statistical process control (SPC)** may make it advantageous to use the cumulative distribution technique described in ASTM D7720 from the American Society for Testing and Materials (www.astm.org). An actual case where a serious fault was detected, trended, and corrected reveals how effective cumulative distribution can be.

## Figure 1. Joey Frank tests oil samples per ASTM D7416 at his CSI minilab at the TVA Gallatin Steam Plant. |

Statistical techniques, including SPC and cumulative distribution, for evaluating alarm limits for **lubricating oils** in steam turbines and coal pulverizers, are defined in **ASTM D7720**, “Standard Guide for Statistically Evaluating Measure and Alarm Limits When Using Oil Analysis to Monitor Equipment and Oil for Fitness and Contamination.”

Data gleaned from more than 1,700 coal-pulverizer oil samples and more than 2,300 steam-turbine oil samples were collected and analyzed between 2002 until 2012 by Joey Frank and Stan Sparkman of the Tennessee Valley Authority (TVA) Gallatin Steam Plant (Figure 1). The **maintenance** and **reliability** department at Gallatin Steam Plant clearly handles lubricating oil data in a consistent and proactive manner in order to implement predictive maintenance strategies, avoid unexpected shutdowns, and extend equipment longevity.

#### Statistical alarms

Two primary kinds of statistical evaluations for alarm limits are described in ASTM D7720. One is for **normal data**, and the other is for causal data. In normally distributed populations, data plotted from low to high create a bell curve where the average value is almost the same as the median, or middle, value with similar tails on left and right.** Causal data** distributions are typically skewed so that the average value is much higher than the median value. Something obviously causes a portion of the measurements to increase in the latter case, making SPC unsuitable for evaluating alarm limits.

According to D7720, SPC can only be used when data is in “control,” in which case the data must be normally distributed. On the other hand, the alternate statistical technique, **cumulative distribution**, can be used with causal data, which is skewed, typically from moderate to extremely high values. Actually, much of the data produced through machinery monitoring are causal. For example, when measuring the amount of water or iron particles in oil, the intent is to identify and correct root causes, not control. The cumulative distribution technique is well suited to such cases.

To best demonstrate the principle of cumulative distribution, data from more than 1,500 separate measurements have been employed (Tables 1 and 2). However, modest amounts of data can be used just as effectively.

ASTM D7720 states the following with respect to data population size:

- 6.1.1.1 For SPC techniques using a normal distribution, caution should be used for data sets with fewer than 30 members. Tentative limits can be set from as little as 10 samples although the quality of the limits will improve with larger populations. Larger populations (for example, in the hundreds) can provide best alarm limits. However, the data needs to be representative of the equipment population.
- 6.1.1.2 For cumulative distribution techniques regardless of the form of distribution, caution should be used for data sets with fewer than 100 members. Tentative limits can be set from as little as 50 samples although the quality of the limits will improve with larger populations. Larger populations (for example, 1000 plus) can provide best alarm limits. However the data needs to be representative of the equipment population.

Parameter |
Count |
Average |
Median |
Distribution |

% Dielectric Change | 2,319 | 0.8 | 0.4 | Causal |

Viscosity @ 40 °C | 2,304 | 31.5 | 31.8 | Normal |

Ferrous Index | 2,304 | 43.1 | 0.0 | Causal |

PPM Water | 2,304 | 125 | 23 | Causal |

ISO >4 | 2,300 | 15 | 15 | Discrete |

ISO >6 | 2,300 | 14 | 14 | Discrete |

ISO>14 | 2,300 | 11 | 11 | Discrete |

Table 1. Turbine Oil Data |

Parameter |
Count |
Average |
Median |
Distribution |

% Dielectric Change | 1,754 | 0.4 | 0.0 | Causal |

Viscosity @ 40 °C | 1,754 | 201 | 199 | Normal |

Ferrous Index | 1,754 | 14.3 | 1.6 | Causal |

PPM Water | 1,754 | 10 | 0 | Causal |

Table 2. Pulverizer Oil Data |

The two populations of data employed for this demonstration were accumulated at the Gallatin Steam Plant as alarm limit sets within **Emerson’s AMS Suite: Machinery Health Manager**. The turbine oil population includes roughly 2,300 different in-service sample sets, whereas about 1,700 different oil sample sets were collected from coal pulverizer gearboxes. Statistics were automatically generated using the Export Statistics feature within the OilView tab.

Tables 1 and 2 present the count or number of measurand values, along with average and median values for each set of measurements. For several measurement parameters the average value is substantially higher than the median value. This is generally true for **zero-based measurements** such as percent dielectric, often called “chemical index,” ferrous index, and PPM water. These are all causal measurements, meaning they can reveal the cause of an evolving issue, such as lubricant degradation, freshly generated machine wear, or water contamination, respectively. Note that a median value of 0.0 indicates that at least 50% of all measurements are zero, not unusual for measurements that are specifically targeting potentially abnormal conditions. Since these measurements are all causal, they’re suitable for evaluation using cumulative distribution, but are not suitable for evaluation using SPC.

On the other hand, the statistics for viscosity at 40 °C have approximately the average and median values are essentially the same. This is a good indication that viscosity measurements are in control with a well-behaved, bell-shaped parametric distribution. These measurements are suitable for evaluation using either SPC or cumulative distribution techniques as described in ASTM D7720.

#### Plots of cumulative distributions

Figures 2 and 3 compare **cumulative distribution plots** for the **ferrous-index data** presented in tables 1 and 2 obtained from the samples of in-service turbine oils and coal pulverizer gearbox oils. The ferrous index is a measure of freshly generated iron wear debris, which is typically due to abrasion, adhesion, or fatigue wear mechanisms. You will see that 80% of the turbine oil samples and 35% of the pulverizer gearbox oil samples show a ferrous index of zero with the numbers escalating from that point. For turbine oils, the 90th percentile corresponds to a ferrous index of 2, the 95th percentile is 3, 97th percentile is 6 and the 99th percentile is 73. For pulverizer oils, the 90th percentile corresponds to ferrous index of 20, the 95th percentile to 45, 97th percentile to 95, and the 99th percentile to 320. Depending on the application, such threshold percentiles can be used to evaluate alarm-limit settings corresponding directly to low alert, high alert, low fault, and high fault, respectively.

## Figure 2. Eighty percent of the turbine oil samples show a ferrous index of zero with the numbers escalating from that point. |

## Figure 3. Thirty-five percent of the pulverizer gearbox oil samples show a ferrous index of zero with the numbers escalating from that point. |

Figures 4 and 5 show **cumulative distribution data** for **particle count of ISO 11171 code values** measured on approximately 2,300 in-service turbine oil samples. ISO code values are only reported in integers where each step between one integer and the next represents roughly a doubling of measured particle counts per milliliter. Therefore, the plot shows steps in what is called a discrete cumulative distribution. For ISO >6 measured on turbine oils, the 90th percentile corresponds to 16, the 95th percentile to 17, the 97th percentile to 19, and the 99th percentile to 21. For ISO >14 measured on turbine oils, the 90th percentile corresponds to 13, the 95th percentile to 14, the 97th percentile to 16, and the 99th percentile to 18.

## Figure 4. For ISO >6 measured on turbine oils, the 90th percentile corresponds to 16, the 95th percentile to 17, the 97th percentile to 19, and the 99th percentile to 21. |

## Figure 5. For ISO >14 measured on turbine oils, the 90th percentile corresponds to 13, the 95th percentile to 14, the 97th percentile to 16, and the 99th percentile to 18. |

#### Preventing a pulverizer gearbox failure

In a recent case, **statistical analysis** of oil samples saved a pulverizer **gearbox** from catastrophic failure. All measurements of oil chemistry and lubrication-system contamination (dielectric 2.21, water 0.0021%, viscosity 171 cSt) were satisfactory. However, the wear-indication data climbed sharply between July 2012 and October 2012, as shown in Table 3. The ferrous index measures 5 micron and larger iron alloy particulate matter in oil samples per ASTM D7416.

Sample Date |
Ferrous Index |

November 2010 | 0.0 |

January 2011 | 4.8 |

April 2012 | 7.0 |

July 2012 | 8.8 |

October 2012 | 124.0 |

Table 3. Pulverizer Gearbox Wear Data |

This pulverizer gearbox was approaching **high fault condition** with serious wear indicated by the high ferrous index. Subsequent microscopic wear debris analysis revealed brass particles, and analysis of vibration data confirmed that a bearing failure was in progress. This information led to a decision to replace the **bearing** immediately. The pulverizer had to be taken out of service for 10 hours, but a costly outage of approximately two weeks was avoided. If the bearing problem had not been identified and corrected, catastrophic damage to the pulverizer could have occurred.

Ray Garvey is engineer, machinery health management & CSI technology, asset optimization, at Emerson Process Management. Contact him at ray.garvey@emerson.com. Joey Frank and Stan Sparkman of the Tennessee Valley Authority Gallatin Steam Plant also contributed to this article. Contact Frank at jlfrank@tva.gov. |

In this case, SPC limits based on multiples of standard deviation (standard deviation = 63) would be grossly overstated because the data population is not parametric. The ferrous-index data population is better suited to use of cumulative distribution probability density calculations.

#### Summary

Alarm limits sets may be improved through the advantage afforded by using the cumulative distribution technique for evaluating alarm limit settings in nearly all measurement populations, whether normally distributed or skewed by one or more root causes. In one actual case, a serious bearing fault was detected, trended, and corrected, and the corresponding measurement data were compared favorably with the cumulative distribution information.