CMMS data sampling at Amtrak

Track equipment reliability through analysis of operations and maintenance data.

By Sheila Kennedy, Contributing Editor

Amtrak’s maintenance organization is responsible for 5,334 units across the United States, including 20 different equipment types ranging from auto carrier cars, commuter cars, diesel locomotives, electric locomotives, high-speed train sets, private owner equipment, and more. Some of the equipment travels great distances, and all of it is subject to heavy wear.

Reliable Asset WorldAlex Gotera is the reliability engineer for Amtrak and a member of the three-person Amtrak RCM Team. “We did some reliability centered maintenance (RCM) analysis of the new electric locomotive,” said Gotera in his presentation at the Reliable Asset World conference hosted by UE Systems in Clearwater Beach, Florida. Gotera described how Amtrak uses data sampling in order to optimize operational performance and maintenance reliability, and to fill the information gaps in its computerized maintenance management system (CMMS).

“The people who work at Amtrak make a big effort to provide good service and to please the passengers by making sure they travel on time and comfortably. It’s not a very easy task,” said Gotera. “Our main performance measurement is endpoint on-time performance. We are shooting for 85%.” Certain service types have better performance rates than others. The long-haul trains have the lowest endpoint on-time performance at 71.4% for the fiscal year to date, while the best performers are the Acela train sets at 90.3%.

To track equipment reliability, reports called the Incident Work Orders Per System summarize the number of work orders for the top 10 systems in a fleet. In the example he shared, locomotives have a higher incidence of automatic train control work orders. “Automatic train control seems to have the most opportunity, because it appears to produce the most delays and is the lowest performer,” said Gotera. “However, we are missing important information. First, there is no performance reference established. Do 42 incidents represent an acceptable performance? Is it possible to get to zero, or is 42 great because it could have been 100? Secondly, we don’t know the cause of the incidents. What events occurred that resulted in writing the work orders?”

GoteraThose two key pieces of information are not being entered in the CMMS system. To find the answers, Gotera considered several options. He could benchmark performance against a similar operation, such as a European train company with similar trains. He could implement a commercially available failure database system, but software is only as good as its data. He could develop a formal reliability model of the operation, but that’s typically very expensive.

Instead, he chose a sampling and judging approach to determine the incident causes. “We designed a process to pull from the full universe of fuzzy CMMS data a representative sample of events and records, and go in one by one and judge what the cause of each incident was,” explained Gotera. “It’s like survey polling. You get a feel for the conclusions from a sample population.”

Gotera shared an example of this approach. The target data set included incident work orders during 338 days for all rolling stock. The querying was for cause categories, including equipment failure, debris damage, operator error, maintenance error, inspection error, nuisance trip, and other. The control set consisted of delays in the same period by division, and the sample size was expressed as number of dates. There were two control parameters: percentage of total delays by division and percentage of total delays by responsibility code in each division.

Sheila Kennedy is a professional freelance writer specializing in industrial and technical topics.Sheila Kennedy is a professional freelance writer specializing in industrial and technical topics. She established Additive Communications in 2003 to serve software, technology, and service providers in industries such as manufacturing and utilities, and became a contributing editor and Technology Toolbox columnist for Plant Services in 2004. Prior to Additive Communications, she had 11 years of experience implementing industrial information systems. Kennedy earned her B.S. at Purdue University and her MBA at the University of Phoenix. She can be reached at

A representative sample of 57 dates was chosen, and 532 incident work orders were written on those 57 days. Up to 10 minutes were spent judging the cause category of each work order before labeling it “undetermined.” The causes of the equipment incidents in this exercise were:

  • 40% equipment failure
  • 15% maintenance error
  • 11% nuisance trip
  • 9% inspection error
  • 7% debris damage
  • 4% undetermined
  • 2% operator error
  • 11% other

“We were excited about this,” said Gotera. “It showed that maybe there is a way to deal with the data gaps in our CMMS. We can cost-effectively determine both performance reference and causation and then work on what’s contributing to the errors, whether it’s insufficient training, lack of coordination, or some other cause,” he added. “The CMMS was not written for maintenance workers to provide feedback; it was written for managing work. You can learn a lot of stuff with a CMMS, such as the number of work orders by priority, but this is a more complicated question to solve.”

Gotera is now looking at using the sampling and judging technique to create a failure database for its equipment components, rather than using 40 years of data. “This is not a mathematically rigorous approach; it’s just experimenting with samples,” said Gotera. “This approach is likely to be effective for tracking equipment reliability through analysis of operations and maintenance data not conducive to straight statistical processing.”