Supercomputer facility makes flexibility, scalability and reliability operational priorities

A super-computing facility that is responsible for ensuring the safety of the nation’s nuclear weapons stockpile seeks assistance in power assurance and reliability.

Lawrence Livermore National Laboratory (LLNL), a preeminent scientific resource for U.S. defense, science and industry, is the institution that applies advanced interdisciplinary science and technologies to ensure that the nation’s nuclear weapons remain safe, secure and reliable. This is known as the Stockpile Stewardship Program, which entails authentication of the U.S. nuclear weapons stockpile (a responsibility of LLNL as part of the National Nuclear Security Commission) in support of the comprehensive nuclear test ban treaty. 

Scientists and engineers at LLNL use supercomputers to certify weapon performance via simulation rather than actual testing. Because these simulations involve many trillions of computations that perform at ultra-high speeds, a new mission was undertaken in the mid-1990s to upgrade these supercomputers to “monster” computer systems that perform at tera scale levels (trillions of calculations per second). The mission resulted in LLNL’s recently operational tera scale Simulation Facility (TSF).

Timing of the TSF project was influenced by resurgence in supercomputing science. But LLNL also desired to supplement simulation results with the knowledge of the few remaining scientists (soon to retire) who have hands-on experience in atmospheric and subterranean nuclear testing.

The TSF facility consists of two 24,000-square-foot computer rooms enveloping tens of thousands of processors in hundreds of cabinets. The simulation system is capable of exceeding 100 tera-FLOPS (trillions of floating-point operations per second) peak performance, with aggregate memories of up to 50 terabytes and archive memory capable of handling petabytes (thousands of trillions) of data. 

 

Assurance that such staggeringly high levels of computation would be completed without interruption requires not only reliable monitoring and protection of the electric power system, but also a scheme that enables TSF to transfer power sources should there be a problem with the primary source.

“The facility requirements far exceed those of conventional data centers,” says Anna Maria Bailey, P.E., Livermore computing program facility manager, who was the TSF design and construction manager of the facility. “The facility requires very high levels of power, as well as cooling, unencumbered floor space and a large communications infrastructure.”

The TSF facility has a capacity of 25 MW to support the computers, and a robust mechanical system includes a large air-handling system with cooling towers, fire protection and alarm systems.

Bailey explains that among the operational priorities of the TSF are flexibility, scalability and reliability. The latter would be greatly reliant on power system protection and the ability to switch power sources if necessary. Power protection and source transfer, as well as the communications technologies supporting them, would have to be advanced, simple to operate and, above all, reliable.

“This was one of the first projects I’ve been involved with where the electrical system was one of the first design considerations,” Bailey says. “In many instances, the requirements for the electric power distribution are determined at the end, but it was critical for this project. We had to make certain that the availability of power was a priority.”

Typical concerns were that an upstream glitch might cause a fault and that there would not be a safe way to shut down in the event that the cooling system was lost at the 24/7 facility. “We were very concerned that if we have a glitch, how do we safely shut down the chillers? The computers will usually ride through a glitch, but a chiller takes 20 minutes to restart, and the computational calculations are at risk. So then there is redundancy built into the mechanical system as well as the electrical,” says Bailey. To further ensure the quality of computations, mechanical loads were separated from computer loads.

TSF’s large mechanical infrastructure includes 30 80,000-cfm air handlers, a 10-MW cooling tower, four 1,200-ton chillers, and one 675-ton chiller. The electrical infrastructure includes a 25-MW switching station, a 3-mile duct bank system, and elaborate fire alarm and communications systems. 

To further support the overall power system, Bailey wanted an automatic transfer scheme that would seamlessly switch sources between two 13.8-kV primary sources should there be a loss of power to an incoming feeder or any under voltage condition. 

“We didn’t have the budget to provide uninterruptible power,” Bailey says, “and with a total projected load of 23 MW, there would be no way for us to do that.” The Schweitzer Engineering Laboratories, Inc. (SEL) solution met the budgetary and operational reliability requirements.

TSF power system monitoring, protection, communications and source transfer requirements, outlined in the specifications, led to the installation of multiple SEL-351S microprocessor-based relays for state-of-the-art protection and control technology that assures the mandated flexibility, scalability and reliability. 

Bailey says, “We had used a lot of individual SEL relays at various locations, and they had a good track record. But this was the first integrated project where all the relays are SEL. They offered the best combination of product and technology for what we wanted to accomplish. When it came to relay-to-relay digital communications [SEL mirrored bits communications], we were impressed by the speed of operations.”

The specification of SEL-351S multifunctional relays involved an array of advanced capabilities and features, such as the Sequential Events Recorder (SER) and oscillographic event reports, SEL interface with SEL-2030 Communications Processor, link to SCADA, engineering access, and programmable logic. 

Automatic power source transfer is facilitated by SEL exclusive mirrored bits communications between relays that are located on the main breakers and act to close the tiebreaker with voltage and synchronism-check supervision. 

“Schweitzer systems support is also important to us,” Bailey says. The SEL Systems and Services Division in Pullman, Wash., was contracted to implement the initial settings for the relays. “I consider the educational support important to program management,” adds Bailey. “Robin Jenkins, an SEL integration engineer who specializes in SCADA-type applications, came to the site and provided training on the communications processors.” In addition to onsite training, several of the LLNL engineers and technicians attended SEL University courses for additional training.

Show Comments
Hide Comments

Join the discussion

We welcome your thoughtful comments.
All comments will display your user name.

Want to participate in the discussion?

Register for free

Log in for complete access.

Comments

No one has commented on this page yet.

RSS feed for comments on this page | RSS feed for all comments