In 2005 engineers at the U.S. Department of Energy's (DoE) Oak Ridge National Laboratory unveiled Jaguar, a system that would later be upgraded into a world-beating supercomputer. By 2011 it had grown to a room-size system that used seven megawatts of energy, ran nearly 225,000 processor cores and had a peak performance of 2.3 petaflops, or 2.3 quadrillion calculations per second. Topping Jaguar, albeit necessary to deliver ever more complex modeling of sophisticated energy challenges, would not be easy.
Simply adding more CPUs, or central processing units, to scale Jaguar to 20 petaflops would require enough energy to power 60,000 homes. To best their own record, the Oak Ridge engineers instead turned to video games—or more precisely, to the graphics processors used in Microsoft Xboxes, Nintendo Wiis and other video game systems.
As of Monday Jaguar becomes Titan, a supercomputer that leverages both CPU and GPU (graphics processing unit) accelerators to deliver 10 times the performance of Jaguar with more than five times the power efficiency.
The key to Titan's speed and efficiency is a design that uses more than 18,500 NVIDIA GPUs, along with nearly 300,000 CPU cores, which typically form the foundation of high-performance computers. The GPUs account for about 90 percent of the system's computational performance and enable Titan to remain roughly the same size as Jaguar.
When fully brought up to speed, Titan (a Cray XK7 system) promises to be the world's most powerful open-science supercomputer, even more powerful than the DoE's Sequoia, a 16.3-petaflop IBM Blue Gene/Q system crowned the world's fastest supercomputer in June. Sequoia differs from Jaguar/Titan, which placed sixth on the list, in that Sequoia is used exclusively by the DoE's National Nuclear Security Administration to monitor U.S. nukes. Titan will be used by a variety of researchers for a variety of projects.
Titan will initially support a handful of key projects at Oak Ridge, including Denovo, simulation software that models the behavior of neutrons in a nuclear power reactor. Oak Ridge's engineers designed Denovo for Jaguar as a way to help extend the lives of the U.S.'s aging nuclear power plants, which provide about a fifth of the country's electricity. Running Denovo, Titan will take 13 hours to model the complete state of a reactor core at a specified point in time, a job that took Jaguar 60 hours to perform.
"The ability to burn nuclear fuel uniformly is very much dependent on knowing and being able to predict the distribution of neutrons in the core," says Tom Evans, a computational scientist at Oak Ridge's Consortium for the Advanced Simulation of Light Water Reactors (CASL), which created Denovo. Titan will enable much more precise simulations.
Titan calculations will also be used to provide nanoscale analysis of materials used to build electric motors and generators as well as model the burning of a variety of fuels in internal combustion engines. Still another application will simulate long-term global climate. A sizable amount of Titan's capacity in the coming year will be devoted to the DoE's Innovative and Novel Computational Impact on Theory and Experiment program (INCITE), which invites academia, government researchers and industry to apply for access to the supercomputer for their various projects.
The performance comes at a price, however. Because Jaguar used only CPUs, its computer architecture was simpler, which in turn made it easier to write its software. "The algorithmic complexity to write that code for a machine like Titan is momentous," Evans says. "For us, first and foremost is getting the CPUs and GPUs to work together."
Titan may be cutting edge, but Evans already has a hankering for more computational power. Ideally, Evans and his team want to do complete, high-fidelity 3-D simulations over a full reactor depletion cycle, which requires calculations at many reaction state points—not just a single point in time. Despite its horsepower, even Titan may not be able to achieve this. "We want to push the envelope, but the reality is that Titan's not going to get us there yet," Evans says. The computing resources required to do this are significant, and Evans points out that he and his team don't have Titan all to themselves. It looks like the engineers had better get cracking on Titan's successor.