Next Wave of U.S. Supercomputers Could Break Up Race for Fastest

National Labs are now collaborating, not competing, to make the fastest supercomputers, which should enable new types of science to model everything from climate change to materials science to nuclear-weapons performance

By Alexandra Witze & Nature magazine

Once locked in an arms race with each other for the fastest supercomputers, US national laboratories are now banding together to buy their next-generation machines.

On November 14, the Oak Ridge National Laboratory (ORNL) in Tennessee and the Lawrence Livermore National Laboratory in California announced that they will each acquire a next-generation IBM supercomputer that will run at up to 150 petaflops. That means that the machines can perform 150 million billion floating-point operations per second, at least five times as fast as the current leading US supercomputer, the Titan system at the ORNL.

The new supercomputers, which together will cost $325 million, should enable new types of science for thousands of researchers who model everything from climate change to materials science to nuclear-weapons performance.

On supporting science journalism

If you're enjoying this article, consider supporting our award-winning journalism by subscribing. By purchasing a subscription you are helping to ensure the future of impactful stories about the discoveries and ideas shaping our world today.

“There is a real importance of having the larger systems, and not just to do the same problems over and over again in greater detail,” says Julia White, manager of a grant program that awards supercomputing time at the ORNL and Argonne National Laboratory in Illinois. “You can actually take science to the next level.” For instance, climate modellers could use the faster machines to link together ocean and atmospheric-circulation patterns in a regional simulation to get a much more accurate picture of how hurricanes form.

A learning experience
Building the most powerful supercomputers is a never-ending race. Almost as soon as one machine is purchased and installed, lab managers begin soliciting bids for the next one. Vendors such as IBM and Cray use these competitions to develop the next generation of processor chips and architectures, which shapes the field of computing more generally.

In the past, the US national labs pursued separate paths to these acquisitions. Hoping to streamline the process and save money, clusters of labs have now joined together to put out a shared call — even those that perform classified research, such as Livermore. “Our missions differ, but we share a lot of commonalities,” says Arthur Bland, who heads the ORNL computing facility.

In June, after the first such coordinated bid, Cray agreed to supply one machine to a consortium from the Los Alamos and Sandia national labs in New Mexico, and another to the National Energy Research Scientific Computing (NERSC) Center at the Lawrence Berkeley National Laboratory in Berkeley, California. Similarly, the ORNL and Livermore have banded together with Argonne.

The joint bids have been a learning experience, says Thuc Hoang, programme manager for high-performance supercomputing research and operations with the National Nuclear Security Administration in Washington DC, which manages Los Alamos, Sandia and Livermore. “We thought it was worth a try,” she says. “It requires a lot of meetings about which requirements are coming from which labs and where we can make compromises.”

At the moment, the world’s most powerful supercomputer is the 55-petaflop Tianhe-2 machine at the National Super Computer Center in Guangzhou, China. Titan is second, at 27 petaflops. An updated ranking of the top 500 supercomputers will be announced on November 18 at the 2014 Supercomputing Conference in New Orleans, Louisiana.

When the new ORNL and Livermore supercomputers come online in 2018, they will almost certainly vault to near the top of the list, says Barbara Helland, facilities-division director of the advanced scientific computing research program at the Department of Energy (DOE) office of science in Washington DC.

But more important than rankings is whether scientists can get more performance out of the new machines, says Sudip Dosanjh, director of the NERSC. “They’re all being inundated with data,” he says. “People have a desperate need to analyse that.”

A better metric than pure calculating speed, Dosanjh says, is how much better computing codes perform on a new machine. That is why the latest machines were selected not on total speed but on how well they will meet specific computing benchmarks.

Dual paths
The new supercomputers, to be called Summit and Sierra, will be structurally similar to the existing Titan supercomputer. They will combine two types of processor chip: central processing units, or CPUs, which handle the bulk of everyday calculations, and graphics processing units, or GPUs, which generally handle three-dimensional computations. Combining the two means that a supercomputer can direct the heavy work to GPUs and operate more efficiently overall. And because the ORNL and Livermore will have similar machines, computer managers should be able to share lessons learned and ways to improve performance, Helland says.

Still, the DOE wants to preserve a little variety. The third lab of the trio, Argonne, will be making its announcement in the coming months, Helland says, but it will use a different architecture from the combined CPU–GPU approach. It will almost certainly be like Argonne's current IBM machine, which uses a lot of small but identical processors networked together. The latter approach has been popular for biological simulations, Helland says, and so “we want to keep the two different paths open”.

Ultimately, the DOE is pushing towards supercomputers that could work at the exascale, or 1,000 times more powerful than the current petascale. Those are expected around 2023. But the more power the DOE labs acquire, the more scientists seem to want, says Katie Antypas, head of the services department at the NERSC.

“There are entire fields that didn’t used to have a computational component to them,” such as genomics and bioimaging, she says. “And now they are coming to us asking for help.”

This article is reproduced with permission and was first published on November 14, 2014.

It’s Time to Stand Up for Science

If you enjoyed this article, I’d like to ask for your support. Scientific American has served as an advocate for science and industry for 180 years, and right now may be the most critical moment in that two-century history.

I’ve been a Scientific American subscriber since I was 12 years old, and it helped shape the way I look at the world. SciAm always educates and delights me, and inspires a sense of awe for our vast, beautiful universe. I hope it does that for you, too.

If you subscribe to Scientific American, you help ensure that our coverage is centered on meaningful research and discovery; that we have the resources to report on the decisions that threaten labs across the U.S.; and that we support both budding and working scientists at a time when the value of science itself too often goes unrecognized.

In return, you get essential news, captivating podcasts, brilliant infographics, can't-miss newsletters, must-watch videos, challenging games, and the science world's best writing and reporting. You can even gift someone a subscription.

There has never been a more important time for us to stand up and show why science matters. I hope you’ll support us in that mission.

Thank you,

David M. Ewalt, Editor in Chief, Scientific American