
TITANIC: When fully brought up to speed Titan will be capable of more than 20 quadrillion calculations per second, or 20 petaflops. One of Titan's roles will be to help Oak Ridge researchers visualize reactor core simulations.
Image: Courtesy of Oak Ridge National Laboratory
-
The Best Science Writing Online 2012
Showcasing more than fifty of the most provocative, original, and significant online essays from 2011, The Best Science Writing Online 2012 will change the way...
Read More »
In 2005 engineers at the U.S. Department of Energy's (DoE) Oak Ridge National Laboratory unveiled Jaguar, a system that would later be upgraded into a world-beating supercomputer. By 2011 it had grown to a room-size system that used seven megawatts of energy, ran nearly 225,000 processor cores and had a peak performance of 2.3 petaflops, or 2.3 quadrillion calculations per second. Topping Jaguar, albeit necessary to deliver ever more complex modeling of sophisticated energy challenges, would not be easy.
Simply adding more CPUs, or central processing units, to scale Jaguar to 20 petaflops would require enough energy to power 60,000 homes. To best their own record, the Oak Ridge engineers instead turned to video games—or more precisely, to the graphics processors used in Microsoft Xboxes, Nintendo Wiis and other video game systems.
As of Monday Jaguar becomes Titan, a supercomputer that leverages both CPU and GPU (graphics processing unit) accelerators to deliver 10 times the performance of Jaguar with more than five times the power efficiency.
The key to Titan's speed and efficiency is a design that uses more than 18,500 NVIDIA GPUs, along with nearly 300,000 CPU cores, which typically form the foundation of high-performance computers. The GPUs account for about 90 percent of the system's computational performance and enable Titan to remain roughly the same size as Jaguar.
When fully brought up to speed, Titan (a Cray XK7 system) promises to be the world's most powerful open-science supercomputer, even more powerful than the DoE's Sequoia, a 16.3-petaflop IBM Blue Gene/Q system crowned the world's fastest supercomputer in June. Sequoia differs from Jaguar/Titan, which placed sixth on the list, in that Sequoia is used exclusively by the DoE's National Nuclear Security Administration to monitor U.S. nukes. Titan will be used by a variety of researchers for a variety of projects.
Titan will initially support a handful of key projects at Oak Ridge, including Denovo, simulation software that models the behavior of neutrons in a nuclear power reactor. Oak Ridge's engineers designed Denovo for Jaguar as a way to help extend the lives of the U.S.'s aging nuclear power plants, which provide about a fifth of the country's electricity. Running Denovo, Titan will take 13 hours to model the complete state of a reactor core at a specified point in time, a job that took Jaguar 60 hours to perform.
"The ability to burn nuclear fuel uniformly is very much dependent on knowing and being able to predict the distribution of neutrons in the core," says Tom Evans, a computational scientist at Oak Ridge's Consortium for the Advanced Simulation of Light Water Reactors (CASL), which created Denovo. Titan will enable much more precise simulations.
Titan calculations will also be used to provide nanoscale analysis of materials used to build electric motors and generators as well as model the burning of a variety of fuels in internal combustion engines. Still another application will simulate long-term global climate. A sizable amount of Titan's capacity in the coming year will be devoted to the DoE's Innovative and Novel Computational Impact on Theory and Experiment program (INCITE), which invites academia, government researchers and industry to apply for access to the supercomputer for their various projects.
The performance comes at a price, however. Because Jaguar used only CPUs, its computer architecture was simpler, which in turn made it easier to write its software. "The algorithmic complexity to write that code for a machine like Titan is momentous," Evans says. "For us, first and foremost is getting the CPUs and GPUs to work together."
Titan may be cutting edge, but Evans already has a hankering for more computational power. Ideally, Evans and his team want to do complete, high-fidelity 3-D simulations over a full reactor depletion cycle, which requires calculations at many reaction state points—not just a single point in time. Despite its horsepower, even Titan may not be able to achieve this. "We want to push the envelope, but the reality is that Titan's not going to get us there yet," Evans says. The computing resources required to do this are significant, and Evans points out that he and his team don't have Titan all to themselves. It looks like the engineers had better get cracking on Titan's successor.




See what we're tweeting about




14 Comments
Add Comment<obligatory joke> But will it run Crysis? </Obligatory Joke>
Reply | Report Abuse | Link to thisNo mention of the cost of constructing or operating this behemoth. I certainly hope our tax dollars are being spent on necessary requirements - why do Oak Ridge researchers need to (better) visualize reactor core simulations - to what benefit? Why isn't this something that vendors of nuclear reactors should be providing? These are the questions I'd have liked this article to have answered!
Reply | Report Abuse | Link to thisThe government wastes our money shamelessly. I am much more pleased to see it spent on a new supercomputer than to see it given to our enemies or merely used to buy votes . Who knows? There may be some useful work done with that new computer. Aren't the DOE supercomputers used to model our nuclear warheads so physical testing is not necessary anymore. So the computer may fall into the same category as my guns and back-up generator. Hoping to never need any of them but wanting them to be good quality and in good working shape if I do.
Reply | Report Abuse | Link to thisRegardless of what it is intended to be used for, there seems to be untried software techniques required to use the GPUs in conjunction with CPUs. Not all announcements of enormous computer system development projects result in successful systems that actually achieve their intended objectives, although for obvious reasons most report success...
Reply | Report Abuse | Link to thisWhat is the real benefit of being able to perform a simulation of "the complete state of a reactor core at a specified point in time" in 13 hours instead of 60 hours, when what they really want to do is simulate the "complete, high-fidelity 3-D simulations over a full reactor depletion cycle, which requires calculations at many reaction state points—not just a single point in time."
It sounds like these systems may not be capable of achieving the results actually required of them...
Although I think the government could spend money on better things. I find this to be a worthy investment in our advancement in technology. With this supercomputer have more precise calculations of atoms, could give us more insight on our universe.
Reply | Report Abuse | Link to thisto jtdwyer,
Reply | Report Abuse | Link to thiswhat incentive would a nuclear reactor venor have to building such a computing system? Their business is building reactors.
This project provides knowledge of:
building large scale parallel computational machines (computer science)
large scale modelling using GPUs (physical sciences)
Then once working, runtime will provide for research on
weather modelling
chemical modelling
aerodynamics modelling
disease modelling (cellular)
disease transmission modelling
and more.
The INTERNET which allowed you to find out about this so easily came out of similarly funded government research. Any researh funding is bound to occassionally fund some real losers, but this one seems like a big win in many ways.
Well, given that it has 18,500 Nvidia GPUs, in theory it would run Crysis quite well. In practice, Crysis runs on Windows and DirectX, which I don't think would work on a supercomputer cluster like this.
Reply | Report Abuse | Link to thisDistributed computing still beats supercomputers like Titan for some tasks. For example, the bitcoin network runs 15 times faster (300 petaflops) verifying the transaction history. I would expect that some of the processes that Google runs also amount to more processing capacity than Titan. Google does not publish many details about it's data centers, but we are talking about an estimated million servers across many data centers.
Your question is exactly the one I ask myself about the DoE: what incentive would the DoE have to build such a computing system? Their business is operating reactors.
Reply | Report Abuse | Link to thisAs I understand, the applications that will run on this system pertain to the operation of nuclear reactors built by vendors of nuclear reactors. I seriously doubt that DoE will magnanimously run any of the other applications you mention.
The internet was developed by the Dept. of Defense Advanced Research Projects Agency as a messaging system for universities contracted by the DoD to perform research (incl. weapons development). Eventually it was adopted by universities for other, unauthorized projects - as simply done as communicating with professors at other universities about the weather rather than weapons.
I expect it will be more difficult to schedule non-DoE applications to run on this computer system without making special (compensated) arrangements.
What does the statement "the bitcoin network runs 15 times faster (300 petaflops) verifying the transaction history" mean? Networks do not run petaflops - I don't know much about bitcoin, but I seriously doubt that their transactions execute floating point instructions (floating point operations per second - FLOPS) at all. Most of the computational instructions executed by distributed PCs and servers are fixed point arithmetic instructions...
Reply | Report Abuse | Link to thisI expect that the processing capacity employed by Google is to oranges what the capacity provided by supercomputers, which execute single array computation functions partitioned among many individual processors, is to apples. Data servers generally perform many transaction requests by balancing the load across many (often replicated database) servers.
I don’t know if you consider this relevant, but Titan was the name of the computer in the University of Cambridge, UK, Computer Laboratory in the ‘70s.
Reply | Report Abuse | Link to thisPerhaps you missed this line: "A sizable amount of Titan's capacity in the coming year will be devoted to the DoE's Innovative and Novel Computational Impact on Theory and Experiment program (INCITE), which invites academia, government researchers and industry to apply for access to the supercomputer for their various projects."
Reply | Report Abuse | Link to thisI certainly did miss that - thanks very much for pointing it out to me!
Reply | Report Abuse | Link to thisI'm still not sure this is a valid undertaking for the DoE, using our hard earned tax 'revenue', I suppose it depends on what other useful research they can support. On the other hand, it does appear that they've made this (I presume enormous) investment without sufficient requirements justification. Maybe I should apply for some system time - I could open up a timesharing business!
rediculous.
Reply | Report Abuse | Link to thisTitan's speed and efficiency is a design that uses more than 18,500 NVIDIA GPUs, along with nearly 300,000 CPU cores, which typically form the foundation of high-performance computers. The GPUs account for about 90 percent of the system's computational performance and enable Titan to remain roughly the same size as Jaguar.
Reply | Report Abuse | Link to this