A deluge of high-energy physics data is headed toward servers in Geneva, Switzerland, later this month. That's because the European Organization for Nuclear Research (CERN) now says it plans to restart its Large Hadron Collider (LHC) soon for a run that could last as long as two years at a collision energy of seven TeV (tera–electron volts, 3.5 TeV per beam). As CERN ramps up the world's most powerful particle accelerator to operate well beyond its previous best performance, the lab's computer systems must likewise be tuned so they can properly capture and analyze all of this new output.
Rather than adding a platoon of new computers, and possibly overextending the information technology infrastructure's power and cooling capacity, CERN is testing a virtualized server environment that it hopes to have in place by the end of the year. Server virtualization, which involves using software to segment a server machine's processing and storage capacity, has become a popular technique in recent years to make better use of underutilized machines and drive up efficiency in data centers.
CERN plans to divide its 4,000 servers (which run about 32,500 processors) into about 35,000 virtual servers by the end of the year and manage the subsequent workflow with the help of software from Platform Computing Corp., in Markham, Ontario. Over the next two or three years the lab could further slice up its servers into as many as 80,000 virtual servers.
Simply adding more server machines to CERN's data centers was not an option. "We are limited in the amount of power and cooling available," says Tony Cass, group leader of the Fabric Infrastructure and Operations group at CERN. "We want to wring every last drop out of the resources we have to do physics. Even 10 percent more capacity means that much more toward improving the physics."
In addition to the particle accelerator itself, LHC has several particle detectors—including ATLAS and CMS—on site that capture data produced during the collisions. "To produce physics results, scientists at CERN first have to turn the 1s and 0s produced by the detectors into meaningful pictures showing the tracks of the different quantum particles produced in the collisions," Cass says. "Then they need to analyze these images to understand what they mean."
Detector data capture and analysis requires tremendous computing power, but not always in equal amounts. Sometimes more crunching is needed to create the images, other times it is necessary for the analysis. To allocate and reallocate processor resources, Cass and his team have to determine how many servers would be required to perform a certain task—analysis, for example. If they find that more computing resources are needed elsewhere, they have to stop the batch-processing work being done on the servers, reconfigure them, and then start them again. Virtual servers, however, can be allocated and reallocated dynamically using Platform Computing's software, without the need to interrupt processing work already in progress.
The LHC, which first went online in September 2008, is designed to accelerate bunches of protons to the highest energies ever generated by a machine, colliding them head-on 30 million times a second, with each collision generating thousands of particles at nearly light-speed. If successful, the LHC could help physicists answer questions about the subatomic composition of matter and energy in the universe.
Unfortunately, the LHC's first run lasted little more than a week before it had to be halted due to a problem with two superconducting magnets. The device was brought online again briefly in November to conduct a few experiments but has been down since December while CERN upgrades the equipment. More than 10,000 researchers in 85 countries plan to use the world's most powerful particle accelerator to test different predictions concerning high-energy physics.
When the LHC comes back online in a few weeks, it is expected run continuously through mid- to late 2011, the longest phase of accelerator operation in CERN's history, which dates back to 1954. "The computers dedicated to LHC, which run around 120,000 computing jobs per day, will need to run at maximum efficiency to ensure the flood of data from the detectors can be turned quickly into physics results," Cass says. A "job" in this context is a request to process a certain amount of data—turning the information produced by the detector in a given time period into the images, say, or scanning over some large number of these images for analysis.
"The pressure on the computing teams will increase once real data is here and physicists are competing to produce papers for journals and conferences, especially if there is any hint of a discovery," Cass says. "It's a critical moment for the LHC detectors."