Even as he installed the landmark camera that would capture the first convincing evidence of dark energy in the 1990s, Tony Tyson, an experimental cosmologist now at the University of California, Davis, knew it could be better. The camera’s power lay in its ability to collect more data than any other. But digital image sensors and computer processors were progressing so rapidly that the amount of data they could collect and store would soon be limited only by the size of the telescopes delivering light to them, and those were growing too. Confident that engineering trends would hold, Tyson envisioned a telescope project on a truly grand scale, one that could survey hundreds of attributes of billions of cosmological objects as they changed over time.
It would record, Tyson said, “a digital, color movie of the universe.”
Tyson’s vision has come to life as the Large Synoptic Survey Telescope (LSST) project, a joint endeavor of more than 40 research institutions and national laboratories that has been ranked by the National Academy of Sciences as its top priority for the next ground-based astronomical facility. Set on a Chilean mountaintop, and slated for completion by the early 2020s, the 8.4-meter LSST will be equipped with a 3.2-billion-pixel digital camera that will scan 20 billion cosmological objects 800 times apiece over the course of a decade. That will generate well over 100 petabytes of data that anyone in the United States or Chile will be able to peruse at will. Displaying just one of the LSST’s full-sky images would require 1,500 high-definition TV screens.
The LSST epitomizes the new era of big data in physics and astronomy. Less than 20 years ago, Tyson’s cutting-edge digital camera filled 5 gigabytes of disk space per night with revelatory information about the cosmos. When the LSST begins its work, it will collect that amount every few seconds — literally more data than scientists know what to do with.
“The data volumes we [will get] out of LSST are so large that the limitation on our ability to do science isn’t the ability to collect the data, it’s the ability to understand the systematic uncertainties in the data,” said Andrew Connolly, an astronomer at the University of Washington.
Typical of today’s costly scientific endeavors, hundreds of scientists from different fields are involved in designing and developing the LSST, with Tyson as chief scientist. “It’s sort of like a federation,” said Kirk Borne, an astrophysicist and data scientist at George Mason University. The group is comprised of nearly 700 astronomers, cosmologists, physicists, engineers and data scientists.
Much of the scientists’ time and about one-half of the $1 billion cost of the project are being spent on developing software rather than hardware, reflecting the exponential growth of data since the astronomy projects of the 1990s. For the telescope to be useful, the scientists must answer a single question. As Borne put it: “How do you turn petabytes of data into scientific knowledge?”
Physics has been grappling with huge databases longer than any other field of science because of its reliance on high-energy machines and enormous telescopes to probe beyond the known laws of nature. This has given researchers a steady succession of models upon which to structure and organize each next big project, in addition to providing a starter kit of computational tools that must be modified for use with ever larger and more complex data sets.