In many ways the Internet is the ultimate virtual laboratory. Social media and news sites tell the casual observer much about our priorities and interests, whether it's the grave prognosis of the U.S.'s ongoing "fiscal cliff" political negotiations or elation over England's royal pregnancy. Social scientists believe that, beyond such superficial revelations, the Internet can also be a tool for conducting expansive, yet inexpensive research experiments at unprecedented speed.

Duncan Watts has been studying the Internet’s impact on social behavior, and vice versa, for more than a decade. In 2001 Watts and fellow Columbia University sociologists published the results of their Small World Project, an e-mail version of sociologist Stanley Milgram’s famous 1967 "six degrees of separation" experiment that used snail mail to test the theory that every person on the planet is separated from everyone else by a chain of about six people. In 2006 Watts worked with a team of researchers on Music Lab, an online experiment that illustrated the difficulty of predicting a song's popularity among a diverse group of listeners.

Now a principal researcher at Microsoft Research's New York City offices, Watts is focusing on improving Internet-based research methods and finding new ways to more effectively leverage the Web as a tool for crowd-sourced knowledge. Superstorm Sandy, which hammered New York City and the U.S. Northeast little more than a month ago, is fresh in his mind, as are the ways in which the Web successfully and not so successfully coordinated emergency response and disaster relief efforts.

The Web could have a profound impact on social science because it offers unprecedented access to people willing to participate in experiments, Watts says. One such experiment, he adds, could be testing the value and accuracy of crowd-sourced information during emergencies as well as instructing Web users on how best to coordinate their resources when disasters strike.

Scientific American recently spoke with Watts about the Web's ability to revolutionize social science, why paying online test participants more money doesn't guarantee more accurate data, and how to make the most of crowd-sourcing to assess what's happening on the ground during a crisis.

[An edited transcript of the interview follows.]

Why are you so interested in the Web as a tool for conducting social science experiments?
The Web offers new opportunities for social science because it dramatically changes the cost structure for running experiments, the scale and speed at which those experiments can be run, and the diversity of the people you can include in your subject pool.

What did Small World and Music Lab teach you about conducting Web-based social science research?
Small World and Music Lab were successful, but in some ways they highlighted the difficulty of doing experiments online. One advantage was the ability to recruit tens of thousands of people to participate. It would be prohibitively expensive to pay that many  participants, so, in effect, we had to "gamify" our experiments to make them appealing. This approach led to tradeoffs. On the one hand by making the research fun and engaging for participants, we ran some very large experiments at very low cost. But the most interesting research questions don’t necessarily lend themselves to fun, engaging games, while conversely most fun games are too complicated to lend themselves to the kind of clean hypotheses that come from theory. Another problem is that running experiments online also runs into certain methodological problems to do with sampling and measurement. I really think we're in the middle of all that with respect to virtual lab-style experiments.

How has Amazon’s Mechanical Turk digital labor marketplace—introduced in 2005—impacted online research?
The great thing about Mechanical Turk is, most of the tasks are incredibly boring, so we don’t have to worry so much about making our experiments fun, or like games, because maybe we could pay participants after all, even if we only have to pay them a little bit.

A few years ago, working with my [former] Yahoo colleague Winter Mason, we demonstrated how to use Mechanical Turk to conduct behavioral research and to make it easier for researchers to benet from Amazon's platform (pdf). We looked at the effect of financial incentives on participant performance. If you pay people more money to do a particular job, how does it affect their performance? [The task in question asked participants to sort a set of images taken from a traffic camera at two-second intervals into chronological order.] We found that increasing payment will increase the amount of work they'll do, but it does not improve quality of their performance at all. Mostly what it does, in fact, is increase how much they think they should be paid!

It sounds as though the Web is tailor-made to be used as a research environment. Where is it lacking?
We keep finding that the biggest challenge in running experiments online isn't the database or the user interface or the algorithms for designing networks. All of that is pretty straightforward. The hard part is actually recruiting people in a reliable way. A lot of our thinking moving forward is how to build a better infrastructure for recruiting and keeping track of people in a way that is transparent and subject to the usual principles of informed consent.

That sounds like a challenge you would have with more traditional social science experiments as well. Doesn't the Web make it easier to recruit participants?
For the last few experiments, we've recruited 100 people and used maybe 20 or 30 of them at a time for different studies. But six months or a year later, when we run the next experiment, all of the people from the past experiment have moved on because there's a lot of churn in Mechanical Turk land.
And actually that's typical in the history of experimental psychology and behavioral economics—you recruit a bunch of people for your experiment, they come in and participate and then they leave and you never hear from them again.

How can you use Mechanical Turk and other Web tools to improve that situation?
We'd like to build up much more persistent subject pools, or what we call panels, of online research subjects. One idea would be to pay them retainers so they're available when we need them for experiments. Then you could send them a request to participate at any point in time, although they're not obligated to. The advantage of doing it that way is—rather than grabbing whoever happens to be around, which is what we do currently—we could specify in advance that we’re interested in a particular type of person. It seems like an obvious thing to do but it requires building a bit of an infrastructure so that you can ask these sorts of questions. Once you’ve created these online panels, and they’ve participated in a number of experiments, you could create customized samples based not just on their demographic information but also on how they’ve behaved in past experiments.

In addition to improving research methods, how might the Web be used to deliver timely, meaningful research results?
I'm interested in collective problem solving—that, how groups of people, or even groups of organizations, solve complex problems. One example that I’m particularly interested is response to crisis situations, such as natural disasters. This is timely, of course, because in the aftermath of Sandy in New York you have a whole suite of organizations—local agencies like the New York Police Department and the fire department as well as national agencies like FEMA and the American Red Cross—swarming in to try to help. Immediately after a disaster you have this massive problem of uncertainty about what's happening on the ground. First responders are generally highly motivated and well meaning, and may even have a lot of experience with previous disasters, but crisis situations also have a way of differing from the past in unanticipated ways, so invariably you have a situation where nobody knows exactly who needs what, where the relevant resources are located, or how to coordinate the relief effort.

Recently, a handful of volunteer “crisis mapping” organizations such as The Standby Task Force [SBTF] have begun to make a difference in crisis situations by performing real-time monitoring of information sources such as Facebook, Twitter and other social media, news reports and so on and then superposing these reports on a map interface, which then can be used by relief agencies and affected populations alike to improve their understanding of the situation. Their efforts are truly inspiring, and they have learned a lot from experience. We want to build off that real-world model through Web-based crisis-response drills that test the best ways to communicate and coordinate resources during and after a disaster.

How might you improve upon existing crisis-mapping efforts?
The efforts of these crisis mappers are truly inspiring, and groups like the SBTF have learned a lot about how to operate more effectively, most from hard-won experience.  At the same time, they’ve encountered some limitations to their model, which depends critically on a relatively small number of dedicated individuals, who can easily get overwhelmed or burned out. We’d like to help them by trying to understand in a more scientific manner how to scale up information processing organizations like the SBTF without overloading any part of the system.

How would you do this in the kind of virtual lab environment you’ve been describing?
The basic idea is to put groups of subjects into simulated crisis-mapping drills, systematically vary different ways of organizing them, and measure how quickly and accurately they collectively process the corresponding information. So for any given drill, the organizer would create a particular disaster scenario, including downed power lines, fallen trees, fires and flooded streets and homes. The simulation would then generate a flow of information, like a live tweet stream that resembles the kind of on-the-ground reporting that occurs in real events, but in a controllable way.

As a participant in this drill, imagine you’re monitoring a Twitter feed, or some other stream of reports, and that your job is to try to accurately recreate the organizer’s disaster map based on what you’re reading. So for example, you’re looking at Twitter feeds for everything during hurricane Sandy that has “#sandy” associated with it. From that information, you want to build a map of New York and the tri-state region that shows everywhere there’s been lost power, everywhere there’s a downed tree, everywhere where there’s a fire.

You could of course try to do this on your own, but as the rate of information flow increased, any one person would get overwhelmed; so it would be necessary to have a group of people working on it together. But depending on how the group is organized, you could imagine that they’d do a better or worse job, collectively. The goal of the experiment then would be to measure the performance of different types of organizations—say with different divisions of labor or different hierarchies of management—and discover which work better as a function of the complexity of the scenario you’ve presented and the rate of information being generated. This is something that we’re trying to build right now.

What's the time frame for implementing such crowd-sourced disaster mapping drills?
We’re months away from doing something like this. We still need to set up the logistics and are talking to a colleague who works as a crisis mapper to get a better understanding of how they do things so that we can design the experiment in a way that is motivated by a real problem.

How will you know when your experiments have created something valuable for better managing disaster responses?
There’s no theory that says, here’s the best way to organize n people to process the maximum amount of information reliably. So ideally we would like to design an experiment that is close enough to realistic crisis-mapping scenarios that it could yield some actionable insights. But the experiment would also need to be sufficiently simple and abstract so that we learn something about how groups of people process information that generalizes beyond the very specific case of crisis mapping.

As a scientist, I want to identify causal mechanisms in a nice, clean way and reduce the problem to its essence. But as someone who cares about making a difference in the real world, I would also like to be able to go back to my friend who’s a crisis mapper and say we did the experiment, and here’s what the science says you should do to be more effective.