Web sites such as Amazon, TripAdvisor and Yelp have long depended on customers to rate books, hotels and restaurants. The philosophy behind this so-called crowdsourcing strategy holds that the truest and most accurate evaluations will come from aggregating the opinions of a large and diverse group of people. Yet a closer look reveals that the wisdom of crowds may neither be wise nor necessarily made by a crowd. Its judgments are inaccurate at best, fraudulent at worst.
According to Eric K. Clemons, a professor of operations and systems management at the Wharton School of the University of Pennsylvania, online ranking systems suffer from a number of inherent biases. The first is deceptively obvious: people who rate purchases have already made the purchase. Therefore, they are disposed to like the product. “I happen to love Larry Niven novels,” Clemons says. “So whenever Larry Niven has a novel out, I buy it. Other fans do, too, and so the initial reviews are very high—five stars.” The high ratings draw people who would never have considered a science-fiction novel. And if they hate it, their spite could lead to an overcorrection, with a spate of one-star ratings.
Such negativity exposes another, more pernicious bias: people tend not to review things they find merely satisfactory. They evangelize what they love and trash things they hate. These feelings lead to a lot of one- and five-star reviews of the same product.
A controlled offline survey of some of these supposedly polarizing products revealed that individuals’ true opinions fit a bell-shaped curve—ratings cluster around three or four, with fewer scores of two and almost no ones and fives. Self-selected online voting creates an artificial judgment gap; as in modern politics, only the loudest voices at the furthest ends of the spectrum seem to get heard.
This self-selection process manifests itself in other ways. In a 2009 study of more than 20,000 items on Amazon, Vassilis Kostakos, a computer scientist at the University of Madeira in Portugal, found that a small percentage of users accounted for a huge majority of the reviews. These super-reviewers—often celebrated with “Top Reviewer” badges and ranked against one another to encourage their participation—each contribute thousands of reviews, ultimately drowning out the voices of more typical users (95 percent of Amazon reviewers have rated fewer than eight products). “There is nothing to say that these people are good at what they do,” Kostakos says. “They just do a lot of it.” What appears to be a wise crowd is just an oligarchy of the enthusiastic.
The existence of super-reviewers has one unassailable advantage, though: they are rarely shills. The deliberate manipulation of review sites by people directly involved with a product—the author of the book, say—is one of the oldest and most difficult problems for online-rating communities to solve.
Some sites attempt to remove suspect posts using automated filters that search for extremely positive or negative language, especially when the review comes from someone with a short résumé. But this lack of transparency can breed mistrust—or worse.
Consider the case of the local-business review site Yelp, which filters out suspect reviews. Its CEO and co-founder Jeremy Stoppelman defends the practice by pointing to classified advertisements placed by business owners offering payment for positive reviews. Yet some businesses suspect more sinister forces at work. Earlier this year a coalition of local business owners sued Yelp, accusing the company of running what amounted to a digital extortion racket. The lawsuit claims that sales representatives from Yelp would call businesses and make a simple offer: advertise with us, and we’ll make negative reviews disappear.
The company vigorously denies the allegations and claims that any cuts are automated and coincidental. Still, Yelp has refused to divulge how its filters operate, lest unscrupulous users employ that information to game the system. This lack of transparency has led to the perception that the company itself might be manipulating the playing field.
The system is not beyond repair, however. Clemons points to RateBeer.com, which has attracted some 3,000 members who have rated at least 100 beers each; all but the most obscure beers have been evaluated hundreds or thousands of times. The voluminous data set is virtually manipulation-proof, and the site’s passionate users tend to post on all beers they try—not just ones they love or hate.
Of course, reviewing 1,000 beers is easier (and cheaper) than rating the same number of restaurants or hotel rooms. Until other sites amass the same amount of quality data, an old truism could be consumers’ best advice: buyer beware.