June 20, 2013

32 min read

Spam: A Shadow History of the Internet [Excerpt, Part 3]

Porous barriers to various forms of electronic intrusion are traversed by either menial human labor or automated botnets. Read about this carnival of oddities in the third excerpt of a chapter on the riveting history of spam

If the Internet survives a nuclear conflagration, messages exchanged to check on the survival of friends and relatives will probably intersperse with spam missives as soon as power returns to servers once the shock waves subside. The infinite variety and persistence of junk content makes it the equivalent of an electronic microbial population that reproduces at an exponential rate. Witness the mass production of content farms—purveying tips on the best way to wear sweater vests along with reviews of deodorant containers—a flood of inanely irrelevant human-penned word dumps that blur the indistinct borderline between spam and actual content. The 19th-century satanic mill quality of the content farms contrasts with the insensate machines that rid spamming of the human element. Botnets, it can be argued, are the ultimate spam—machines that take what they want (less than a nickel for a compromised computer) rather than asking whether you want to buy undesired goods. Follow all this in our third installment of a chapter from Finn Brunton’s remarkable spam opus.

TABLE OF CONTENTS

Filtering: Scientists and Hackers [Excerpt Part 1] Part 1 of the Spam book excerpt series
Poisoning: The Reinvention of Spam [Excerpt Part 2] Part 2 of the Spam book excerpt series
The Quantified Audience Content farms represent a “back to basics” approach to spamming reminiscent of 19th-century sweat shops
The Botnets Meet the spambot ActiveAgent that crawled Web pages seeking out addresses to e-mail them preprogrammed text
The Marketplace Enter the flourishing global casbah for spam supplies and malware

On supporting science journalism

If you're enjoying this article, consider supporting our award-winning journalism by subscribing. By purchasing a subscription you are helping to ensure the future of impactful stories about the discoveries and ideas shaping our world today.

“NEW TWIST IN AFFECT”: CONTENT FARMS AND SOCIAL SPAM

THE QUANTIFIED AUDIENCE

Google, as representative a company of our era as Ford was of the 1910s, is not in the business of search but the business of advertising—its ad services provide 97 percent of its revenue. These ads take the form of little squibs of text or images, often displayed in response to particular search keywords. If a site owner puts some of these ads on his or her web page, they can receive some amount of revenue, generally very small, on a per-impression basis (that is, every time a page with the ad is loaded in a browser) or a per-click basis (a viewer actually clicking on the ad to visit the advertiser’s page). Google gets a cut of this revenue as well, and all those served ads on blogs and web pages, sponsored links in search results, and ads accompanying the conversations in Google’s Gmail service accumulate into the company’s income, and this pays for nearly everything else. (From this fund also comes the oceans of free content whose hosting is paid for out of an individual’s share of this money in return for running ads on their site.) So if ads are the business, and content merely the enticement—that is, the ornament on the engine—why not optimize for advertising?

Hence the splogs and spam sites containing post after post and page after page of text automatically gathered and generated to best fit Google’s search engine algorithms and filled to the last pixel with advertising, so that every page view and clickthrough is maximized as a source of revenue. The ads on a spam page may be entirely served through Google’s affiliate advertising program—in other words, they can be a significant source of revenue for Google. What this means is that search engine spammers running their vast stables of spam blogs and sites are not anomalous. They are making the greatest possible use of the technologies and economies available, constructing a system in which all the extraneous matter of people and conversation has been pruned away in favor of the automation of content production, search results, clicks, and ads served. (The “Enterprise” package of one of the many businesses in this field will mass produce up to 1,000 blogs for the subscriber, turning out 10,000 posts a day around the 150 keywords of the subscriber’s choice—a daily volume of text that quantitatively dwarfs that of entire literate cultures and historical epochs.) This system in turn puts Google in the contradictory position of having to analyze and expel many of their most dedicated customers: those who deliberately overexploit, and accidentally overexpose, the financial and attention economies and technologies that underlie the contemporary web.

The mass production systems for human-authored text known as “content farms” exacerbate this contradictory role. Demand Media, an exemplary case, commissions content from human writers (who are willing to meet very low standards at high speeds for very little money) on the basis of an algorithm that determines ad revenue over the lifetime of any given article. It then posts this content through dozens of domains such as eHow.com and Livestrong.com. Generating, at peak, thousands of articles a day, Demand Media can create a simulacrum of knowledge convincing enough to attract both search engine returns and the clicks of actual humans (despite producing a kind of nonsensical poetry of uselessness, the correlative of spam’s machine-mangled posthuman semantics, with articles such as “How to wear a sweater vest” and lengthy reviews of deodorant containers). As C. W. Anderson observes, content farms are engaged in the attraction and manipulation of a “quantified audience,” a strategy that marks a nebulous border space between more reputable and legitimate media production and spam as such. After all, these are very precisely targeted articles written by people for people; at what point do they cross over from the space of a merely frivolous or attention-grabbing article that a newspaper would run and into the domain of network misbehavior? When does algorithmic quantification part ways with the canny editor who knows that sex, serial killers, and how-to stories sell?

Throughout this history, the spam has produced definitional problems. Easy as it might be to identify a canonical example—one of those laugh- ably awful filter-beating projects described earlier, for instance, with Cialis links interpolated into a lexical pulp made from the Federalist Papers—it’s the edge cases that are problematic. Whether we’re talking about free speech on Usenet, the policy questions of legitimate marketing and com- mercial activity conducted over email, or the desirable but spam-ish mes- sages that trip the filters and disappear, there is always friction not around the most egregious case (no one argues for Leo Kuvayev’s “\/1@gR/-\” messages) but at the blurry places where spam threatens to blend into acceptable use, and fighting one might have a deleterious effect on the other. The realm of “social spam” and the quantified audience is the blurriest of these—where fairly acceptable and established methods of getting attention and audience management may begin to shade into spam.

“The algorithm is fed inputs from three sources: Search terms (popular terms from more than 100 sources comprising 2 billion searches a day), the ad market (a snapshot of which keywords are sought after and how much they are fetching), and the competition (what’s online already and where a term ranks in search results).” This statement could be describing an extremely well-run empire of splogs, but it is journalist Daniel Roth’s description of the Demand Media’s operation. The algorithm outputs what’s going to be profitable, the jobs are posted to a separate site to find labor, and then a person writes the entry. “It is a database of human needs,” Roth adds, but that isn’t exactly true. It is a constantly updated collection of those queries whose results can consistently make some money over time; “needs” is rather too grand. Anderson dissects the normative commitments this kind of “algorithmic journalism” makes—because it does indeed make commitments and reflect beliefs, which we should not dismiss too quickly, facile and self-serving as those beliefs may seem. They are where different constituencies stake out territory and make their arguments in the technological drama.

Anderson identifies five commitments, in which we can find a pro- nounced echo of the lineage of spam from which content farms and other types of algorithmic journalism partake. It’s built around “big data,” and it thus features the blurring we’ve seen before between human and machine input and judgment. It takes as cardinal the idea of “consumer choice”—after all, it’s “demand” media that can claim to be giving those query-typing people exactly what they want, to a very high degree of mathematical precision, and therefore without any pretense of paternalistically filtering or “curating” the information for their benefit. Finally, it is future-oriented, because it is predictive—rather than reporting the news of the immediate past, it can look to what’s trending and have content produced to that apogee, like a Wienerian cybernetic gun shooting to where the aircraft will be when the round arrives. Looking at these five beliefs embodied in the content farm project, we can see again the capture of relevance, in a more refined form than before. Rather than send a million emails in hope of a handful of responses, make a million articles that will be perceived as relevant enough by the search engines to get top billing for a handful of searches and by a person to click through and contribute ad revenue.

For a slightly different approach to this project, we can turn to AOL, where the historical ironies become almost rich. The company’s walled- garden approach brought huge numbers of new users to networked com- puting in the United States. Not coincidentally, it also brought the enormous waves of inexperienced users that kept breaking the mores of “netiquette” and provided a source of profitable chum for the early spam- mers. It is now reinventing itself as a frantically SEO-gaming content empire. A leaked internal memo concerning “The AOL Way” revealed a fascinating project to use a tightly coordinated human staff to generate an enormous amount of text against which to serve advertising. The number of articles produced is to jump from 33,000 a month to 55,000—five to ten articles produced per staffer per day—built around a system of metrics based on “Traffic Potential,” “Revenue/Profit,” “Turnaround Time,” and “Editorial Integrity,” with point-by-point questions such as “What CPM will this content earn?” (“CPM” means Cost Per a thousand [thousand = M, as in Roman numerals] times that an ad is loaded—that is, the antici- pated revenue to AOL.) “Is this story SEO-winning for in-demand terms?” appears in the checklist: how many vitally popular keywords can be used here to net the greatest number of searches? In this light, AOL’s purchase of the massive content producing and aggregating site the Huffington Post fits the model of algorithmic journalism Anderson describes. AOL is not buying a popular and potentially problematic cultural property, like Sony purchasing a movie studio or Condé Nast losing money on the New Yorker. They are buying a factory—an assembly line system with a proven track record and a well-managed, if grueling, schedule, that can produce or aggregate material appropriate to trending topics reliably enough to generate page views with the latest in top-ten lists and the travails of reality TV’s stable.

Is this spam? Not precisely, though that term is often applied—but “spam” has never been precise. There is something in the disposable and opportunistic nature of the material produced, and the mingling of auto- mated and human infrastructure used to produce it, that seems similar—a cynical project to monopolize the conversation and commandeer the space of relevant information. Linkbait is a related term, one that originated in the SEO community to describe the strategy of producing relevant, highly “linkable” content in hopes of drawing traffic, and therefore advertising revenue, from “link-savvy bloggers and web content creators” and the “hundreds of sheep-like content creators” who follow them (to quote one of the earliest appearances of the term in the fall of 2005). Having originated as a positive phrase for an exploitative strategy built on the kind of lightweight trend-based content long popular in the magazine industry, “linkbait” was soon adopted as a negative descriptor that covered the same content. From the perspective of readers looking for something of depth, it was the perfect term for the vast algal blooms of linked content with catchy titles, top-ten lists about trending topics, wild claims, and needlessly contrarian stances, all delivered with only a few hundred words per article. The term has now spread from its SEO roots to describe other cultural phenomena perceived to sacrifice argument and evidence in favor of attracting notice.

Consider yet another version of this idea, as applied to individual self- promotion: personality spamming, a term coined by writer Merlin Mann as a slightly bitter joke about the use of microblogging service Twitter. Personality spamming is the work of arrogating attention for oneself, using social media to build an audience—often a very carefully quantified audience of “followers” and “rebloggers”—rather than a network of friends, as was the initial, notional promise. It is a witty condemnation of the socially acceptable but aggressively eyeball-hungry work of those who would be, or act like, celebrities, “influencers,” or “thought leaders.” The top reason for unfriending on Facebook is “frequent, unimportant posts,” and many computer-based clients for Twitter have a “mute” feature so that you can ignore messages from some users without having to unfollow them and then refollow them later (which would let them know you had turned them off for a time)—indicators of personality spam as a feature of daily life. As Anderson suggests with algorithmic journalism, these practices

reflect something genuinely new, and as yet not clearly theorized, distinct equally from Habermasian communal conversation-as-deliberation as from the blandly managerial product, shaped by layers of human talent for the broadest possible distribution, of Adorno and Horkheimer’s Kulturindustrie. There is a reformulation under way in which the question of acceptable modes of social expression and self-promotion are being weighed. Drawing on Alice Marwick’s research, we can see some of these new modes being forged by individuals who turn themselves into the epigones of major advertising and marketing firms: the brand is you, the goal is to accumulate relevance for certain terms or ideas so that you can become, in some nebulous sense, “influential.” (From that state will come the book contract, the speaking fees, and the TV deal, presumably.) Thus the method is to treat every platform, gathering, and interaction as a marketing opportunity to configure oneself and one’s activities to suit the search algorithms.

IN YOUR OWN WORDS: SPAMMING AND HUMAN-MACHINE COLLABORATIONS

The gradual predominance of the algorithm in the project of spamming appears in the filters and the spam created in response to them, in search engines and their manipulators, and, as will be shown, in the grand global project of the botnets. However, it is most eerily seen in those places where algorithmic initiatives and human labor intersect. Content farming is a great instance of this combination, but there are other examples, some still more intimate, where human and machine production are meshed to beat the automated security of antispam systems. Mechanical Turk, for instance, is a truly strange and contemporary thing: a marketplace for crowdsourcing small units of work that can be done by a person on a computer. Under the rubric of “artificial artificial intelligence,” it’s a venue in which a “requester” (in the Mechanical Turk terminology) with a task can break it up into fragments called human intelligence tasks (HITs), offer a price per task, and then see if any of the cloud of “providers”—workers looking to pick up some small quantity of micropayment labor, akin to the “content producers” waiting for new jobs from Demand Media—will take them up. Amazon’s system coordinates the workers, the task fragments, and the payments. (If you had a forty-five-minute interview in an mp3 file, you could break the audio up into two- or three-minute segments, upload them to Mechanical Turk, offer a dollar for each transcribed segment, go have lunch, and return to find much of the work done.) The service is estimated to have 100,000 workers in 100 countries, with the majority in India and the United States. It gets used for transcription work, as in our example, as well as in database projects, surveys, image tagging, and more recondite activities. It features a range of HITs for rewriting texts of various lengths, many of which appear to be for services that provide plagiarized or “pre- written” essays and papers to paying students—the rewriting of the texts (“in your own words”) makes them harder for a teacher to identify with a Google search.

Simply creating HITs to send out spam email would be pointlessly dif- ficult and expensive compared to easy and massively automated processes such as botnets. But the Mechanical Turk system is ideal for engaging in social network spam. (“Social networks”: of course, all networks are already social, regardless of whether they want to be.) Many sites now come with built-in models of user action and selection, from voting to public bookmarking to collaborative filtering, providing different ways for the group to assign salience and value. Aside from the direct benefits of traffic from one of these sites, as users see an interesting link and click through on it, getting linked on a major social networking site is a good way to boost one’s PageRank and get better search returns. The now decades-long quest of the search engine spammers to move up in the search rankings has thus migrated into the new territory of social recommendation systems. “Could you please bookmark my site / Using one of the following sites: http://www.del.icio.us/ http://www.stumbleupon.com/ http://www.furl.com,” says one Mechanical Turk requester, offering a rate of $1.75 per bookmarking. Suddenly, in the eyes of the algorithms of the social networking sites and search engines, there is a rise of interest on the part of reputable and high-value real-human social site users in this ad-laden website about mortgage restructuring or celebrity sex tapes.

Craigslist, meanwhile, offers a very different challenge and reward for those who would spam social networks—a challenge that has led to a strange human-machine arms race. Craigslist is a free site for posting classified ads, from bicycles for sale to apartments for rent (and great volumes of personals and “missed connections,” a mass index of urban loneliness and yearning). It offers free space for ads on a site with hundreds of local city instances, sitting ninth place in the number of page views served in the United States (as of this writing), up there with Google and its properties Wikipedia and Facebook. Craigslist therefore obviously needs to protect itself against spammers. One of the characteristics of spam is duplication of text—it’s one of the properties that Bayesian filtering seized on as a weak point—so Craigslist blocked multiple ad postings with the same text or from the same network address. They required a valid email address to post, emailing a confirmation demand to that address that had to be clicked before the ad would be posted. They used a CAPTCHA system—the deformed letters on weird backgrounds that only humans can read, in theory, to verify their nonbot status—to block automated posting tools. Finally, they allowed other users to flag an ad as spam so that the site’s moderators could delete it. The spammers, in return, developed tools such as CL Auto Posting Tool and Craigslist Bot Pro 1 (the banality of the business of spam: $67, Windows only, “allows you to automate your personal and business online advertising”) to sidestep each of Craigslist’s defenses. Textual polymorphism—individual variations in the language of a spam message—could defeat the duplicate message detector, just as it does in email. Proxies could be used to post ads from lots of different network addresses, with valid email addresses for confirmation messages stamped out like license plates by programs like Jiffy Gmail Creator. Captcha King can fill in the CAPTCHAs. Monitors were developed to detect when an ad was flagged as spam so that it could be automatically resubmitted.

Craigslist then turned to telephone verification. To post an ad in certain categories, you have to take an automated phone call or text message with your confirmation password before the ad will go up, with only one ad per phone number. The spammers tried using voice-over-Internet (VoIP) services such as Skype, which in some cases made it possible to generate new phone numbers. Craigslist blocked those. “My assumption is probably accurate that CL is looking at the national database that distinguishes which numbers are voip and which arn’t [sic],” wrote one spammer in an extensive technical discussion devoted to overcoming these new developments. The spammers turned to services that could allow them to register additional phone numbers for a small fee. Craigslist blocked those, too.

The spammers turned to other platforms: “why don’t you guys take a laptop and go to: truck stops airports bus stations you should find close to 100 pay phones there”—and use the phones and their numbers for the verification messages. Another spammer reported back: “I used to

have 140 accounts all done by me at payphones. It took me about 3 days. It was not easy and it was boring.” A more ingenious, almost Mechanical Turk–like distribution of labor project followed as the culmination of these efforts: “some are creating pages of ringtones [for mobile phones], if a person wants a ringtone All you have to do is to receive an sms (craigslist) in her cell and received this code placed on the website and can automatically download your ringtone.” In other words, people with mobile phones in search of free ringtones will act as the distributed phone verification system to compensate for Craigslist’s antispam initiative: a random voluntary population, organized remotely by machine, helping advertisers to swamp the community platform without ever realizing that they’re doing it.

CAPTCHAs, that border between the human and the robot-readable used by Craigslist among many other sites and platforms, have long dogged spam production, making it harder to start new Blogger blogs or open more free email accounts, and spammers have been working assiduously on different fronts to overcome them. In May 2008, a truly odd break- through took place. Security company Websense documented a series of attacks on the account-creation process of email services. Many requests for accounts kept hitting the CAPTCHA stage, and most, but not all, failed. There was Russian-language evidence of offers to pay small sums for the solutions to CAPTCHAs, but the pace (replies in six seconds) and the failure rate (nine to one) suggested that computers were doing the solving. (“We still believe there is human involvement,” said the company’s statement.) Later, Websense also documented a significantly improved CAPTCHA-cracker, one to which the spammer’s computers could pass their CAPTCHA problems as they made new email accounts.This program could take the image of distorted text and return a result. within twenty to twenty-five seconds, with a significantly improved error rate of one success in five to eight tries or between 12 percent and 20 percent—not bad at all. Botnets, with all their spare computing power, are ideal for brute-force attacks on the computationally onerous processing required to analyze CAPTCHAs.

At the same time, the opposite tack is being taken by services such as Captcha King, mentioned earlier in the Craigslist-spammer arms race, that advertise a series of aristocracy-themed payment plans (Royal, Imperial, and Emperor) for CAPTCHA solutions sold in batches of thousands.Their method, which integrates with spamming software like automated Craig- slist posting engines, Jiffy Gmail Creator, and MySpace bots, retrieves the CAPTCHA images “for manual entry.” An outsourced staff sits there all day banging out CAPTCHAs, with a guaranteed “success rate of 95% with a response time of less than 90 seconds.” Those poor souls, whose work makes regular data entry look exceedingly pleasant by comparison, are essentially being paid to be human, that is, to exhibit a theoretically solely human characteristic. (Another service along the same lines, KolotiBablo, tells us with its pay rates that “bare humanity” isn’t worth much in itself: between US$0.35 to US$1.00 for every thousand CAPTCHAs solved— meaning a bit under $3 a day for eight solid hours of typing in CAPTCHA texts six times a minute.) In their work, and in the statement “We still believe there is human involvement,” we can hear the echo of Alan Turing’s clattering teletype in the parlor playing the Imitation Game. Are the CAPTCHAs being solved by a distributed workforce of deeply bored humans or by increasingly sophisticated optical character recognition pro- grams running on a network of compromised machines? Some details can help distinguish them, but the fact that the two can be intermingled and difficult to identify—who’s on the other end of the line?—calls up the essential problem of Turing’s thought experiment. As Kevin Kelly put it, “What if spammers come up with an artificial intelligence before Google does?”

In response, the kind of technology used to tell computers and humans apart is also being pushed to greater sophistication—yet another arms race. Work is now being done on presenting moving pictures (such as a gal- loping horse) made of animated blotches against a blotchy background. It’s the sort of thing that a human can identify but a computer would find exceedingly difficult, at least so far. There is an inventive world of vernacular bot-stopping solutions on personal websites: an email address ending with “oryx,” with a note to remove the “genus of antelope” before sending; a very simple joke for which you must choose the obviously correct punchline; a photograph you must briefly describe (“am I in the house or on the beach?”) before you can send a message—the kind of tasks that are trivial for humans but require inference impossible for the crude programs currently being sent to gather addresses and post comment spam. Intriguingly, one of the CAPTCHA-busting sweatshops described earlier, a Russian service called Antigate, keeps Westerners at bay by requiring visitors to enter the name of the current Russian prime minister using the Cyrillic alphabet, a “culturally restricted CAPTCHA,” meant to not simply fend off bots but to sort out groups of humans. The territory of what is uniquely and reliably human (and can be automatically tested at scale, over different kinds of interfaces) is one of interesting zones given to future technologists to explore—if only to keep the botnets at bay.

THE BOTNETS

“By now I don’t know exactly what there is in the worm,” announces the protagonist. “More bits are being added automatically as it works its way to places I never dared guess existed.” He continues, “And—no, it can’t be killed. It’s indefinitely self-perpetuating so long as the net exists. Even if one segment of it is inactivated, a counterpart of the missing portion will remain in store at some other station and the worm will automatically subdivide and send a duplicate head to collect the spare groups and restore them to their proper place.” This passage from John Brunner’s 1975 science fiction novel The Shockwave Rider is quoted at the beginning of John Shoch and Jon Hupp’s remarkable 1982 paper “The ‘Worm’ Programs—Early Experience with a Distributed Computation.” It is through them, and their work at Xerox PARC, that the worm makes its conceptual, etymological way from Brunner’s novel to email spamming’s mutations in the new millennium.

Shoch and Hupp were envisioning something quite inventive, particu- larly for the time: a “distributed computation,” that is, a single program operating across many machines and taking advantage of idle processing power to do its work. This “worm” is the first monster from which the others spring with the same essential DNA, the worm that grows at night (“affinity for nighttime exploration led one researcher to describe these as ‘vampire programs’”) as it segments individual underused machines for a collective purpose. The essential project remains the same, from Brunner’s novel through the lab in 1982 to the present day: turning all the little boxes into one big machine. “Instead of viewing this environment as 100 independent machines connected to a network, we thought of it as a 100-element multiprocessor, in search of a program to run.” Worms since then have had a long and storied history in legitimate computer science, but the idea of a worm program as articulated by Brunner, Shoch, and 3Hupp has also found an extraordinary life in botnets and their spam- financed operations.

Imagine an office cubicle in a big building somewhere in the world—it could be the United States, Taiwan, Germany, or Brazil. The fluorescent lights hum overhead in the drop ceiling. An employee is away from his desk. His computer is playing a screensaver of family photographs. The computer—a standard-issue black Chinese-manufactured clone machine running Windows XP—is idle, but still engaging in automatic behavior over its broadband connection. It checks for new email at the server every few minutes, for example. A small but regular trickle of requests and replies move over its always-on connection to the network.

At some point in the past, perhaps while the computer’s user was visit- ing a malicious web page, downloading and installing a program, or opening an ecard from a stranger, this computer was infected with a bit of malware, a program designed to exploit computers. In this case, the malware was a worm, a much-developed heir to the Shoch and Hupp worm concept in the form of a parasitic program capable of operating on its own. (This behavior distinguishes it from a virus, which needs to operate inside another program already present on the computer.) Far below any level our employee would ever notice, somewhere in the recesses of disk space, the worm uses spare processing power on the computer and the extra bandwidth of the always-on connection to do its work, turning the computer into a remotely controlled tool for the worm’s programmer—and, automatically, into a tool for spreading the worm to other computers. This malware’s point of infection can be exceedingly simple and subtle. Perhaps the employee received an email from a coworker’s address, warning of a failed message, giving the innocuous and puzzling explanation “The message contains Unicode characters and has been sent as a binary attachment.” He downloaded and opened the attachment to see only a page of meaningless symbols. He closed the page, perhaps sent a reply to his coworker—“Had trouble with your last message?”—or ignored the whole event as a computer mystery.

When he opens that attachment, the employee launches the worm to do its covert work. Having installed itself on the computer, it begins searching the host’s files for email addresses, to which it sends versions of the infection message, randomly drawing the header, body text, and attachment name from a small collection, all equally puzzling and dull. It looks for the popular file-sharing program Kazaa (one in the group of popular peer-to-peer programs for sharing media files that included Napster, Gnutella, and Morpheus); if it finds it, it copies a version of itself to the directory of shared files under one of several names such as strip-girl-2.0bdcom_patches.bat, office_crack.exe, or winamp5. Now, in the huge mesh of file-sharing computers, someone browsing this user’s files—or searching for a “cracked” (free, unprotected) copy of Microsoft Office or a stripping girl—will find one of these files, download it, launch it, see the page of meaningless symbols or an error message, and be similarly, quietly infected. But the worm has much more to do beyond replicate itself.

It also opens a “backdoor” to the infected computer that allows it to communicate with its controller and execute commands on its behalf, turning the computer into a “bot” or “zombie” machine. It begins quietly shipping information back and forth over the available capacity of its con- nection to the Internet. It checks in with the “command-and-control” channel, on which it receives its instructions from the botmaster. (This channel is often set up using an ancient, robust chat protocol called Inter- net Relay Chat [IRC].) The instructions given to it are generally along these lines: take this text (“Your Online Banking is Blocked! / We recently reviewed your account, and suspect that your Bank Of America account may have been accessed by an unauthorized third party”) and send it as an email to this list of addresses. The computer on the desk in the office cubicle has become a spam-distribution machine and has the capacity to do much more. It has joined the botnet.

Why the “bot” in botnet? Bots are simply programs that can do what they’re programmed to without constant human involvement. They can correlate data, hang out in a chat channel providing the rules of conduct when anyone asks, or search the web for email addresses while their programmers are occupied elsewhere. These abilities make them ideal for an enormous variety of computer tasks—and among them is spamming. Far back in the history of online socializing, “floodbots” would join a channel and fill it “with garbage text, endlessly repeated insults, or random billowing storm clouds of data,” killing the normal conversation.82 In 1996, with spam as a targeted marketing model taking off and NANAE forming, a company called GlobalMedia Design released RoverBot, one of the early address- harvesting bots, which would take keywords, find related pages, and search those pages for email addresses so that you could generate address lists related to “real estate” or “manga.” And, portending the rise of increasingly autonomous spam operations, there was the spambot ActiveAgent, a little nightmare that crawled web pages looking for addresses and emailing them with preprogrammed text; the author, “Robert Returned,” would sell the code for ActiveAgent for $100 to anyone who asked. Of course, there were already more efficient means of amassing and mailing to addresses being developed—methods that would culminate in the botnet.

Our fictional employee’s desktop computer is hosting a real worm: first released in early 2004, it was referred to in the security community as Mydoom, and it had good archetypal characteristics for explaining the basics of a botnet. In particular, it has the sting in its tail that brings botnets into conversation with the military. “On the first of February 2004,” the worm tells the infected computer, “request the website of SCO Inc., http://www.sco.com, every millisecond, and continue until the twelfth of the month.” You request a site when you type “www.sco.com” into your browser’s address bar and hit return or click on a link to sco.com: the request is sent to the server at that address, and data from the server is received and displayed on your screen. This is the normal business of servers, and they are built and configured to handle a certain number of requests for a certain amount of data from a certain number of users, depending on resources and anticipated use. If too many requests arrive in too short a time, the server cannot deal with the new requests and the site cannot be accessed—it becomes unusably slow or entirely fails to respond, leaving the user with an error page (“The server may be unavailable,” “The server has timed out,” and so on). This is called a denial of service (DoS). A DoS is often the result of a sudden burst of popularity, when a personal site that normally receives a few hundred visitors a day appears on a major blog or social news site and then suddenly receives tens of thousands of visitors and gets over- whelmed. Such an event can also be undertaken maliciously. It is what the charivari of outraged Usenet denizens did to Portal and Internet Direct as vengeance, swamping the servers with furious mail and big, capacity-consuming image files.

What this command issued by the Mydoom worm meant to do was create a vast phantom population of users requesting the site again and again and again from many thousands of computers all over the world, effectively knocking the site offline for twelve full days, rendering them unable to do business and acting as a devastating blow to their reputation as a company that provides secure servers for enterprise clients. A coordinated action from a botnet, a global network of machines, to take down a website or a server is called a distributed denial of service (DDoS) attack. Such an attack can be used to extort money from online companies (such as casinos) by preventing customers from reaching them, to eliminate security firms or other enemies, and to attack civil and governmental Internet infrastructure: it’s a transition from tool to weapon, with spam becoming a mere platform for further developments.

The Mydoom worm contained a poignant message embedded in the code: “(s y n c—1 . * * o * 0 1 ; a n d y * I ‘ m j u s t d o i n g m y k * * * * o b, n o t h * p e r s o n a l * * * * * } r r y) B G @”, usually transliterated as “(sync-1.01; andy; I’m just doing my job, nothing personal, sorry).” The author, or authors, of Mydoom have never been caught; the “job” and “Andy” remain mysteries, known only to a small group of collaborators, competitors, enemies, and friends. This private message from one person to another embedded in the code creates a dizzying sense of parallax in context of the scale of the botnet—a system that makes spam production literally the size of the planet. All those individual desktop and laptop computers in homes, businesses, dorm rooms, and Internet cafés can be seen as a single resource, part of one continuous landscape, and a huge untapped well of spare system cycles, bandwidth, and sensitive information. Once you have the distributed power of many infected computers that are autonomously infecting others in turn, new projects and possibilities arise. A botnet becomes a platform, with spam just one “program” among others that runs on the platform alongside things such as key cracking (breaking passwords and encryption), clickfraud (automated “clicking” on ads to increase revenue to the ad host), identity theft of all kinds, and DDoS attacks—and potentially much more. It is the beginning of a new scale of operations.

THE MARKETPLACE

Life as an apprentice botmaster: the worm you wrote, or more likely bought or stole from a more skilled programmer, has succeeded, proliferating steadily for several days. You now have ten or fifteen thousand compromised computers under your notional control. Their number varies from day to day: perhaps a new infection boom has added a few thousand more, or a patch has been released that fixes the security flaw you were taking advantage of (but only several hundred of the users of your infected machines know to install it, so you do not lose that many bots). People go on vacation, leaving their computers off for a week or two; companies upgrade, and the old machines—your machines—go out to the recycling bin to be palletized and shipped to Accra or Guiyu. Other worm writers and botmasters create programs designed to take over machines and knock off the infections already present, like yours. From day to day, users of infected machines all over the world power them up or down on cycles of nights, weekends, and lunch breaks. The bot population is shifting and unreliable, and you face the very real problem of making use of all of this distributed computing power you have accumulated. You have what secu- rity analysts call a “victim cloud” with which to make money generating spam, among other jobs. How do you control it?

On the most abstract level, your method is this: you use the archaic but reliable protocol for real-time messaging online, good old IRC. IRC has a long history of automated interactions in which chatbots have been responding to commands and relaying messages long before the arrival of more sophisticated technologies. All of your infected computers subscribe to your IRC channel, referred to as the command-and-control (C&C) channel, and you can easily send out instructions to their population, such as message text and address “target lists” for spam campaigns. This relatively simple arrangement creates another problem, however: now your network of compromised machines has a single point of control, that channel, and thus is vulnerable to attack and seizure, whether by law enforcement and “white hat” good-guy hackers or other botmasters, who could commandeer your channel and use it to make your machines work for them. (Other botmasters trying to take over your network is the biggest ongoing problem you face.) There are ways to make your C&C channel more secure. Perhaps you have managed to obfuscate or encrypt some of the critical traffic and code, such as the authentication passwords you use to control your bots. This trick will keep the other botmasters at bay, for now. The next critical question: how are you going to make money?

As with the development of spam itself, this is all about taking advantage of new affordances: you are on these computers now, and you control them. First you snoop, scrubbing the compromised computers for usernames, passwords, email contacts, financial information, secrets—and you monitor their network traffic for similarly useful material, like anything associated with the keywords “paypal” and “paypal.com,” which might have a password attached. (When the security company Finjan seized a server being used to store botnet data, they found 1.4 gigabytes of material from compromised machines in the United States, the European Union, India, Canada, and Turkey, including patient data from health- care providers as well as the usual acres of business databases and email logs. It’s possible to monetize many of these resources yourself, but it’s also often time-consuming and potentially dangerous without the right skills—and getting money out of bank accounts and credit cards safely is a very different matter from simply getting credit card numbers and account login information.

Instead, you bring your data into the thriving underground economy that has formed around online crime. You join yet another IRC channel: a screen of names, or “nicks,” working out deals in the typo-ridden low- ercase that acts as the argot of the marketplace. “i need 1 mastercard i give 1 linux hacked root” pops up; “i have verified paypal accounts with good balance . . . and i can cashout paypals.” Trustworthy users who have proved their reliability to the channel’s administrators have a +v symbol at the end of their nick, so you know you can do business with them— they are not thieves, “rippers”—at least not among their own. (“report ripperz to @s -Trade OPEN rippers are not alowed [sic] here . . . if u find one show the log.”) There are several different ways you can make money at this point.You can sell the data you have stolen from the infected computers under your control to a “cashier,” someone who knows how to convert financial authentication information into money. Your cashier may themselves have to work with a “confirmer,” who can pose as the sender in a money transfer using a stolen account. (Because cashiers often need to be country and gender specific—a bank will not clean out an account belonging to a female name in Texas if a male voice with a Slavic accent is on the line—an odd economy in, say, “fml CA US UK cashout” cashiers has developed.) You can also try cutting a deal with the cashier to keep more of the profit.

You can sell your botnet as a whole for a smaller but quick profit: the going rate is between four cents and a dime per compromised computer. They pay you the total, you send them the passwords and other information for the C&C channel for the bots—the keys to the spam factory. You can also rent time and capacity on your botnet for all the services it can provide: hosting cracked software for download, hosting fake sites for phishing messages (where people can input their passwords, in response to an email, under the impression that it belongs to Facebook or their bank), delivering DDoS attacks, and running spam campaigns. The channel is a great place to get set for your own spam projects, as well, with databases of email accounts, including “targeted” collections—for example, those professionals with bank accounts more likely to fall for a bank-message phishing scam— there for purchase and trade. You can get lists of netblocks (ranges of Internet addresses) that are notably vulnerable or heavily monitored or that belong to certain organizations that you might want to take advantage of or avoid. Finally, you can barter for all of these things, transacting any one for any other: time on your machines in return for a list of addresses, some credit card data in return for a few thousand more machines for your network. After a good spam campaign, with a mix of pharmaceutical messages for a client, paid for in batches of a million and sent to a cheap, inferior list of addresses—and phishing messages for your personal profit, sent to a more precise, targeted list—you can come back to the market with more data to sell, and more money with which to buy work and data from the others.

The market is transnationally hopping—though it looks, like so much of your working life as a global criminal, like a window on your screen with text in it. People take advantage of the primitive text/background color choices to make their offers stand out in a visual shouting match of green text on brown or white on electric blue. A variety of typo-ridden languages are in use. On an average afternoon, somebody with the nick “TOrPedO`” tries to drum up business: “CA (DOB + mmn + SIN + ATM PIN + Paypal with email access + Drivers License) = 12 $—AU (DOB + mmn + Paypal with email access + Drivers License + Medicare card number + ATM PIN) = 10 $—Also EU fulls selected countries could be spammed on Request. . . . SELLING cvv2s Available for Sale: Cvv2’s US bundle of 20 for 60$—EU countries bundle of 20 for 75$ ... SELLING MAIL LISTS 1Available for Sale: US, UK, CA, AU, European: IT, ES, GR, FR, GY. Bundle of 5mb = 40$—PM me now.” “PM” is “private message”: step out of the public space and get the deal done.

If TOrPedO` is you in this scenario, you can move spam “on Request,” you have lists of addresses targeted by country for sale to other spammers, you have all the identity theft basics at $12 a pop, as well as bundles of CVV2s—the three-digit Card Verification Values used to confirm credit card transactions when the card is not physically present—priced to move. Some of the data you have accumulated needs to be turned into money, and the nick PhuckedUp is looking for clients: “Legit PinCashier, Looking for Supliers, i cashout FCU, CU, Small Banks, with limit of 3k ! msg me only serious supliers !”—FCU and CU being “credit unions,” that is, smaller banking operations. You have a lot of competitors in this business. zgfrik posts: “selling abbey [Abbey banking] account with 23k on it,price 1000$—msg me if interested.” As in markets everywhere, trust is a problem, and the warnings fly: “BOSNIAN RIPPERS Ognjen Miric AND Ervin Residbegovic—BOTH LIVES IN Bosnia And Herzegowina! Sarajevo! ZIP: 71000 DONT BUY FROM ANYONE FROM BOSNIA // Sara- jevo! YOU WILL LOSE YOUR MONEY 110%!”

You post your notice: “=(REAL BANK LOGINS SPAM SUPPLYS)=(SELL BANK LOGINS\PRICE DEPENDS ON BALANCE 10% FROM IT)=(BIG BASE!)=(ADD ME>,” followed by a chat name and email address. Later, as you meet others in this world, you will move on to covert password-protected channels where more serious action happens. You have joined the twenty-first-century spam economy.

It’s not a bad living, as documented by analyses of Russian forums devoted to doing malware, spamming, and credit card theft deals. A million spam messages sent on behalf of a client costs the equivalent of a hundred U.S. dollars—and there’s a bulk discount, of course. A million addresses for $120, more if you want them sorted by country. Fifteen dollars for a hour of denial of service attacks; more for a more sustained attack, which requires more cunning to outwit the blocking strategies the target might employ as they catch on. Given how much it can cost a target to be down during the attack, it’s a great way to make money by extortion. You can sell a malware program called “Pinch” that searches for banking data and passwords from infiltrated computers, and you can also sell the raw data you acquire—$10 a megabyte, for others to pan through in search of profitable information and to go to the additional trouble of actually extracting the money. (The transactions between parties in the business are done through services like Yandex and WebMoney, services akin to PayPal but with greater market penetration in Russia and Eastern Europe.) If you buy a hundred “good” credit card numbers (verified, with CVV and all the ID information, with high spending limits) for $10.66 apiece, of which perhaps half can actually be used buy and ship goods to Russia for resale or fencing before you set off their antifraud detection systems, that can still produce a few hundred dollars’ worth of value per card, for a profit of $13,000. Not bad at all.

Still better are advance-fee fraud messages—the “Nigerian spam” described earlier—at a cost of $20 for 200,000 messages (they’re more expensive because they have to be more targeted in the sending and tai- lored in the writing, themed with recent news and somewhat plausible details), with a response rate of 2 or 3 percent and an average take of $1,922.99 per victim. Even if they spammers don’t net a really big fish, they can ultimately expect to clear about $200,000 in profit, though it is a lot more work. There may not be honor among thieves, but there is good customer service. The interdependent parts of this economy include agreed-upon systems for testing product (a sector of a botnet to confirm the available bandwidth, a few credit cards from a batch to make sure they’re real and to check the balances), money-back guarantees, nicely designed interfaces, partnership programs, and, charmingly, free champagne for closing a deal together. As Holt argues, it makes sense in the short term to lease a botnet rather than build one of your own—you can send spam and do attacks with a somewhat higher profit margin and no maintenance. But what if you are a truly gifted and visionary programmer? What if you want to build a better botnet?

It’s Time to Stand Up for Science

If you enjoyed this article, I’d like to ask for your support. Scientific American has served as an advocate for science and industry for 180 years, and right now may be the most critical moment in that two-century history.

I’ve been a Scientific American subscriber since I was 12 years old, and it helped shape the way I look at the world. SciAm always educates and delights me, and inspires a sense of awe for our vast, beautiful universe. I hope it does that for you, too.

If you subscribe to Scientific American, you help ensure that our coverage is centered on meaningful research and discovery; that we have the resources to report on the decisions that threaten labs across the U.S.; and that we support both budding and working scientists at a time when the value of science itself too often goes unrecognized.

In return, you get essential news, captivating podcasts, brilliant infographics, can't-miss newsletters, must-watch videos, challenging games, and the science world's best writing and reporting. You can even gift someone a subscription.

There has never been a more important time for us to stand up and show why science matters. I hope you’ll support us in that mission.

Thank you,

David M. Ewalt, Editor in Chief, Scientific American