POISONING: THE REINVENTION OF SPAM
The machines in the shop roar so wildly that often I forget in the roar that I am; I am lost in the terrible tumult, my ego disappears, I am a machine. I work, and work, and work with- out end; I am busy, and busy, and busy at all time. For what? and for whom? I know not, I ask not! How should a machine ever come to think?
— Morris Rosenfeld, “In the Sweat-Shop” (Songs from the Ghetto, trans. Leo Wiener, Norbert Wiener’s father)
Even as the filters were being installed, the first messages began to trickle in, like this one: “Took her last look of the phantom in the glass, only the year before, however, there had been stands a china rose go buy a box of betel, dearest of the larger carnivorous dinosaurs would meet it will be handy? Does it feel better now?” Or like this: “rosily pixie sir.chalet, healer partly .fanned media viva.jests, wheat skier.given rammed bath.weeded divas boxers.” Messages by the hundreds and thousands, sometimes with a link, mostly without. It was as though some huge Dada machine, the Tzara-Bot 9000, had just come online. This was litspam, cut-up literary texts statistically reassembled to take advantage of flaws in the design and deployment of Bayesian filters.
Bayesian filters destroyed email spam as a reputable business model in three ways, each of which became a springboard for spam’s transformation. Filtering killed off conventional spam language, the genre of the respect- able sales pitch, with its textual architecture of come-on rhetoric inherited from generations of Dale Carnegie books, direct-mail letters, cold calls, and carnival barkers (“Hundreds of millions of people surf the Web today. The internet is absolutely exploding everywhere in the world. Ask yourself: ‘Am I going to profit from this?’ Give me the chance to share with you this exiting business opportunity.”). This kind of material became a liability; its elements were too likely to be caught by the statistical analysis of words that the filters performed. Second, filtering made it a lot harder to make money through sales—if far fewer of your messages were getting through, you needed much better return on a successful message, not just a small profit on a bottle of generic pharmaceuticals, to make spam a viable business. Finally, filtering enormously increased the failure rate for spammy messages. If the filter caught the vast majority of your messages, you needed to send out a lot more, and be much more creative in their construction, to turn that tiny percentage or part of a percentage into a business. It was assumed that the message-sending capacity was a reliable limit, a fixed ceiling on spam operations: in the “Plan for Spam FAQ,” Paul Graham [a filtering expert] answers the question “If filters catch most spam, won’t spammers just send more to compensate?” with “Spammers already operate at capacity.” These three developments fed each other. If the filters attacked regularity in language, noting the presence of words with a high probability of appearing in spam messages, you had to be much more creative in the spam messages you wrote, putting more effort into each attempt than before. You would see very little profit in return for your increased effort, because fewer of your messages would get through, and you would have to put more of that profit into your infrastructure, because you would need to hugely increase the amount of spam you could send.
They also contained three points for spam’s transformation into a new and different trade, of which litspam was a harbinger. All three points of transformation hinged on the success of Graham’s ideas: the enforcement of new laws combined with filtering had eliminated the mere profit- seeking carpetbaggers and left the business to the criminals. Filters made conventional sales language and legal disclaimers into liabilities, which meant that those willing to be entirely duplicitous could move into wholly different genres of message to get past the filters and make the user take action. Hence the recommended link from a half-remembered friend (or friendly stranger), the announcement of breaking news, and, most extraordinarily, the fractured textual experiment of litspam. If filtering made it much harder to make money per number of messages, spam messages could become much more individually lucrative: rather than sales pitches for goods or sites, they could be used for phishing, identity theft, credit card scams, and infecting the recipient’s computer with viruses, worms, adware, and other forms of dangerous and crooked malware. A successful spam message could net many thousands of dollars, rather than $5 or $10 plus whatever the spammer might make selling off their good addresses to other spammers. Finally, if the new filters meant the messages failed much more often, spammers could develop radically new methods for sending out spam that cost much less and enabled them to send much, much more—methods such as botnets, which we’ll come to shortly.
In the context of the transformations that spam was going through as it became much more criminal, experimental, and massively automated, litspam provides a striking example of the move into a new kind of com- putationally inventive spam production. Somewhere, an algorithmic bot with a pile of text files and a mailing list made a Joycean gesture announc- ing spam’s modernism.
To explain litspam, recall the problem of false positives: legitimate mes- sages misfiled as spam. You cannot make the filter too strict. You need to give it some statistical latitude, because missing a legitimate message could cost the user far more than the average of 4.4 seconds it takes to identify and discard a spam message that gets through the filter. The success or failure of a filter depends on its rate of false positives; one important message lost may be too many, and Graham argued that the reason Bayesian filters did not take off in their first appearance was false positive rates like Patel and Lin’s 1.16 percent, rather than his 0.03 percent. Implicit in his argument was the promise that other people could reproduce or at least closely approximate his percentage. A person could indeed reproduce Graham’s near-perfect rate of false positives if they were very diligent, particularly in checking the marked-as-spam folder early in the filter’s life to correct misclassifications. It also helped to receive a lot of email with a particular vocabulary, a notable lexical profile, to act as the “negative,” legitimate nonspam words. These things were true of Graham. Building this filter was a serious project of his for which he was willing to read a lot of spam messages, do quite a bit of programming, and become a public advocate; it follows that his personal filter would be very carefully maintained. Graham had a distinctive corpus to work with on the initial filter: his personal messages, with all the vocabulary specific to a programmer and specialized venture capitalist—“Lisp [the programming language] . . . is effectively a kind of password for sending email to me,” he wrote in the original “A Plan for Spam” document. His array of legitimate words, at the opposite side of the axis from the words that signal spam (such as “madam,” “guarantee,” and “republic”), includes words like “perl [another programming language],” “scripting,” “morris,” “quite,” and “continuation.”
Other individual users, however, may have a slightly higher rate of false positives because they have a characteristic vocabulary that overlaps more with the vocabulary of spam than Graham’s does, or because their vocabu- lary is aggregated with that of others behind a single larger filter belonging
to an organization or institution, or simply because they are lazier in their classifications or unaware that they can classify spam rather than deleting it. (The problem of characteristic vocabulary was even worse for blog comment spam messages—the kind with a link to boost Google search rankings or bring in a few customers—because the spammers, or their automated programs, can copy and cut up the words in the post itself for the spam massage, making evaluation much trickier.) Users are thus not perfect, and filters can be poorly implemented and maintained, and so must be a little tolerant of the borderline messages. In this imprecision, a two-pointed strategy for email spamming took shape:
1. In theory, you could sway a filter by including a lot of neutral or acceptable words in a message along with the spammier language, edging the probability for the message over into legitimacy. Linkless gibberish messages were the test probes for this idea, sent out in countless variations to see what was bounced and what got through: “I although / actually gorged with food, were always trumpets of the war / to sound! This still- ness doth dart with snake venom on itwell, / I’d have laughed.”
2. After a spam message got through, the recipient was faced with a dilemma. If the recipient deleted the message, rather than flagging it as spam, the filter would read it as legitimate, and similar messages would get through in the future. If he or she flagged it as spam, the filter, always learning, would add some more marbles to the bags of probabilities rep- resented by significant words, slightly reweighing innocent words such as “stillness,” “wheat,” “laughed,” and so on toward the probability of spam, cumulatively increasing the likelihood of false positives. These broadcasts from Borges’s Library of Babel would be, in effect, a way of taking words hostage. “Either the spam continues to move, or say goodbye to ‘laughed.’”
But why use literature? Early messages show that the first experiments along these lines were built of words randomly drawn from the dictionary. This approach does not work very well, because we actually use most words very seldom. The most frequently used word in English, “the,” occurs twice as often as the second most frequent, and three times as often as the third most frequent, and so on, with the great bulk of language falling far down the curve in a very long tail.32 From the perspective of the filter, all those words farther out on the curve of language—“abjure,” “chimera,” “folly”—are like the bag of marbles after that first sunset, with one black marble and one white; with no prior evidence, those unused words are at fifty/fifty odds and make no difference, and one “sexy” will still flag the message as spam. What the spammer needs is natural language, alive and in use up at the front of the curve.
A huge portion of the literature in the public domain is available online as plain text files, the format most convenient to programmers: thousands and thousands and thousands of books and stories and poems. These can be algorithmically fed into the maw of a program, chopped up and reas- sembled and dumped into spam messages to tip the needle just over into the negative, nonspam category. Hence the bizarre stop-start rhythm of many litspam messages, with flashes of lucidity in the midst of a fugue state, like disparate strips of film haphazardly spliced together. Their sources include all the canonical texts and work in the public domain available on sites like Project Gutenberg, as well as more recondite materials. Many authors in the science fiction vein are popular with hackers, who sometimes pay them the dubious honor of scanning their books with optical character recognition software to turn the printed words into text files that can be circulated online. Neal Stephenson’s encryption thriller Cryptonomicon is one of these books, available as a full text file through a variety of sources and intermittently in the form of chunky excerpts in spam messages over the course of years. “This is a curious form of literary immortality,” Stephenson observed. “E-mail messages are preserved, haphazardly but potentially forever, and so in theory some person in the distant future could reconstruct the novel by gathering together all of these spam messages and stitching them back together. On the other hand, e-mail filters learn from their mistakes. When the Cryptonomicon spam was sent out, it must have generated an immune response in the world's spam filtering systems, inoculating them against my literary style. So this could actually cause my writing to disappear from the Internet.”
The deep strangeness of litspam is best illustrated by breaking a piece of it down, dissecting one of these flowers of mechanized language. The sample that opened this section, drawn at random from one of my spam- collecting addresses is two sentences and forty-five words and was assembled from no less than four interpolated sources: “Took her last look of the phantom in the glass, only the year before, however, there had been stands a china rose go buy a box of betel, dearest of the larger carnivorous dinosaurs would meet it will be handy? Does it feel better now?” “Took her last look of the phantom in the glass” is from “The Shadows,” a fairy tale by Aberdonian fantasist George MacDonald. “Only the year before, however,” and “of the larger carnivorous dinosaurs would meet” are from chapters 15 and 11 of Arthur Conan Doyle’s adventure novel The Lost World. “Stands a china rose go buy a box of betel, dearest” is from Song IV of the “Epic of Bidasari” as translated in the Orientalist Chauncey Starkweather’s Malayan Literature. And “it will be handy? Does it feel better now?” is from Sinclair Lewis’s Main Street, chapter 20. Each of these frag- ments are subtly distorted in different ways—punctuation is dropped and the casing of letters changed—but left otherwise unedited. It’s a completely disinterested dispatch from an automated avant-garde that spammers and their recipients built largely by accident. “I began to learn, gentlemen,” as the ape says in Kafka’s “Report to an Academy,” another awkward speaker learning language as a means of escape: “Oh yes, one learns when one has to; one learns if one wants a way out; one learns relentlessly.”
Litspam obviously does not work for human readers, aside from its occasional interesting resemblance to stochastic knockoffs of the work of Tzara or Burroughs (with a hint of Louis Zukofsky’s quotation poems, or Bern Porter’s “Founds” assembled from NASA rocket documentation). If anything, its fractured lines and phrasal salad are a sign that something’s suspiciously wrong and the message should be discarded. As with the biface, robot-readable text of web pages that tell search engine spiders one thing and human visitors another, litspam is to be read differently by different actors: the humans, with their languages, and the filters with their probabilities, like the flower whose color and fragrance we enjoy, and whose splotched ultraviolet target the bee homes in on. Litspam cuts to the heart of spam’s strange expertise. It delivers its words at the point where our experience of words, the Gricean implicature that the things said are connected in some way to other things said or to the situation at hand, bruisingly intersects the affordances of digital text. Like a negative version of the Turing test, you think you will be chatting with a person over teletype (“Will X please tell me the length of his or her hair?” as Turing suggests) and instead end up with racks of vacuum tubes or, rather, a Java program with most of English-language literature in memory: “that when some members rouen, folio 1667.anglonorman antiquities p. Had concluded his speech to the king.” We look for sense, for pattern and meaning, whether in the Kuleshov Effect—the essence of montage, with different meanings attributed to the same strip of film depending on what it’s intercut with—or the power of prophetic signals, like a spread of Tarot cards, whose rich symbolic content is full of hooks we can connect with our own current preoccupations, fears, memories, and desires. If there’s a spammy core to the message—a recognizable pitch, link, or come-on—we might pick out that most salient portion (perhaps clicking on this will explain this bizarre message!) and the spam will still have done its job.
Let us return to Turing, briefly, and introduce the fascinating Imitation Game, before we leave litspam and the world of robot-read/writable text. The idea of a quantifiable, machine-mediated method of describing quali- ties of human affect recurs in the literature of a variety of fields, including criminology, psychology, artificial intelligence, and computer science. Its applications often provide insight into the criteria by which different human states are determined—as described, for example, in Ken Alder’s fascinating work on polygraphs, or in the still understudied history of the “fruit machine,” a device that (allegedly) measured pupillary, pulse, and other response to pornographic images, developed and deployed during the 1950s for the purpose of identifying homosexuals in the Canadian military and the Royal Canadian Mounted Police (RCMP) in order to eliminate them from service. (It is like a sexually normative nightmare version of the replicant-catching Voight-Kampff machine in Blade Runner.) Within this search for human criteria, the most famous statement—and certainly the one that has generated the most consequent literature—is the so-called Turing Test. The goal of Turing’s 1950 thought experiment (which bears repeating, as it’s widely misunderstood today) was to “replace the question [of ‘Can machines think?’] by another, which is closely related to it and is expressed in relatively unambiguous words.” Turing considered the question of machines “thinking” or not to be “too meaningless to deserve discussion,” and, quite brilliantly, turned the question around to whether people think—or rather how we can be convinced that other people think. This project took the form of a parlor game: A and B, a man and a woman, communicate with an “interrogator,” C, by some intermediary such as a messenger or a teleprinter. C knows the two only as “X” and “Y”; after communicating with them, C is to render a verdict as to which is male and which female. A is tasked with convincing C that he, A, is female and B is male; B’s task is the same. “We now ask the question,” Turing continues, “‘What will happen when a machine takes the part of A in this game?’ Will the interrogator decide wrongly as often when the game is played like this as he does when the game is played between a man and a woman? These questions replace our original, ‘Can machines think?’”
What litspam has produced, remarkably, is a kind of parodic imitation game in which one set of algorithms is constantly trying to convince the other of their acceptable degree of salience—of being of interest and value to the humans. As Charles Stross puts it, “We have one faction that is attempting to write software that can generate messages that can pass a Turing test, and another faction that is attempting to write software that can administer an ad hoc Turing test.” In other words, what we are seeing is the product of algorithmic writers producing text for algorithmic readers to parse and block, with the end product providing a fascinatingly fractured and inorganic kind of discourse, far off even from the combinatorial lit- erature of avant-garde movements such as the Oulipo, the “Workshop of Potential Literature.” The particular economics of spamming reward sheer volume rather than message quality, and the great technical innovations lie on the production side, building systems that act with the profligacy of a redwood, which may produce a billion seeds over the course of its lifetime, of which one may grow into a mature tree. The messages don’t improve from their lower bound unless they have to, so the result doesn’t get “better” from a human perspective—that is, more convincing or plausibly human—just stranger.
Surrealist automatic writing has its particular associative rhythm, and the Burroughsian Cut-Up depends strongly on the taste for jarring juxtapositions favored by its authors (an article from Life, a sequence from The Waste Land, one of Burroughs’s “routines” in which mandrills from Venus kill Eisenhower). Litspam text, along with early comment spam and the strange spam blogs described in the next section, is the expression of an entirely different intentionality without the connotative structure produced by a human writer. The results returned by a probabilistically manipulated search engine, or the poisoned Bayesian spew of bot-generated spam, or the jumble of links given by a bad choice filtering algorithm act in a different way than any montage. They are more akin to flipping through channels on a television, with very sharp semantic transitions between spaces—from poetry to porn, a wiki to a closed corporate page, a reputable site to a spear-phishing mock-up. (If it has a cultural parallel, aside from
John Cage’s Imaginary Landscape No. 4—in which two musicians manipulate a radio’s frequency, amplitude, and timbre according to a preestablished score, with no control over what’s being broadcast—it would be Stanley Kubrick’s speculative art form of the future, which he described as “mode jerking”: sudden, severe, jolting transitions between different environments.)41 Consider this message from “AKfour seven,” writing via a Brazilian domain hosted on an ISP in Scranton, Pennsylvania:
I stand here today humbled by the task before [url=http://www.bawwgt.com] dofus kamas[/url], grateful for the trust you have bestowed, mindful of the sacri- fices borne by our [url=http://www.bawwgt.com]cheap dofus kamas[/url]. I thank President [url=http://www.bawwgt.com]dofus power leveling[/url] for his service to [url=http://www.bawwgt.com]buy dofus kamas[/url], as well as the generosity and cooperation he has shown throughout this transition.
It’s President Obama’s inaugural address, intercut with links to a site whose business it is to sell currency (“Kamas”) and other desirables for the French online role-playing game Dofus, which features various tree people, archers, and gambling cats—and a substantial gray market for in-game currency sold for real money. It is not simply that the smallest is spliced into the biggest, major with minor, now with then. It is the use of written words under the condition of pure arbitrary utility. As digitized, searchable, copy- and-paste-ready text, it is all one continuous matter—almost shockingly atemporal and best analogically compared not to a library or a conversation but to the “polymer goo” that Harrison White uses to describe social structures, full of complex striations and from which many different shapes can be extruded depending on need.
Finally, a note on this idea of “atemporality” to close this section on litspam. The concept of atemporal media has been discussed recently in terms of digital aesthetics and music.The digitization of media moves them into a kind of continuous present of use, the way all recorded music can now occupy a single, shuffled state of immanence from wildly different points of creation. An mp3 of the antediluvian old-time musician Dock Boggs, as recorded in the Norton Hotel with a borrowed banjo in 1927, segues into the synthesizer layers of Oneohtrix Point Never, who creates music in 2010 that could pass for electrocosmic voyages on vinyl from the 1970s. Historicity becomes another stylistic element, like timbre. As Brian Eno has put it, it’s all “current” now, which brings the aesthetics of recording itself to the fore as a stylistic choice with its own content, as all sounds coexist in a permanent digital noon. Litspam, setting aside its ultimate purpose of slipping through or damaging filters to sell more porn site logins or discontinued toys, is an extraordinary form of digital atemporality. Histories and myths, poetry, instructions for pleaching the lime trees of an ornamental garden, religious exegesis, and online tax guides constitute one shape, of which a given litspam message is a probability-guided surface. “In which gravitation is a consequence of the curvature of spacetime which governs the motion of inertial objects. The South Park episode Conjoined Fetus Lady and Season 1 of Freaks and Geeks depict dodgeball as a potentially violent sport. August Anheuser Busch IV (born June 15, 1964) is the great-great-grandson of Anheuser-Busch founder Adolphus Busch, the son of former chairman, president and CEO August Busch III. Many of these are produced by hurricanes or tropical storms along the coastal plain.”
Litspam was only one new form of postfilter spam among many, of course. Graham predicted that “the spam of the future will probably look something like this: ‘Hey there. Thought you should check out the following: http://www.27meg.com/foo.’” It squeaks by the filter on neutral language but gets caught with its suspicious URL, and we have indeed seen quite a bit of that, along with the variety of old-fashioned spam that makes it past imperfectly installed and trained filters. (Filtering also created a genius for euphemism on the part of spammers. A few of many, many terms in recent spam messages for promises about the male anatomy, which almost approach poetic allusion: “your engine in pants”/“Drilling machine”/“‘in- work-condition’ tool”/“Crazy penetrator!”/“Meatstick-champion!”/“Your nighttime failure”/“Make your volcano erupt over lion”/“the thing as you deserve it” and many others). Litspam remains something remarkable and special as an unintended consequence within the unintended consequence of spam: a loop of mechanical readers and mechanical writers generating texts from within the uncanny valley identified by Masahiro Mori. It is the chance meeting of Ulysses and a telephone interchange, as strange to our eyes as a pedantic speech from an ape, a tale told by a robot.