“NEW TWIST IN AFFECT”: SPLOGGING
THE POPULAR VOTE
Screams of frightened women, choked Sobs, truly communicative Tears, little brusque Laughs . . . Howls, Chokings, Encore!, Recalls, silent Tears, Threats, Recalls with additional Howls, Pounding of approbation, uttered Opinions, Wreaths, Prin- ciples, Convictions, moral Tendencies, epileptic Attacks, Childbirth, Insults, Sui- cides, Noises of discussions (Art-for-art’s-sake, Form and Idea), etc.
—Villiers de l’Isle-Adam, explaining some of the settings of his automatic theatri- cal public, “La machine à gloire,” Contes cruels, 1874
Terra’s blog is titled “Tyler tyler honored with rd Jonas E,” and subtitled “Nine State Regulators Investigating Auction Bonds, Group Says.The City of Tyler Traffic Engineering Department installed the City.” One of her posts on July 16, 2008, titled “Tyler State Trial Law Litigation Lawyer Attorney Robert M.,” opens:
Our web servers cannot find the page or file you asked for. Best choice of the month: _blood pressure.
Button to return to the previous page. Astronomers on verge of finding Earth’s twin. The cost of having a product regis- tered is now estimated to be around million. Century, the public may demand that the federal _Bar board massachusetts overseer_ register products that are effective against bed bugs.
My mother lives in _Affect metoprolol_ side housing and has been dealing with this for about a year now.
This continues for another 1,300 words, and Terra posted three times on July 16 alone. In June, she posted 160 entries, about five a day, each from several hundred to a few thousand words. This is not her only blog, either; according to her Blogger profile, she has eleven others, with titles like “S first try boosted the team as Biko” and “Only two USB fujitsu three is always.” The bizarre stop-start rhythm of her posts makes them difficult to stop quoting. Their language has no heritage in oral speech and lacks the syntactic edges that imply beginning and ending. As with litspam messages, the jolting movement from paragraph to paragraph feels much closer to channel-surfing cable television than to any literary medium: “Oprah ends three weeks of vegan eating. Astronomers on verge of finding Earth’s twin. Seeing more people living out of their cars.” Then comes a sudden transition into the diaristic, with “I” sentences, opinions, and the rhythmic clauses characteristic of online thinking-aloud: “I don’t think it’s a numbers game, but I think whatever view you end up with, it does not have to be a majority point of view, that reasons have weight, not just adding up whoever agrees with you.” Her posts are full of links, most of which go to similar blogs: vollybllgrl’s blog “It was a powerline that brought down a Black Hawk black last night in northeast Alabama,” for example, or a post on the blog “Default title” by manning6029 with the oddly Ballardian phrase “Picture of blonde girl in Morocco is new twist in Affect.”
Terra is, of course, a robot, as are vollybllgrl, manning6029, Geriut of “The most dazed part of our democracy,” etylycigob of “The Triad Lady Knights cross country team had a kylee season,” and countless more. They are producing splogs, or spam blogs—one of the forms search engine spam took in response to the PageRank strategy of Google and its third-gen- eration search imitators. Splogs now account for more than half of the total number of all blogs. (In comparison, second-generation nonblog spam pages, stuffed with keywords and links, form roughly 8 percent of the total of all web pages being indexed.) The patterns of data from splogs and spings (spam pings—a link signal sent by a blog, presented on the linked blog like a comment, and theoretically driving traffic and PageRank) map with striking accuracy onto the patterns of email spam, with similar spikes—around the holidays, for example—and mysterious troughs, during which some waning of the moon causes output to die down for a few days or weeks. How do they work?
As the PageRank system became more widely known and understood, Google gathered market share and other search engines began to follow its more refined ranking model. (Google’s ranking system is considerably more elaborate than the bare bones of PageRank, of course, and continues to grow in parameters and complexity to this day, but the basic outline is what search engine spammers were responding to—and that suffices to understand their methods.) A variety of strategies developed as websites with a high PageRank were transformed into kingmakers. A link from them could move a page into the top ten or top three returns of the different search sites, boosting attention and revenue. The theoretical notion of a “reputation economy” was becoming thoroughly applied here. Link trading began as a second-generation approach, along with requests for a positive mention and a clickable link, and third-generation search amplified these methods. Sites issued “Best of the Web” awards, “Top 100 Sites” awards, and so forth; awards included a badge, a little image, and a snippet of code to be copied into the winning site—a snippet that included a link to the award-giving site. The human user saw a little badge image, but the search engine spider saw an outgoing link, that is, an endorsement. New habits of use and etiquette appeared among ordinary users of the web: a comment in a blog post included the commenters’ websites along with their names, to rack up another link. Posting something without including a “via” link to the person you got it from—the “via” being an additional outbound link as a kind of thanks for using their discovery—became increasingly rude, the sign of an uncouth person.
These techniques only brushed the surface of the transformation in spam practices. PageRank tried to solve the problem of relevance and the problem of spam in one stroke by incorporating the expression of social relationships, communities, and human choices. In theory, social structures are much harder to game for spam purposes, but their robot-readable expression online is not. The second-generation-based award badges were one strategy among many. Domain flooding, for example, was the creation of tens or hundreds of sites to redirect to the target site. Link farms or “mutual admiration societies” arose: these were huge link-heavy blocs of sites, each page linking to many of the others, with their accumulated “votes” rented out. They charged for outbound links from the farm, like penurious aristocrats charging to have their renowned ancient name and reputation associated with some unknown member of the nouveau riche. In the third generation, spam began to move into the creation of its own social graph—producing the appearance, if not the reality, of its own society.
Generating the expression of a nonexistent social phenomenon required the creation of much more content than previous search-spamming proj- ects while avoiding certain telltale signs of robots at work. Old-fashioned attempts to artificially alter the link graph had signature patterns.The bulky shape of heavy cross-linking within a group of sites, all with only a few inbound links (because spam pages are lonesome), created little islands of intense self-endorsement with no outside involvement. To the right ana- lytic tools, it’s a pattern as obvious as the newspaper ads taken out by vanity publishing houses for their new releases with the blurbs solely from friends and other writers in the same situation. Search engines could correct for these islands with modifications to the algorithm. Also, although complete web pages could be almost entirely automatically generated, they still required the purchase and maintenance of a stable of domain names and hosting plans with a service provider, which could get expensive.What was needed for third-generation search spam was a way to very rapidly generate new content, seeded with links, in a wide variety of different locations online as in a genuine community.
In 1999, a company called Pyra Labs launched a service called Blogger. The concept of a weblog—a chronological series of entries from newest to oldest—is pleasantly intuitive and diarylike; the concept of Blogger, as of so many related systems from Flickr to Wikipedia, was to provide people with an equally intuitive means for publishing their content. It was remotely hosted, so you did not have to own a website domain name or pay for hosting; many of its processes were automated, so you did not have to design it or do any coding behind the scenes; and it had a useful and increasingly sophisticated Application Programming Interface (API). An API is a set of requests that a web service can support from other pro- grams—tools that programs can use to engage with the service. An API makes it easier to automate the publishing process, and on a platform like Blogger (which was acquired by Google in 2003), this automated publishing requires very little effort to manage a lot of content. You can delegate the account creation process, the choice of settings, the ratio of outbound links to content, and the posting frequency to a program. The missing piece here is the words for the blog, but words come ready-made in the form of RSS feeds.
RSS (the acronym originally stood for RDF Site Summary but has changed to mean the more explanatory Really Simple Syndication) is a format closely associated with the development of blogs; it makes new posts or other changes on a site available in forms that are easy to use. Feed readers can gather the latest entries from RSS-enabled sites, material can be forwarded to mobile devices, and a page can feature the headlines or recent posts from other sites. From the perspective of a spam blogger, this feature is like a faucet for words. Samuel Beckett once said of the Cut-Up technique of William Burroughs and Brion Gysin “That’s not writing, it’s plumbing”—a prescient remark now that we have a means of writing that really is like plumbing: lay the pipes, the tank, the cut-off valves, and then open the taps and leave the room. A splog production system will pull in RSS feeds from other blogs and news sources, chop them up and remix them according to rules, insert relevant links, and post the resulting material, hour after hour and day after day, with minimal human supervision. Terra put a new post up already, while this section was being written, titled “After it became an tyler city”: “A witness reports a nun went crazy upon realizing that the man next to her in line was the Epinephrine frontman. Ghost Town Poster Disappoints Me, Gervais. Some things, like gravity, must also be close to.” And so on, and on, and on.
All splogs do not sound alike: some are built on the “excerpt” model, with fragments of about 350 characters taken whole from other blogs. These fragments are chosen from posts that are polling particularly well in Google, with good keyword metrics. Their goal is to make money through contextual advertising, in which page views and the occasional clickthrough are the best that can be hoped for. These create parasitic relationships with authors. One of the Internet’s many, many interchange- able product-review bloggers notes that being excerpted by splogs is a sign that you are choosing the right topics and words, because the splogs are copying you; if you want to attract more of them, because they offer backlinks to your site with their excerpts, “create posts with a popular keyword, like iPhone for example.” The behavior of excerpt splogs is straightforward: they are drawn to the right language like ants to honey.
Splogs built on a full content model, as Terra’s is, play a larger and more subtle game, cross-linking in their hundreds and thousands to distort the shape of the web. Each splog is assigned a set of keywords and feeds from which to pull related text. This is why Terra’s blog sounds like the product of someone with a feverish, pathological obsessions with Tylers. It pulls a set of RSS feeds and headlines based around “Tyler” as the keyword, with a few others for variation; thus post after post reports from a strange universe where several cities and schools named Tyler, the director Tyler Perry, the economist Tyler Cowen (who blogs), and posts and news articles that mention Tylers are all of equal significance. With experience, one begins to see the patterns. “The TV adaptation of the big screen movie features Nicole Ari Parker, Vanessa Williams and Malinda Williams” refers to one of Perry’s projects; “The sociologist Max Weber introduced a distinction between consumer” is a broken fragment from Cowen. Interspersed with this Tyler compulsion is the jarring appearance of the functional language of web design, as in “Button to return to the previous page,” used within paragraphs of first-person sentences.
How far removed language is at this point from anything meant for humans! Terra’s blog links to other splogs, which link to still more, forming an insular community on a huge range of sites—a kind of PageRank greenhouse that is not in itself meant to be read by people. A human seeing a splog post immediately knows that something is wrong and can flag the splog to be taken down by the network administrator. Splogs of Terra’s type are not meant to interact with humans at all; they are created solely for the benefit of search engine spiders. They do not imitate indvidual humans—they are not the computational equivalent of “George Kaplan,” the nonexistent secret agent whose train tickets and hotel rooms in North by Northwest are meant to convey a particular human life. They only work from a distance, appearing to be groups of people, with the language and links functioning in aggregate. If splogs are like any previous technological artifact, they are akin to the “QL” sites constructed during the Second World War to mislead night bombing runs: rickety structures of pipe, wooden frames, wire netting, and lights that if seen from far enough away look like a town, with railway signals, lamps, and open doors. Taken in statistical total and algorithmic analysis, splogs resemble the patterns of a thriving community. Their posts are pitched at precisely the level of complexity the spider requires to accept their input as human, and they adapt human text for other machines to read and act on. Influence on humans is a second-order result.