So I admit I looked for my name online, an egosurfing trip to see how well upstart search engine Bing compared with reigning champ Google. That's when I discovered my evil twin.
Apparently an Internet bot stole my identity and used it to set up a blog and post spam under my name—with entries, at times, supposedly postmarked from the future. Quick, where's Philip K. Dick when you need him?
Here's one of my purported posts from the future, as it appeared in late August on free Web site host Tripod.com (the site has since been taken down):
Nov 23, 2009 by by charles q choi
Apprentice would be if you arent. Hardly use the front or hardly. Repair shops that the shops that carry out inspections and able.
Spam blogs such as this one started becoming a problem in late 2005, "when it became clear their numbers exceeded the number of real blogs," says David White, vice president of publishing for Technorati, the first blog search engine. Of the roughly two million pings Technorati gets per hour—messages that blogs send out so search engines can learn of newly published posts—more than 90 percent are from these spam blogs, or "splogs." (This high rate is a result of spam blogs updating more frequently than real blogs do. Media services firm Universal McCann estimates that, all told, 184 million blogs currently exist worldwide.)
Splogs are typically created automatically by commercially available software. Judging by certain similarities between these blogs, a large fraction are probably created and maintained by a small number of active spammers, "maybe a couple of dozen," explains computer scientist Tim Finin at the University of Maryland, Baltimore County, in Catonsville.
As with most spam, the purpose behind these blogs is greed. Spammers often create these splogs and populate them with advertisements in the hope that some hapless user clicks on them and sends revenue the spammer's way. Spam blogs can also boost the prominence of other pages in search engine returns by linking to them, a service that spammers can sell.
"One of the fastest way to get content onto the Web is from blogs, so it's not surprising that spam targets blogs," Finin says. "If I make a post on a blog, it gets indexed into Google's blog search in about five minutes, and then gets pushed out into Google's regular results 10 minutes later."
The high cost of splogs
As a result, spam blogs can waste valuable disk space and bandwidth and harm search engines by degrading their results. Malicious links on these blogs can also steal data or exploit vulnerabilities on a computer if users click on them. Market analyst firm Ferris Research of San Francisco estimates that spam will cost the world $130 billion this year in lost productivity and anti-spam measures, with $42 billion of that in the U.S. alone.
As to why spam blogs are often filled with nonsense, "a page will get penalized in terms of their ranking by search engines if they are duplicating content wholly from somewhere else," White explains. "So spammers might be avoiding this problem by making spam blog text unique through random combinations of words and terms."
Nowadays, Finin adds, "spam blogs often capture text from the Web. It's really easy to get a program to plagiarize other blogs, which often offer their content up as RSS feeds." Neither Finin nor White could explain why some posts seemed to originate from the future, but Finin hazards a guess: "I think many spam blogs are set up [by] people who are not extremely competent."
So a bot probably just chose my name at random for the blog? That's a relief.
"Actually, I'm pretty sure it was intentional," White explains. "We do get reports of bloggers finding another blog impersonating them. They steal content from legitimate sites to make their own more relevant to searches."
Well, so much for curbing my paranoia.
"It's probably not a threat to your career," Finin consoles. "You should be more concerned that your material is being misused for someone else's gain and potentially associated with things that you don't want to be linked with. Our research blog was compromised by code injected into it—the version seen by search engines but not humans had all these terms associated with gay sex products and services."
Ironically, when it comes to fighting spam bots, the answer might be more bots—more specifically, artificially intelligent bots. "The main technique today to identify these blogs is machine learning, an artificial intelligence technique," Finin explains. "The trick is to identify the fingerprints of these robot blogs—what words they use, their patterns of updating, the ads they host."
Of course, spammers constantly refine their techniques, "and so your anti-spam program might fall behind," Finin adds. To make sure their bots stay on top, Finin and his colleagues have devised a strategy where multiple bots check one another's results to make sure they are keeping up with spam.
"There's probably always going to be an arms race between spammer and anti-spammer," Finin says. And the battle will likely persist. "I just see spam as human nature," he notes. "There are going to always be some people that are going to try and fool others for some selfish goal."
Human nature indeed—or, perhaps, inhuman nature.