“The man who wrote the note is a German. Do you note the peculiar construction of this sentence?” These were the words of Sherlock Holmes in “A Scandal in Bohemia,” analyzing a note from a client, unmasking the King of Bohemia incognito, and incidentally, establishing himself as a brilliant literary analyst. It is impossible to keep a secret from the legendary Sherlock Holmes, who can read an ocean from a drop of water. Just as the paper would have carried the marks of the royal fingers, to the skilled reader the writing carried the marks of the royal mind.
Fiction has recently become fact with the improving science of stylometry, the study of writing style. In 1964, Frederick Mosteller and David Wallace published a three-year study of the distribution of common words in the Federalist Papers and showed that the writing style of Alexander Hamilton and James Madison differed in subtle ways. For example, only Madison used the word “whilst” (Hamilton used “while” instead). More subtly, while both Hamilton and Madison used the word “by,” Madison used it much more frequently, enough that you could guess who wrote which papers by looking at how frequently the word was used. Mosteller and Wallace took this work it its conclusion, and were able to show that certain “disputed” papers, claimed by both Hamilton and Madison, were overwhelmingly likely to have come from Madison’s pen. Today, computers can do this type of analysis in seconds, whether to uncover a case of murder-disguised-as-suicide, study an anonymous medieval poem, resolve disputes about authorial credit, or even provide political asylum for a refugee. In the last case, for example, a critic of a repressive foreign government claimed asylum on the basis of articles he had written and published on-line. The problem, though, was that the articles had been published anonymously. This wouldn’t necessarily stop a repressive secret service, in a place where mere suspicion is enough for imprisonment. But this technology was able to convince the immigration judge of his authorship of the documents in question, and hence to let him stay.
Over the past decade, I have developed a computer program to do this sort of analysis of writing style, based on literally millions of different features. This program will take a sample of writing and determine, on the basis of similarity, who among a set of authors was most likely to have written that sample. In July, I received an email from a reporter for London’s Sunday Times asking if I could help them solve a mystery. The reporter had received a tip that J. K. Rowling had secretly penned a novel under a pen name: The Cuckoo’s Calling, by Robert Galbraith, who was described as a former member of the Royal Military police, and whose novel had grown “directly out of his own experiences and those of his military friends.” The tip was at least plausible. Rowling and Galbraith had the same agent and editor. The book was unusually accomplished for a supposed first-time novelist. And Galbraith, a man who had ostensibly spent years in uniform, was surprisingly good at describing women’s clothing. But hard evidence was still lacking. The reporter wanted to know what the computer program could determine.