They call it culturomics: the obvious play on the word “genomics” looks at trends in human thought and culture. But scientists say culturomics has been hampered by a lack of quantitative data. So researchers at Harvard, along with Google, Encyclopedia Britannica, and the American Heritage Dictionary, have come up with a new tool.
It’s a database of 5.2 million books, published since the year 1500. That’s four percent of all the books ever published, with a total of 500 billion words. The focus is on English language culture, so three quarters of the books are in English.
Among the first findings of the research, published in the journal Science [Jean-Baptiste Michel et al., "Quantitative Analysis of Culture Using Millions of Digitized Books"]: about, 8500 new words enter the English language annually. But many of them don’t end up in dictionaries. And about fame—actors become famous around age 30, writers around 40, and politicians around 50. But the fame of politicians can eventually exceed that of actors.
A Google tool called the Books Ngram Viewer is available based on this data—users can track the usage and frequency of a word or phrase over the past few centuries. Thus, we can watch the fall and rise of Melville. And soon the rise and fall of Snooki.
[The above text is an exact transcript of this podcast.]