NO ONE DOUBTS that the words we write or speak are an expression of our inner thoughts and personalities. But beyond the meaningful content of language, a wealth of unique insights into an author’s mind are hidden in the style of a text—in such elements as how often certain words and word categories are used, regardless of context.

It is how an author expresses his or her thoughts that reveals character, asserts social psychologist James W. Pennebaker of the University of Texas at Austin. When people try to present themselves a certain way, they tend to select what they think are appropriate nouns and verbs, but they are unlikely to control their use of articles and pronouns. These small words create the style of a text, which is less subject to conscious manipulation.

Pennebaker’s statistical analyses have shown that these small words may hint at the healing progress of patients and give us insight into the personalities and changing ideals of public figures, from political candidates to terrorists. “Virtually no one in psychology has realized that low-level words can give clues to large-scale behaviors,” says Pennebaker, who, with colleagues, developed a computer program that analyzes text, called Linguistic Inquiry and Word Count (LIWC, pronounced “Luke”). The software has been used to examine other speech characteristics as well, tallying up nouns and verbs in hundreds of categories to expose buried patterns.

Character Count
Most recently, Pennebaker and his colleagues used LIWC to analyze the candidates’ speeches and interviews during last fall’s presidential election. The software counts how many times a speaker or author uses words in specific categories, such as emotion or perception, and words that indicate complex cognitive processes. It also tallies up so-called function words such as pronouns, articles, numerals and conjunctions. Within each of these major categories are subsets: Are there more mentions of sad or happy emotions? Does the speaker prefer “I” and “me” to “us” and “we”? LIWC answers these quantitative questions; psychologists must then figure out what the numbers mean. Before LIWC was developed in the mid-1990s, years of psychological research in which people counted words by hand established robust connections between word usage and psychological states or character traits

The political candidates, for example, showed clear differences in their speaking styles. John McCain tended to speak directly and personally to his constituency, using a vocabulary that was both emotionally loaded and impulsive. Barack Obama, in contrast, made frequent use of causal relationships, which indicated more complex thought processes. He also tended to be more vague than his Republican rival. Pennebaker’s team has posted a far more in-depth breakdown, including analyses of the vice presidential candidates, at

Skeptics of LIWC’s usefulness point out that many of these characteristics of McCain’s and Obama’s speeches could be gleaned without the use of a computer program. When the subjects of analysis are not accessible, however, LIWC may provide a unique insight. Such was the case with Pennebaker’s study of al Qaeda communications. In 2007 he and several co-workers, under contract with the FBI, analyzed 58 texts by Osama bin Laden and Ayman al-Zawahiri, bin Laden’s second in command.

The comparison showed how much pronouns are able to disclose. For example, between 2004 and 2006 the frequency with which al-Zawahiri used the word “I” tripled, whereas it remained constant in bin Laden’s writings. “Normally, higher rates of ‘I’ words correspond with feelings of insecurity, threat and defensiveness. Closer inspection of his ‘I’ use in context tends to confirm this,” Pennebaker says.

Other studies have shown that words that are used to express balance or nuance (“except,” “but,” and so on) are associated with higher cognitive complexity, better grades and even the truthfulness with which facts are reported. For bin Laden, analysis showed that the thought processes in his texts had reached a higher level over the years, whereas those of his lieutenant had stagnated.

Healing Words
This power of statistical analysis to quantify a person’s changing language use over time is a key advantage to programs such as LIWC. In 2003 Pennebaker and statistician R. Sherlock Campbell, now at Yale University, used a statistical tool called latent semantic analysis (LSA) to study the diary entries of trauma patients from three earlier studies, looking for text characteristics that had changed in patients who were convalescing and met rarely with their physician. Again, the researchers showed that content was unimportant. The factor that was most clearly associated with recovery was the use of pronouns. Patients whose writings changed perspective from day to day were less likely to seek medical treatment during the follow-up period.

It may be that patients who describe their situation both from their own viewpoint and from the perspective of others recover more quickly from traumatic experiences—a variation on the already well-established idea that writing about negative experiences is therapeutic. Or perhaps the LSA simply detected the patients’ recovery as reflected by their writing but not brought about by it—in that case, programs such as LIWC could aid doctors in diagnosing illness and gauging treatment progression. Researchers are currently investigating many other patient groups, including those with cancer, mental illness and suicidal tendencies, using LIWC to uncover clues about their emotional well-being and their mental state.

Although the statistical study of language is relatively young, it is clear that analyzing patterns of word use and writing style can lead to insights that would otherwise remain hidden. Because these tools offer predictions based on probability, however, such insights will never be definitive. “In the final analysis, our situation is much like that of economists,” Pennebaker says. “It’s too early to come up with a standardized analysis. But at the end of the day, we all are making educated guesses, the same way economists can understand, explain and predict economic ups and downs.”

He Said, She Said
The way we write and speak can reveal volumes about our identity and character. Here is a sampling of the many variables that can be detected in our use of style-related words such as pronouns and articles:

  • Gender: In general, women tend to use more pronouns and references to other people. Men are more likely to use articles, prepositions and big words.
  • Age: As people get older, they typically refer to themselves less, use more positive-emotion words and fewer negative-emotion words, and use more future-tense verbs and fewer past-tense verbs.
  • Honesty: When telling the truth, people are more likely to use first-person singular pronouns such as “I.” They also use exclusive words such as “except” and “but.” These words may indicate that a person is making a distinction between what they did do and what they did not do—liars often do not deal well with such complex constructions.
  • Depression and suicide risk: Public figures and published poets use more first-person singular pronouns when they are depressed or suicidal, possibly indicating excessive self-absorption and social isolation.
  • Reaction to trauma: In the days and weeks after a cultural upheaval, people use “I” less and “we” more, suggesting a social bonding effect.

Note: This article was originally printed with the title, "You Are What You Say."