What if you were asked to describe images you saw in an inkblot or to invent a story for an ambiguous illustration--say, of a middle-aged man looking away from a woman who was grabbing his arm? To comply, you would draw on your own emotions, experiences, memories and imagination. You would, in short, project yourself into the images. Once you did that, many practicing psychologists would assert, trained evaluators could mine your musings to reach conclusions about your personality traits, unconscious needs and overall mental health.

But how correct would they be? The answer is important because psychologists frequently apply such "projective" instruments (which present people with ambiguous images, words or objects) as components of mental assessments, and the outcomes can profoundly affect the lives of the respondents. The tools often serve, for instance, as aids in diagnosing mental illness, in predicting whether convicts are likely to become violent after being paroled, in evaluating the mental stability of parents engaged in custody battles, and in discerning whether children have been sexually molested.

To gauge their relevance, we have reviewed a large body of research into how well projective methods work, concentrating on three of the most extensively used and best-studied instruments. Overall our findings are unsettling.

Butterflies or Bison?
The famous Rorschach inkblot test--which asks people to describe what they see in a series of 10 inkblots--is by far the most popular of the projective methods, given to hundreds of thousands, or perhaps millions, of people every year. The research discussed below refers to the modern, rehabilitated version, not to the original construction, introduced in the 1920s by Swiss psychiatrist Hermann Rorschach.

The initial tool came under severe attack in the 1950s and 1960s, in part because it lacked standardized procedures and a set of norms (averaged results from the general population). Standardization is important because seemingly trivial differences in the way an instrument is administered can affect a person's responses to it. Norms provide a reference point for determining when someone's responses fall outside an acceptable range.

In the 1970s John E. Exner, Jr., then at Long Island University, ostensibly corrected the problems in the early Rorschach test by introducing what he called the Comprehensive System. This set of instructions established detailed rules for delivering and scoring the inkblot exam and for interpreting the responses, and it provided norms for children and adults.

In spite of the Comprehensive System's current popularity, it generally falls short on two crucial criteria that were also problematic for the original Rorschach: scoring reliability and validity. A tool possessing scoring reliability yields similar results regardless of who grades and tabulates the responses. A valid technique measures what it aims to measure: its results are consistent with those produced by other trustworthy instruments or are able to predict behavior, or both.

To understand the Rorschach's scoring reliability defects, it helps to know something about how reactions to the inkblots are interpreted. First, a psychologist rates the collected reactions on more than 100 characteristics, or variables. The evaluator, for instance, records whether the person looked at whole blots or just parts, notes whether the detected images were unusual or typical of most test takers, and indicates which aspects of the inky swirls (such as form or color) most determined what the respondent reported seeing.

Then the examiner compiles the findings into a psychological profile of the individual. As part of that interpretative process, psychologists might conclude that focusing on minor details (such as stray splotches) in the blots, instead of on whole images, signals obsessiveness in a patient and that seeing things in the white spaces within the larger blots, instead of in the inked areas, reveals a negative, contrary streak.

For the scoring of any variable to be considered highly reliable, two different assessors should be very likely to produce similar ratings when examining any given person's responses. Recent investigations demonstrate, however, that many of the Rorschach scores weighted heavily by clinicians display unsatisfactory agreement. As a consequence, clinicians may often arrive at quite different interpretations of people's responses.

Equally troubling, analyses of the Rorschach's validity indicate that it is poorly equipped to identify most psychiatric conditions--with the notable exceptions of schizophrenia and other disturbances marked by disordered thoughts, such as bipolar disorder (manic depression). Despite claims by some Rorschach proponents, the method does not consistently detect depression, anxiety disorders or antisocial personality (a condition characterized by dishonesty, callousness and lack of guilt).

Moreover, although psychologists frequently administer the Rorschach to assess propensities toward violence, impulsiveness and criminal behavior, most research suggests it is not valid for these purposes either. Similarly, no compelling evidence supports its use for helping to detect sexual abuse in children.

Other problems have surfaced as well. Some evidence indicates that the Rorschach norms meant to distinguish mental health from mental illness are unrepresentative of the U.S. population and mistakenly make many adults and children seem maladjusted. For instance, in a 1999 study of 123 adult volunteers at a California blood bank, one in six had scores supposedly indicative of schizophrenia.

The inkblot results may be even more misleading for minorities. Several investigations have shown that scores for African-Americans, Native Americans, Native Alaskans, Hispanics, and Central and South Americans differ markedly from the norms. Together the collected research raises serious doubts about the use of the Rorschach inkblots in the psychotherapy office and in the courtroom.

Doubts about TAT
Another projective tool--the Thematic Apperception Test (TAT)--may be as problematic as the Rorschach. This method asks respondents to formulate a story based on ambiguous scenes in drawings on cards. Among the 31 cards available to psychologists are ones depicting a boy contemplating a violin, a distraught woman clutching an open door, and a woman who is grabbing the arm of a man who is looking away. One card, the epitome of ambiguity, is totally blank.

The TAT has been called "a clinician's delight and a statistician's nightmare," in part because its administration is usually not standardized: different clinicians present different numbers and selections of cards to respondents. Also, most clinicians interpret people's stories intuitively instead of following a well-tested scoring procedure. Indeed, a recent survey of nearly 100 North American psychologists practicing in juvenile and family courts discovered that only 3 percent relied on a standardized TAT scoring system. Unfortunately, some evidence suggests that clinicians who interpret the TAT in an intuitive way are likely to overdiagnose psychological disturbance.

Many standardized scoring systems are available for the TAT, but some of the more popular ones display weak "test-retest" reliability: they tend to yield inconsistent scores from one picture-viewing session to the next. Their validity is frequently questionable as well; studies that find positive results are often contradicted by other investigations. For example, several scoring systems have proved unable to differentiate normal individuals from those who are psychotic or depressed.

A few standardized scoring systems for the TAT do appear to do a good job of discerning certain aspects of personality--notably the need to achieve and a person's perceptions of others (a property called "object relations"). But many times individuals who display a high need to achieve do not score well on measures of actual achievement, so the ability of that variable to predict a person's behavior may be limited. These scoring systems currently lack adequate norms and so are not yet ready for application outside of research settings, but they merit further investigation for possible use in therapy.

Faults in the Figures
In contrast to the Rorschach and the TAT, which elicit reactions to existing images, a third projective approach asks the people being evaluated to draw the pictures. A number of these instruments, such as the frequently applied Draw-a-Person Test, have examinees depict a human being; others have them draw houses or trees as well. Clinicians commonly interpret the sketches by relating specific "signs"--such as features of the body or clothing--to facets of personality or to particular psychological disorders. They might associate large eyes with paranoia, long neckties with sexual aggression, missing facial features with depression, and so on.

As is true of the other methods, the research on drawing instruments gives reason for serious concern. In some studies, raters agree well on scoring outcomes, yet in others the agreement is poor. What is worse, no strong evidence supports the validity of the sign approach to interpretation; in other words, clinicians apparently have no grounds for linking specific signs to particular personality traits or psychiatric diagnoses. Nor is there consistent evidence that signs purportedly linked to child sexual abuse (such as tongues or genitalia) actually reveal a history of molestation.

The only positive result found repeatedly is that, as a group, people who draw human figures poorly have somewhat elevated rates of psychological disorders. On the other hand, studies show that clinicians are likely to attribute mental illness to many normal individuals who simply lack artistic ability.

Certain proponents argue that sign approaches can be valid in the hands of seasoned experts. Yet one group of researchers reported that experts who administered the Draw-a-Person Test were less accurate than graduate students at distinguishing psychological normality from abnormality.

A few global scoring systems, which are not based on the interpretation of signs, might be useful. Instead of assuming a one-to-one correspondence between a particular feature of a drawing and a personality trait, psychologists who apply such methods combine many aspects of the pictures to come up with a general impression of a person's adjustment. In a study of 52 children, a global scoring approach helped to distinguish normal individuals from those with mood or anxiety disorders. In another report, global interpretation correctly differentiated 54 normal children and adolescents from those who were overly aggressive or who were extremely disobedient. The global approach may work better than the sign approach because the act of aggregating information can cancel out "noise" from variables that provide misleading or incomplete information.

Our literature review, then, indicates that, as usually administered, the Rorschach, TAT and human figure drawings are useful only in very limited circumstances. The same is true for many other projective techniques [see box on page 55].

We have also found that even when the methods assess the psychological traits they claim to measure, they tend to lack what psychologists call "incremental validity": they rarely add much to information that can be obtained in other, more practical ways, such as by conducting interviews or administering objective personality tests. (Objective tests seek answers to relatively clear-cut questions, such as "I frequently have thoughts of hurting myself--true or false?") The lack of added insight provided by projective tools makes their costs in money and time hard to justify.

What to Do?
Some mental health professionals disagree with our conclusions. They argue that projective tools have a long history of constructive use and, when administered and interpreted properly, can cut through the veneer of respondents' self-reports to provide a picture of the deepest recesses of the mind. Our critics have also asserted that we have emphasized negative findings to the exclusion of positive ones.

Yet we remain confident in our conclusions. In fact, as negative as our overall findings are, they may paint an overly rosy picture of projective techniques because of the so-called file drawer effect. As is well known, scientific journals are more likely to publish reports demonstrating that some procedure works than reports finding failure. Consequently, researchers often quietly file away their negative data, which may never again see the light of day.

We find it troubling that psychologists commonly administer projective instruments in situations for which their value has not been well established by multiple studies; too many people can suffer if erroneous diagnostic judgments influence therapy plans, custody rulings or criminal court decisions. Based on our findings, we strongly urge psychologists to curtail their use of most projective techniques and, when they do select such instruments, to limit themselves to scoring and interpreting the small number of variables that have been proved trustworthy.

Our results also offer a broader lesson for practicing clinicians, psychology students and the public at large: even seasoned professionals can be fooled by their intuitions and their faith in tools that lack strong evidence of effectiveness. When a substantial body of research demonstrates that old intuitions are wrong, it is time to adopt new ways of thinking.