Reprinted from Are We Getting Smarter? Rising IQ in the Twenty-First Century, by James R. Flynn. Copyright © 2012 James R. Flynn. Reprinted with the permission of Cambridge University Press.

The phenomenon of IQ gains has created unnecessary controversy because of conceptual confusion. Imagine an archaeologist from the distant future who excavates our civilization and finds a record of performances over time on measures of marksmanship. The test is always the same, that is, how many bullets you can put in a target 100 meters away in a minute. Records from 1865 (the U.S. Civil War) show the best score to be five bullets in the target, records from 1898 (Spanish-American War) show 10, and records from 1918 (World War I) show 50.

A group of "marksmanship-metricians" looks at these data. They find it worthless for measuring marksmanship. They make two points. First, they distinguish between the measure and the trait being measured. The mere fact that performance on the test has risen in terms of "items correct" does not mean that marksmanship ability has increased. True, the test is unaltered but all we know is that the test has gotten easier. Many things might account for that. Second, they stress that we have only relative and no absolute scales of measurement. We can rank soldiers against one another at each of the three times. But we have no measure that would bridge the transition from one shooting instrument to another. How could you rank the best shot with a sling against the best shot with a bow and arrow? At this point, the marksmanship-metrician either gives up or looks for something that would allow him to do his job. Perhaps some new data that would afford an absolute measure of marksmanship over time such as eye tests or a measure of steady hands.

However, a group of military historians are also present and it is at this point they get excited. They want to know why the test got easier, irrespective of whether the answer aids or undermines the measurement of marksmanship over time. They ask the archaeologists to look further. Luckily, they discover battlefields specific to each time. The 1865 battlefields disclose the presence of primitive rifles, the 1898 ones repeating rifles, and the 1918 ones machine guns. Now we know why it was easier to get more bullets into the target over time and we can confirm that this was no measure of enhanced marksmanship. But it is of enormous historical and social significance. Battle casualties, the industries needed to arm the troops, and so forth altered dramatically.

Confusion about the two roles has been dispelled. If the battlefields had been the artifacts first discovered, there would have been no confusion because no one uses battlefields as instruments for measuring marksmanship. It was the fact that the first artifacts were also instruments of measurement that put historians and metricians at cross-purposes. Now they see that different concepts dominate their two spheres: social evolution in weaponry—whose significance is that we have become much better at solving the problem of how to kill people quickly; marksmanship—whose significance is which people have the ability to kill more skillfully than other people can.

The historian has done nothing to undermine what the metrician does. At any given time, measuring marksmanship may be the most important thing you can do to predict the life histories of individuals. Imagine a society dominated by dueling. It may be that the lives of those who are poor shots are likely to be too brief to waste time sending them to university, or hire them, or marry them. If a particular group or nation lacks the skill, it may be at the mercy of the better skilled. Nonetheless, this is no reason to ignore everything else in writing military history.

Some years ago, acting as an archaeologist, I amassed a large body of data showing that IQ tests had gotten easier. Over the twentieth century, the average person was getting many more items correct on tests like Raven's and Similarities. The response of intelligence- or g-metricians was dual [Editor's note: See here for an exploration of the "general intelligence" factor g]. First, they distinguished IQ tests as measuring instruments from the trait being measured, that is, from intelligence (or g if you will). Second, they noted that in the absence of an absolute scale of measurement, the mere fact that the tests had gotten easier told us nothing about whether the trait was being enhanced. IQ tests were only relative scales of measurement ranking the members of a group in terms of items they found easy to items they found difficult. A radical shift in the ease/difficulty of items meant all bets were off. At this point, the g-metrician decides that he cannot do his job of measurement and begins to look for an absolute measure that would allow him to do so (perhaps reaction times or inspection times).

However, as a cognitive historian, this was where I began to get excited. Why had the items gotten so much easier over time? Where was the alteration in our mental weaponry that was analogous to the transition from the rifle to the machine gun? This meant returning to the role of archaeologist and finding battlefields of the mind that distinguished 1900 from the year 2000. I found evidence of a profound shift from an exclusively utilitarian attitude to concrete reality toward a new attitude. Increasingly, people felt it was important to classify concrete reality (in terms the more abstract the better); and to take the hypothetical seriously (which freed logic to deal with not only imagined situations but also symbols that had no concrete referents).

It was the initial artifacts that caused all the trouble. Because they were performances on IQ tests, and IQ tests are instruments of measurement, the roles of the cognitive historian and the g-metrician were confused. Finding the causes and developing the implications of a shift in habits of mind over time is simply not equivalent to a task of measurement, even the measurement of intelligence. Now all should see that different concepts dominate two spheres: society's demands—whose evolution from one generation to the next dominates the realm of cognitive history; and g—which measures individual differences in cognitive ability. And just as the g-metrician should not undervalue the non-measurement task of the historian, so the historian does nothing to devalue the measurement of which individuals are most likely to learn fastest and best when compared to one another.

I have used an analogy to break the steel chain of ideas that circumscribed our ability to see the light IQ gains shed on cognitive history. I hope it will convince psychometricans that my interpretation of the significance of IQ gains over time is not adversarial. No one is disputing their right to use whatever constructs are best to do their job: measuring cognitive skill differences between people.

But an analogy that clarifies one thing can introduce a new confusion. The reciprocal causation between developing new weapons and the physique of marksmen is a shadow of the interaction between developing new habits of mind and the brain.

The new weapons were a technological development of something outside our selves that had minimal impact on biology. Perhaps our trigger fingers got slightly different exercise when we fired a machine gun rather than a musket. But the evolution from preoccupation with the concrete and the literal to the abstract and hypothetical was a profound change within our minds that involved new problem-solving activities.

Reciprocal causation between mind and brain entails that our brains may well be different from those of our ancestors. It is a matter of use and structure. If people switch from swimming to weight-lifting, the new exercise develops different muscles and the enhanced muscles make them better at the new activity. Everything we know about the brain suggests that it is similar to our muscles. Maguire et al. (2000) found that the brains of the best and most experienced London taxi-drivers had an enlarged hippocampus, which is the brain area used for navigating three-dimensional space. Here we see one area of the brain being developed without comparable development of other areas in response to a specialized cognitive activity. It may well be that when we do "Raven's-type" problems certain centers of our brain are active that used to get little exercise; or it may be that we increase the efficiency of synaptic connections throughout the brain. If we could scan the brains of people in 1900, who can say what differences we would see?

Do huge IQ gains mean we are more intelligent than our ancestors? If the question is "Do we have better brain potential at conception, or were our ancestors too stupid to deal with the concrete world of everyday life," the answer is no. If the question is "Do we live in a time that poses a wider range of cognitive problems than our ancestors encountered, and have we developed new cognitive skills and the kind of brain that can deal with these problems?," the answer is yes. Once we understand what has happened, we can communicate with one another even if some prefer the label "more intelligent" and others prefer "different." To care passionately about which label we use is to surrender to the tyranny of words. I suspect most readers ask the second question, and if so, they can say we are "smarter" than our ancestors. But it would probably be better to say that we are more modern, which is hardly surprising!

The theory of intelligence
The thesis about psychometics and cognitive history, that they actually complement one another, and the remarks made about the brain imply a new approach to the theory of intelligence. I believe we need a BIDS approach: one that treats the brain (B), individual differences (ID), and social trends (S) as three distinct levels, each having equal integrity. The three are interrelated and each has the right to propose hypotheses about what ought to happen on another level. It is our job to investigate them independently and then integrate what they tell us into a coherent whole.

The core of a BIDS approach is that each level has its own organizing concept and it is a mistake to impose the architectonic concept of one level on another. We have to realize that intelligence can act like a highly correlated set of abilities on one level (individual differences), like a set of functionally independent abilities on another level (cognitive trends over time), and like a mix on a third level (the brain), whose structure and operations underlie what people do on both of the other two levels. Let us look at the levels and their organizing concepts.

  • Individual differences: Performance differences between individuals on a wide variety of cognitive tasks are correlated primarily in terms of the cognitive complexity of the task (fluid g)—or the posited cognitive complexity of the path toward mastery (crystallized g). Information may not seem to differentiate between individuals for intelligence but if two people have the same opportunity, the better mind is likely to accumulate a wider range of information. I will call the appropriate organizing concept "General Intelligence" or g, without intending to foreclose improved measures that go beyond the limitations of "academic" intelligence (Heckman & Rubenstein, 2001; Heckman, Stixrud, & Urzua, 2006; Sternberg, 1988, 2006; Sternberg et al., 2000).
  • Society: Various real-world cognitive skills show different trends over time as a result of shifting social priorities. I will call this concept "Social Adaptation." As I have argued, the major confusion thus far has been as follows: either to insist on using the organizing concept of the individual differences level to assess cognitive evolution, and call IQ gains hollow if they are not g gains; or to insist on using the organizing concept of the societal level to discount the measurement of individual differences in intelligence (e.g. to deny that some individuals really do need better minds and brains to deal with the dominant cognitive demands of their time).
  • The brain: Localized neural clusters are developed differently as a result of specialized cognitive exercise. There are also important factors that affect all neural clusters such as blood supply, dopamine as a substance that render neurons receptive to registering experience, and the input of the stress-response system. Let us call its organizing concept "Neural Federalism." The brain is a system in which a certain degree of autonomy is limited by an overall organizational structure.

Researchers on this level should try to explain what occurs on both of the other two levels. The task of the brain physiologist is reductionist. Perfect knowledge of the brain's role would mean the following: given data on how cognition varies from person to person and from time to time, we can map what brain events underlie both social and life histories. To flesh this out, make the simplifying assumption that the mind performs only four operations when cognizing: classification or CL (of the Similarities sort); liberated logic or LL (of the Raven's sort); practical intelligence or PI (needed to manipulate the concrete world); and vocabulary and information acquisition or VI. And posit that the brain is neatly divided into four sectors active respectively when the mind performs the four mental operations; that is, it is divided into matching CL, LL, PI, and VI sectors.

Through magnetic resonance imaging scans (MRI) of the brain, we get "pictures" of these sectors. Somehow we have MRIs from 1900 that we can compare to MRIs of 2000. When we measure the neurons within the CL and LL sectors, we find that the later brains have "thickened" neurons. The extra thick­ness exactly predicts the century's enhanced performance on Similarities and Raven's.

As for individual differences, we have pictures of what is going on in the brains of two people in the VI sector as they enjoy the same exposure to new vocabulary. We note that the neurons (and connections between neurons) of one person are better nourished than those of the other because of optimal blood supply (we know just what the optimum is). We note that when the neurons are used to learn new vocabulary, the neurons of one person are sprayed with the optimum amount of dopamine and the neurons of the other are less adequately sprayed. And we can measure the exact amount of extra thickening of grey matter the first person enjoys compared to the second. This allows us to actually predict their different performances on the WISC Vocabulary subtest.

Given the above, brain physiology would have performed its reductionist task. Problem-solving differences between individuals and between generations will both have been reduced to brain functions. It will explain both the tendency of various cognitive skills to be correlated on the individual differences level, and their tendency to show functional autonomy on the societal level. That does not, of course, mean that explaining human cognition on the levels of individual differences or social demands have been abolished. Even if physiology can predict every right and wrong answer of people taking IQ tests, no one will understand why these tests exist without knowing that occupation is dependent on mastering certain cognitive skills (social level) and that parents want to know whether their children have those skills (individual differences).

Closing windows
IQ trends over time turn the pages of a great romance: the cogni­tive history of the twentieth century. I may have made mistakes in interpreting their significance, but I hope I have convinced you that they are significant. Those who differ about that must, in my opinion, assert one or both of two propositions. That since IQ tests measure g, they cannot possibly signal the ebb and flow of anything else. I doubt anyone will defend that proposition. That nothing save g, or the special factors that fit under the umbrella of g, interests them. I believe that some feel that way, which is sad. They will always view the history of cognition through one window.