Saturday, October 20, 2007

Cosma Shalizi on the statistical myth called 'g', the general factor of intelligence

We can use the Jim Watson's us-and-them comments to inform ourselves about intelligence and its heritability and malleability. We cannot do better than start with this set of four posts by Cosma Shalizi.

In particular, the last post in the series demolishes g, the general factor of intelligence, whose very definition is based on statistical arguments, and Cosma uses convincing counter-examples that undermine those arguments. (This also means you will need some -- only some! -- knowledge of statistics to follow the arguments).

It's worth reproducing here the concluding paragraphs:

In primitive societies, or so Malinowski taught, myths serve as the legitimating charters of practices and institutions. Just so here: the myth of g legitimates a vast enterprise of intelligence testing and theorizing. There should be no dispute that, when we lack specialized and valid instruments, general IQ tests can be better than nothing. Claims that they are anything more than such stop-gaps — that they are triumphs of psychological science, illuminating the workings of the mind; keys to the fates of individuals and peoples; sources of harsh truths which only a courageous few have the strength to bear; etc., etc., — such claims are at present entirely unjustified, though not, perhaps, unmotivated. They are supported only by the myth, and acceptance of the myth itself rests on what I can only call an astonishing methodological backwardness.

The bottom line is: The sooner we stop paying attention to g, the sooner we can devote our energies to understanding the mind. [With bold emphasis added by me]

In the third post in that series, Cosma also tackles the issue of heritability (which is at the center of claims about race differences), and malleability. One of the examples he uses is interesting: height, which is known to be highly heritable. He then proceeds to show how heritability has very little -- and how environment has so much -- to do with the dramatic growth in height over the last century:

... height is heritable, and estimates for the population of developed countries put the heritability around 0.8. Moreover, tall people tend to be at something of a reproductive advantage. Applying the standard formulas for response to selection, we straightforwardly predict that average height should increase. If we select a population without a lot of immigration or emigration to mess this up, say 20th century Norway, we find that that's true: the average height of Norwegian men increased by about 10 centimeters over the century. But that's much more than selection can account for. Doing things by discrete generations, rather than in continuous time, height grew by 2.5 centimeters per generation. (The conclusion is not substantially altered by going to continuous time.) If the heritability of height is 0.8, for this change to be due entirely to selection, the average Norwegian parent must have been 3 centimeters taller than the average Norwegian. This, needless to say, was not how it happened; the change was almost entirely environmental. The moral is that highly heritable traits with an indubitable genetic basis can be highly responsive to changes in environment (such as nutrition, disease, environmental influences on hormone levels, etc.).

In contrast, the best estimate for heritability of IQ is far lower (at about 0.34) than that of height (about 0.8). And we know about the Flynn effect: a steady increase in average IQ with time -- about 2 to 3 points per decade. Here's Cosma:

The population average IQ rose monotonically, and pretty steadily, over the 20th century in every country for which we can find suitable records, including ones where we can definitely rule out immigration or emigration as significant contributory causes. (If it really is global, and I think we don't know enough yet to say either way, then the idea that it could be due to migration is — peculiar.) The magnitude of the gains are, as these things go, huge: two to three IQ points per decade. As I said in the earlier post, this puts the average 1900 IQ at 70 to 80 in 2000 terms. Let's check how intense natural selection would have to be to explain this. Over a twenty-five-year generation, we're looking at an IQ change of 5 to 7.5 IQ points. Sticking with the usual biometric model, and taking the best estimate of heritability within that model, namely 0.34, we'd have to see a reproductive differential of between 14 and 22 points, i.e., the average parent would have to have an IQ that much higher than the average person. (I am neglecting correcting for assortative mating and for continuous time, which don't change things much.) Since 15 IQ points is one standard deviation, this would imply a huge bias in reproductive rates towards those with higher IQs. Needless to say, nothing of the kind is observed in any of the countries where the Flynn Effect has been documented.

[For more on the Flynn effect, see Tyler Cowen's posts; also see Andrew Gelman's posts.]

* * *

There is a lot more in those posts by Cosma, so I will have to ask you to spend some time on them. In the following posts, I'll link to some studies whose results are bad news for people who believe that racial and gender differences in IQ have a strong genetic basis.


  1. Ashutosh said...

    Jews are 0.0025% of earth's population yet 20% of all Nobel Prize winners are Jews.

  2. Anonymous said...

    Here is a critique which also mentions Shalizi's comment, and why he is missing the point:

    "Jake, this is a good review and I agree with many of your major conclusions. However, your summary of the literature on g has several problems.

    [g-factor] s predicated on the notion that performance across different cognitive batteries tends to be positively correlated

    A quibble -- the positive correlation between performance on different test items is not just a notion but an empirical observation that has been supported by millions of data points over the last century. More on this below.

    Psychological tests for g-factor use principal component analysis -- a way of identifying different factors in data sets that involve mixtures of effects.

    Factor analysis, not PCA, is the method used by psychometricians. They are similar in principle but not in application.

    g-factor is very controversial.

    Not among intelligence researchers.

    In this review, we emphasize intelligence in the sense of reasoning and novel problem-solving ability (BOX 1). Also called FLUID INTELLIGENCE(Gf), it is related to analytical intelligence1. Intelligence in this sense is not at all controversial...

    [These authors go on to explain that in their view Gf and g are one and the same.]

    From another review:

    Here (as in later sections) much of our discussion is devoted to the dominant psychometric approach, which has not only inspired the most research and attracted the most attention (up to this time) but is by far the most widely used in practical settings.

    This was published over a decade ago. The psychometric approach has continued to attract the most research and attention and is still by far the most widely used.

    The second and broader critique of this work is whether the tests that we have for "intelligence" measures something useful in the brain.

    There's wide agreement that the tests measure something useful about human behavior:

    In summary, intelligence test scores predict a wide range of social outcomes with varying degrees of success. Correlations are highest for school achievement, where they account for about a quarter of the variance. They are somewhat lower for job performance, and very low for negatively valued outcomes such as criminality. In general, intelligence tests measure only some of the many personal characteristics that are relevant to life in contemporary America. Those characteristics are never the only influence on outcomes, though in the case of school performance they may well be the strongest.

    A more standard criticism of g:

    while the g-based factor hierarchy is the most widely accepted current view of the structure of abilities, some theorists regard it as misleading (Ceci, 1990).
    that is:

    One view is that the general factor (g) is largely responsible for better performance on various measures40,85.A contrary view accepts the empirical,factor-analytic result, but interprets it as reflecting multiple abilities each with corresponding mechanisms141. In principle, factor analysis cannot distinguish between these two theories, whereas biological methods potentially could10,22,36. Other perspectives recognize the voluminous evidence for positive correlations between tasks and subfactors, but hold that practical, creative142 and social or emotion-related73 abilities are also essential ingredients in successful adaptation that are not assessed in typical intelligence tests. Further, estimates of individual competence, as inferred from test performance, can be influenced by remarkably subtle situational factors, the power and pervasiveness of which are typically underestimated2,136,137,143.

    The concepts of IQ and g-factor have been questioned by several authors. Stephen Jay Gould actually wrote a whole book -- The Mismeasure of Man -- trying to debunk the assumption that intelligence can be measured in a single number. (For a more recent and excellent critique, I recommend this article by Cosma Shalizi.) The common theme among many of these critiques is that the tests for intelligence conflate numerous separable brain processes into a single number. As a consequence, 1) you aren't sure what you are measuring, 2) you can't associate what you are measuring with a particular region (the output may be the result of an emergent process of several regions), and 3) you may be eliding significant differences in performance across individuals that you would recognize with a better test.

    You give too much credit to Gould and Shalizi. Their primary criticisms are entirely less reasonable than the points you make.

    The main thrusts of their arguments are that test data do not statistically support a g-factor. Gould's argument is statistically incompetent (for a statistican's critique see Measuring intelligence: facts and fallacies by David J. Bartholomew, 2004). Shalizi's criticism is incredibly sophisticated, but likewise incorrect. In a nutshell, Shalizi is trying to argue around the positive correlations between test batteries. If those correlations didn't exist, his argument would be meaningful. However, as I noted above, these intercorrelations are one of the best documented patterns in the social sciences.

    significant differences in performance across individuals that you would recognize with a better test.

    It's possibly not well known that enormous efforts have gone into trying to make tests that have practical validity for life outcomes yet do not mostly measure g. See for example the works of Gardner and Sternberg. The current consensus is that their efforts have failed. A notable exception might be measures of personality.


    Ultimately, we need to use biological measures such as cortical volume to determine what g really is. One possible approach is to combine chronometric measurements (e.g. reaction time) with brain imaging studies. Genetically informed study designs have a role to play here too.


  3. jay parisi said...

    For the critic whom said "You give too much credit to Gould and Shalizi. Their primary criticisms are entirely less reasonable than the points you make."

    You're entirely wrong. The philosophical relevance of Gould's argument, (the clear trouble with arbitrarily 'defining what intelligence is') should not undermined. It is impossible to craft a theory of intelligence, that is unbiased, as every test is only a protocol. For example, there is no neutral justification for preferring the idea that intelligence is an 'average ability score', over the idea that our intelligence is reflected by things we are 'best at' (even if this be one task). Or, for example, the idea that gaining deep and specific knowledge, in a field, doesn't make us more intelligent - as specific knowledge clearly allows us to solve (with novelty) real world problems. To Gould, the problem with IQ begins with how we define 'intelligence'. Therefore, your criticism regarding Gould's statistical technique, is redundant, even if we doubtfully assume that you are fair in this criticism.

    You Say: "Shalizi's criticism is incredibly sophisticated, but likewise incorrect. In a nutshell, Shalizi is trying to argue around the positive correlations between test batteries. If those correlations didn't exist, his argument would be meaningful. However, as I noted above, these inter-correlations are one of the best documented patterns in the social sciences."

    Actually, what Shalizi is trying to say, is that there simply is no reason to prefer the unfounded assertion, that a single biological factor (g), is responsible for sub-test variation, over the idea that their are multiple factors that are involved (much in the same way, there is no reason why one should prefer to uphold that a single factor (let's say BMI) is responsible for superior 'athletic ability', when we know that several qualities are involved (lung capacity, bone structure, height, testosterone levels, hand eye-coordination, and even things like practice, confidence, and motivation).

    The obvious fact that tests are inter-correlated, provides absolutely NOTHING towards a justification that g is the sole factor responsible for the variation we observe. If anything there is an abundance of evidence, based on the phenomena of sub-test scatter, scores of Autistic subjects, and those with traumatic brain injury, which suggests that there are multiple factors involved, which account for variance.