Arthur R. Jensen (2000) The Locus of the Modifiability of g is Mostly Biological. Psycoloquy: 11(012) Intelligence g Factor (27)

Volume: 11 (next, prev) Issue: 012 (next, prev) Article: 27 (next prev first) Alternate versions: ASCII Summary

Topic:

Article:

PSYCOLOQUY (ISSN 1055-0143) is sponsored by the American Psychological Association (APA).

Psycoloquy 11(012): The Locus of the Modifiability of g is Mostly Biological

THE LOCUS OF THE MODIFIABILITY OF g IS MOSTLY BIOLOGICAL
Reply to Hunt on Jensen on Intelligence-g-Factor

Arthur R. Jensen
Educational Psychology
School of Education
University of California
Berkeley, CA 94720-1670

nesnejanda@aol.com

Abstract

Hunt is hopeful that the level of g can be importantly raised by psychological-educational means for a large proportion of the population. It seems to me that our present, yet inadequate, evidence on the nature of g suggests that because of its strong biological causes and the high degree of randomness in its microenvironmental influences that account for most of the nongenetic variance, g is less apt to be modifiable by psychological and educational manipulations than by biological means. Much can be done, however, to improve the inculcation of specific useful knowledge and skills that could make schooling, employment, and quality of life more productive and rewarding for many people who are not presently well-served by our traditional system of public education because of failure to take proper account of the wide range of individual differences in the school population.

Keywords

behavior genetics, cognitive modelling, evoked potentials, evolutionary psychology, factor analysis, g factor, heritability, individual differences, intelligence, IQ, neurometrics, psychometrics, psychophyiology, skills, Spearman, statistics

1. Hunt (1999) brings up so many lively topics in his review that my replies will have to be quite telegraphic and selective if I am not going to write an overly long essay in response to them. And, as we would expect from Hunt, none of what he says is trivial. I find myself in agreement with most of the points he discusses, and since he didn't cite his own book (Hunt, 1995), I must advise readers that it more fully spells out his position expressed in his present review (Hunt, 1999) and is a book that I recommend for students to read in conjunction with my book on g (Jensen, 1998, 1999).

2. As congruent as are our views on most points in this field, there may be some points of difference in our philosophy of science. In mentioning my treatment of race differences in g, for example, Hunt (1999, Abstract) says, "speculation without evidence is unwise". First, any theory or hypothesis that has not yet been definitively proved could be called "speculation without evidence", and this applies to every nontrivial hypothesis in the history of science, many of which were formulated decades before they were fully proved and generally accepted. Second, I believe there is already enough evidence to warrant formulating the hypothesis I put forth. My "default hypothesis," which I'm sure will have to be revised in certain details in light of new evidence conforms to Karl Popper's notion of "conjectures and refutations" in how a science progresses. It is also not a lone, rootless hypothesis coming from out of the blue, but is notably integrated with the broad Galton-Spearman paradigm for research on intelligence and individual and group differences.

3. This whole line of theorising and hypothesis testing characterises what the philosopher of science Imre Lakatos termed a 'progressive' research program, in contrast to a "degenerating" program, which can predict nothing but can only explain (or explain away) empirical findings after the fact by proliferating often mutually inconsistent ad hoc hypotheses (Urbach, 1974). I notice there are few complaints about "speculation without evidence" when racial differences in g or any other behavioural traits are attributed to purely social-cultural factors. Even the author of a recent book that discusses how biological or genetic factors may explain the observed Black-White differences in certain types of athletic performance has been accused of racism, called a Nazi, and the like, despite the fact that his findings are not at all judgmental and his research was assisted and praised by several African-American coaches in various sports (Entine, 2000). A hypothesis, whether proposed by saint or sinner, is objectively either right or wrong. Social egalitarianism or other political philosophies and ideologies are not part of the science of mental ability, and my book intentionally does not touch on them in the least, although that is not to imply that these topics are unworthy of serious discussion in their own right.

4. I agree with Hunt (1999, para 4) that Mackintosh (1998) has produced an excellent college-level textbook on intelligence, and I would use it as the basic text for an introductory course on intelligence if I were still teaching. I recommend it as the best prerequisite reading for my own book (Jensen, 1998, 1999), which was never conceived as a textbook. Tnstructors typically evaluate a textbook by its introductory but broad and eclectic coverage of the field, balancing every viewpoint with ever other, every point with its counter-point, so the overall impression is neutral and uncontroversial. As Hunt remarks, my book is nothing like that, but was intended to make a case that I thought needed to be made. The origins of my aim are indicated in the Preface (Jensen, 1998).

5. Hunt touches on most of the salient points in my book, so I shall comment only on those isolated points that could give rise to some misunderstanding, with my comments referenced to his numbered paragraphs: There is not the least opposition (Hunt para 6) between g, or anything I say in "The g Factor," and the 'three-stratum model' of arroll (1993), whose work I have drawn on extensively. In Carroll's model, g dominates the 'three- stratum' factor hierarchy. My book is much more specifically focused on g than is Carroll's book, which is easily the most comprehensive treatment of the factor analysis of the entire human abilities domain that we now have. Its treatment of g does not differ essentially from mine. I acknowledge all of the lower-order ability factors in Carroll's analyses.

6. In reply to Hunt (para 7), I should note that, logically and factually, g is no more an 'artifact' than any of the lower-order factors, or, for that matter, the actual performance measures from which the factors are derived. The notion that scientific constructs, such as factors or latent traits, are 'artifacts' would just be scientific nihilism. By the same token, one can easily list a huge number of 'artifacts' in the physical sciences -- gravitation, magnetic lines of force, potential energy, photons, evolution, atoms, electrons, and other subatomic particles, the curvature of space, black holes, singularity, etc., etc.

7. What can Hunt (para 8) mean that variables are linked statistically but not causally? Aren't two correlated variables, say X and Y, that are correlated because they share one or more common causes although X is not a cause of Y (or vice versa) also causally correlated? The causal chain is not direct. And it may be trivial or important, depending on the investigator's hypothesis, which is a question of empirical analysis. In many places in my book I talk about the possible causes of g in the plural. I look to physiological variables and list a number of likely possibilities, because no molar behavioural variables have been demonstrated that are reliably uncorrelated with each other. Nor have inferred cognitive components fitted the bill. The g factor arises from correlations among lower-order factors, which arise from correlations among diverse tests, but going from the lower to the higher levels of the hierarchy is not an additive combining of tests or factors, but rather an abstracting operation -- a distillation, so to speak. At each higher level, the more specific sources of variance are screened out.

8. Hence if g at the apex of the hierarchy represents only the commonality, not the summation, of a number of lower-order factors whose uniquely defining characteristics do not enter into the higher-order g factor, and if g alone has as high or higher validity in predicting external 'real-life' criteria than the residualised lower-order factors, then g is not a 'hollow' factor or an 'artifact.' It is where the action is -- the chief active ingredient in the battery of tests when it comes to practical prediction and correlations with external or nonpsychometric variables. Of course, for predicting any specialised criterion, such as success in different educational programs or occupations, the inclusion of certain group factors in the prediction equation will naturally raise the predictive validity of the tests. But what is always so surprising is how relatively small their incremental validity usually is. The lower- order factors representing fluid (Gf) and crystallised (Gc) abilities in the Cattell-Horn system have different correlations across the age span because Gc reflects retention/retrieval of well-learned knowledge and cognitive skills, measures of which have a somewhat different trajectory across age than do the measures of Gf, which reflect on-the-spot learning of unfamiliar material, novel problem solving, inductive reasoning, and the like. But most lower-order factors have differential correlations, independent of g, with various nonpsychometric variables, such as sex and occupational skills.

9. So the fact that Gf and Gc differentially reflect mental decline associated with ageing (Gf suffering more than Gc) is not unlike the differential correlations of other factors with nonpsychometric variables. Also, the very high correlation between Gf and Gc (along with several other 2nd-order factors) gives rise to the 3rd-order factor, g, and the residualised Gf then is practically zero; that is Gf = g. In fact, it is clear that in Spearman himself attributed to g the properties commonly associated with Cattell's Gf. It is instructive, too, that even tests that are considered exemplary measures of Gc such as the General Information and Vocabulary subtests of the Wechsler Scales and the Armed Services Vocational Aptitude Battery (ASVAB) show substantial correlations with tests that are exemplars of Gf, such as Raven's matrices and even reaction time (RT) measures on elementary cognitive tasks (ECTs). The distinction between Gc and Gf is not at all clear-cut, and the reason is largely that Spearman's g resides in both of them: Gf = g and Gc reflects the influence of g in the individuals' past acquisition of knowledge and skills.

10. Hunt (para 8) notes that the secular increase in IQ over the past 70 years (known as the 'Flynn Effect') is greater on tests of GF than on tests of Gc. Although the causes of the Flynn Effect are not all understood, one can just as well argue that the increase in scores of Gf tests was caused by improvements in the class of nonpsychological environmental factors (nutrition, health care, and inoculations for childhood diseases) that have biological effects on aspects of mental growth that are reflected in g. The secular increase in IQ notably parallels increases in children's growth rates, adult height, brain size, and longevity over the same period of time and even to a comparable degree, as measured by standardised scales. Gc, which largely reflects scholastic achievement, has perhaps not kept pace with the biological enhancement of mental growth, and, residualised from g, it may even have declined at the upper end of the bell curve due to the 'dumbing down' of the curriculum in American public education in recent generations. The Flynn Effect is most noticeable at the lower end of the curve, possibly because of the increase in compensatory programs and special services for poorly achieving pupils, and because the lower socioeconomic groups have gained more from general improvements in nutrition and medical advances than the higher SES groups.

11. Hunt claims (para 11) that the correlations between ECTs are rather low compared to correlations between conventional tests, and that they should not be low if they actually reflect some more causal physiological variables that affect response time (RT). First, it should be noted that a given ECT is a single, simple task (usually taking less than one second to perform) and to gain acceptable psychometric reliability it has to be repeated over many trials. The ECT trials are like an extremely homogeneous test, or much as if we repeatedly presented a single IQ test item to a group of persons (assuming they had Korsakoff's syndrome and hence no memory of previous presentations). The scores on such a test could not be highly correlated with another such test composed of a single repeated item, because a single test item measures very little g (the 'signal') and lots of specificity (the 'noise').

12. But by aggregating many different items, each containing a small signal (g), we can completely reverse the sizes of the total variance due to signal and noise, because the item covariances, reflecting g, constitute an increasing proportion of the variance as the number of different items is increased. Very high reliability can be attained for the RT measures in an ECT by increasing the number of test trials. (Interestingly, we have found that the increases in reliability as a function of increasing the number of trials conforms to the Spearman-Brown prophecy formula even better than do most ordinary psychometric tests.) But even with high reliability, we find in factor analyses of a variety of ECTs that each one measures a large amount of specificity, i.e., variance peculiar to a particular ETC.

13. There is also a non-g and non-psychometric RT factor that most ECTs have in common. (Psychomotor ability?) The specificity and non-g RT factor usually account for more than about two-thirds of the variance in the ECT measures. But there is also a part of their variance that is g, which ECTs have in common with untimed psychometric tests that bear little or no resemblance to any of the ECTs. Any narrow common factors due to content similarity or particular skills in the ECTs and psychometric tests would be residualised in the first-order factors and thus would not contribute to a hierarchical g factor.

14. Individual differences in RT are remarkably stable across trials. Experiments that have used hundreds or thousands of trials still show significant correlations between RT and IQ, stabilising at an asymptotic level. Attempts to explain the RT-IQ correlations as the result of higher and lower IQ persons selecting different strategies do not lend any support to the strategy hypothesis (see Jensen, 1998, pp. 248-245). But one of the important points we have learned from the research with ECTs is that when the speed of processing measures derived from a number of different ECTs are aggregated, the aggregate score provides a measure of g (albeit alloyed with some non-g components) that is not dependent on prior learned knowledge and skills, which is all that some psychologists believe is measured by conventional IQ tests (Jensen, 1984; 1998, pp. 107-108).

15. Hunt's comment (para 13)on the heritability of IQ as a function of age does not quite fit with my own reading of the evidence, which, I believe, shows that heritability increases from early childhood (about .40) to later maturity (about .80). Some of the strongest evidence for heritability comes from studies of monozygotic (MZ) twins who were separated early in life and reared apart, and the great majority of subjects in these studies (e.g., Newman et al, 1937; Bouchard, et al., 1990) are young adults and middle-aged adults.

16. Hunt (para 18) agrees that we still probably don't know how to increase the level of g by any purely psychological or educational means. But I think there is some reason to believe that a moderate and significant increase may be achieved in some proportion of the population through environmental factors that have biological effects on brain development, such as nutrition (particularly certain vitamins) and the avoidance of the infectious diseases of childhood. Perhaps more attention will be given to these factors in view of the meagre effects of manipulating just the educational environment. In the far-off future, we will probably have a sufficient understanding of brain chemistry and physiology to design specific biological interventions that would directly affect those aspects of brain chemistry that are involved in g. Although our present knowledge of the heritability of g only allows statistical prediction of the probable effects of genetic selection, the identification of the specific genes that physically affect crucial aspects of mental development may one day allow direct interventions that would benefit individuals and the whole society. The efficacy of Galton's humane concept of eugenics, achieved through means he himself could not have envisaged, will thereby be recognised as just another advance in the application of science to human welfare. The true, and Galtonian, meaning of eugenics -- that ominous word for so many people today -- will probably be taken for granted long before the next millennium.

17. I think Hunt (para 22) loses sight of what I have termed 'Spearman's hypothesis,' or the fact that the average Black-White difference in cognitive achievements, whatever its cause, is essentially a difference in g. This was recognised by the panel of experts appointed by the American Psychological Association to review the present state of knowledge abut intelligence and make a pronouncement on the "knowns and unknowns" (Neisser, et al.,1996). G is not itself a cognitive skill, but a major source of individual differences in the speed and ease of acquisition of cognitive skills. Skills typically have a surprisingly narrow transfer gradient. Correlations between measures of skills that fall outside the 'transfer surface' are attributable to g rather than to common elements of information content or skill shared by the different tasks. If everyone were equally able to acquire the most important cognitive skills in our society and the only sources of variance were differences in opportunity to do so, the problem would be vastly different from what it actually is. Certain cognitive achievements, of course, are greatly influenced by schooling and other kinds of experience. And it is important to note that the racial "gap" differs greatly for different kinds of cognitive achievement. But it is larger for those achievement measures that are the most g loaded. For example, scholastic achievement tests of arithmetic calculation, arithmetic applications, and arithmetic concepts, range along both the dimensions of increasing g loadings and increasing mean White-Black differences. The same is true for spelling, reading comprehension, and written composition.

18. Hunt (paras 23 and 24) states that The Black-White 'gap' has been decreasing in recent decades. I assume he is referring to differences in scholastic achievement, rather than in g levels. Scholastic achievement has shown a desirable trend in the last decade or so, although its cause is uncertain, as there are also shifting trends in birth rates in different segments of the population that are correlated with IQ . I have found no bona fide evidence of any increase in g per se. This could be demonstrated, for example, by finding that the changes in mean Black-White differences on various kinds of test scores are related to differences in the tests' g loadings. But even in terms of IQ per se, or in standard deviation (SD) units, the best available evidence I know of shows the same average White-Black difference of about 1.2 SD in 1980 (the last truly representative national survey, based on the Armed Services Vocational Aptitude Battery), as is seen in the Armed Forces draft data obtained in the 1960s. A cogent case has not yet been made for a significant decrease in the White-Black IQ difference.

19. Probably the most interesting point in Hunt's commentary is his reference to Spearman's 'law of diminishing returns (para 27), which states that g is a larger part of the total variance in tests measuring various mental abilities in low IQ groups than in high IQ groups. Ipso facto, Blacks' g should have larger g loadings on various tests than Whites have. (Note that tests' g loadings are not intrinsically related to the means or difficulty level of the tests.) This is a good hypothesis; but there is a huge problem in attempting to test it. The predicted effect is small and has only been demonstrated when completely non-overlapping groups within the full IQ range have been compared -- in fact, there is usually a gap of at least several IQ points between the highest score in the lower IQ group and the lowest score in the higher IQ group. In the Black and White IQ distributions, however, the total overlap of IQs is about 60 percent, i.e., while the full range of IQs exists in each group, 60 percent of the IQ scores obtained by Blacks are matched by Whites, and vice versa (Jensen, 1980, Chapter 4).

20. Although the tests of this hypothesis have not compared the amounts of g variance in the high and low groups, but rather the average correlation among tests in each of the groups, it is tantamount to testing the g difference, because the overall average correlation in a matrix is best estimated from the eigenvalue of the matrix's first principal component (PC1), a reasonable measure of g, as follows: Average r = (PC1 - 1)/(n - 1), where n is the number of variables (e.g., tests) in the matrix (Kaiser, 1968). So, to test the interesting hypothesis suggested by Hunt, it would take very large samples -- perhaps all the standardisation samples of many of the commonly used intelligence and aptitude test batteries. Even the massive ASVAB data (N = 12,000) from the National Longitudinal study shows the mean g loadings of Whites and Blacks as .727 and .743, respectively, indeed a nonsignificant difference (F=1.04).

21. Let us assume that eventually we can specifically identify the dozen or so genes that contribute the most genetic variance to the g variance in the White population. Hunt (para 28) overlooks at least one possible technical problem in using such information to determine whether the mean White-Black g difference involves a genetic component. That is, populations that have been relatively isolated genetically for many millennia differ in the frequencies of many genes, but this does not rule out the possibility that different genes (alleles), originating through mutations and maintained by natural selection, could have equivalent effects on those features of mental development that are manifested as g variance. Therefore 'gene counting' would not be definitive, but at best could provide only one more item of evidence that is either consistent, or inconsistent, or indeterminate with respect to the hypothesis in question.

22. Hunt's suggestion (para 34) that we don't know how much effort it would take to give all people, everywhere, an equal opportunity to realise their potential is certainly true. But it seems unlikely that any society could provide more equal environments and opportunities than are enjoyed by full siblings reared together. Yet such siblings differ by an average of 12 IQ points, and there are usually even larger differences in special talents. Beethoven and his two brothers all took music lessons from the same teacher, and the latter two turned out to be musical duds. We can thank God (or genetics) for Ludwig.

REFERENCES

Bouchard, T.J., Jr., Lykken, D.T., Tellegen, A., & McGue, M. (1990). Sources of human psychological difference: The Minnesota study of twins reared apart. Science, 250,223-228.

Carroll, J.B. (1993). Human cognitive abilities: A survey of factor- analytic studies. Cambridge, UK: Cambridge University Press.

Entine, J. (2000). The taboo: Why black athletes dominate sports and why we are afraid to talk about it. New York: Public Affairs.

Hunt, E. (1995). Will we be smart enough? A cognitive analysis of the coming workforce. New York: Russell Sage Foundation.

Hunt, E. (1999). The modifiability of intelligence. PSYCOLOQUY 10(072) ftp://ftp.princeton.edu/pub/harnad/Psycoloquy/1999.volume.10/ psyc.99.10.072.intelligence-g-factor.14.hunt http://www.cogsci.soton.ac.uk/cgi/psyc/newspy?10.072

Jensen, A.R. (1980). Bias in mental testing. New York: Free Press.

Jensen, A.R. (1984). Test validity: g versus the specificity doctrine. Journal of Social and Biological Structures, 7, 93-118.

Jensen, A.R. (1998). The g factor: The science of mental ability. Westport, CT: Praeger.

Jensen, A.R. (1999). Precis of: "The g Factor: The Science of Mental Ability" PSYCOLOQUY 10(023). ftp://ftp.princeton.edu/pub/harnad/Psycoloquy/1999.volume.10/ psyc.99.10.023.intelligence-g-factor.1.jensen http://www.cogsci.soton.ac.uk/cgi/psyc/newpsy?10.023

Kaiser, H.F. (1968). A measure of the average intercorrelation. Educational and Psychological Measurement, 28, 245-247.

Mackintosh, N. (1998). IQ and human intelligence. Cambridge, UK: Cambridge University Press.

Neisser, U. Boodoo, G., Bouchard, T.J., Boykin, A.W., Brody, N., Ceci, S.J., Halpern, D.F., Loehlin, J.C., Perloff, R., Sternberg, R.J., & Urbina, S. (1996). Intelligence: Knowns and unknowns. American Psychologist, 51, 77-101.

Newman, H.H., Freeman, F.N., & Holzinger, K.J. (1937). Twins: A study of heredity and environment. Chicago: University of Chicago Press.

Urbach, P. (1974). Progress and degeneration in the IQ debate. British Journal of the Philosophy of Science, 25, 99-135 & 235-259.