Arthur R. Jensen (2000) Iq Tests, Psychometric and Chronometric g, and Achievement. Psycoloquy: 11(014) Intelligence g Factor (29)

Volume: 11 (next, prev) Issue: 014 (next, prev) Article: 29 (next prev first) Alternate versions: ASCII Summary
PSYCOLOQUY (ISSN 1055-0143) is sponsored by the American Psychological Association (APA).
Psycoloquy 11(014): Iq Tests, Psychometric and Chronometric g, and Achievement

Reply to Kush on Jensen on Intelligence-g-Factor

Arthur R. Jensen
Educational Psychology
School of Education
University of California
Berkeley, CA 94720-1670


Kush brings up a number of legitimate and useful issues of concern to school psychologists and other fields of applied psychometrics, about the relationship of the theory of intelligence to the practical uses of "intelligence" tests.


behavior genetics, cognitive modelling, evoked potentials, evolutionary psychology, factor analysis, g factor, heritability, individual differences, intelligence, IQ, neurometrics, psychometrics, psychophyiology, skills, Spearman, statistics
1. Kush's (1999) thoughts and the questions they raise about "intelligence" testing are what one would hope for from a practical perspective. It is good to hear from this side of the issues. I myself have not done clinical testing since my clinical internship at the University of Maryland's Psychiatric Institute 45 years ago, although I have conducted many research studies with school populations since then, and a good many school psychologists have taken my graduate courses. But I haven't been in that particular firing line that school psychologists must routinely face, and cannot consider myself an expert on the practical problems they have to deal with. Hence my book (Jensen, 1998; 1999) was not intended as a manual or guide for the practical uses of mental ability tests. But I am concerned that clinicians, especially those working in school psychology, should know more about the theory of just what it is that the tests they use are actually measuring. School psychologists should be completely disabused of what I have called the "specificity doctrine," which, as explained in my book (Jensen, 1998, pp.107-114), is a superficial and completely discredited view of mental tests. Yet, I've been told, it is still being taught on some psychology courses, if only implicitly by default because of a totally non-theoretical notion of psychological testing as mere technology. In this regard, I am grateful to Kush (1999) for asking some of the right questions. I can't answer them all here, but will indicate my own theoretical stance on some of the key issues he raises.

2. First, I should point out that I have long thought that IQ tests, by whatever alias, such as testing the IQ of every child in every classroom, have little justification for mass use in schools. It is important, of course, that all children's progress in school should be well monitored, but this can be done entirely through brief and frequent achievement tests after each unit of instruction, which should indicate whether a pupil is learning whatever is actually being taught in class. There will be individual differences, of course. Only iff severe and persistent problems are revealed by such tests, should IQ and other diagnostic tests be individually administered and interpreted by a qualified professional. Indeed, it should be absolutely mandatory for the school system. The various tests and observations, then, serve as an objective aid to clinical judgement about the probable cause(s) of the pupil's scholastic problem and to help decide how best the school and the child's parents can deal with the child's problem. Nothing but the child's individual performance and characteristics should enter into such assessment and recommendations. The idea that these should differ in any way according to the pupil's racial, ethnic, social-class, religious background, or sex seems to me both unnecessary and unjustifiable from any scientific standpoint. Statistical differences in the score distributions of various demographic groups do not and should not play any part in the individual assessment or recommendations based thereon. Such statistics are a different issue altogether, certainly in need of scientific understanding, but not relevant to individual assessment. In my view, a difference of X points in g level between any two individuals has the same meaning whether those individuals are of the same or of different racial backgrounds. If this constitutes a radical view, I would like to see sensible evidence that contradicts it.

3. With regard to individual ability testing, Kush asks how many types of ability should be tested. Certainly the level of g should be determined, and this is best done with the administration of one or two individual full-scale IQ tests that are appropriate to the person's language background. The Full Scale IQ, always based on a battery of tests, usually gives a fair assessment of g level, provided the testee makes normal effort, which can be observed by the examiner. But that is about all the conventional IQ tests are good for. Subtest scores of an IQ battery are virtually useless for individual diagnosis. The various subtests, residualised from g, have little independent interpretive value unless they show extreme deviations from the overall average.

4. What we actually need is a number of batteries, each composed of about a dozen tests, which measure the well-established group factors and special talents. These batteries, which focus on specific factors, could be "age-normed" in the relevant population, and a separate set of norms could be provided to show regressed factor scores that are independent of g. This is needed because the scores on all such specialised tests have considerable g loadings, and it is diagnostically useful to know an individual's standing on each factor test, independently of the individual's standing on g. I suspect that most of the variance in the "multiple intelligences" indicated by Gardner (1983) could be captured by such a set of factor-derived batteries. Scholastic performance, that is, the traditional academic curriculum including the 3 Rs, English composition, history, maths and science, is best predicted by g, with validity coefficients in the .70 to .80 range in the total school population. Scholastic achievement tests at one age generally predict later achievement even better than most IQ tests, because they reflect g as well as specific knowledge and skills that transfer to later scholastic learning, and the interest or motivation for such learning, which is only moderately correlated with g or IQ.

5. The SAT, for example, predicts college performance (both grades and persistence to graduation), because it measures both g and acquired academic knowledge and skills that are prerequisite for success in most college courses. Most students already know to some extent their particular scholastic strengths and interests by the time they are of college age. For those in doubt, a profile of special abilities and interests would be most useful. I believe it is better for an individual's proclivities and ambitions to be shaped by their perception of their performance in relation to that of others, rather than by test scores, since these can miss some of the important personal traits other than cognitive abilities that make for success in a particular niche. All the successful persons I know feel lucky to have found the niche in which they have succeeded, because they realise there are so many niches in which they would not have been as successful. The "crisis of youth" is largely this problem of finding the right niche for oneself. Psychological assessment techniques can offer guidance, but they should not dictate or override strong proclivities and ambitions, which in any case will be modified by the realities of competition.

6. The chronometric tests that have figured most prominently in research on speed of information processing as a component of g and other cognitive factors are not yet suitable for the clinical assessment of g. What will be needed is a specially designed battery of chronometric measures, the total score on which will "drown out", so to speak, the task specificity that constitutes such a large part of the variance on each of these tests. I predict that it will be possible to devise a practically useful battery that will correlate probably as much as .70 with IQ in a linguistically and culturally homogeneous population. Such an effort is underway, but a lot of refinement is still needed. An important value of such a test is that it could measure g without dependence on more than minimal acquired knowledge and skills.

7. We have found, for example, that children of ages 12 to 13 who are enrolled in selective universities and are succeeding remarkably well in their course work, perform on the reaction time (RT) tests (which have no academic content) on a par with their university classmates, who average seven years older. But these gifted students have an average RT that is markedly faster, on average, than that of their age-mates, who are in the 7th or 8th grade. These academically gifted children also perform on a par with their university classmates on the SAT, vocabulary, and general knowledge, indicating that, because of their speed in information processing, they have acquired in 13 years a level of knowledge and skills which is commensurate with that acquired in 20 years by ordinary university students (whose IQs average about 120). What the chronometric tests tell us that ordinary IQ tests do not is that these gifted children differ from the average not only in the kinds of knowledge and skills typically associated with an advantaged environment, but also in some basic cognitive processes that favorably affect their powers of learning, retention, and comprehension of complex academic subject matter. Such chronometric tests would be especially useful with individuals for whom conventional IQ tests may be of doubtful validity because of their atypical background.

8. Kush (para 5) is right in noting that tests of so-called crystallised ability (Gc) reflect achievement to a considerable extent. However, the more interesting aspect of Gc is that if Gc types of tests are sufficiently diverse, covering a great many areas of information (as do the so-called "trivial pursuit" games), the general factor extracted is largely the same as fluid intelligence (Gf), which is itself virtually the same as g. In fact, the reaction time measures derived from various achievement-free elementary cognitive tasks, and which necessarily reflect only Gf=g plus task specificity, correlate significantly with obvious achievement tests, such as vocabulary, reading comprehension, arithmetic, and other specific items of information. Such wide-ranging achievement tests reflect Gf=g even more than they reflect Gc, because it is Gf=g that mostly affects speed of knowledge and skill acquisition and consequently produces differences in the amount of acquisitions, given equal time. Hence "mental age" differences are reflected in content-free RT tests and in knowledge-based achievement tests. This applies to all kinds of knowledge, not just scholastic or academic knowledge. Of course, opportunity and interests govern the types of knowledge a person acquires, but g level is the stronger correlate of the amount of knowledge acquired.

9. The fact that an IQ test contains "culturally loaded" items does not necessarily mean that it is biased with respect to different groups in the population. Few if any culture loaded tests in common use today are biased for any native-born, English-speaking groups in the United States. There are many psychometric indicators of test bias, but group differences in test scores is not one of them (Jensen, 1980; summarised in Jensen, 1998, pp.360-369).

10. The broad-brush theories of Helms (i.e., on African-centred values) and Ogbu (voluntary versus involuntary immigrant minorities) as explanations for the average White-Black IQ difference are totally incapable of explaining the data reviewed in my book (1998;1999), particularly the data that demonstrate Spearman's hypothesis that the White-Black difference is directly related to various tests' g loadings, whether the tests are group administered or individually administered, verbal or nonverbal, psychometric or chronometric, culture-loaded or culture-reduced. Nor can they even begin to account for, and are vcontradicted by, the evidence showing that, in school-age children, the age-matched White-Black differences in the fine-grained aspects of different psychometric tests can be perfectly simulated by the differences between older and younger age groups within each racial group. These socio-cultural theories mentioned by Kush are not supported by any data independent of the circular terminology in which the supposed explanatory variables are merely different code names for Black and White (involuntary versus voluntary immigrants, and African-centred versus European-centred thinking). No means of empirically testing these notions is suggested and probably none are possible. I find nothing resembling scientific arguments or theory testing in these armchair approaches to the study of group differences.

11. I say "bravo!" to Kush's (1999, para 8 & 9) concluding statements.


Gardner, H. (1983). Frames of mind. New York: Basic Books.

Jensen, A.R. (1980). Bias in mental testing. New York: Free Press.

Jensen, A.R. (1998). The g factor: The science of mental ability. Westport, CT: Praeger.

Jensen, A.R. (1999). Precis of: The g Factor: The Science of Mental Ability PSYCOLOQUY 10(023). psyc.99.10.023.intelligence-g-factor.1.jensen

Kush, J.C. (1999). The g factor: Implications for school psychologists. PSYCOLOQUY 10(067) psyc.99.10.067.intelligence-g-factor.13.kush

Volume: 11 (next, prev) Issue: 014 (next, prev) Article: 29 (next prev first) Alternate versions: ASCII Summary