Joachim Krueger (1998) Getting to the Core of the Data by Testing Against Alternative Hypotheses. Psycoloquy: 9(70) Social Bias (8)

Volume: 9 (next, prev) Issue: 70 (next, prev) Article: 8 (next prev first) Alternate versions: ASCII Summary
PSYCOLOQUY (ISSN 1055-0143) is sponsored by the American Psychological Association (APA).
Psycoloquy 9(70): Getting to the Core of the Data by Testing Against Alternative Hypotheses

Reply to Hamm on Krueger on Social-Bias

Joachim Krueger
Department of Psychology
Brown University, Box 1853
Providence, RI 02912


Hamm (1998) endorses and elaborates two of the suggested remedies for ritualistic null hypothesis testing, namely, (1) testing alternative substantive hypotheses and (2) attempting to account for individual differences in judgment. However, the empirical example offered as an illustration of these methods appears to confuse rather than clarify them. I offer a counterexample from research on social projection to illustrate both remedies and to underscore the point that they offer distinct benefits which can be observed within the same data set.


hypothesis testing, Bayes' Rule, rationality, projection.
1. Rationality typically fails the standard null hypothesis test of significance, and irrationality (e.g., in the form of significant bias) is inferred. While not denying the reality of many important biases and shortcomings in social judgment, I argued earlier that null hypothesis significance testing (NHST) is a barren strategy because ANY specific point hypothesis can be rejected with sample measures that are sufficiently numerous and precise (Krueger 1998a).

2. Hamm (1998) directs attention to two problematic features of NHST. First, NHST in its pure form involves only one a priori hypothesis. Rejection of that hypothesis says nothing about what can be accepted instead. Pitting two hypotheses against each other alleviates this problem. The trick of course is to find an alternative a priori hypothesis. Second, NHST lumps error variance together with systematic individual differences. Rather than examining whether degrees of rationality may predict other variables of interest, NHST settles for conclusions regarding the rationality of groups of subjects. Computing scores within subjects to reflect degrees of bias alleviates this problem.

3. Hamm offers a Bayesian decision problem to illustrate that judgments can be classified according to the decision strategy they imply. If, for example, a subject expects the probability of a hypothesis given the data, p(H|D), to be the same as the probability of the data given the hypothesis, p(D|H), it seems that some version of the representativeness heuristic guided judgment. Other patterns of judgment are possible (Hamm counts 9). For example, subjects can rely exclusively on base rates, p(H), or they can combine the available probabilities in various normative or non-normative ways.

4. Hamm reports the percentages of subjects whose judgments revealed the various inference strategies. For example, the confusion of the inverse (equating p(H|D) with p(D|H)) appeared to be particularly common, but other strategies emerged as well. In short, the findings pointed to individual differences in judgmental output. What these analyses did NOT achieve, contrary to claim, were comparisons among "a large number of hypotheses." As described, the study seemed to have no hypotheses. There were no predictions concerning the likely prevalence of the possible inference strategies. The presence of nine different strategies is not the same as having nine different hypotheses.

5. Nevertheless, the two matters of concern, (1) use of alternative hypotheses, and (2) examination of individual differences, deserve illustration by example. I offer some recently collected data on one of the social-perceptual phenomena with which I was concerned in the target article. The phenomenon is social projection, which is expressed by the correlation between respondents' self-ratings for a series of items (e.g., personality traits) and their estimates (in percent) about the prevalence of these items in a group, r(self-ratings,estimated prevalence), (Krueger 1998b).

6. By traditional NHST, the mean within-rater correlation is compared against zero. If M(r) > 0 at p < .05, projection is said to be present, and an egocentric bias is attributed to the population. If alternative hypotheses are pitted against each other, however, more can be learned. But what alternative hypothesis? According to the view that projection reflects rational inductive reasoning (Hoch 1987), the correlation representing projection is expected to be the same as the correlation between a respondent's self-ratings and the actual prevalence of the traits in the population, r(self-ratings,actual prevalence). The latter correlation reflects the degree to which a person is actually typical of the group, and thus indicates how much projection is warranted.

7. Thus, H0 is that projection r = 0, and H1 is that r(projection) = r(typicality). By separate one-sample t-tests, the obtained r(projection) can be tested against each of the two hypotheses. The exact p values represent p(D|H0) and p(D|H1). Using Bayes' Rule, it is now possible to ask which of the two hypotheses is more compatible with the data. It is assumed that the two hypotheses are the only ones under consideration, and that they are mutually exclusive and exhaustive. If there are no reasons to think otherwise, one can also assume that they are equally likely a priori, i.e., p(H0) = p(H1) = .5. From this, the posterior probability of H0 follows as p(H0|D) = p(H0)p(D|H0)/p(D), where p(D) = .5p(D|H0) + .5p(D|H1). Finally, p(H1|D) = 1- p(H0|D).

8. When respondents made ratings for the population of all adults in the nation, the mean r(projection) was .25, and the mean r(typicality) was .38. Using the significance levels and the assumptions described above, the Bayesian analysis yielded posterior probabilities favoring the induction hypothesis (p(H1|D) = .9999 vs. p(H0|D) = .0001). When respondents made ratings for students at their university, the r(projection) was .41, and the r(typicality) was .50. In this condition, the posterior probability of the induction hypothesis (relative to that of the no projection hypothesis) was virtually identical to 1.

9. Because all respondents had their own projection and typicality scores, it was possible to examine individual differences. It so happened that the two scores were not correlated, which means that respondents projected at roughly appropriate levels 'as a group,' but it also suggested that they were individually unaware of how typical they were of the group. Recently, Stanovich and West (1998) reported an extensive study intercorrelating multiple measures of bias (e.g., fallacies in deductive reasoning, overconfidence, hindsight bias, base-rate neglect, projection, etc.) with measures of intelligence and epistemic attitudes. In brief, the data showed that some biases are correlated with these predictors, whereas others are not. This correlational approach holds promise for breaking the lock of group-based analyses (NHST), and the wholesale condemnation of groups that tends to follow from such analyses.


Hamm, R. M. (1998) Characterizing individual strategies illuminates nonoptimal behavior. PSYCOLOQUY 9(49)

Hoch, S. J. (1987) Perceived consensus and predictive accuracy. Journal of Personality and Social Psychology, 53, 221-234.

Krueger, J. (1998a). The bet on bias: A forgone conclusion? PSYCOLOQUY 9(46)

Krueger, J. (1998b). On the perception of social consensus. Advances in Experimental Social Psychology, 30, 163-240.

Stanovich, K. E. & West, R. W. (1998) Individual differences in rational thought. Journal of Experimental Psychology: General, 127, 161-188.

Volume: 9 (next, prev) Issue: 70 (next, prev) Article: 8 (next prev first) Alternate versions: ASCII Summary