Siu L. Chow (1999) In Defence of Significance Tests. Psycoloquy: 10(006) Social Bias (15)

Volume: 10 (next, prev) Issue: 006 (next, prev) Article: 15 (next prev first) Alternate versions: ASCII Summary
Topic:
Article:
PSYCOLOQUY (ISSN 1055-0143) is sponsored by the American Psychological Association (APA).
Psycoloquy 10(006): In Defence of Significance Tests

IN DEFENCE OF SIGNIFICANCE TESTS
Commentary on Krueger on Social-Bias

Siu L. Chow
Department of Psychology
University of Regina
Regina, Saskatchewan
Canada S4S 0A2

Siu.Chow@uregina.ca

Abstract

Krueger argues that hypotheses in favour of human irrationality benefit from the shortcomings of using significance tests. This underscores the need to distinguish between (a) theory-corroboration and utilitarian experiments, (b) the substantive and statistical hypotheses, (c) corroborating explanatory theory and testing statistical hypothesis, and (d) two levels of abstraction.

Keywords

Bayes' rule, bias, hypothesis testing, individual differences probability, rationality, significance testing, social cognition, statistical inference

I. A CASE AGAINST NULL-HYPOTHESIS TESTING

1. Krueger (1998) questions the validity of data which suggest false consensus, self-enhancement and over-attribution. He argues that social psychologists are misled by their practice of treating theories of rationality as the statistical null hypothesis (H0), for three main reasons. First, echoing the view that H0 is never true (Thompson, 1996), Krueger asserts that H0 cannot be proved (Paragraph 5). This amounts to denying rationality before data are collected if a theory of rationality is identified with H0. Second, there is an asymmetry in the null hypothesis significance test procedure (NHSTP) because it is more difficult to have a zero difference than a nonzero difference (Paragraph 4). Third, the asymmetry difficulty is amplified by improved methodology or statistical power (Paragraph 4; see also Meehl, 1967).

2. Krueger recommends, among other things, the "revise and resubmit" approach, in which the research hypothesis (H1) is located post hoc such that p(D|H1) is always .5, whereas p(D|H0) is always smaller than .5. The "revise and resubmit" defence of rationality is an application of the commonly held view about the shortcomings of NHSTP. It is unusual in that H0, as well as H1, may be identified with the substantive hypothesis. (Critics of NHSTP normally identify H1 with the substantive hypothesis.) It is argued in this commentary that the "revise and resubmit" argument is weakened to the extent that NHSTP can be defended.

II. THEORY CORROBORATION VERUS UTILITARIAN EXPERIMENTATION

3. A researcher may conduct experiments to test the interference theory of forgetting. The independent variable in this theory corroboration experiment is the number of intervening items between the initial presentation of the target item and its recall (e.g., 2 and 6). Four important points to note. (1) The experimental manipulation is neither the short-term store nor the forgetting processes. That is, it is not the "to be explained" phenomenon. (2) The substantive hypothesis is the view that subsequent learning interferes with earlier learning. (3) The statistical hypothesis is the expectation that the mean of the six item condition is smaller than that of the two item condition. (4) The substantive and statistical hypotheses are different.

4. Another researcher may conduct experiments to ascertain the efficacy of Fertilizer F in order to make the utilitarian or practical decision as to whether or not to replace Fertilizer C with F. The treatment variable in this utilitarian experiment is the type of fertilizer used, whose two levels are Fertilizers F and C. In contrast to the theory corroboration experiment, Fertilizer F is the "to be studied" phenomenon, as well as part of the research manipulation. Moreover, the experimental and statistical hypotheses are indistinguishable on the surface, in that the mean yield in the "Fertilizer F" condition is higher than that in the "Fertilizer C" condition. Nonetheless, the substantive and statistical hypotheses are conceptually different (see 2 above and 5 & 6 below).

III. SUBSTANTIVE VERSUS STATISTICAL HYPOTHESIS

5. The interference theory is qualitative and substantive in the sense that it describes some hypothetical properties of the memory system. In a similar vein, the theory about Fertilizer F at the conceptual level is also qualitative and substantive because it stipulates what chemicals are in the soil, as well as how they behave. In both cases, the explanatory theories are about specific phenomena in the world and they are specific to their respective "to be explained" phenomena.

6. In the case of the t test, H0 is u1 = u2. It is a general statement applicable to all experiments involving two levels of an independent variable that is at the interval scale of measurement or higher. This state of affairs should have alerted its users to the fact that it is not a statement about specific phenomena in the world. Instead, it is a statement about what should happen logically if data are collected with reference to a well defined formal structure (viz., the inductive rule that underlies the experimental design). In the case of the utilitarian experiment, Fertilizer F is used as a level of the treatment variable, not as a substantive explanation. That is, H0 is not (and cannot be) a substantive hypothesis. Krueger's diagnosis of the difficulty with studies of irrationality could have been stronger had it been pointed out that it is simply inappropriate to treat H0 as the substantive hypothesis.

IV. CORROBORATING EXPLANATORY THEORY VERSUS TESTING STATISTICAL

HYPOTHESIS

7. Krueger's observation that the statistical meaning of H0 "hinges on the concept of chance" (Paragraph 8) may be used to highlight the difference between corroborating an explanatory theory and testing H0. To begin with, it helps to emphasize that H0 should not be used as a categorical proposition. It appears twice in statistical hypothesis testing:

    (a) If chance factors are responsible for the data, H0 is true
    (e.g., u1 = u2). 

    (b) If H0 is true, the sampling distribution of differences has a
    mean of zero and a standard error of difference of a well-defined
    value.

8. The critical t value used to decide whether or not the result is statistically significant is based on the sampling distribution mentioned in 7b. Note that the substantive hypothesis does not feature in the statistical decision.

9. As the substantive theory is neither H0 nor H1, its corroboration must be different from the statistical decision itself. It has been shown that corroborating an explanatory theory experimentally involves three embedding conditional syllogisms. At the same time, the binary decision about statistical significance involves a disjunctive syllogism (Chow, 1998).

V. TWO LEVELS OF ABSTRACTION

10. The post hoc H1 plays an important role in Krueger's "revise and resubmit" approach (Paragraphs 8, 12 & 13). The first difficulty is that H1 plays no role in NHSTP because only one sampling distribution is used in making the statistical decision (see 7b above). The second difficulty is that the sampling distribution in question is something more abstract than the distribution of scores of the substantive population. In short, a lone sampling distribution of differences is used in NHSTP even though there are two statistical populations (viz., for the experimental and control conditions). Moreover, the former and the latter belong to two levels of abstraction. In saying, "According to the null hypothesis of random responding, 50% of the choices will be rational," Krueger (Paragraph 8) seems to have ignored the difference between the two levels of abstraction. Random responding is not the chance influences envisaged in 7a and 7b above.

VI. SUMMARY AND CONCLUSIONS

11. Krueger's account makes explicit the importance of talking properly about NHSTP. The three points raised in 1 assume a different complexion once neither H0 nor H1 is treated as the substantive hypothesis. Specifically, H0 should be true if (a) the theory that implies the experimental hypothesis is false, and (b) data are collected properly in accordance with the inductive rule that underlies the experimental design. There is no asymmetry because H1 plays no role in the statistical decision. Improving the design or method of the experiment enhances the inductive validity of the experimental conclusion. It has nothing to do with statistics.

12. Much of the current misunderstanding about NHSTP in general (and H0 in particular) is due to the fact that NHSTP is discussed almost exclusively in the context of utilitarian experiments. As the important differences between theory corroboration and utilitarian experiments are not made explicit, the distinction between the substantive and statistical hypotheses is overlooked. Consequently, the statistical hypothesis testing is conflated with theory corroboration. Further confusion is introduced when NHSTP is discussed without taking into account the level of abstraction implicated by the sampling distribution of the test statistic. The state of affairs observed by Krueger confirms the need to observe some important conceptual distinctions.

VII. REFERENCES

Chow, S. L. (1998). Multiple Book Review of "Statistical Significance: Rationale, Validity and Utility." Behavioral and Brain Sciences 21: 169-240. ftp://ftp.princeton.edu/pub/harnad/BBS/WWW/bbs.chow.html

Krueger, J. (1998). The bet on bias: A foregone conclusion? PSYCOLOQUY 9(46) ftp://ftp.princeton.edu/pub/harnad/Psycoloquy/1998.volume.9/ psyc.98.9.46.social-bias.1.krueger http://www.cogsci.soton.ac.uk/cgi/psyc/newpsy?9.46

Meehl, P. E. (1967). Theory testing in psychology and physics: a methodological paradox. Philosophy of Science, 34, 103-15.

Thompson, B. (1996). AERA editorial policies regarding statistical significance testing: Three suggested reforms. Educational Researcher, 25, 26-30.


Volume: 10 (next, prev) Issue: 006 (next, prev) Article: 15 (next prev first) Alternate versions: ASCII Summary
Topic:
Article: