Robert M. Hamm (1994) Underweighting of Base-rate Information Reflects Important. Psycoloquy: 5(03) Base Rate (7)

DIFFICULTIES PEOPLE HAVE WITH PROBABILISTIC INFERENCE

Commentary on Koehler on Base-Rate

Clinical Decision Making Program

Department of Family Medicine

University of Oklahoma Health Sciences Center

Oklahoma City OK 73190 USA

rob-hamm@@uokhsc.edu

I argue that people do a poor job integrating informative base rates into their decision processes. This is shown by the results of two sorts of study. First, in probabilistic inference word problems, people's interpretations of conditional probabilities are confused. Second, in studies where subjects receive a series of pieces of information and update their probabilities after each, their probability updating is inaccurate, reflecting several error-producing processes, including overweighting of most recent information, which is usually not the base-rate information. We should not ask how much this matters, without considering that experts who make consequential decisions based on their hypotheses about the state of the world usually follow rule-like scripts, rather than explicitly revise probabilities.

1. Koehler (1993) correctly argues that people do not completely "neglect" base rate: when asked to judge the probability that an event occurred in a particular situation, given information about the base rate of the event along with fallible information pertaining to whether the event occurred, people make some use of the base rate. Other studies showing some use of base rate that were not mentioned by Koehler include Ofir (1988) and Hamm (1987).

2. Demonstrating that "base-rate neglect" has been oversold, however, does not prove that people accurately integrate informative base rates into their decision processes. In this commentary I argue that people are indeed inaccurate when asked to revise hypothesis probabilities on the basis of evidence (Sections II and III). This is true when experts make real decisions, as well as when novices make hypothetical decisions. How do people reason in realistic situations that demand probability revision? In Section IV I consider the implications for optimal decision making of the theory that people follow "mental scripts."

3. Koehler shows that "base-rate neglect" is not an empirically correct description of what people do when given probabilistic inference word problems because their responses are affected by the base rate. I would add that "neglect" is not the actual psychological process. The term implies a flaw in an attentional process, so that insufficient weight is given to base-rate information. However, the largest part of the error in these word problems is due to people's lack of understanding of the meaning of the conditional probabilities in the problem.

4. The problems offer a piece of evidence (e.g., in the Blue/Green Cab problem, a witness reported that the cab involved in the night accident was blue: evidence e = "blue"), a base rate (only 15% of the cabs in the city are blue: prior p(blue) = .15), and a measure of the fallibility or reliability of the evidence (the witness, in similar conditions, was right 80% of the time: p(e/h) = p("blue"/blue) = p("green"/green) = .80). Then the question is asked ("What is the probability that the cab involved in the accident was blue, p(h/e) or p(blue/'blue')?"). A subject who does not distinguish the conditional probabilities p(e/h) [the reliability or fallibility of the evidence] and p(h/e) [the desired answer] may offer p(e/h) as the response. Indeed, in Bar-Hillel's (1980) histogram of responses, the value for p(e/h), .80, was the most frequent answer. This was observed again with several word problems by Hamm (1987, 1989).

5. People's difficulty interpreting these conditional probabilities has been offered as an explanation for their errors on probabilistic inference word problems by Dawes (1986) and Dawes, Mirels, Gold, and Donahue (1993). Hamm and Miller (1988) showed, with several word problems, including the Blue/Green Cab problem, that the pattern of responses differed little whether the text of the problem offered p(e/h) or p(h/e) as the information regarding the fallibility of the evidence. Analysis of subjects' verbal protocols showed little association between the specific conditional probability concept written in the word problem and the concept used in their thinking (Hamm and Miller, 1988).

6. Eddy (1982) noted that this same confusion occurs in the professional writings of medical doctors. A more recent example in the medical research literature is an error by Bernstein, Rudolph, Pinto, Viner, and Zuckerman (1990), who reversed the sensitivity p(Test/Disease) and the positive predictive value p(Disease/Test) in interpreting their own data table, putting misleading values into the literature for subsequent meta-analyses. Penney (1992) too has demonstrated the inability of students in statistics classes to interpret the conditional probabilities in the 2 by 2 table relating evidence and hypothesis.

7. A demonstration that people confuse these conditional probabilities does not fully explain how people think about probabilistic inference. It is not simply that they apply Bayes' Theorem correctly with the sole exception that they mistake the one conditional probability for the other (Pollatsek, Well, Konold, Hardiman, and Cobb, 1987; Hamm, 1987; Hamm, 1993).

8. Further evidence on what people do when given fallible evidence pertinent to a hypothesis comes from those studies on probability updating in which a sequence of information is given and the subject revises p(h) after each piece. Early work in this paradigm, reviewed by Edwards (1968), most often showed conservatism (overweighting of base rate) as Ayton (1993) noted.

9. For the base-rate neglect question, the important finding from these studies (see also Hogarth and Einhorn, 1992, and Robinson and Hastie, 1985) is that the order in which people get the information makes a difference. Although it shouldn't make any difference what order they get information in, subjects usually put greater weight on the most recently received information (Adelman, Tolcott, and Bresnick, 1993, with military intelligence experts dealing with realistic military intelligence problems; Tubbs, Gaeth, Levin, and Van Osdol, 1993, with college students on everyday problems such as troubleshooting a stereo; Chapman, Bergus, Gjerde, and Elstein, 1993, with medical doctors on a realistic diagnosis problem). In more ambiguous situations the first impression had a lasting effect (Tolcott, Marvin, and Lehner, 1989).

10. Because every probability adjustment involves balancing prior probability or base rate with the implications of the new evidence, any inappropriate use of the most recent information implies, indirectly, that the base rate has been inappropriately used too. Hamm (1987) included the base-rate information in the sequence and found direct evidence that it had more influence upon subjects' final probability estimates when it was presented last. In sum, the results from this second paradigm show that people have a more fundamental problem with probabilistic inference than mere neglect of base rate or confusion of conditional probabilities.

11. Does it matter that people cannot accurately revise numerical probabilities (Christensen-Szalanski, 1986)? The deeper study of what people actually do, as called for by Koehler, can provide perspective. What do doctors do, for example, when ideally they should be forming hypotheses and revising hypothesis probabilities as they gather evidence?

12. It is not that they do a numerical integration more complex than Bayes' Theorem to revise probabilities (Gregson, 1993), as Hamm's (1987) explorations show. Doctors thinking aloud about cases don't even speak explicitly of probabilities (Kuipers, Moskowitz, and Kassirer, 1988), though when they are induced to do so it improves their decisions (Pozen, D'Agostino, Selker, Sytkowski, and Hood, 1984; Carter, Butler, Rogers, and Holloway, 1993).

13. Nor do doctors rely exclusively on learning probabilities from experience, like rats learning the contingencies on a lever (Spellman, 1993). While some of their knowledge is based on this kind of experience (Christensen-Szalanski and Beach, 1982; Christensen- Szalanski and Bushyhead, 1981), doctors have to know what to do with both the common diagnoses (8 out of 10) and the rare ones (1 in 10,000). Though in some situations, where people experience an event repeatedly, they can implicitly learn a base rate, in other situations, where people do not experience an event repeatedly but rather learn about it abstractly, they may also be able to take account of a base rate -- but if they cannot, the consequences may be important.

14. How, then, do doctors usually handle diagnostic problems? Experts generally organize their extensive knowledge into mental scripts (Schmidt, Norman, and Boshuizen, 1990), complex rules that function with the speed of recognition to provide responses for familiar and unfamiliar situations. Explicit calculation of Bayesian probabilities is not a strength of this type of rule (cf. Hamm, 1993). Instead, experts' accuracy may be a function of the recognition processes, which can bring ideas to mind optimally (Anderson and Milson, 1989). Or accuracy may be due to well-tuned judgment processes governing response choice (Chapter 8 of Abernathy and Hamm, 1994).

15. If doctors' scripts are used accurately, producing results similar to those that wise use of Bayes' theorem would produce, this is due not only to the feedback of experience but also to reflection and to others' criticism (Chapter 11 of Abernathy and Hamm, 1994). Any form of argument can be applied toward justifying a change in a script, including arguments based on probabilistic analysis.

16. For example, when the screening tests for HIV first came out, Meyer and Pauker (1987) warned against ignoring the base rate, i.e., against assuming that someone with no risk factors has AIDS if their screen is positive for AIDS. Guided by such explicit discussion of the probabilities, and by individual cases of people devastated by false positive HIV screens, doctors' shared scripts were adjusted until now they don't recommend that patients be screened unless there are risk factors. The "1993 script" produces behavior that is, for the most part, consistent with a Bayesian analysis. Individual doctors using the script need neither think about probabilities nor understand the Bayesian principles. They just think of the rules, or of cases in which the script is implicit (Riesbeck and Schank, 1989). Note, of course, that this scenario depends on there being someone who understands the probabilistic principles and can shape the script that everyone else will use.

Abernathy, C.M., and Hamm, R.M. (in press, 1994). Surgical Intuition. Philadelphia, PA: Hanley and Belfus.

Adelman, L., Tolcott, M.A., and Bresnick, T.A. (1993). Examining the Effect of Information Order on Expert Judgment. Organizational Behavior and Human Decision Processes, 56, 348-369.

Anderson, J.R., and Milson, R. (1989). Human Memory: an Adaptive Perspective. Psychological Review, 96, 783-719.

Ayton, P. (1993). Base Rate Neglect: an Inside View of Judgment? Commentary on Koehler on Base-rate. PSYCOLOQUY 4(63) base-rate.5.ayton.

Bar-Hillel, M. (1980). The Base-rate Fallacy in Probability Judgments. Acta Psychologica, 44, 211-233.

Bernstein, L.H., Rudolph, R.A., Pinto, M.M., Viner, N., and Zuckerman, H. (1990). Medically Significant Concentrations of Prostate-specific Antigen in Serum Assessed. Clinical Chemistry, 36, 515-518.

Carter, B.L., Butler, C.D., Rogers, J.C., and Holloway, R.L. (1993). Evaluation of Physician Decision Making With the Use of Prior Probabilities and a Decision-analysis Model. Archives of Family Medicine, 2, 529-534.

Chapman, G.B., Bergus, G.R., Gjerde, C., and Elstein, A.S. (1993). Sources of Error in Reasoning about a Clinical Case: Clinicians as Intuitive Statisticians (Meeting Abstract). Medical Decision Making, 13, 382.

Christensen-Szalanski, J.J.J. (1986). Improving the Practical Utility of Judgment Research. In B. Brehmer, H. Jungermann, P. Lourens, and G. Sevon (Eds.), New Directions in Research on Decision Making (pp. 383-410). North Holland: Elsevier Science Publishers B.V.

Christensen-Szalanski, J.J.J., and Beach, L.R. (1982). Experience and the Base-rate Fallacy. Organizational Behavior and Human Performance, 29, 270-278.

Christensen-Szalanski, J.J.J., and Bushyhead, J.B. (1981). Physicians' Use of Probabilistic Information in a Real Clinical Setting. Journal of Experimental Psychology: Human Perception and Performance, 7, 928-935.

Dawes, R.M. (1986). Representative Thinking in Clinical Judgment. Clinical Psychology Review, 6, 425-441.

Dawes, R.M., Mirels, H.L., Gold, E., and Donahue, E. (1993). Equating Inverse Probabilities in Implicit Personality Judgments. Psychological Science, 4, 396-400.

Eddy, D.M. (1982). Probabilistic Reasoning in Clinical Medicine: Problems and Opportunities. In D. Kahneman, P. Slovic, and A. Tversky (Eds.), Judgment Under Uncertainty: Heuristics and Biases (pp. 249-267). Cambridge: Cambridge University Press.

Edwards, W. (1968). Conservatism in Human Information Processing. In B. Kleinmuntz (Ed.), Formal Representation of Human Judgment (pp. 17-52). New York: Wiley.

Gregson, R.A.M. (1993). Which Bayesian Theorem Could Be Compared With Real Behavior? Commentary on Koehler on Base-rate. PSYCOLOQUY 4(50) base-rate.2.gregson.

Hamm, R.M. (1987). Diagnostic Inference: People's Use of Information in Incomplete Bayesian Word Problems. (Publication No. 87-11). Boulder, CO: Institute of Cognitive Science, University of Colorado.

Hamm, R.M. (1989). People Misinterpret Conditional Probabilities: Final Report of Project Using Protocol Analysis and Process Tracing Techniques to Investigate Probabilistic Inference (Publication No. 89-4). Boulder, CO: Institute of Cognitive Science, University of Colorado.

Hamm, R.M. (1993). Explanations for Common Responses to the Blue/Green Cab Probabilistic Inference Word Problem. Psychological Reports, 72, 219-242.

Hamm, R.M., and Miller, M.A. (1988). Interpretation of Conditional Probabilities in Probabilistic Inference Word Problems. (Publication No. 88-15). Boulder, CO: Institute of Cognitive Science, University of Colorado.

Hogarth, R.M., and Einhorn, H.J. (1992). Order Effects in Belief Updating: The Belief-adjustment Model. Cognitive Psychology, 24, 1-55.

Koehler, J.J. (1993). The Base Rate Fallacy Myth. PSYCOLOQUY 4(49) base-rate.1.koehler.

Kuipers, B., Moskowitz, A.J., and Kassirer, J.P. (1988). Critical Decisions Under Uncertainty: Representation and Structure. Cognitive Science, 12, 177-210.

Meyer, K.B., and Pauker, S.G. (1987). Screening for HIV: Can We Afford the False Positive Rate? New England Journal of Medicine, 317, 238-241.

Ofir, C. (1988). Pseudodiagnosticity in Judgment Under Uncertainty. Organizational Behavior and Human Decision Processes, 42, 343-363.

Penney, C.G. (1992). Why Can't My Students Understand Conditional Probability? Paper presented at annual meetings of Psychonomics Society, St. Louis.

Pollatsek, A., Well, A.D., Konold, C., Hardiman, P., and Cobb, G. (1987). Understanding Conditional Probabilities. Organizational Behavior and Human Decision Processes, 40, 255-269.

Pozen, M.W., D'Agostino, R.B., Selker, H.P., Sytkowski, P.A., and Hood, W.B., Jr. (1984). A Predictive Instrument to Improve Coronary-care-unit Admission Practices in Acute Ischemic Heart Disease: A Prospective Multicenter Clinical Trial. New England Journal of Medicine, 310, 1273-1278.

Riesbeck, C.K., and Schank, R.C. (1989). Inside Case-based Reasoning. Hillsdale, NJ: Lawrence Erlbaum Associates, Publishers.

Robinson, L.B., and Hastie, R. (1985). Revision of Beliefs When a Hypothesis Is Eliminated From Consideration. Journal of Experimental Psychology: Human Perception and Performance, 11, 443-456.

Schmidt, H.G., Norman, G.R., and Boshuizen, H.P.A. (1990). A Cognitive Perspective on Medical Expertise: Theory and Implications. Academic Medicine, 65, 611-621.

Spellman, B.A. (1993). Implicit Learning of Base Rates: Commentary on Koehler on Base-rate. PSYCOLOQUY 4(61) base-rate.4.spellman.

Tolcott, M.A., Marvin, F.F., and Lehner, P.E. (1989). Expert Decision Making in Evolving Situations. IEEE Transactions on Systems, Man, and Cybernetics, 19, 606-615.

Tubbs, R.M., Gaeth, G.J., Levin, I.P., and Van Osdol, L.A. (1993). Order Effects in Belief Updating with Consistent and Inconsistent Evidence. Journal of Behavioral Decision Making, 6, 257-269.