Tom Stafford (2000) Stroop Interference: Methodological Problems and Contrary Data. Psycoloquy: 11(110) Stroop Differences (2)

Volume: 11 (next, prev) Issue: 110 (next, prev) Article: 2 (next prev first) Alternate versions: ASCII Summary
PSYCOLOQUY (ISSN 1055-0143) is sponsored by the American Psychological Association (APA).
Psycoloquy 11(110): Stroop Interference: Methodological Problems and Contrary Data

Commentary on Koch on Stroop-Differences

Tom Stafford
Adaptive Behaviour Research Group
Department of Psychology
University of Sheffield
S10 2TP


There are worries about the intra- and inter-reliability of the two studies presented by Koch et al. (1999), which use different basic methodologies. It is also argued that contrasting different forms of cluster analysis would make the functions of this technique more transparent. One conclusion of the target article is that higher short-term memory (STM) abilities lead to higher Stroop interference. This seems to contradict other recent results. However, if this conclusion holds then it is a new one, not previously investigated.


cluster analysis, individual differences, short-term memory, Stroop interference, visual reasoning


1. Methodological notes to Koch et al.'s (1999) two studies are discussed, and then the findings are put in the context of other results in the literature. It seems likely that the findings reported from the second study are due to some third factor perhaps related to the age difference between the clusters rather than revealing a genuine relationship between Stroop interference and individual differences in cognitive abilities.


2. Strangely, reaction time is defined as the time from stimulus offset until keypress. This seems counter-intuitive, since the response begins as soon as the stimulus onsets. An interaction between stimulus duration and reaction time is reported, namely that reaction time decreases as stimulus duration increases. However, the interaction of stimulus duration with reaction time disappears if reaction times are calculated as being from the on set of the stimulus. This can be seen from FIGURE 1.

    FIGURE 1. Data for Koch et al. (1999) expt. 1 as originally
    presented.  Colour congruent (CC), incongruent (CI) and
    neutral/control (N) data shown.

Reaction time remains constant across stimulus durations until durations of 500 ms or greater, at which reaction time (for unclear reasons) increases rapidly with stimulus duration.

3. Inspection of the all-subjects' means reveals that Stroop interference is not present for all durations of stimulus presentation (see FIGURE 2).

    FIGURE 2. Data for Koch et al. (1999) expt. 1. Stroop reaction
    times for colour naming in the congruent (CC), conflict (CI) and
    control (N) conditions.

Interference is the difference between the incongruent and the neutral/control condition reaction times. In the 100 ms duration condition, averaging over all subjects, there is - 200 ms of interference; a reverse Stroop effect. This is indicative of the amount of error in the procedure. Distinguishing a cluster that inconsistently responds may be unwarranted if the responses of the subjects are on average inconsistent.

4. Inspecting the mean reaction times for the two groups it can be seen that for neither group is Stroop interference present for all stimulus durations (see FIGURE 3).

    FIGURE 3. Mean interference (neutral RT - incongruent RT) for
    "consistent" and "inconsistent" subjects.

Both the 'consistent' group and the 'inconsistent' group showed a reverse of the normal Stroop effect in four of the ten stimulus duration conditions, having two of them common to both groups. Particularly notable is the mean reaction time of the inconsistent group in the 1000ms stimulus duration condition. This group scored a mean reaction time of 365 ms (-122 ms interference) to the incongruent stimuli, which is particularly fast for any Stroop stimuli, but especially for the incongruent stimuli. It is most likely that this is partly due to the methodological peculiarity of measuring reaction time from stimulus offset, but it still remains to be explained why this condition mean is lower than both the congruent and the neutral condition means. At the other end of the scale, the reaction time of the consistent subjects in the incongruent condition, 1210 ms, is notably slow. Why does it take these subjects over 1 second to respond, when in other conditions they are responding in half that time? The average amount of interference across all conditions and subjects is 25 ms. This is disappointing for the normally reliable Stroop effect (MacLeod , 1991), where we might expect at least over 100 ms interference. These things raise further worries about the basic reliability of the method employed and about the validity of distinguishing 'consistent' and 'inconsistent' groups when the reaction times across all subjects and conditions are inconsistent.

5. The presence of a significant between-subjects effect was used as the justification for attempting cluster analysis. However eliminating the inconsistent cluster from the data set bit not remove the significance between-subjects effect. It may be that individual differences exist on the Stroop effect and that the cluster analysis picked out those individuals with the most erratic responses, rather than the erratic responses being due to some inherent difference between the groups. In this case the use of Squared Euclidean distance causes outliers to be particularly influential on the division of cases into clusters. It should be noted that the second cluster (size 4 cases) has little intra-cluster similarity, defined more by dissimilarity to the first cluster. For example, although a nearest neighbour amalgamation rule was used, case 1 only clusters with the other cases at a distance nearly three times the distance required to cluster the remaining cases. The use of cluster analysis in this cases would be made more informative if different varieties of cluster analysis were compared. For example, k-means clustering which is used to divide the cases up into a pre-determined number of clusters, could be used to verify the clusters discovered using hierarchical/tree clustering.


6. Again, there are worries about reliability. The study contained 120 subjects, administered to by 60 school psychologists. The sample had an age range of 2 years to 22 years not an ideal protocol for finding cognitive correlates of Stroop interference. The page version of the Stroop task was employed (Stroop, 1935). Unconventionally, this study used an error measure as its index of Stroop interference. The number of correct items named for a word-colour page was subtracted from the number of correct items names on a baseline colour-only page. Subjects were restricted to a maximum of 45 seconds spent on each page. This difference was taken as a measure of Stroop interference.

7. Cluster analysis on interference scores using Ward's method produced two groups, of sizes 34 and 27. The majority, 59 subjects, clustered into neither group. Group one had low interference scores, and a median age of 9 years. Group two had higher interference scores and a median age of 16 years. Mean ages for the two groups are not provided.

8. It is possible that interference was greater for the older group (group two) simply because they read more words. If both groups were getting the same proportion of colour words correct, but the second group read more colour items correctly in the allotted time then that group would produce a greater interference score. This is feasible given that the mean number of correct items on the colour page was 54, and it would be expected that younger subjects (i.e. the members of group one) would score lower than this, and older subjects (i.e. group two) would score above this. An additional possibility is that the second group, because of their older age, imagined themselves patronised by the simple task and so rushed their performance, thus sacrificing accuracy for speed. This is impossible to verify however because no measure of speed was recorded. In short it is possible that group two read more colour-word items correctly, and/or got a higher proportion correct than group one, but still ,due to the way interference is defined, produced a higher mean interference score.

9. The two groups also differed on perceptual reasoning and short-term memory tasks. This is not surprising given the discrepancy in the mean age of the two groups. It is not reported whether the authors of the paper made a correction for the use of multiple statistic tests, which is a valid concern given the number of variables involved in the analysis.


10. Only a correlation between STM and Stroop interference measured by reaction time, demonstrated within a homogenous group, is really sufficient to establish the relationship between cognitive ability and Stroop performance. The current studies have too many methodological question marks hanging over them to warrant firm conclusions, especially in light of contradictory findings in other experiments.

11. Cluster analysis is certainly a powerful and informative tool, which Koch and co-authors used effectively in their target article. However there are additional points worth making about the use of cluster analysis. The full details of its use should be made explicit and discussed. The basic type of cluster analysis used (e.g. hierarchical verses k-means) should be made clear (which indeed the authors of the target article do) and why that version is being used should be discussed. Similarly the distance measure and the amalgamation rule used should be discussed and the effects of choosing alternatives made explicit. For example using the squared Euclidean distance exaggerates the effect of outliers (because their distance is squared). If there is a concern over the effect of outliers then a comparison between cluster analyses using squared and simple Euclidean distance measures could easily be presented.

12. The finding that higher STM is related to increased Stroop interference (if that is really what the dependent variable in Study Two measured) is interesting when compared to a recent function imaging paper (MacDonald et al. 2000; but see Rowe et al. 2000). This study found that increased activation of the dorsolateral prefrontal cortex (DLPFC), an area intimately implicated in STM, was associated with reduced interference, not increased. It was posited that DLPFC is involved in maintaining goals, such as how to respond to an ambiguous Stroop stimuli, and that therefore greater activation of DLPFC is indicative of greater control. Thus if we take DLPFC activation as a measure of STM, high DLPFC activation, according to MacDonald et al, correlates with low interference, not high interference. This is in contradiction of the conclusions of Koch et al. However, if we expect high DLPFC to be indicative of low efficiency in that region, we might expect high activations to be associated with low STM ability, not high. Under this efficiency hypothesis, it could be that high STM ability is correlated with low DLPFC activation and thus high Stroop interference in the MacDonald et al study. This illustrates an important caveat about imaging studies that higher activation may not always be indicative of higher functioning. This low efficiency hypothesis is not contradicted by the finding that PFC lesions impair Stroop performance (see Engle, Kane, & Tuholski, 1999) since, it could be considered, a lesion removes the function beyond the point where efficiency or inefficiency are of account.

13. It is unclear why higher STM ability might cause greater Stroop interference. STM has been equated with the ability to represent goals with the clear implication that a stronger representation (stronger STM) leads to less interference (Cohen & Servan-schreiber, 1992; MacDonald et al., 2000; O'Reilly, Braver, & Cohen, 1999). The result that Stroop interference correlates with STM ability also contrasts with other (admittedly sparse) findings in the literature (Barch & Carter, 1998; Chen, 2000). Barch and Carter (1998) found that error interference in schizophrenics was negatively correlated with Speaking Span performance, a measure of verbal working memory. Another recent publication reports a finding of increased Stroop interference under dual-task memory loading conditions (Zhe Chen, personal communication, 2000). Presumably memory loading decreases working memory capacity and thus should decrease Stroop interference if Koch et al's finding generalised. The work of Engle and colleagues on individual differences in short-term memory and inhibition (Conway et al. 1999; Rosen & Engle, 1998) supports the view that superior memorial capacity is generally related to greater ability to suppress distracting information. Given these findings it is unclear, from a theoretical perspective, why STM should lead to less Stroop interference in Koch et al's task.


Barch, D. M., & Carter, C. S. (1998). Selective attention in schizophrenia: relationship to verbal working memory. Schizophrenia Research, 33(1-2), 53-61.

Chen, Z. (2000). The effects of attention and memory load on Stroop interference effect. Investigative Ophthalmology & Visual Science, 41(4), 216.

Cohen, J. D., & Servan-schreiber, D. (1992). Context, cortex, and dopamine - a connectionist approach to behavior and biology in schizophrenia. Psychological Review, 99(1), 45-77.

Conway, A. R. A., Tuholski, S. W., Shisler, R. J., & Engle, R. W. (1999). The effect of memory load on negative priming: An individual differences investigation. Memory & Cognition, 27(6), 1042-1050.

Engle, R. W., Kane, M. J., & Tuholski, S. W. (1999). Individual Differences in Working Memory Capacity and What They Tell Us About Controlled Attention, General Fluid Intelligence, and Functions of the Prefrontal Cortex. In M. Miyake & P. Shah (Eds.), Models of working memory: mechanisms of active maintenance and control : CUP.

Koch, C., Gobell, J. & Roid, G.H. (1999) Exploring Individual differences in stroop processing with cluster analysis. PSYCOLOQUY 10(025) psyc.99.10.025.stroop-differences.1.koch

MacDonald, A. W., Cohen, J. D., Stenger, V. A., & Carter, C. S. (2000). Dissociating the role of the dorsolateral prefrontal and anterior cingulate cortex in cognitive control. Science, 288(5472), 1835-1838.

MacLeod, C. M. (1991). Half a century of research on the stroop effect - an integrative review. Psychological Bulletin, 109(2), 163-203.

O'Reilly, R., Braver, T. S., & Cohen, J. D. (1999). A Biologically Based Computational Model of Working Memory. In A. Miyake & P. Shah (Eds.), Models of Working Memory: Mechanisms of Active Maintenance and Executive Control : CUP.

Rosen, V. M., & Engle, R. W. (1998). Working memory capacity and suppression. Journal of Memory and Language, 39(3), 418-436.

Rowe, J. B., Toni, I., Josephs, O., Frackowiak, R. S. J., & Passingham, R. E. (2000). The prefrontal cortex: Response selection or maintenance within working memory? Science, 288(5471), 1656-1660.

Stroop, J. R. (1935). Studies of interference in serial verbal reactions. Journal of Experimental Psychology, 18, 643-662.

Volume: 11 (next, prev) Issue: 110 (next, prev) Article: 2 (next prev first) Alternate versions: ASCII Summary