Hostname: page-component-7b9c58cd5d-7g5wt Total loading time: 0 Render date: 2025-03-15T11:17:57.013Z Has data issue: false hasContentIssue false

Responsible behavioral science generalizations and applications require much more than non-WEIRD samples

Published online by Cambridge University Press:  15 June 2010

Vladimir J. Konečni
Affiliation:
Department of Psychology, University of California–San Diego, La Jolla, CA 92093-0109. vkonecni@ucsd.eduhttp://psychology.ucsd.edu/people/faculty/vkonecni.php

Abstract

There are many methodological considerations – some intricately associated with the use of WEIRD samples – that adversely affect external validity as much as, or even more than, unrepresentative sampling does. Among suspect applications, especially worrisome is the incorporation of WEIRD-based findings regarding moral reasoning and retribution into normative expectations, such as might be held by international criminal tribunals in “cognitively distant” war-torn areas.

Type
Open Peer Commentary
Copyright
Copyright © Cambridge University Press 2010

The article by Henrich et al. is a valuable contribution that goes beyond prior critiques of the deplorable lack of representativeness of a large proportion of participant samples that have been used in the behavioral sciences. The cogency of argumentation, and both the breadth and the detail of the empirical documentation that is provided, are impressive. Therefore, my commentary will not challenge the main thesis proposed by Henrich et al. Instead, its purpose is to supplement and increase the scope of their article's argument.

An important, although perhaps self-evident, observation is that the authors' thesis concerning WEIRD samples would be even more useful (perhaps considerably more so) had they at least mentioned and briefly outlined some other factors – often closely, and sometimes unavoidably, associated with the research designs using WEIRD samples – which may even more detrimentally affect the generalizability (external validity) of the results than does the lack of WEIRD samples' representativeness.

An abbreviated list of such factors will have to suffice here: unrepresentative sets of independent variables; artificiality of research settings; a limited number of tasks (often a single task) through which the independent variables are presented; and relying on a single data-collection method (such as questionnaires, surveys, or rating scales) – and therefore obtaining a single dependent measure (or an uninformatively correlated set of measures) that is often qualitatively different from the one to which generalization is sought in the “real world.” The mentioned factors are highly relevant for a more complete understanding of the issues in some of the areas discussed in the target article, especially fairness and cooperation, punishment of “excessive” cooperators, personal choice, “fundamental attribution error,” and moral reasoning.

Moreover, one must worry about the (statistical) interaction of the effect of WEIRD samples' uniqueness (extremity, non-modal character) with the effects of these additional factors (e.g., the frequently highly artificial tasks), such that the overall result (especially when interactions are of a multiplicative form) would be even more misleading with regard to some real-world criterion and domain of desired application than is the case on the basis of WEIRD samples' “differentness” alone. On the other hand, if, for example, a greater variety of tasks were used, the presently observed differences between WEIRD and various non-WEIRD samples might in some cases disappear. One simply cannot predict what would happen without doing the research.

The above family of methodological observations has its root in the pioneering work of Campbell and colleagues (e.g., Campbell & Stanley Reference Campbell and Stanley1963; Webb et al. Reference Webb, Campbell, Schwartz and Sechrest1966). Among the subsequent empirical demonstrations of some of the underlying principles were the studies by Ebbesen and Konečni: for example, of decisions under risk (in automobile driving; e.g., Ebbesen et al. Reference Ebbesen, Parker and Konečni1977; Konečni et al. Reference Konečni, Ebbesen and Konečni1976) and of key decisions by judges, prosecutors, and other participants in the criminal justice system (Konečni & Ebbesen Reference Konečni and Ebbesen1982b). An important aspect of this work has been the mustering of the theoretical and empirical support for the idea of validated simulations in behavioral science (Konečni & Ebbesen Reference Konečni, Ebbesen, Lösel, Bender and Bliesener1992).

Among the judicial decisions studied in this research program were those of the setting of bail and, especially, the sentencing of felons (e.g., Ebbesen & Konečni Reference Ebbesen and Konečni1975; Konečni & Ebbesen Reference Konečni, Ebbesen, Konečni and Ebbesen1982a). This work utilized both WEIRD and non-WEIRD samples (as in the fourth “telescoping contrast” in Henrich et al.; see sect. 6) and supports the target article's skepticism. Moreover, a more general, but logical, extension is to question the applicability of WEIRD-based findings regarding aggressiveness, retribution, fairness and equity, and moral reasoning in general (cf. sect. 4.4.) to international law. Here the most troubling possibility is the deliberate or unconscious incorporation of WEIRD-based findings into the normative expectations held by international bodies in “cognitively distant” war-torn areas – such as in Rwanda by the United Nations Assistance Mission for Rwanda and the International Criminal Tribunal for Rwanda. What must be very carefully taken into account are not only the enormous complexities of ancient tribal relations, but also those stemming from massive religious conversions by some of the warring parties under an external oppressor (as in Bosnia and Herzegovina, another internationally adjudicated conflict).

In sum, there is far more to external validity than the unrepresentativeness of samples. The only truly solid reason to trust an experimental simulation (especially one that potentially involves enormous human costs) is to have had it validated by means of careful successive approximations to the real world, each step moving closer to the actual real-world phenomenon – not just with different participant samples, but also guided by a multi-method X multi-dependent-measure matrix (Konečni & Ebbesen Reference Konečni, Ebbesen, Lösel, Bender and Bliesener1992).

Some additional observations are in order. Just as Nature Genetics requires all empirical papers to include data from two independent samples (target article, sect. 6.2, para. 3), the Journal of Personality and Social Psychology, for example, might begin to require not just the use of at least two different methods in the laboratory, but also both laboratory and field research – before researchers move away from psychology freshmen. If this were required, it seems likely that some “cute,” supposedly counterintuitive, task-specific effects (including in the area of heuristics and biases) would not be replicated even with different WEIRD samples. I am not as favorably disposed as Henrich et al. apparently are to Mook's (Reference Mook1983) idea that the use of WEIRD samples is justified “when seeking existential proofs” (sect. 7.1.6, para. 1); nor to the authors' admittedly clever idea of setting up research facilities in bus terminals and airports to capture non-university participants (sect. 7.3, para. 6) – if the same old suspect methods, such as “reactive” questionnaires and games with trivial pay-offs, would continue to be used.

Henrich et al. believe that behavioral scientists' tendency to claim “universality” for data obtained with WEIRD participants may in part be due to so many researchers themselves being WEIRD (sect. 7.1.1, para. 8). This fact may also be partly responsible for researchers' relative reluctance to worry adequately about external validity and about the effects of complex higher-order interactions among type of participants, methods, and settings. A sustained interest in such interactions may require a contextual (“field-dependent”) worldview and a holistic reasoning style that is (according to Henrich et al.) less utilized by WEIRD people, who favor analytical reasoning.

References

Campbell, D. T. & Stanley, J. C. (1963) Experimental and quasi-experimental designs for research. Rand McNally.Google Scholar
Ebbesen, E. B. & Konečni, V. J. (1975) Decision making and information integration in the courts: The setting of bail. Journal of Personality and Social Psychology 32:805–21.CrossRefGoogle Scholar
Ebbesen, E. B., Parker, S. & Konečni, V. J. (1977) Laboratory and field analyses of decisions involving risk. Journal of Experimental Psychology: Human Perception and Performance 3:576–89.Google Scholar
Konečni, V. J. & Ebbesen, E. B. (1982a) An analysis of the sentencing system. In: The criminal justice system: A social-psychological analysis, ed. Konečni, V. J. & Ebbesen, E. B., pp. 293332. Freeman.Google Scholar
Konečni, V. J. & Ebbesen, E. B., eds. (1982b) The criminal justice system: A social-psychological analysis, pp. 413–23. Freeman.Google Scholar
Konečni, V. J. & Ebbesen, E. B. (1992) Methodological issues in research on legal decision-making, with special reference to experimental simulations. In: Psychology and law, ed. Lösel, F., Bender, D. & Bliesener, T.. Walter de Gruyter.Google Scholar
Konečni, V. J., Ebbesen, E. B. & Konečni, D. K. (1976) Decision processes and risk taking in traffic: Driver response to the onset of yellow light. Journal of Applied Psychology 61:359–67.CrossRefGoogle Scholar
Mook, D. G. (1983) In defense of external invalidity. American Psychologist 38(4):379–87.CrossRefGoogle Scholar
Webb, E. J., Campbell, D. T., Schwartz, R. D. & Sechrest, L. (1966) Unobtrusive measures: Nonreactive research in the social sciences. Rand McNally.Google Scholar