What can experimental studies of bias tell us about real-world group disparities?

Joseph Cesario

doi:10.1017/S0140525X21000017

What can experimental studies of bias tell us about real-world group disparities?

Published online by Cambridge University Press: 08 January 2021

Joseph Cesario

Show author details

Joseph Cesario*: Affiliation:
Department of Psychology, Michigan State University, East Lansing, MI 48824, USA. cesario@msu.edu www.cesariolab.com

Article contents

Abstract
Introduction
The standard approach
Critical flaws of the standard paradigm
Experimental studies of bias: Three topics
What do experimental studies of bias tell us?
Broader consequences
Related critiques
A new (or at least rehashed) approach
Conclusion
Footnotes
References

Rights & Permissions

Abstract

This article questions the widespread use of experimental social psychology to understand real-world group disparities. Standard experimental practice is to design studies in which participants make judgments of targets who vary only on the social categories to which they belong. This is typically done under simplified decision landscapes and with untrained decision-makers. For example, to understand racial disparities in police shootings, researchers show pictures of armed and unarmed Black and White men to undergraduates and have them press “shoot” and “don't shoot” buttons. Having demonstrated categorical bias under these conditions, researchers then use such findings to claim that real-world disparities are also due to decision-maker bias. I describe three flaws inherent in this approach, flaws which undermine any direct contribution of experimental studies to explaining group disparities. First, the decision landscapes used in experimental studies lack crucial components present in actual decisions (missing information flaw). Second, categorical effects in experimental studies are not interpreted in light of other effects on outcomes, including behavioral differences across groups (missing forces flaw). Third, there is no systematic testing of whether the contingencies required to produce experimental effects are present in real-world decisions (missing contingencies flaw). I apply this analysis to three research topics to illustrate the scope of the problem. I discuss how this research tradition has skewed our understanding of the human mind within and beyond the discipline and how results from experimental studies of bias are generally misunderstood. I conclude by arguing that the current research tradition should be abandoned.

Keywords

Discrimination disparate outcomes implicit bias racial bias school discipline shooter bias social psychology stereotyping

Type: Target Article
Information: Behavioral and Brain Sciences , Volume 45 , 2022 , e66

DOI: https://doi.org/10.1017/S0140525X21000017 [Opens in a new window]
Copyright: Copyright © The Author(s), 2021. Published by Cambridge University Press

1. Introduction

For more than half a century, experimental social psychologists have (1) demonstrated the many ways people are treated differently because of their race, age, sex, and other social categories and (2) used these findings to explain why group disparities exist in the real world. From racial disparities in fatal police shootings and school discipline, to sex disparities in science, technology, engineering, and mathematics (STEM) engagement and corporate leadership, social psychologists have overwhelmingly concluded that the stereotypes in the heads of decision-makers play a substantial role in causing group disparities, whether or not people agree with or even consciously acknowledge such stereotypes (Devine, Reference Devine1989; Greenwald & Banaji, Reference Greenwald and Banaji1995). The logic among social psychologists has been the following: If we can show in an experiment that people are treated differently based on their outward appearances when we present them as equal in all other respects, then in the real world such differential treatment exists and is a major cause for why outcomes differ across groups (see, e.g., Greenwald & Krieger, Reference Greenwald and Krieger2006; Kang & Banaji, Reference Kang and Banaji2006). As just one example illustrating the way in which experimental demonstrations of decision-maker bias have been tied to disparate outcomes, Moss-Racusin, Dovidio, Brescoll, Graham, and Handelsman (Reference Moss-Racusin, Dovidio, Brescoll, Graham and Handelsman2012) state clearly that their research “informs the debate on possible causes of the gender disparity in academic science by providing unique experimental evidence that science faculty of both genders exhibit bias against female undergraduates” (p. 16477).

The purpose of this article is to show that standard practice in experimental social psychology is fundamentally flawed, so much so that findings from these studies cannot be used to draw any substantive conclusions about the nature of real-world disparities – despite the ubiquitous practice of drawing exactly these conclusions. There are three problems inherent in the current approach that render it impotent for this purpose. First, critical pieces of information used by actual decision-makers are absent in experimental studies (missing information flaw). Second, effects of biased decision-making are rarely understood in the context of other important influences on group outcomes, such as the behaviors of targets themselves (missing forces flaw). Third, there is no systematic study on whether the contingencies required to produce experimental bias are present in actual decisions (missing contingencies flaw). These three flaws can lead researchers to vastly overestimate the role of stereotyping as a causal process, even going so far as to reveal experimental stereotyping effects when they play no role in real decisions or in causing group disparities. Although current experimental studies can provide important information about stereotyping processes per se, they cannot and do not provide information about the nature of group disparities. That is, the contribution of stereotyping and bias research is misunderstood and misused.

I first describe the standard “research cycle” of stereotyping and bias studies in experimental social psychology. I describe the flaws inherent in this approach at the abstract level and then apply the analysis to three research topics in social psychology: police officers' decision to use deadly force, implicit bias, and school disciplinary policy. I then describe what experimental studies of bias can tell us and how researchers generally misinterpret the nature of such studies. I speculate on the ways this research tradition has skewed the understanding of the human mind that has been exported from our discipline to the culture at large. I then connect this critique to related critiques within psychology and similar problems that have arisen in other fields. In the final section, I chart out an alternative path that might be more effective for studying group disparities.

Throughout this paper, I focus on the familiar social psychological demonstrations of categorical bias: experiments in which participants respond differently to targets from different social categories. Although I focus on studies that posit stereotype activation as the culprit for such differential responding (as this is a long-standing way of understanding such effects, e.g., Duncan, Reference Duncan1976), nearly all of the current analysis is applicable to bias caused by other sources. I focus on experimental social psychology because this area has had a considerable impact on the discussion of group disparities, but this is not to say that similar critiques cannot be leveled against other areas and disciplines. The current critique is also distinct from related critiques of mundane realism or external validity, which I discuss in more detail later. Instead, this critique is about how social psychologists are fundamentally misguided in how they approach the study of group disparities, which distorts the nature of the decision under study and leads to incorrect conclusions about the conditions under which decisions will be more or less biased. Although psychology has no shortage of problems to be addressed (e.g., Srivastava, Reference Srivastava2016), I limit my discussion in this paper to the misuse of experimental social psychology in explaining group disparities.

Before getting into the details of the argument, it is important to provide two cautionary notes. First, I am addressing the question of whether decision-maker bias produces group disparities in the immediate outcomes of that decision (and whether experimental social psychology can inform this process). This is seen in the example of a police officer's decision to shoot and racial disparities in being shot by police, or of a search committee's hiring decision and sex disparities in STEM employment. The current analysis does not address or dispute the possibility that decision-maker bias may enter earlier in the chain of events that leads to the decision in question. For example, police officers may show bias in the decision to engage in discretionary stopping of Black citizens or high school teachers may show bias in discouraging female students from pursuing STEM careers.

Second, the current analysis relies substantially on the fact that the distributions of behaviors, personality, character, preferences, abilities, and so on are not equal across different demographic groups (and that this fact is not appropriately considered by experimental social psychologists). I make no claims about the origin of these group differences in terms of the degree to which they are caused by individual decision-makers, “structural” forces beyond individual actions, genetic factors, incentive structures because of government policies, and so on. The point here is not to claim that group differences are inherent to people (although they might very well be) or that there are no broader social influences on human behavior. There may be systematic bias that produces group differences in the distributions of important characteristics. For the purposes of the present argument, the distal causes of group differences are irrelevant because these causes are separable from the question of whether group disparities are because of biased decision-making for specific outcomes. For example, the reasons why men and women differ in their interest in things versus people is a separate question from whether faculty search committees are biased against women in hiring for STEM positions.

On both these points, there is the possibility that bias “earlier” in the causal chain eventually leads to disparities on a later outcome, even while decision-makers show no bias on that later outcome. Of course, claims of “earlier” bias also require evidence, and if the available evidence is merely more of the same demonstrations from experimental social psychology, then these studies suffer from the same flaws described here and are, therefore, not convincing evidence.

2. The standard approach

The standard research cycle begins with an observation that groups differ in their real-world outcomes and the desire to understand the causes of such disparities. Simply, we see that members of some groups get better or worse outcomes than members of other groups and we want to know why. It is, perhaps, natural that social psychologists would start with the assumption that stereotypes – categorical information stored in a decision-maker's mind – play a meaningful role in producing these group differences. To gather evidence in support of this possibility, researchers design experiments in which participants make judgments of targets who vary only with respect to the social categories to which they belong. For example, to study the role of race in police officers' decision to use deadly force, researchers show participants pictures of Black and White men who do not vary in how they are presented in any way other than their race (as in the First-Person Shooter Task [FPST]; Correll, Park, Judd, & Wittenbrink, Reference Correll, Park, Judd and Wittenbrink2002). If participants shoot unarmed Blacks more than unarmed Whites, one can be sure that the race of the target played a causal role in participants' decisions because the experimenter has presented the groups in an identical way on all other dimensions (such as their posture, the frequency of holding a gun, facial expressions, etc.). Making all groups exactly equal in how they are presented in an experiment allows the researcher to conclude that the decision-maker (and not differences in the behavior of targets themselves) is responsible for biased responses directed at targets from different groups. It would be difficult to overstate the ubiquity of this approach in experimental social psychology; it is the paragon of systematic design and is understood as the method for studying the biasing effects of categories.

Having established that social categories impact participants' decisions in an experiment, researchers return to the original real-world disparity and conclude that the same processes observed in the lab explain these disparities as well (see, e.g., Moss-Racusin et al., Reference Moss-Racusin, Dovidio, Brescoll, Graham and Handelsman2012 for a prototypical example). That is, if stereotypes cause people to treat targets differently when there are no real behavioral differences in experimental stimuli, then this same biased treatment on the part of decision-makers is at play in the real world and can account for a meaningful amount of the disparities we see across groups. Researchers then complete the circle by using their experimental findings as evidence for designing interventions intended to reduce the disparity of interest.

3. Critical flaws of the standard paradigm

The standard experimental approach in social psychology contains three fundamental flaws which prevent the findings of experimental studies from being directly applied to the study of group disparities: the flaw of missing information, the flaw of missing forces, and the flaw of missing contingencies. The first flaw is that the decision components used by real-world decision-makers are absent in our experiments; in other words, information that is available to and used by actual decision-makers is removed from our experimental studies. The second flaw is that other influences on group outcomes – such as actual behavioral differences across groups – are not integrated into our designs, analyses, or conclusions. The third flaw is the lack of systematic study of whether the contingencies required to produce experimental bias are present in real decisions; along with this is the understudied question of whether the experimental landscape changes the motivation and ability of decision-makers. By “fatal flaws,” I mean that any one of these flaws can reveal experimental stereotyping effects even when no such effects exist in real decisions. I first describe these flaws at the general level and then show how such flaws are evident in three different research areas in experimental social psychology. Descriptions and examples of these flaws are summarized in Table 1.

Table 1. Three flaws inherent to experimental social psychology studies of bias

3.1 The missing information flaw

For reasons good and bad, experimental studies of categorical bias in social psychology are massive simplifications of real-world decision landscapes. The problem is that this simplification removes information that may play a strong or even critical role in real decisions. When this happens, three distortions may occur. First, the missing information may have more powerful effects than social category information and may overwhelm any categorical influence in real decisions; when such forces are removed all that remains is the categorical influence, which is then realized in the experiment. Second, removing these variables may leave experimental participants with no useful information to render a judgment other than the target's social category; although categorical information may be used minimally or not at all in real decisions, experimental participants now use it because of the absence of any other kind of diagnostic information. Third, the presence or absence of such information may change the underlying decision process itself, leaving researchers with a distorted understanding of the cognitive dynamics at play in real decisions. In all cases, researchers are at risk of incorrectly concluding that the reliable and replicable effects of categories observed in their experiments are present in the real world and have the same effects on outcomes. This suggests that social psychologists may fundamentally misunderstand the nature of a decision if their experimental methods strip away critical features present in real decision-makers' environments.

This first flaw reflects a fallacy in the justification for using experimental studies, which is to presume that any information that can affect outcomes in an experimental setting does have the same effect in the real world. Said differently, the fallacy is the unstated belief that adding additional information to the decision landscape will not change the nature or magnitude of an experimental effect and the missing information can therefore safely be ignored.

3.2 The missing forces flaw

The second flaw of the current experimental approach is that researchers do not interpret experimental effects in light of the other causal forces which impact group outcomes in the real world. Primary among these forces is the behavior of the targets themselves and the cognitive, motivational, and behavioral differences that exist across groups. This flaw is important because if there are strong influences on group outcomes besides biased treatment, then it follows that experimental participants may show reliable decision-maker bias – even very strong bias – while such bias exerts no discernable effect on real outcomes. If true, social psychologists may be perpetually disappointed in the state of the world because their recommended interventions of removing decision-maker bias will not yield equal outcomes or even reduce group disparities. Indeed, depending on the strength of group differences, social psychologists may be diverting resources away from effective interventions and toward those that will have little effect on reducing disparities.

This flaw reflects the fallacy that researchers believe they can safely ignore the degree to which the stimuli used in experimental studies match the distributional properties of the real-world groups they represent. One reason for this disregard may be the belief that all groups have roughly identical distributions on important underlying causal characteristics.Footnote ¹ Yet this assumption is incorrect, as groups differ (and often markedly so) on important personality, motivational, and cognitive dimensions – in other words, on the interest and ability factors that relate to nearly all outcomes (see, e.g., ACT, 2017; Andreoni et al., Reference Andreoni, Kuhn, List, Samek, Sokal and Sprenger2019; Beaver et al., Reference Beaver, DeLisi, Wright, Boutwell, Barnes and Vaughn2013; Benbow & Stanley, Reference Benbow and Stanley1980; Benbow, Lubinski, Shea, & Eftekhari-Sanjani, Reference Benbow, Lubinski, Shea and Eftekhari-Sanjani2000; Byrnes, Miller, & Schafer, Reference Byrnes, Miller and Schafer1999; Ceci & Williams, Reference Ceci and Williams2010; Cesario, Johnson, & Terrill, Reference Cesario, Johnson and Terrill2019; Diekman, Steinberg, Brown, Belanger, & Clark, Reference Diekman, Steinberg, Brown, Belanger and Clark2017; Gottfredson, Reference Gottfredson1998; Halpern et al., Reference Halpern, Benbow, Geary, Gur, Hyde and Gernsbacher2007; Hsia, Reference Hsia1988; Hsin & Xie, Reference Hsin and Xie2014; Jussim, Cain, Crawford, Harber, & Cohen, Reference Jussim, Cain, Crawford, Harber and Cohen2009; Jussim, Crawford, Anglin, Chambers, et al., Reference Jussim, Crawford, Anglin, Chambers, Stevens and Cohen2015a; Jussim, Crawford, & Rubinstein, Reference Jussim, Crawford and Rubinstein2015c; Lee & Ashton, Reference Lee and Ashton2020; Lippa, Reference Lippa1998; Lu, Nisbett, & Morris, Reference Lu, Nisbett and Morris2020; Lubinski & Benbow, Reference Lubinski and Benbow1992; Lynn, Reference Lynn2004; Lynn & Irwing, Reference Lynn and Irwing2004; McLanahan & Percheski, Reference McLanahan and Percheski2008; Roth, Bevier, Bobko, Switzer, & Tyler, Reference Roth, Bevier, Bobko, Switzer and Tyler2001; Sowell, Reference Sowell2005, Reference Sowell2008; Su, Rounds, & Armstrong, Reference Su, Rounds and Armstrong2009; Tregle, Nix, & Alpert, Reference Tregle, Nix and Alpert2019; Wright, Morgan, Coyne, Beaver, & Barnes, Reference Wright, Morgan, Coyne, Beaver and Barnes2014).Footnote ² In understanding the role of decision-maker bias in producing disparate outcomes, it is necessary to compare and interpret the size of categorical bias effects with the size of these behavioral differences across groups.

Methodologically, this flaw is guaranteed because target stimuli are presented as equal on all dimensions except for social category membership. Statistically, this flaw is guaranteed because analytic models either do not incorporate information about real-world behavioral differences or if they do, they are treated as control variables whose (often very strong) relationship to the outcome of interest is ignored. These decisions shift researchers' attention away from the role that causal forces beyond categorical bias may have on group disparities in the real world – or at least, allow researchers to relegate these forces to a brief mention in the Introduction of their papers. To the extent that groups differ in important ways, and such differences have strong effects on obtained outcomes, the role of perceiver bias and stereotyping will be overstated.

Although the first flaw concerns removing everything but categorical information from the experiment, this second flaw concerns failing to interpret those experimental categorical effects in light of other known forces on group outcomes. This failure can lead to overemphasizing the role of perceiver bias, as revealed by experimental methods and “statistically significant” model coefficients (while ignoring variance explained or effect sizes).

3.3 The missing contingencies flaw

The third critical flaw is the failure to study whether the precise contingencies needed to produce categorical bias in our experiments are realized in real-world decision situations. Whether the conditions required for experimental demonstrations of bias are present in the real world obviously informs the degree to which such demonstrations can explain group disparities. However, it is also important because such contingencies relate to the motivation and ability of decision-makers in experimental tasks, and it is known that motivation and ability are critical variables for categories to have biasing effects on judgments.

This flaw reflects another fallacy present in experimental studies of bias, which is to ignore the contrived nature of experiments and the total control experimenters exercise over all aspects of the participant's experience. Indeed, contingencies are “missing” in not one but two ways. First, social psychologists do not explore whether the conditions needed for experimental bias are present in the real world. But second, discussion of these conditions is missing when social psychologists advocate for their research outside of academic psychology, where “contingent” and “conditional” bias now becomes “widespread” and “pervasive” bias (e.g., Greenwald & Krieger, Reference Greenwald and Krieger2006; Kang & Banaji, Reference Kang and Banaji2006).

Specific contingencies are required for category information to bias a person's decisions; stereotype effects do not occur uniformly for all people or under all conditions. For categories to bias decisions, clear diagnostic or individuating information must be absent and perceivers must lack the ability or motivation to control the biasing influence of categories. When decision-makers have adequate ability and motivation to control the effects of categorical information, or when information is unambiguous (as with strong individuating information or applicability of a single concept; Higgins, Reference Higgins, Higgins and Kruglanski1996), categories have little to no biasing effect on judgments (e.g., Darley & Gross, Reference Darley and Gross1983; Dovidio & Gaertner, Reference Dovidio and Gaertner2000; Koch, D'Mello, & Sackett, Reference Koch, D'Mello and Sackett2015; Krueger & Rothbart, Reference Krueger and Rothbart1988; Locksley, Borgida, Brekke, & Hepburn, Reference Locksley, Borgida, Brekke and Hepburn1980; see Jussim, Reference Jussim2012b; Jussim et al., Reference Jussim, Crawford and Rubinstein2015c; Kunda & Spencer, Reference Kunda and Spencer2003). As stated unequivocally in a summary by Kunda and Thagard (Reference Kunda and Thagard1996) over two decades ago, “It is clear … that the target's behavior has been shown to undermine the effects of stereotypes based on all the major social categories” (p. 292).

Given the importance of some contingency set, researchers must outline the precise contingencies required to give rise to bias in the lab and detail the degree to which experimental contingencies are present in real decisions. Assuming researchers can, in fact, show that these necessary contingencies are reproduced with regularity in the real world, researchers are also responsible for keeping these contingencies front and center when discussing their study in applied contexts so as to not overextend claims of bias.

Experimental contingencies are also important because they relate to the roles of ability and motivation in biased decision-making. Ability and motivation have been the twin variables in nearly every major model of impression formation, persuasion, and decision-making in social cognition for decades (Bargh, Reference Bargh, Chaiken and Trope1999; Devine, Reference Devine1989; Fazio, Reference Fazio1990; Fiske & Neuberg, Reference Fiske and Neuberg1990; Petty & Wegener, Reference Petty, Wegener, Chaiken and Trope1999; Smith & DeCoster, Reference Smith and DeCoster2000). Given this, it is surprising that social cognitive researchers have not systematically studied whether novice or experimental participants match expert or real-world decision-makers on these two dimensions.

First, regarding ability, it has long been known that experts use different information, and use the same information differently, relative to novices (see, e.g., Klein, Reference Klein1998; Koch et al., Reference Koch, D'Mello and Sackett2015; Levine, Resnick, & Higgins, Reference Levine, Resnick and Higgins1993; Logan, Reference Logan2018; although not always, see Miller, Reference Miller2019). In experimental studies of stereotyping, it is undeniable that there are no serious attempts to train participants before having them render a judgment. If trained decision-makers use information in the decision landscape differently than do untrained participants, this represents an important difference in ability between the two groups. If experts attend to different decision components or use these components differently than novices, and this difference changes the effect of social categories on the ultimate decision, then the conclusion of widespread bias in real decisions based on findings from undergraduate participants will be unwarranted.

The experimental situation itself can also be understood as impacting participants' ability in important ways. As described above, the simplified experimental methods used in studies of bias remove important sources of information used by real decision-makers. Said differently, researchers change the nature of accuracy and bias in decision-makers when they fail to give participants information that is available in real decisions, information which can allow participants to make decisions in unbiased (or, at least, less biased) ways.

Besides the ability differences between expert decision-makers and naive experimental participants, there are surely important motivational differences as well. Some research has tried to increase participants' motivation to provide unbiased decisions by rewarding accurate decisions or increasing personal relevance, but whether such manipulations produce similar motivation to those found outside experimental contexts is unknown. And importantly, experimental participants simply do not bear the costs of their decisions in ways that are required of many real-world decision-makers, a fact which can change the link between intentions and behavior (e.g., Sowell, Reference Sowell2008; Tetlock, Reference Tetlock1985). For naive participants making imaginary decisions about hypothetical targets, there is no effect on their lives once the experiment ends.

3.4 Summary

The three critical flaws of the experimental approach to the study of bias and group disparities can be summarized as follows. If the information used by actual decision-makers in real-world decision landscapes is absent in experimental studies of these decisions, one's understanding of the decision under study can be dramatically skewed. Merely demonstrating bias conveys nothing about the strength of that bias relative to other causal forces on group outcomes. Moreover, there is a failure to specify the required contingencies for experimental demonstrations of bias and explore whether such contingencies are present in real decisions. Finally, if actual decision-makers use information differently or have different motivations and abilities than experimental decision-makers, there is no guarantee that bias will be observed outside experimental contexts. For these reasons, claims of ubiquitous bias among real-world decision-makers may be overstated.

4. Experimental studies of bias: Three topics

Having identified the problems inherent to experimental studies of bias at the general level, I now turn to demonstrating how these problems appear in practice. I chose the three topics discussed next because they cover a range of characteristics. Shooter bias is a narrow topic with nearly two decades of research and is a prototypical social psychological study. Implicit bias is a much broader topic but one that has had a major effect on the public's understanding of group disparities. School disciplinary policies are a relatively new topic, but an important one with broad interest beyond the discipline of psychology.

4.1 Shooter bias

For nearly two decades, researchers have studied the question of racial bias in police officers' decisions to use deadly force. Without question, the most common experimental task used is the FPST, in which participants are shown pictures of armed and unarmed Black or White men and asked to press buttons labeled “shoot” and “don't shoot” (Correll et al., Reference Correll, Park, Judd and Wittenbrinkin press; see Cesario & Carrillo, Reference Cesario, Carrillo, Carlston, Johnson and Hugenbergin press for a summary). How does this research fare with respect to the three fundamental flaws of experimental social psychology? (Fig. 1).

Figure 1. Example trial of the First-Person Shooter Task, the most common experimental method for understanding police officers' decisions to shoot.

4.1.1 Shooter bias: The missing information flaw

With respect to the first flaw, every relevant piece of information used by police officers in the decision to shoot has been removed from the standard experimental task, absent the one variable of whether or not targets are holding guns (an effect which overwhelms all other effects in both real and experimental decisions). Although a small number of exceptions exist and are discussed below, this has been true of virtually all studies using the FPST (see Cesario & Carrillo, Reference Cesario, Carrillo, Carlston, Johnson and Hugenbergin press). These missing variables include: dispatch information about the citizen and why the officer has been called to the scene, neighborhood information, past encounters with the citizen, how the interaction has unfolded leading up to the decision point (e.g., has the citizen been compliant thus far?), the physical movements by the citizen at the moment of the decision point, the goal of the officer at the scene, whether other officers are present, whether non-lethal tactics have already been used, and so on.

Officers report that all these factors matter, and indeed officers are trained to attend to these factors and integrate them into their dynamic, continuously updating decision to use deadly force as the interaction with the citizen unfolds. Of course, whether and the extent to which any of these pieces of information actually affect officers' decisions are empirical questions. Yet by not including these features, researchers simply have no idea whether their experimental methods are adequately capturing officers' decision processes. Researchers may be fundamentally misunderstanding the underlying cognitive decision dynamics if factors that impact those dynamics in real decision-makers have no possibility of impacting experimental participants, simply because researchers have failed to include such factors in their studies. Thus, we can ask, what happens to racial bias – at both the behavioral and the cognitive process levels – when such information is introduced into the experiment?

As one example of how the conclusions from experimental studies can drastically change if we introduce information used by officers in real decisions, Johnson, Cesario, and Pleskac (Reference Johnson, Cesario and Pleskac2018) conducted a series of studies examining the role of dispatch information in the decision to shoot. Participants completed a standard FPST, but with an important modification: On some trials, participants were given dispatch information at the start of each trial that contained the race of the target, whether the target had a weapon (correct 75% of the time), or both pieces of information. As shown in Figure 2, although untrained undergraduates showed the standard race bias effect when no dispatch information was given, dispatch information of any type eliminated race bias in the decision to shoot. Thus, a single change to the standard, simplified experimental task to include the most important and relevant information that officers have in real shootings eliminated the biasing effects of race. This calls into question our ability to draw conclusions about real-world cases of police shootings from simplified experimental paradigms. More generally, it illustrates the importance of ensuring that the decision landscape for participants in experimental laboratory tasks contains those factors used by real-world decision-makers.

Figure 2. In the standard First-Person Shooter Task (“No Info”), undergraduate participants showed racial bias in the decision to shoot. When provided with prior dispatch information about target race or presence of a weapon (“Prior Info”), participants showed no evidence of racial bias in the decision to shoot. Black and white bars refer to target race. Modified from Johnson et al. (Reference Johnson, Cesario and Pleskac2018).

This point is consistent with research by Correll and colleagues (Correll, Wittenbrink, Park, Judd, & Goyle, Reference Correll, Wittenbrink, Park, Judd and Goyle2011; but see Pleskac, Cesario, & Johnson, Reference Pleskac, Cesario and Johnson2018), who manipulated the neighborhood background in which targets in the FPST appeared. In nearly all uses of the FPST, targets are presented in neutral, uninformative backgrounds (office buildings, parks, etc.). These researchers manipulated whether targets appeared in neutral backgrounds or dangerous, urban backgrounds. Placing targets in the dangerous backgrounds completely eliminated racial bias in the decision to shoot. To the extent that real-world police shootings occur in dangerous neighborhoods or situations, this seriously calls into question the degree to which our experimental findings inform our understanding of police officer racial bias.

As another attempt to reintroduce those factors present in actual decisions but missing in experimental studies, at least three independent research groups have used some version of an immersive shooting simulator similar to those used for training by law enforcement (Cox, Devine, Plant, & Schwartz, Reference Cox, Devine, Plant and Schwartz2014; James, James, & Vila, Reference James, James and Vila2016; James, Vila, & Daratha, Reference James, Vila and Daratha2013; James, Klinger, & Vila, Reference James, Klinger and Vila2014; Pleskac, Johnson, Cesario, Terrill, & Gagnon, Reference Pleskac, Johnson, Cesario, Terrill and Gagnonunder review). As depicted in the right panel of Figure 3, participants in such studies stand in front of a projection screen and watch life-sized videos recorded from a first-person point of view. These videos are of policing scenarios similar to those encountered by law enforcement (e.g., traffic pullovers and domestic disturbances). Participants verbally interact with individuals in the videos as they unfold over time, during which participants must decide whether to use deadly force. This response is made using a modified handgun; when the trigger is pulled, cycling of the firearm occurs through a compressed air connection, which provides recoil and initiates the sound of a handgun firing through a set of speakers. Officers routinely report being highly involved with these scenarios and display strong emotional states, attesting to the realism of the method.

Figure 3. Left panel: Participant completing the standard laboratory First-Person Shooter Task. Right panel: A participant-officer completing an immersive shooting simulator, with video from officer's perspective superimposed in lower right corner.

Importantly, such a methodological change is not merely about recreating surface-level similarity to the decision to shoot in terms of the participant's experience (e.g., pressing a button vs. holding a gun). This method allows researchers to introduce back into the decision landscape those factors which simplified tasks remove but which officers report as being important.

Cesario and Carrillo (Reference Cesario, Carrillo, Carlston, Johnson and Hugenbergin press) came to two main conclusions in their summary of the research on shooting simulator studies. First, among the studies that manipulated the scenarios to which officers responded, there was strong evidence of the importance of the scenario and the specific actors on officers' decisions – stronger than the effects of suspect race. In Pleskac et al. (Reference Pleskac, Johnson, Cesario, Terrill and Gagnon2019) for example, variance in officers' decisions was primarily explained by the different scenarios in the videos (e.g., serving a warrant for armed robbery vs. failure to pay for child support) and the behavior of the different actors in the videos – features that the standard FPST removes entirely from the decision landscape. The second main conclusion was that studies using shooting simulators do not provide strong evidence of anti-Black bias in officers' decisions. Indeed, in all possible tests of racial bias across such studies, only about 5% showed anti-Black bias in officers' decisions. In contrast, almost 40% of tests showed anti-White bias in officers' decisions.

Although these results are inconsistent with claims from experimental social psychologists regarding the overwhelming importance of racial stereotypes in decisions to shoot, these results are consistent with the many analyses of actual police shootings that have revealed the importance of context and suspect behavior (see, e.g., Cesario et al., Reference Cesario, Johnson and Terrill2019; Fryer, Reference Fryer2016; Fyfe, Reference Fyfe1980; Geller & Karales, Reference Geller and Karales1981; Inn, Wheeler, & Sparling, Reference Inn, Wheeler and Sparling1977; Klinger, Rosenfeld, Isom, & Deckard, Reference Klinger, Rosenfeld, Isom and Deckard2016; Loughlin & Flora, Reference Loughlin and Flora2017; Ma, Graves, & Alvarado, Reference Ma, Graves and Alvarado2019; Mentch, Reference Mentch2020; Ross, Winterhalder, & McElreath, Reference Ross, Winterhalder and McElreath2021; Shjarback & Nix, Reference Shjarback and Nix2020; Tregle et al., Reference Tregle, Nix and Alpert2019; Wheeler, Phillips, Worrall, & Bishopp, Reference Wheeler, Phillips, Worrall and Bishopp2017; Worrall, Bishopp, Zinser, Wheeler, & Phillips, Reference Worrall, Bishopp, Zinser, Wheeler and Phillips2018).

4.1.2 Shooter bias: The missing forces flaw

With respect to the second flaw, it is clear that experimental social psychologists have ignored the contexts of actual deadly force decisions and the multiple influences on group disparities in fatal shootings, including the behavior of citizens themselves and whether such behavior varies across groups. There have been almost no serious attempts to connect experimental research to systematic analyses of fatal police shootings from the Criminal Justice literature, with nothing more than superficial citations of such research and no substantive input on how studies are designed or how research is conducted. Indeed, nearly a decade passed from the first publication using the FPST before researchers thought to ask about the very basic variable of neighborhood dangerousness (Correll et al., Reference Correll, Wittenbrink, Park, Judd and Goyle2011), and 15 years passed before experimental social psychologists asked about whether different violent crime rates play a role in explaining racial disparities (Cesario et al., Reference Cesario, Johnson and Terrill2019; Scott, Ma, Sadler, & Correll, Reference Scott, Ma, Sadler and Correll2017).

In shooter bias studies, Black and White targets are shown holding guns with the same frequency; in other words, they are presented in equal proportions in those situations for which deadly force is relevant. The logic is that, if experimental participants are more likely to shoot Black targets in the FPST, then this same racial bias in the heads of police officers explains the per capita racial disparity in being shot. Yet for the results to apply, it must be the case that Black and White citizens are present in deadly force situations with equal likelihoods in the real world, otherwise factors such as differential exposure to the police may be sufficient to explain racial disparities.

In contrast to the underlying assumption in experimental studies, there is clear evidence that (1) the context of violent crime is an overwhelming influence on officers' decisions to shoot and (2) violent crime rates differ across racial groups (e.g., Barnes, Jorgensen, Beaver, Boutwell, & Wright, Reference Barnes, Jorgensen, Beaver, Boutwell and Wright2015; Cesario et al., Reference Cesario, Johnson and Terrill2019; Klinger et al., Reference Klinger, Rosenfeld, Isom and Deckard2016; Ma et al., Reference Ma, Graves and Alvarado2019; Miller et al., Reference Miller, Lawrence, Carlson, Hendrie, Randall, Rockett and Spicer2017; Nix, Campbell, Byers, & Alpert, Reference Nix, Campbell, Byers and Alpert2017; Tregle et al., Reference Tregle, Nix and Alpert2019; Wheeler et al., Reference Wheeler, Phillips, Worrall and Bishopp2017; Worrall et al., Reference Worrall, Bishopp, Zinser, Wheeler and Phillips2018). Police officers do not use deadly force equally across all policing situations. The modal police shooting is one in which officers have been called by dispatch to the scene of a possible crime and are confronted with an armed citizen posing a deadly threat to the officer or to other citizens. It is also the case that violent crime rates differ very starkly across racial groups. Indeed, recent study suggests that the different rates of exposure to police through violent crime situations greatly – if not entirely – accounts for the overall per capita disparities in being fatally shot by police (Cesario et al., Reference Cesario, Johnson and Terrill2019; Fryer, Reference Fryer2016; Mentch, Reference Mentch2020; Ross et al., Reference Ross, Winterhalder and McElreath2021; Tregle et al., Reference Tregle, Nix and Alpert2019).

Once fatal police shootings are understood from this angle, it becomes clear that social psychologists have misunderstood this topic in their experimental approaches. Rather than first studying the nature of police shootings and then building experimental investigations around that understanding, researchers instead first created experimental worlds in which all group members are equal, under the assumption that this matched the actual behavior of groups and that their experimental findings would shed light on the disparate outcomes of those group members.

When it comes to explaining group disparities, researchers clearly prioritize their experimental findings over other possible causal forces on group outcomes. For example, of 18 recently published papers on shooter bias from experimental social psychology, only two raise the possibility that different behaviors of Black and White citizens might play a role in Black citizens' overrepresentation in being shot by the police (a possibility dismissed in one paper with indirect evidence and dismissed in the other paper with reference to a single article). This was true even when authors recognized that behavioral differences might account for other disparities, such as how the greater aggressiveness and criminality of men account for why they are more likely to be shot than women (Plant, Goplen, & Kunstman, Reference Plant, Goplen and Kunstman2011).

An important point concerning “blaming the victim” needs to be raised here, and this applies not only to fatal shootings but to all disparities. It is necessary to keep causal analysis distinct from “blaming the victim,” or in Felson's (Reference Felson1991) terms, to not use a blame analysis framework where a causal analysis framework is needed. Whatever the causal factors that lead an individual to one or another outcome, such factors can be described without the language of blame and responsibility. To say that a proximate cause of police shootings is involvement in crime is not to cast blame on a person for their own shooting, and certainly such an explanation should not be misapplied to those cases where criminal involvement is not present. But neither should a person's behavior be off-limits as part of a causal analysis merely because that person belongs to a minority group.

4.1.3 Shooter bias: The missing contingencies flaw

Research on shooter bias clearly illustrates the third flaw, the lack of attention to experimental contingencies and whether there are differences in motivation and ability between experimental and real-world decision-makers. Evidence of racial bias is reliably obtained with untrained citizens completing the FPST (Mekawi & Bresin, Reference Mekawi and Bresin2015), but the task has specific parameters that are required for such bias to be realized. For example, in the FPST, the target appears on the screen holding an object and the participant must make a decision within a response window relative to target onset. Thus, target race and object are presented simultaneously and responses after, say, 650 ms are considered errors.

The important question is whether these contingencies match the nature of actual police shootings. They do not. Officers almost always have information about citizen race much, much sooner than when the decision to shoot is made (and certainly well outside the window for ruling out controlled processing), and officers almost always have some interaction with the citizen before deciding to shoot. As noted above, experimental FPST participants are also given zero information about the situation surrounding the decision, a fact that matches no police shooting.

More important, the FPST is a task about misidentifying harmless objects for weapons. However, evidence of racial bias in the FPST has been used to make claims about widespread police officer bias in the decision to shoot. What has not been questioned is the degree to which fatal police shootings are actually about misidentification of harmless objects. If police shootings rarely involve the misidentification of objects under neutral conditions (which is the focus of the FPST), then it might be misleading to apply findings from the FPST to explain racial bias in fatal shootings more broadly. In fact, we estimated that the number of fatal shootings in which officers misidentify harmless objects for weapons is around 30 incidents per year (Cesario et al., Reference Cesario, Johnson and Terrill2019). To the extent that error rates on the FPST are informative for understanding racial bias, the task may be applicable only to an extremely infrequent event within a much larger set of related events. Indeed, considering that there are over 75,000,000 police–citizen contacts per year (Davis, Whyde, & Langton, Reference Davis, Whyde and Langton2018), this suggests the error rate for officers misidentifying a harmless object as a weapon – the central question of the FPST – is on the order of less than one in a million.

One could salvage the FPST by replying that the task still tells us something important about officers' decisions during these very infrequent events. Moreover, infrequent events can be tremendously important, and the tragic cases where an officer makes a clear error and shoots a citizen reaching for his wallet are the events that we as citizens should care the most about. However, two problems remain. First, the most reliable effect in the FPST is on response times and not on error rates; meta-analysis indicates that there is not a reliable effect of target race on shooting unarmed targets (Mekawi & Bresin, Reference Mekawi and Bresin2015). Second, such an argument requires ignoring the problems described above, which can change the applicability of such results to real-world cases. For example, the FPST assumes equal encounter rates with the police (as 50% of trials are White targets and 50% of trials are Black targets). However, if officers have differential contact with Black citizens (because of bias in discretionary stopping of citizens or simply because of different violation rates between Black and White citizens), then racial disparity in being shot while reaching for a wallet may exist while officers show no bias in the actual decision to shoot. A constant, race-blind error rate on the part of the police would still result in a greater proportion of Black Americans being shot while reaching for their wallets (see Cesario, Reference Cesario2021; Ross et al., Reference Ross, Winterhalder and McElreath2021).

What about the failure to consider possible motivation and ability differences between real-world and experimental decision-makers? Social psychologists have overwhelmingly used convenience samples of naive undergraduates to study the decision to shoot (see Cesario & Carrillo, Reference Cesario, Carrillo, Carlston, Johnson and Hugenbergin press), participants for whom the decision is inconsequential and who have no training in how to make such a decision. Yet police officers typically receive over 1,000 hours of use of force training (Morrison, Reference Morrison2006; Stickle, Reference Stickle2016). It would be surprising if the ability to detect and classify objects, and the cognitive processes underlying such performance, was similar for experienced officers and undergraduates who have never made a single such decision in their lives. Interestingly, Correll et al. (Reference Correll, Park, Judd and Wittenbrink2002) issued exactly this caution in the very first study on the FPST (“it is not yet clear that Shooter Bias actually exists among police officers … there is no reason to assume that this effect will generalize beyond [lay samples],” p. 1328). Yet despite this and later warnings (Cox & Devine, Reference Cox and Devine2016), researchers continued to apply studies from undergraduates to police officers, even as data came to light that police officers did not show the same bias (e.g., Correll et al., Reference Correll, Park, Judd, Wittenbrink, Sadler and Keesee2007, Reference Correll, Hudson, Guillermo and Ma2014).

The fact that trained officers may use information in the decision landscape differently than untrained undergraduates represents an important ability difference between the two groups. If experts attend to different decision components or use these components differently than novices, and this difference changes the effect of target race on the ultimate decision, then the conclusion of widespread race bias in officers' deadly force decisions based on findings from undergraduate participants will be unwarranted. Indeed, sworn officers typically show little to no bias in the behavioral decision to shoot with the standard FPST (e.g., Akinola, Reference Akinola2009; Correll et al., Reference Correll, Park, Judd, Wittenbrink, Sadler and Keesee2007; Johnson et al., Reference Johnson, Cesario and Pleskac2018; Ma & Correll, Reference Ma and Correll2011; Sim, Correll, & Sadler, Reference Sim, Correll and Sadler2013; Taylor, Reference Taylor2011), and this is especially true for studies using immersive shooting simulators such as the one described above (e.g., James et al., Reference James, Vila and Daratha2013, Reference James, Klinger and Vila2014, Reference James, James and Vila2016). Cesario and Carrillo (Reference Cesario, Carrillo, Carlston, Johnson and Hugenbergin press) summarized studies in which sworn officers completed the standard FPST and found that out of 64 possible tests for racial bias, only ~25% showed anti-Black bias whereas ~70% showed no bias on the part of officers in one direction or the other.

As a direct means of demonstrating the importance of collecting data with trained experts rather than naive undergraduates, Johnson et al. (Reference Johnson, Cesario and Pleskac2018) tested for differences between officers and students in the underlying cognitive dynamics of the decision to shoot. Was there evidence that trained versus untrained individuals were making the decision in a different way or using race differently during the decision process? Trained officers and untrained undergraduates completed the standard laboratory FPST. These researchers then modeled the data from each group using a drift diffusion model. This model describes the decision to shoot as a sequential sampling process in which people start with a prior bias to shoot or not and accumulate evidence over time until a threshold required for a decision is reached. The details regarding this modeling can be found elsewhere (see Johnson, Hopwood, Cesario, & Pleskac, Reference Johnson, Hopwood, Cesario and Pleskac2017; Pleskac et al., Reference Pleskac, Cesario and Johnson2018); for now, the important point is that the model allows for an understanding of how the cognitive processes underlying to the decision to shoot might vary between untrained and trained participants.

In these data, trained officers showed no racial bias in their behavioral decisions, despite untrained undergraduates showing such bias. More important, cognitive modeling of the decision data revealed why officers did not show bias in their behavioral responses. Officers showed two major differences compared to untrained undergraduates in the underlying decision components. First, race did not affect the manner in which officers accumulated evidence about whether to shoot. For untrained undergraduates, their processing of the object held by the target was “contaminated” by the target's race: When a harmless object was held by a Black target, the processing of his race interfered with processing of the object being held, pushing participants toward a “shoot” decision (resulting in more false alarms). Officers showed no such effect of race. They were able to extract information about the object in the person's hand independent of the target's race. Second, officers set higher thresholds for making a decision, accumulating more evidence before making a decision. In combination, these two components eliminated the effect of race on officers' behavioral decisions, an effect robustly observed in untrained participants.

Among trained officers, then, the decision process operated differently and race did not have the same effects on the underlying decision components as it did on untrained participants. Failure to understand or appreciate these differences leads researchers to inappropriately apply the results from undergraduates – who have no training and have never had to make such a decision before entering the lab – to expert decision-makers.

4.2 Implicit bias and group disparities

It would be difficult to find a concept from experimental social psychology that has spread more quickly and widely outside academia than implicit bias. There is no question that implicit bias research (1) has been used to explain why groups in contemporary American society obtain unequal outcomes and (2) has relied almost exclusively on studies using indirect measures such as the Implicit Association Task (IAT).Footnote ³ Other writings have critiqued the theoretical and measurement aspects of implicit bias research (Arkes & Tetlock, Reference Arkes and Tetlock2004; Blanton & Jaccard, Reference Blanton and Jaccard2008; Blanton, Jaccard, Gonzales, & Christie, Reference Blanton, Jaccard, Gonzales and Christie2006; Blanton et al., Reference Blanton, Jaccard, Klick, Mellers, Mitchell and Tetlock2009; Blanton, Jaccard, Strauts, Mitchell, & Tetlock, Reference Blanton, Jaccard, Strauts, Mitchell and Tetlock2015; Corneille & Hütter, Reference Corneille and Hütter2020; Fiedler, Messner, & Bluemke, Reference Fiedler, Messner and Bluemke2006; Mitchell, Reference Mitchell, Crawford and Jussim2018; Oswald, Mitchell, Blanton, Jaccard, & Tetlock, Reference Oswald, Mitchell, Blanton, Jaccard and Tetlock2013; Schimmack, Reference Schimmack2020), so I restrict my discussion of this topic to those aspects most relevant to the question of explaining group disparities.

4.2.1 Implicit bias: The missing information flaw

In prototypical implicit bias research, as in studies using the IAT or other indirect measurement techniques (Fazio, Jackson, Dunton, & Williams, Reference Fazio, Jackson, Dunton and Williams1995; Greenwald, McGhee, & Schwartz, Reference Greenwald, McGhee and Schwartz1998), every possible source of information which could impact a person's judgment and behavior is stripped from the measurement of these unconscious or uncontrollable processes. In the best-case scenario, participants are shown cropped photos of faces belonging to different group members and make rapid categorizations of these faces; in the worst-case scenario, there are no group members whatsoever and group labels (e.g., “Black”) serve as target stimuli instead. No information other than category membership is available to participants and button-press differences on the order of a fraction of a second are the outcome of interest. Additionally, research has shown that implicit or indirect measures can be sensitive to context information (see, e.g., Barden, Maddux, Petty, & Brewer, Reference Barden, Maddux, Petty and Brewer2004; Blair, Reference Blair2002; Gawronski, Reference Gawronski2019; Gawronski & Sritharan, Reference Gawronski, Sritharan, Gawronski and Payne2010; Wittenbrink, Judd, & Park, Reference Wittenbrink, Judd and Park2001). The fact that humans exist and are perceived only in contexts, and not isolated against empty backgrounds, should prompt meaningful discussion about the degree to which such context-less implicit bias measures will predict bias in real decisions.

4.2.2 Implicit bias: The missing forces flaw

Implicit bias research also reflects the second flaw outlined in this paper, which is that the effects of implicit bias are not appropriately compared to other influences on group outcomes. Consider the example of sex differences in STEM participation. Women and men do not have identical profiles of ability and interest relevant to STEM performance, and much research has explored the implications of these factors (Benbow & Stanley, Reference Benbow and Stanley1980; Benbow et al., Reference Benbow, Lubinski, Shea and Eftekhari-Sanjani2000; Ceci & Williams, Reference Ceci and Williams2010; Reference Ceci and Williams2011; Ceci, Williams, & Barnett, Reference Ceci, Williams and Barnett2009; Cheng, Reference Cheng2020; Cortés & Pan, Reference Cortés and Pan2020; Hakim, Reference Hakim2006; Halpern et al., Reference Halpern, Benbow, Geary, Gur, Hyde and Gernsbacher2007; Kleven, Landais, & Søgaard, Reference Kleven, Landais and Søgaard2019; Kleven, Landais, Posch, Steinhauer, & Zweimüller, Reference Kleven, Landais, Posch, Steinhauer and Zweimüller2020; Lubinski & Benbow, Reference Lubinski and Benbow1992; Su & Rounds, Reference Su and Rounds2015; Su et al., Reference Su, Rounds and Armstrong2009; Valla & Ceci, Reference Valla and Ceci2014). How it is that millisecond differences in measured associations correspond to those factors which impact group disparities is questionable, given the lack of integration of such differences into the larger dynamics of STEM engagement and performance. Although there is variation in the published literature, studies claiming to demonstrate the importance of implicit bias in explaining group outcomes often do not measure these other forces at all (e.g., Cvencek, Greenwald, & Meltzoff, Reference Cvencek, Greenwald and Meltzoff2011a; Cvencek, Meltzoff, & Greenwald, Reference Cvencek, Meltzoff and Greenwald2011b), do not compare the size of implicit bias effects to the size of these other forces (e.g., Nosek & Smyth, Reference Nosek and Smyth2011), treat these other forces as control variables without directly comparing the size of implicit bias effects to these variables (e.g., Kiefer & Sekaquaptewa, Reference Kiefer and Sekaquaptewa2007), or treat such forces as a predicted variable resulting from implicit bias rather than the reverse (e.g., Nosek, Banaji, & Greenwald, Reference Nosek, Banaji and Greenwald2002; Nosek et al., Reference Nosek, Smyth, Sriram, Lindner, Devos, Ayala and Greenwald2009).

4.2.3 Implicit bias: The missing contingencies flaw

As for the third flaw, there has been a striking failure to explore whether the precise experimental contingencies required to demonstrate implicit bias in the lab correspond in some reasonable way to the contingencies present during real-life decisions. These contingencies include the twin features of the lack of ability and motivation, as well as the specific experimental details needed to reveal bias on indirect measures.

Consider some of the necessary experimental contingencies required both for the measurement of implicit cognition and for observing the effects of implicit bias on decision-making and behavior. Perhaps the central defining feature of implicit cognition is awareness (Greenwald & Banaji, Reference Greenwald and Banaji1995), and as such implicit measures are supposed to “neither inform the subject of what is being assessed nor request self-report concerning it” (p. 5).Footnote ⁴ A first-order, foundational question then is whether people are aware of their biases, aware of what is being assessed during the measurement of these biases, or aware of the effects of their biases. After all, if one defines implicit bias as discrimination based on “unconscious” processes and argues that implicit bias is so important as to have implications for legal doctrine in the United States (Greenwald & Krieger, Reference Greenwald and Krieger2006; Kang & Banaji, Reference Kang and Banaji2006), then certainly the basic question of awareness must have been thoroughly settled by now. As Gawronski (Reference Gawronski2019) describes, however, there is currently no convincing evidence that people are uniquely unaware of their biases or the effects of their biases.Footnote ⁵ It is striking that the concept of implicit bias has been pushed into federal policy at the highest levels of the U.S. government without any convincing evidence concerning even basic questions about the measurement or the effects of implicit bias.

Indirect measurement techniques (as a means of assessing stereotype associations) require specific contingencies to reveal bias on the part of participants. Take for example the IAT (Greenwald et al., Reference Greenwald, McGhee and Schwartz1998). As with other control tasks, such as the Stroop task, no one shows bias in their decisions if given sufficient time to respond.Footnote ⁶ Thus, a speeded response is a required condition for measurement of implicit bias so that controlled cognitive processes will be prevented or attenuated from impacting responses. In this way, implicit measures can assess “implicit attitudes by measuring their underlying automatic evaluation” (Greenwald et al., Reference Greenwald, McGhee and Schwartz1998, p. 1464), as opposed to measuring a more controlled evaluation elicited by the stimulus.

In addition to measurement, there are also necessary conditions to demonstrate the effects of implicit bias on behavior and decision-making. Consider the central claim by implicit bias researchers that automatically activated associations influence us even when we don't want them to (e.g., “implicit biases are especially intriguing, and also especially problematic, because they can produce behavior that diverges from a person's avowed or endorsed beliefs or principles,” Greenwald & Krieger, Reference Greenwald and Krieger2006, p. 951). Given this, people must be in a decision situation where controlled processes – that is, what we want – cannot play a role, conditions where we want to respond in unbiased ways but are unable to do so. This requires that a person lacks the ability to exercise controlled processes, as in a decision with a short response time window. Without this feature, the decision situation is no longer one in which we are unable to produce the desired, unbiased response. Good experimental practice and inference would then require that, in implicit bias research, both contingencies are in place: People do not want to be influenced by categorical information but are in decision situations where such controlled processes cannot influence responses. Given that none of the studies recently presented as strong evidence for the behavioral prediction of implicit bias ensured that these contingencies were met suggests that this practice is not widespread (Jost, Reference Jost2019).

As a final, critical contingency, as noted earlier there is overwhelming evidence that categorical bias is overridden when decision-makers are provided with individuating information (e.g., Kunda & Thagard, Reference Kunda and Thagard1996). In the measurement of implicit bias, no individuating information is ever presented; yet it is common to apply laboratory findings of implicit bias to real decisions which contain strong individuating information, such as in hiring decisions or decisions about one's own career choice (where a person clearly has interest and ability information; e.g., Nosek & Smyth, Reference Nosek and Smyth2011).

In terms of explaining group disparities, it follows that the bulk of the underrepresentation for any group must be because of an underrepresentation of people who are ambiguous with respect to their performance at the task at hand, because it is only these people for whom decisions will be affected by implicit bias on the part of decision-makers. In the case of “gatekeepers” making biased decisions against potential STEM students (Nosek & Smyth, Reference Nosek and Smyth2011), the “A” student and the “F” student are both unaffected by implicit bias on the part of the guidance counselor (because there is unambiguous positive and negative individuating information, respectively). This means that the sex disparity must be comprised of “C” students who would have become successful in STEM careers had implicit bias not caused the guidance counselor to unintentionally steer those students out of a STEM track. It is the responsibility of implicit bias proponents to show this is the case.

Implicit bias research, then, provides another example of the fundamental weaknesses of experimental social psychology when explaining group disparities. Without providing any relevant information to participants, researchers obtain evidence of the biasing effects of category information. Such associations as measured by millisecond response time differences – obtained under completely discordant conditions to the real world and which do not correspond to the presumed psychological constructs of interest in a straightforward way (see, e.g., Blanton, Jaccard, Christie, & Gonzales, Reference Blanton, Jaccard, Christie and Gonzales2007; Uhlmann, Brescoll, & Paluck, Reference Uhlmann, Brescoll and Paluck2006) – are proposed to explain complex and sizable group disparities. Little effort is made to integrate these differences into a detailed model which includes other, strong influences on outcomes or specification of the real-world performance parameters. These weaknesses are consistent with the poor performance of implicit bias measures to predict discriminatory behavior (see, e.g., Blanton et al., Reference Blanton, Jaccard, Klick, Mellers, Mitchell and Tetlock2009; Oswald et al., Reference Oswald, Mitchell, Blanton, Jaccard and Tetlock2013, Reference Oswald, Mitchell, Blanton, Jaccard and Tetlock2015).Footnote ⁷

4.3 Racial disparities in school disciplinary outcomes

A final example is the recent study in experimental social psychology on racial disparities in school disciplinary outcomes. There are well-known racial disparities in suspensions and expulsions, with Black schoolchildren more likely to receive such outcomes than White, Hispanic, or Asian schoolchildren (Lhamon & Samuels, Reference Lhamon and Samuels2014). At issue is why this per capita disparity exists and whether distorted interpretation of behavior because of racial stereotypes explains such disparities. That is, are schoolteachers interpreting the same behavior on the part of Black and White schoolchildren differently and, therefore, referring them for disciplinary action at different rates, even while the behavior of Black and White kids is the same?

Experimental social psychologists have followed the familiar pattern of instructing participants to make punitive judgments of hypothetical schoolchildren from simple written scenarios, with targets who are presented as equal on every dimension other than their race. After observing an effect of target race on disciplinary decisions, researchers then loop back and claim that such findings can help explain why racial disparities in real classrooms exist (Jarvis & Okonofua, Reference Jarvis and Okonofua2020; Okonofua & Eberhardt, Reference Okonofua and Eberhardt2015).

An analysis of this research reveals the three flaws identified above. The information provided to participants in these experimental studies are impoverished descriptions of real teacher–child experiences, removing important information that real decision-makers could use, such as a child's history of behavior in the classroom, the other children involved, the teacher's current intentions and behavior, or even the general context surrounding the event. All the knowledge that the teacher has concerning the student's history and past behavior simply cannot play a role in their experimental judgments. This is important because the distribution of student disciplinary action is highly skewed and is principally tied to specific students; the question is not about the average, generic student but about specific students at the tail end of a distribution. For example, in one large survey of teacher referrals for disciplinary action, 93% of the 22,000 students recorded did not receive a single referral, 4% received only one referral, and six students received more than 20 referrals each (Rocque & Paternoster, Reference Rocque and Paternoster2011). Experiments are about group average effects (e.g., “Does a sample of participants show an average difference in disciplining unknown, nonspecific Black or White students?”), but the distribution of disciplinary actions suggest this misses the nature of the actual topic under study.

Researchers prevent teachers from making unbiased judgments if such information plays a strong role in real decisions and forces participants to use the only diagnostic information given to them. For example, studies on race and classroom discipline give teachers a student's name (manipulated to be either a common Black or White name) and a one-paragraph description of an event (“You tell DeShawn to pick his head up and get to work. He only picks his head up”). Whether these vignettes contain information used by teachers when making real disciplinary decisions is unknown.

These experimental designs also fail to consider the possible influence of other factors that may play a role in a child's behavior, such as socioeconomic status, family structure, cultural norms for the teacher–child relationship, parental expectations, interest in school, delay of gratification, and so on, all of which differ across racial groups and would reasonably be expected to relate to behavioral differences in the classroom (Andreoni et al., Reference Andreoni, Kuhn, List, Samek, Sokal and Sprenger2019; DeNavas-Walt, Proctor, & Smith, Reference DeNavas-Walt, Proctor and Smith2013; Heriot & Somin, Reference Heriot and Somin2018; Hsin & Xie, Reference Hsin and Xie2014; McLanahan & Percheski, Reference McLanahan and Percheski2008; Musu-Gillette et al., Reference Musu-Gillette, Zhang, Wang, Zhang, Kemp, Diliberti and Oudekerk2018; Price-Williams & Ramirez, Reference Price-Williams and Ramirez1974; Rocque & Paternoster, Reference Rocque and Paternoster2011; Wright et al., Reference Wright, Morgan, Coyne, Beaver and Barnes2014; Zytkoskee et al., Reference Zytkoskee, Strickland and Watson1971). Whatever the size of participants' racial bias in disciplining hypothetical Black versus White schoolchildren in an experimental situation, one cannot draw any conclusions about whether such categorical biases impact disciplinary outcomes in the real world because the experimental bias effect is not understood in relation to these other factors. An assumption justifying the design of such studies is the expectation that children who differ in myriad important ways should behave identically in the classroom.

As support for this claim, consider a recent paper on race and school suspensions by experimental social psychologists, which begins by stating that racial differences in school suspension are “not fully explained by racial differences in socioeconomic status or in student misbehavior” (Okonofua & Eberhardt, Reference Okonofua and Eberhardt2015, p. 617). No report is given of how much the racial disparities are explained by these factors, just that some non-zero amount remains. As evidence for this claim, six citations are provided, but none of these citations measure student behavior and show that Black and White students are behaving similarly. Indeed, one of these citations states “The ideal test … would be to compare observed student behavior with school disciplinary data. Those data were not available for this study, nor are we aware of any other investigation that has directly observed student behaviors” (Skiba, Michael, Nardo, & Peterson, Reference Skiba, Michael, Nardo and Peterson2002, p. 325).Footnote ⁸ In contrast, Wright et al. (Reference Wright, Morgan, Coyne, Beaver and Barnes2014) did find that racial differences in school suspension rates were fully accounted for by prior behavioral problems of the student. The point is not to single out these researchers (as such claims are broadly made by nearly everyone doing similar research), but instead to illustrate an additional example of the problems identified above.

Moreover, using experimental social psychology to explain school suspensions and expulsions reflects the third flaw as well: A lack of attention to the actual contingencies needed to produce stereotyping effects in the lab and whether such contingencies resemble real-world situations. As noted earlier, stereotyping effects occur under conditions of ambiguity and are absent or small when perceivers have individuating information or are judging unambiguous behaviors. To the extent that teachers are misconstruing or misinterpreting students' behaviors because of stereotypes held about different racial groups, those effects are therefore predicted to occur in the absence of individuating information or for ambiguous behaviors. How categorical bias might reveal itself in long-term interactions such as teacher–student relationships, where plenty of individuating information is available, is not established.

Some study by the leading scholars within social psychology on school disciplinary disparities has tried to take a more dynamic perspective. For example, Okonofua, Walton, and Eberhardt (Reference Okonofua, Walton and Eberhardt2016) propose that the teacher–student relationship can devolve over time and that initial stereotype effects can increase in strength as teachers' expectations and worry about minority students' behavior affects students' behavior in the classroom (see also Madon et al., Reference Madon, Jussim, Guyll, Nofziger, Salib, Willard and Scherr2018; Martell, Lane, & Emrich, Reference Martell, Lane and Emrich1996). Of course, whether initial teacher concerns about classroom management eventually lead Black students to enact those behaviors that would get them expelled, when they would not have otherwise done so absent such expectations, is unclear. Nor are the effects of such expectations set within the context and force of the other strong effects listed earlier on students' outcomes.

5. What do experimental studies of bias tell us?

To say that studies in experimental social psychology cannot tell us about real-world group disparities is not to say that such studies are worthless. These studies provide a wealth of information about the function and process of storing and using categorical information. However, if researchers want to know about real-world group disparities, such findings cannot provide them with the information they seek.

The standard way of interpreting experimental stereotyping findings has already been described: Experimental evidence that participants are biased against identical targets from different groups reflects the power of stereotypes to affect individual decision-makers. The assumption that the same processes operate in the real world means that removing decision-maker bias will result in groups obtaining roughly similar (or at least substantially more similar) outcomes.

Yet is this interpretation the correct one? An alternative interpretation of the results of experimental studies of bias starts with the understanding that people learn the conditional probabilities of the behavior of different groups as they navigate their social worlds. In other words, groups differ in their characteristics and people pick up on this, storing diagnostic information about relative group differences even if imperfectly so (Eagly, Wood, & Diekman, Reference Eagly, Wood, Diekman, Eckes and Trautner2000; Eagly, Nater, Miller, Kaufmann, & Sczesny, Reference Eagly, Nater, Miller, Kaufmann and Sczesny2020; Jussim et al., Reference Jussim, Cain, Crawford, Harber and Cohen2009, Reference Jussim, Crawford, Anglin, Chambers, Stevens and Cohen2015a, Reference Jussim, Crawford and Rubinstein2015c; Koenig & Eagly, Reference Koenig and Eagly2014; McCauley, Stitt, & Segal, Reference McCauley, Stitt and Segal1980).

Then, they enter a social psychology experiment on bias. They are asked to render a judgment about a target without being given diagnostic or distinguishing individuating information. Under such conditions, they end up using the information that they have come to learn as being probabilistically accurate in their daily lives, and categorical influence dominates.

Thus, through a kind of methodological trickery, the experimenter has created a world in which information that is probabilistically predictive in everyday life becomes completely inaccurate given the systematic design of our experiments. This interpretation is consistent with a view of stereotyping that describe perceivers as forming conditional probabilities and emphasizes how categorical effects are most likely under conditions of ambiguity and uncertainty, when no strong individuating information is present (Krueger & Rothbart, Reference Krueger and Rothbart1988; Kunda & Thagard, Reference Kunda and Thagard1996; Lick, Alter, & Freeman, Reference Lick, Alter and Freeman2018; McCauley et al., Reference McCauley, Stitt and Segal1980). Given the design of most experiments, it is not surprising that there are decades of laboratory studies showing stereotyping effects. To be clear, this provides no information about whether this type of categorical influence leads to disparate outcomes across groups. It does reveal that experimenters are skilled at creating worlds whose landscapes do not match the real world in any way, and participants fail to behave perfectly according to the standards of the experimenter when placed in such worlds.

In light of this reframing, what does the standard interpretation of experimental studies reveal about researchers' assumptions of how minds should and do operate? Throughout this paper, I have noted that the standard experimental design presents targets “who vary only with respect to the social categories to which they belong.” What do researchers intend when they design stimuli in this way? In doing this, researchers intend to make targets equal on all dimensions relevant to the decision at hand. For example, in the FPST, the single relevant piece of information in the decision to “shoot” is whether the target is holding a gun or not. If participants are influenced by anything other than the object in the target's hand, then researchers conclude that participants are making erroneous decisions – that is, they are showing bias. This includes cases when participants are influenced by factors related to a person's race that are probabilistically related to threat or handgun use, for example, having been previously arrested for a violent crime. Similarly, in studies of STEM hiring, the single relevant piece of information is the qualification of the applicant as revealed by the resume; being influenced by anything other than this information is treated as biased, erroneous decision-making.

What this illustrates is the researcher's belief that participants are wrong to use any information other than the information deemed relevant by the researcher. This includes information that the participant has learned prior to entering the experiment, information that may be probabilistically accurate in everyday life. In the mind of the researcher, participants should not use information within the experiment that may actually lead to more accurate decisions outside the experiment – not because such information is reliably incorrect, but because the experimenter has artificially made it incorrect. The researcher demands that participants are accurate as defined by the decision landscape of the experiment, no matter how disconnected this landscape is from the real world. Researchers, thus, require a kind of blank slate worldism of their participants in judging accuracy and bias, where information from one world must be erased when moving to the next. Such a demand on the part of social psychologists in fact violates a core tenet of good prediction, which is the use of priors in updating posterior prediction. Bayes' rule would require participants in social psychology experiments to include the target's categorical information in their judgments (though of course the effect of categorical information should depend on the strength of the data, as it does).

6. Broader consequences

Beyond the specific conclusions about group disparities, experimental social psychology has had a significant – and potentially misleading – impact on broader questions about the human mind and human nature. This research has led directly to the widespread attention currently given to the topic of implicit bias. Originally, dual process models in social psychology supported a satisficer view of the human mind, one in which people did “good enough” (and were thus subject to bias) unless motivation and ability were high (Fazio, Reference Fazio1990; Fiske & Neuberg, Reference Fiske and Neuberg1990; Petty & Wegener, Reference Petty, Wegener, Chaiken and Trope1999; Smith & DeCoster, Reference Smith and DeCoster2000). Importantly, such models were explicit that biasing effects were conditional (Bargh, Reference Bargh, Uleman and Bargh1989); they were not present at all times and for all people.

As experimental studies of categorical bias proliferated and as demonstrations of bias became more attractive than demonstrations of accuracy (e.g., Higgins & Bargh, Reference Higgins and Bargh1987; Jussim, Reference Jussim2012b; Jussim et al., Reference Jussim, Cain, Crawford, Harber and Cohen2009), the published literature left one with the impression of widespread, inescapable error in decision-making and the important point that bias occurs under specific experimental conditions was given a backseat to the more attractive story of widespread bias in real-world decisions (Greenwald & Krieger, Reference Greenwald and Krieger2006). Moreover, as social psychology moved further away from actual behavior and increasingly focused on millisecond reaction times, whether such differences mattered for actual decisions became increasingly unclear.

At the same time, demographic groups in the United States continued to obtain unequal outcomes despite little overt, official discrimination for several decades (and in places such as academia, preferential policies in favor of underrepresented groups), coupled with increasingly egalitarian attitudes. These disparities presented a puzzle. If groups were not being overtly barred from entry and decision-makers widely expressed egalitarian beliefs, what was causing persistent disparities?

Enter the concept of implicit bias, supported by experimental social psychology studies on categorical bias (Greenwald & Banaji, Reference Greenwald and Banaji1995). As this research was taken up by people outside the research community, the understanding of the human mind morphed from “under certain conditions, bias may emerge” to “unconscious bias is ever-present and impossible to control,” with a lack of attention to those studies showing individual variation in automatically activated concepts (e.g., Fazio et al., Reference Fazio, Jackson, Dunton and Williams1995). By now, this view is ubiquitous and claims of uncontrollable, unavoidable, pervasive, unconscious bias can be found anywhere one cares to look.

Such a view of the human mind, however, is in no way justified by the experimental studies on which it is built. There is so little overlap between our experimental parameters and the parameters of real-world decisions that the popular view of the human mind as swamped with uncontrollable bias is premature. It is troubling that researchers have not devoted serious research attention to exploring this gap.

At the same time that social psychologists have been using their findings to explain group disparities, people outside academia have enthusiastically adopted these claims. This has been true throughout popular culture, government organizations, the legal system, and the corporate world. In the case of police shootings, the claim that implicit bias is responsible for racial disparities is widely broadcast in newspaper accounts of fatal police shootings, with studies from experimental social psychological cited as evidence (e.g., Carey & Goode, Reference Carey and Goode2016; Dreifus, Reference Dreifus2015; Kristof, Reference Kristof2014; Lopez, Reference Lopez2017). In the case of school disciplinary disparities, President Obama's 2014 “Dear Colleague” letter on the “Nondiscriminatory Administration of School Discipline” was explicit in rejecting the idea that actual behavioral differences across racial groups contribute meaningfully to the corresponding disparities in school suspensions. It also named implicit bias training as a possible solution for ensuring that school police administer discipline in a non-discriminatory manner. It is difficult to overstate how widespread this belief has become in the last decade, driven primarily if not wholly by research from experimental social psychologists. Indeed, some researchers have actively pushed this agenda, appearing on televised news programs, holding press conferences, writing advocacy pieces, and testifying in court (as described in, e.g., Mitchell, Reference Mitchell, Crawford and Jussim2018).

7. Related critiques

Although I focus on social psychology experiments in this paper, related critiques have been made in other literatures. A brief review of these critiques, some of which are general methodological critiques and some of which are specific to group disparities, provides additional support to the current argument.

On the question of group disparities specifically, Heckman's (Reference Heckman1998) analysis of racial and gender disparities in employment supports the current analysis. In typical “audit studies” (e.g., Bertrand & Mullainathan, Reference Bertrand and Mullainathan2004), a set of prospective employers are sent resumes that are identical except for the race of the applicant; research typically finds that Black applicants receive fewer callbacks for interviews than White applicants. Such findings are then used as evidence that actual racial disparities in employment are because of discrimination on the part of employers. Thus, the general format of experimental labor market studies is the same as the social psychology research described in the current paper: If we can show average levels of race-based differential treatment between hypothetical people who are otherwise presented as equal, then this same differential treatment is responsible for actual group disparities.

Heckman argued that average levels of market-wide discrimination cannot necessarily be applied to real people engaged in real transactions, because such transactions do not occur at the market-wide level. Employment transactions are between specific people and specific firms, and if the people and firms in experimental studies do not match the characteristics of real people and firms in the market, then experimental results are irrelevant for explaining real group disparities. Suppose an experimental audit study finds that employers at Goldman Sachs engage in discrimination against Black applicants. If it is the case that Black applicants do not apply to Goldman Sachs, or that actual Black applicants do not have the resumes that would make them competitive at Goldman Sachs, then whether employers at Goldman Sachs discriminate against artificial Black applicants tells us nothing about why Blacks may be under-employed there or anywhere else in the financial market.

There is the same problem in labor market studies as in studies in experimental social psychology: A lack of attention to the degree of overlap between the characteristics of real group members and the characteristics of our hypothetical experimental targets. And this failure, as in social psychology, distorts our understanding of the nature of group disparities. As Heckman summarized, “A careful reading of the entire body of available evidence confirms that most of the disparity in earnings between blacks and whites in the labor market of the 1990s is due to the differences in skills they bring to the market, and not to discrimination within the labor market” (p. 101; see also Neal & Johnson, Reference Neal and Johnson1996).

In terms of broad methodological critiques, similar concerns have been raised in the field of judgment and decision-making (JDM). Hogarth (Reference Hogarth1981), for example, highlighted the discrepancy between the discrete judgments used in experimental JDM research and the continuous, interactive judgments frequently found in the real world. He used this discrepancy to highlight how researchers' failure to incorporate the role of feedback in experimental decision tasks could lead to distorted conclusions. Specifically, he demonstrated that decisions characterized as “biased” in discrete judgments could be understood as functional when decisions were continuous. Similarly, a major thrust of Gigerenzer and colleagues' research program has been to show that the structure of the decision environment is a crucial consideration for a full understanding of accurate and inaccurate decisions. Failure to appreciate the relation between the organism and its environment can lead to misleading conclusions about the nature of human rationality and decision-making (Dhami, Hertwig, & Hoffrage, Reference Dhami, Hertwig and Hoffrage2004; Gigerenzer, Hoffrage, & Kleinbölting, Reference Gigerenzer, Hoffrage and Kleinbölting1991; Pleskac & Hertwig, Reference Pleskac and Hertwig2014). Tetlock (Reference Tetlock1985) also analyzed the nature of JDM research and noted how laboratory studies lacked accountability for decision-makers, a key component inherent to most real-world decisions and one which can change the nature of decisions. Thus, there is precedent for being concerned about social psychologists' lack of interest in the degree to which their experimental tasks reflect the decision landscape in which actual decisions are made or whether the characteristics of real decision-makers match those in our experimental settings.

Relatedly, Eagly and colleagues' study on gender differences and leadership style provide supportive evidence for the arguments advanced here (Eagly & Johannesen-Schmidt, Reference Eagly and Johannesen-Schmidt2001; Eagly & Johnson, Reference Eagly and Johnson1990). These researchers found that some gender differences in leadership style were larger in laboratory studies compared to studies conducted in actual organizational settings. The explanation for this difference in methodology could be understood in the terms described here, which is the failure to include real-world information in our laboratory studies. Specifically, actual roles in organizational settings contain role requirements, which can exert powerful effects on behavior regardless of the person occupying the role. In laboratory studies, in contrast, this influence is absent, hence the greater potential for gender to exert an influence on leadership behavior in this context.

To be fair, within social psychology there are some lines of research on stereotyping and disparate outcomes that do consider group behavioral differences as an important part of the causal chain producing group disparities. For example, Diekman et al. (Reference Diekman, Steinberg, Brown, Belanger and Clark2017) have proposed a goal congruity model to help understand sex differences in STEM participation. In this model, the communal goals that people have, in combination with their beliefs about how different STEM and non-STEM careers can fulfill those needs, impact STEM engagement and ultimately career choice. Importantly, this model accounts for at least some of the sex disparities in STEM participation by taking seriously the sizable male–female difference in communal goals.

Finally, the current study is most closely related to broad concerns in the experimental literature on external validity. Part of the current analysis raises multiple concerns regarding the external validity of experimental social psychology, and this is certainly not new. However, this study goes beyond past treatments in several ways. First, this paper outlines which features of the typical experimental investigations are threats to external validity and analyzes how the fallacies and assumptions underlying researchers' approaches to the question of group disparities directly lead to choices that undermine external validity. Second, the current study is not a broad indictment of the external validity of typical experimental social psychology. The standard experimental social psychology study can tell us much about how categorical information is formed and used, and I raise no issue with the external validity of those studies. Instead, the concern here is specifically with the use of these findings to explain real-world disparate outcomes. Finally, the current study goes beyond typical external validity concerns because, even if the external validity of current studies was improved, the problems inherent to this approach are so fundamental that they still could not be applied to explain group disparities. For example, if distributional differences between men and women on STEM-related attributes are not taken into account when explaining group disparities in STEM participation, then irrespective of any changes to the experimental process researchers will still misunderstand the nature of this disparity. A way of thinking about the relationship between the current analysis and past critiques of external validity is that the current study uses those past critiques as a vehicle for a broader, more systematic dismantling of current experimental studies on bias.

On external validity, relevant data supporting the current argument come from Mitchell (Reference Mitchell2012), who compared effect sizes of laboratory studies to field studies. Although the relationship between the two was strong and positive, this varied by subfield in important ways. Social psychology not only had a lower correspondence between lab and field studies than some other subareas, but social psychology was also the subfield in which the sign of the effect reversed most often. Although the purpose of Mitchell's analysis was not to identify all the features that impact lab-field correlations, the relatively poor performance of social psychology could be understood with the current framework – to the extent that the lab studies fail along the three flaws outlined here, the correspondence of these experimental effects once behavior returns to the field will be low. Of course, not all the social psychology studies in Mitchell were of decision-maker bias, but other analyses have found similar, supportive effects (e.g., Eagly & Johnson, Reference Eagly and Johnson1990; Koch et al., Reference Koch, D'Mello and Sackett2015).

8. A new (or at least rehashed) approach

If the current approach to understanding group disparities is not just misguided but fundamentally flawed, what might be an alternative, more productive research cycle? Although it would be nice to claim a completely new approach to studying these important topics, what follows is largely a rehashing and reemphasizing of other, better recommendations that have already been made, for example, by Dasgupta and Stout (Reference Dasgupta and Stout2012) and Mortensen and Cialdini (Reference Mortensen and Cialdini2010), with some further elaboration and connection to other critiques from the past several decades. The major difference is that I begin by explicitly noting that in many (perhaps most) cases of studying group disparities, we may end up concluding that experimental social psychology cannot contribute or at least will play a distant backseat to other approaches.

Studies of group disparities on any outcome should begin first and foremost with a task analysis of the decision itself as it exists outside the laboratory. This would involve detailed discussions with those individuals responsible for making such decisions, ideally including novice and expert decision-makers. Researchers might also meaningfully enhance the quality of their models by completing training protocols themselves, to learn how the decision is supposed to unfold (at least as formally instructed). In the case of police shootings, beginning at this step would likely have led to a drastically different methodology used by experimental social psychologists, one which incorporated actual features of deadly force decisions.

The second step in the process involves the study of members of groups who are obtaining disparate outcome on the topic of interest (both more and less desirable outcomes), including behavioral, personality, or other individual differences relevant to the topic at hand. This can often be useful in confirming that the factors identified by decision-makers in step 1 are, in fact, relevant. This step is also important for placing any categorical bias effects in the context of the size of these performance-related differences. Beyond giving us a more accurate understanding of the nature of group disparities, this can also provide information about the strength of different interventions to reduce such disparities. The expectation about what the world will look like after eliminating all decision-maker bias is very different depending on whether there are no differences or large differences across groups.

In the case of shooter bias, an initial task analysis would have revealed that the context and behavior of the target citizen is critical and that the context of violent crime is a central part of the officer's decision to shoot. The second step would have led to the recognition that there are very sizable differences across groups in violent crime rates and led to an appreciation that any biasing effects of race on an officer's decision must be placed in the context of these behavioral differences. The same is true, for example, of intellectual performance differences across groups, where sometimes average differences do not exist but differences are large at the extreme tails and other times average group differences do exist and are sizable (Ceci & Williams, Reference Ceci and Williams2010; Fryer & Levitt, Reference Fryer and Levitt2010; Halpern et al., Reference Halpern, Benbow, Geary, Gur, Hyde and Gernsbacher2007; Hsin & Xie, Reference Hsin and Xie2014; Lubinski & Benbow, Reference Lubinski and Benbow1992). Outcome differences in demographic disparities among, for example, college grade point averages (GPAs), majors, and graduation rates must be understood in the context of these sizable incoming differences across racial and ethnic groups (e.g., ACT, 2017), and interventions that do not address these differences at the core are unlikely to stem the cascading and continuing differences over time.

Only after the first two non-experimental steps comes the third step of designing experiments informed by the data already obtained. This will almost always necessitate more involved and difficult studies with non-student samples; what follows would likely be a steep decline in both the number of studies conducted and the proportion of studies involving undergraduate convenience samples.

The final step in relating back to the real-world disparities of interest involves integrating the size of categorical effects from experimental tasks with the sizes of other effects on a group's outcomes, for example, behavioral and personality differences across groups. This is something that will be specific to the domain under study as it is unlikely that many of the same factors impact outcomes to the same extent across domains (but see Gottfredson, Reference Gottfredson1997, Reference Gottfredson1998, Reference Gottfredson2004).

This call for a new approach to research complements other, previous concerns about the approach of standard psychological science. Already noted are the proposals by Dasgupta and Stout (Reference Dasgupta and Stout2012) and Mortensen and Cialdini (Reference Mortensen and Cialdini2010). Other recent examples include Rozin's (Reference Rozin2009) assessment of how changes to the reward structure in psychology would improve the science. As he stated (emphasis added):

In such cases, as with the nth study (where n > 10) on a particular phenomenon or claim, it is appropriate to determine whether proper controls have been conducted, whether alternative accounts have been dealt with, and whether there are any errors in thinking or experimentation. But first, we have to find out what it is that we will be studying, what its properties are, and its generality outside of the laboratory and across cultures.

Aligned with Rozin's critique, the current study pushes back against a movement that gained momentum with the emergence of social cognition in the late 1970s and perspectives such as Mook's “Defense of External Invalidity” (Mook, Reference Mook1983). These forces pushed the importance of systematic design and justified the measurement of small differences in highly impoverished experimental settings, without consideration of whether the decisions made in these studies related in clear ways to the actual decisions that, ultimately, we care so much about (see also Ring, Reference Ring1967). Another way of framing the problem is to suggest that social psychology has been more focused on publishing demonstrations of bias than on fully understanding the nature of group disparities through the pursuit of a “strong inference” model (Platt, Reference Platt1964).

9. Conclusion

What can experimental social psychology tell us about why different segments of society are not evenly represented across all outcomes? Experimental studies of categorical bias can and do tell us about the functions and processes of storing group-based information. However, the disconnect between the experimental parameters of these studies and the conditions surrounding real-world decisions makes our experiments irrelevant when it comes to understanding the complex dynamics of group disparities. Of course, there is individual-level bias and discrimination; tribalism and intergroup bias are features of all human minds. But if the goal is to study systematic categorical bias and its effects on group outcomes, a different approach is needed. I describe one possible new approach for experimental social psychology, one which begins not with the assumptions of academic researchers holding the goal of demonstrating bias but instead with an analysis of the actual decision itself. Such an approach would not only change the relevance of social psychology for understanding group disparities, but may also correct some of the misleading claims about the human mind that have extended out from academia in the last two decades.

Acknowledgments

I thank Michael Bailey, E. Tory Higgins, Lee Jussim, Calvin Lai, Richard Lucas, and three anonymous colleagues for productive discussions and feedback on earlier drafts of this study. Alice Eagly and two anonymous reviewers provided outstanding and critical comments that greatly increased the quality of this manuscript. This study also benefitted from discussions with friends and colleagues at the Duck Conference on Social Cognition (2017).

Financial support

This study is based on work supported by the National Science Foundation under Grants No. 1230281 and 1756092.

Conflict of interest

None.

Footnotes

1. Although speculative, this claim is consistent with the expectation of many social psychologists that, absent biasing agents, all groups would attain roughly equal outcomes; that is, evidence of disparity is taken as evidence of discrimination (or at the very least it is taken as evidence that something is wrong and in need of fixing). Social psychologists continue to place experimental research in their narratives about why group disparities persist decades after explicit discrimination has been legally banned and attitudes have become markedly more egalitarian (e.g., Dovidio, Reference Dovidio2001; Dovidio et al., Reference Dovidio, Penner, Albrecht, Norton, Gaertner and Shelton2008). Quantitative data confirm the strong ideological lopsidedness of academics, particularly in the social sciences (e.g., Haidt, Reference Haidt2011; Inbar & Lammers, Reference Inbar and Lammers2012; Jussim, Reference Jussim2012a; Jussim, Crawford, Anglin, & Stevens, Reference Jussim, Crawford, Anglin and Stevens2015b).

2. In response to a question about why different groups achieve different outcomes, Thomas Sowell reframed the question as: “I would look at it differently … I would say, ‘Why would we expect different groups to do the same?’ Americans have come here from all over the world, and why would you ever expect that countries that have entirely different histories, located in entirely different climates, different geographies … Why would you expect those countries to develop exactly the same mix of skills to exactly the same degree so that people would arrive on these shores in such a way that they would be represented evenly across the board? Nowhere in the world do you find this evenness that people use as a norm. And I find it fascinating that they will hold up as a norm something that has never been seen on this planet, and regard as an anomaly something that is seen in country after country.”

3. A few quick examples from leading social psychologists illustrate these points:

• On racial disparities in criminal justice outcomes, Banks, Eberhardt, and Ross (Reference Banks, Eberhardt and Ross2006) state: “The racial bias research centers on the Implicit Association Test (IAT), which aims to measure implicit bias that operates beyond individuals’ conscious awareness, and may exist even among individuals who genuinely believe themselves to be unbiased” (p. 1170).
• On “the persistent disparity in economic, residential, and health status between Blacks and Whites,” Dovidio et al. (Reference Dovidio, Penner, Albrecht, Norton, Gaertner and Shelton2008) state: “less conscious and more indirect” … “racial biases … occur implicitly, without intention or awareness” and “are assessed with new techniques (e.g., response time measures) … which assess spontaneous and uncensored reactions” (pp. 478–479).
• On the role of trust “in the worlds of business, law, education, and medicine, and even more ordinary daily interactions between individuals,” Stanley, Sokol-Hessner, Banaji, and Phelps (Reference Stanley, Sokol-Hessner, Banaji and Phelps2011) state that “implicitly held attitudes” (as measured by the IAT) have “very real cost for individuals and society” (pp. 7710, 7714).

4. The original definitions of implicit cognition within social psychology solely emphasized the lack of awareness rather than uncontrollability, as for example in Greenwald and Banaji's (Reference Greenwald and Banaji1995) original definitions. However, definitions now emphasize both awareness and controllability, as in the definitions found on the Project Implicit website or, for example, in Nosek, Greenwald, and Banaji (Reference Nosek, Greenwald and Banaji2007): “The term implicit has come to be applied to measurement methods that avoid requiring introspective access, decrease the mental control available to produce the response, reduce the role of conscious intention, and reduce the role of self-reflective, deliberative processes” (p. 267).

5. On awareness of the contents themselves, Gawronski states: “In fact, counter to a widespread assumption in the literature, there is currently no evidence that people are unaware of the mental contents underlying their responses on implicit measures. If anything, people the available evidence suggests that people are aware of the mental contents” (p. 578). On awareness of the effects of those mental contents, Gawronski states: “the available evidence suggests that people can be unaware of the origin of their implicit biases, but the same is true of explicit biases. Moreover, the preliminary evidence that implicit, but not explicit, biases influence judgment outside of awareness is rather weak and prone to alternative interpretations” (p. 578). With respect to the IAT specifically, there is nothing in the measure that ensures that participants are unaware of their association or unaware of what is being assessed during the IAT. Indeed, the IAT is an attention-grabbing effect precisely because the person can consciously feel the difficulty of certain categorizations even while they do not want those categorizations to be more difficult.

6. This will be true for the behavioral response itself; that is, bias as measured by pressing one or another button or saying one or another color will be eliminated with unlimited response time windows. Bias as measured by response times may still be observed under longer response windows, but even this requires that participants do not have a minimum response time window restriction on their response (imposed by either themselves or the experimenter), for example, that one cannot respond before 2 s.

7. One of the strongest defenses of implicit bias to explain real-world group outcomes was mounted by Jost et al. (Reference Jost, Rudman, Blair, Carney, Dasgupta, Glaser and Hardin2009). These researchers listed 10 studies claiming to prove that “implicit bias is beyond reasonable doubt.” Although a thorough evaluation of these studies is beyond the scope of the current paper, it is useful to address how these studies fare with respect to the three flaws described here. Despite the fact that the summary is explicitly designed to address implicit prejudice and group outcomes, only four of the cited 10 papers deal specifically with differential treatment of social groups based on implicit bias. Of these four, however, all exhibit at least one of the fatal flaws identified above and, therefore, none are convincing demonstrations of the role of implicit bias in explaining group disparities. Moreover, given that the stated purpose of such studies is to uncover biases of which people have “little or no awareness” (Jost et al., Reference Jost, Rudman, Blair, Carney, Dasgupta, Glaser and Hardin2009, p. 40), the fact that zero of the 10 studies had any assessment of whether people were aware of their own associations or their effects further removes these papers from providing convincing evidence of the effects of implicit bias on group outcomes. Rooth (Reference Rooth2007) used fully equated applications with the applicant name changed, failing to incorporate real group differences in estimating the size of categorical bias. Rudman and Glick (Reference Rudman and Glick2001) had undergraduates with no training make hiring decisions. Plant and Peruche's (Reference Plant and Peruche2005) research is a shooter bias study and has the problems described in section 4 of this paper. Green et al. (Reference Green, Carney, Pallin, Ngo, Raymond, Iezzoni and Banaji2007) did use actual physicians but rated hypothetical vignettes and in fact showed that physicians with high anti-Black bias (as measured by the IAT) actually treated Black and White hypothetical patients equally. von Hippel et al. (Reference von Hippel, Brener and von Hippel2008) do not relate to group disparities (and do not have any comparison groups to the target group under study, thus cannot provide evidence for group disparities). Arcuri, Castelli, Galdi, Zogmaister, and Amadori (Reference Arcuri, Castelli, Galdi, Zogmaister and Amadori2008), Palfai and Ostafin (Reference Palfai and Ostafin2003), Rudman and Ashmore (Reference Rudman and Ashmore2007), Gray, Brown, MacCulloch, Smith, and Snowden (Reference Gray, Brown, MacCulloch, Smith and Snowden2005), and Nock and Banaji (Reference Nock and Banaji2007) all do not relate to group disparities. All but one study were almost certainly underpowered to detect the effects reported, given the sample sizes and designs.

8. To be fair, many of these studies do try other, indirect ways of establishing racial bias in teachers' interpretations of behavior, e.g., use of discriminant analyses. And it is important to be appropriately cautious about the uncertainty in whether there are behavioral differences in the types of behaviors performed by Black and White schoolchildren. Yet, at the same time that many authors have argued for a lack of evidence in behavioral differences, it is not always clear that the data support this claim. For example, Skiba et al. (Reference Skiba, Michael, Nardo and Peterson2002) argue the case for no behavioral differences between Black and White children in referrals, but the most extreme reason for being given a referral (“threat”) that distinguished between Black and White student referrals was significantly higher for Black students. Other study, e.g., Rocque and Paternoster (Reference Rocque and Paternoster2011), collapses across severity in misconduct because of the infrequent nature of the more severe events, making it difficult to draw clear conclusions. In contrast, Lewis, Butler, Bonner, and Joubert (Reference Lewis, Butler, Bonner and Joubert2010) found that Black boys compared to White boys were about twice as likely to engage in objective, severe behavior such as fighting with another student or making threats against another student. Regardless, the point for this paper is that experimental social psychology studies on this topic do not incorporate these distributional patterns into their designs or conclusions.

References

ACT. (2017). The Condition of College & Career Readiness 2017 (Tech. Rep.).Google Scholar

Akinola, M. N. (2009). Deadly decisions: An examination of racial bias in the decision to shoot under threat. Dissertation, Harvard University.Google Scholar

Andreoni, J., Kuhn, M. A., List, J. A., Samek, A., Sokal, K., & Sprenger, C. (2019). Toward an understanding of the development of time preferences: Evidence from field experiments. Journal of Public Economics, 177, 104039.CrossRef Google Scholar PubMed

Arcuri, L., Castelli, L., Galdi, S., Zogmaister, C., & Amadori, A. (2008). Predicting the vote: Implicit attitudes as predictors of the future behavior of decided and undecided voters. Political Psychology, 29, 369–387.CrossRef Google Scholar

Arkes, H. R., & Tetlock, P. E. (2004). Attributions of implicit prejudice, or “would Jesse Jackson ‘fail’ the Implicit Association Test?”. Psychological Inquiry, 15, 257–278.CrossRef Google Scholar

Banks, R. R., Eberhardt, J. L., & Ross, L. (2006). Discrimination and implicit bias in a racially unequal society. California Law Review, 94, 1169–1190.CrossRef Google Scholar

Barden, J., Maddux, W. W., Petty, R. E., & Brewer, M. B. (2004). Contextual moderation of racial bias: The impact of social roles on controlled and automatically activated attitudes. Journal of Personality and Social Psychology, 87, 5–22.CrossRef Google Scholar PubMed

Bargh, J. A. (1989). Conditional automaticity: Varieties of automatic influence in social perception and cognition. In Uleman, J. S. & Bargh, J. A. (Eds.), Unintended thought (pp. 3–51). Guilford.Google Scholar

Bargh, J. A. (1999). The cognitive monster: The case against the controllability of automatic stereotype effects. In Chaiken, S. & Trope, Y. (Eds.), Dual process theories in social psychology (pp. 361–382). Guilford.Google Scholar

Barnes, J., Jorgensen, C., Beaver, K. M., Boutwell, B. B., & Wright, J. P. (2015). Arrest prevalence in a national sample of adults: The role of sex and race/ethnicity. American Journal of Criminal Justice, 40, 457–465.CrossRef Google Scholar

Beaver, K. M., DeLisi, M., Wright, J. P., Boutwell, B. B., Barnes, J. C., & Vaughn, M. G. (2013). No evidence of racial discrimination in criminal justice processing: Results from the national longitudinal study of adolescent health. Personality and Individual Differences, 55, 29–34.CrossRef Google Scholar

Benbow, C. P., & Stanley, J. C. (1980). Sex differences in mathematical ability: Fact or artifact? Science (New York, N.Y.), 210, 1262–1264.CrossRef Google Scholar PubMed

Benbow, C. P., Lubinski, D., Shea, D. L., & Eftekhari-Sanjani, H. (2000). Sex differences in mathematical reasoning ability at age 13: Their status 20 years later. Psychological Science, 11, 474–480.CrossRef Google Scholar PubMed

Bertrand, M., & Mullainathan, S. (2004). Are Emily and Greg more employable than Lakisha and Jamal? A field experiment on labor market discrimination. American Economic Review, 94, 991–1013.CrossRef Google Scholar

Blair, I. V. (2002). The malleability of automatic stereotypes and prejudice. Personality and Social Psychology Review, 6, 242–261.CrossRef Google Scholar

Blanton, H., & Jaccard, J. (2008). Unconscious racism: A concept in pursuit of a measure. Annual Review of Sociology, 34, 277–297.CrossRef Google Scholar

Blanton, H., Jaccard, J., Christie, C., & Gonzales, P. M. (2007). Plausible assumptions, questionable assumptions and post hoc rationalizations: Will the real IAT, please stand up? Journal of Experimental Social Psychology, 43, 399–409.CrossRef Google Scholar

Blanton, H., Jaccard, J., Gonzales, P. M., & Christie, C. (2006). Decoding the implicit association test: Implications for criterion prediction. Journal of Experimental Social Psychology, 42, 192–212.CrossRef Google Scholar

Blanton, H., Jaccard, J., Klick, J., Mellers, B., Mitchell, G., & Tetlock, P. E. (2009). Strong claims and weak evidence: Reassessing the predictive validity of the IAT. Journal of Applied Psychology, 94, 567–582.CrossRef Google Scholar PubMed

Blanton, H., Jaccard, J., Strauts, E., Mitchell, G., & Tetlock, P. E. (2015). Toward a meaningful metric of implicit prejudice. Journal of Applied Psychology, 100, 1468–1481.CrossRef Google Scholar

Byrnes, J. P., Miller, D. C., & Schafer, W. D. (1999). Gender differences in risk taking: A meta-analysis. Psychological Bulletin, 125, 367–383.CrossRef Google Scholar

Carey, B., & Goode, E. (2016). Police try to lower bias, but under pressure, it isn't so easy. New York Times. Retrieved from http://www.nytimes.com/2016/07/12/science/bias-reduction-programs.html.Google Scholar

Ceci, S. J., & Williams, W. M. (2010). Sex differences in math-intensive fields. Current Directions in Psychological Science, 19, 275–279.CrossRef Google Scholar PubMed

Ceci, S. J., & Williams, W. M. (2011). Understanding current causes of women's underrepresentation in science. Proceedings of the National Academy of Sciences, 108, 3157–3162.CrossRef Google Scholar PubMed

Ceci, S. J., Williams, W. M., & Barnett, S. M. (2009). Women's underrepresentation in science: Sociocultural and biological considerations. Psychological Bulletin, 135, 218–261.CrossRef Google Scholar PubMed

Cesario, J. (2021). On selective emphasis, broad agreement, and future directions: Reply to Ross, Winterhalder, & McElreath. Retrieved from psyarxiv.com/2p5eg.Google Scholar

Cesario, J., & Carrillo, A. (in press). Racial bias in police officer deadly force decisions: What has social cognition learned? In Carlston, D. E., Johnson, K., & Hugenberg, K. (Eds.), The Oxford handbook of social cognition (2nd ed.). Oxford University Press.Google Scholar

Cesario, J., Johnson, D. J., & Terrill, W. (2019). Is there evidence of racial disparity in police use of deadly force? Analyses of officer-involved fatal shootings in 2015–2016. Social Psychological and Personality Science, 10, 586–595.CrossRef Google Scholar

Cheng, S. D. (2020). Careers Versus Children: How Childcare Affects the Academic Tenure-Track Gender Gap. Retrieved from https://scholar.harvard.edu/files/sdcheng/files/sdcheng_kids_jmpv7.pdf.Google Scholar

Corneille, O., & Hütter, M. (2020). Implicit? What do you mean? A comprehensive review of the delusive implicitness construct in attitude research. Personality and Social Psychology Review, 24, 212–232.CrossRef Google Scholar

Correll, J., Hudson, S. M., Guillermo, S., & Ma, D. S. (2014). The police officer's dilemma: A decade of research on racial bias in the decision to shoot. Social and Personality Psychology Compass, 8, 201–213.CrossRef Google Scholar

Correll, J., Park, B., Judd, C. M., & Wittenbrink, B. (2002). The police officer's dilemma: Using ethnicity to disambiguate potentially threatening individuals. Journal of Personality and Social Psychology, 83, 1314–1329.CrossRef Google Scholar PubMed

Correll, J., Park, B., Judd, C. M., Wittenbrink, B., Sadler, M. S., & Keesee, T. (2007). Across the thin blue line: Police officers and racial bias in the decision to shoot. Journal of Personality and Social Psychology, 92, 1006–1023.CrossRef Google Scholar

Correll, J., Wittenbrink, B., Park, B., Judd, C. M., & Goyle, A. (2011). Dangerous enough: Moderating racial bias with contextual threat cues. Journal of Experimental Social Psychology, 47, 184–189.CrossRef Google Scholar PubMed

Cortés, P., & Pan, J. (2020). Children and the remaining gender gaps in the labor market. NBER Working Paper 27980.CrossRef Google Scholar

Cox, W. T. L., & Devine, P. G. (2016). Experimental research on shooter bias: Ready (or relevant) for application in the courtroom? Journal of Applied Research in Memory and Cognition, 5, 236–238.CrossRef Google Scholar

Cox, W. T. L., Devine, P. G., Plant, E. A., & Schwartz, L. L. (2014). Toward a comprehensive understanding of officers’ shooting decisions: No simple answers to this complex problem. Basic and Applied Social Psychology, 36, 356–364.CrossRef Google Scholar

Cvencek, D., Greenwald, A. G., & Meltzoff, A. N. (2011a). Measuring implicit attitudes of 4-year-olds: The preschool implicit association test. Journal of Experimental Child Psychology, 109, 187–200.CrossRef Google Scholar

Cvencek, D., Meltzoff, A. N., & Greenwald, A. G. (2011b). Math–gender stereotypes in elementary school children. Child Development, 82, 766–779.CrossRef Google Scholar

Darley, J. M., & Gross, P. H. (1983). A hypothesis-confirming bias in labeling effects. Journal of Personality and Social Psychology, 44, 20–33.CrossRef Google Scholar

Dasgupta, N., & Stout, J. G. (2012). Contemporary discrimination in the lab and field: Benefits and obstacles of full-cycle social psychology. Journal of Social Issues, 68, 399–412.CrossRef Google Scholar

Davis, E., Whyde, A., & Langton, L. (2018). Contacts between police and the public, 2015. Bureau of Justice Statistics. U.S. Department of Justice.Google Scholar

DeNavas-Walt, C., Proctor, B., & Smith, J. (2013). Income, poverty, and health insurance coverage in the United States: 2012. U.S. Department of Commerce.Google Scholar

Devine, P. G. (1989). Stereotypes and prejudice: Their automatic and controlled components. Journal of Personality and Social Psychology, 56, 5–18.CrossRef Google Scholar

Dhami, M. K., Hertwig, R., & Hoffrage, U. (2004). The role of representative design in an ecological approach to cognition. Psychological Bulletin, 130, 959–988.CrossRef Google Scholar

Diekman, A. B., Steinberg, M., Brown, E. R., Belanger, A. L., & Clark, E. K. (2017). A goal congruity model of role entry, engagement, and exit: Understanding communal goal processes in STEM gender gaps. Personality and Social Psychology Review, 21, 142–175.CrossRef Google Scholar PubMed

Dovidio, J. F. (2001). On the nature of contemporary prejudice: The third wave. Journal of Social Issues, 57, 829–849.CrossRef Google Scholar

Dovidio, J. F., & Gaertner, S. L. (2000). Aversive racism and selection decisions: 1989 and 1999. Psychological Science, 11, 315–319.CrossRef Google Scholar PubMed

Dovidio, J. F., Penner, L. A., Albrecht, T. L., Norton, W. E., Gaertner, S. L., & Shelton, J. N. (2008). Disparities and distrust: The implications of psychological processes for understanding racial disparities in health and health care. Social Science & Medicine, 67, 478–486.CrossRef Google Scholar PubMed

Dreifus, C. (2015). Perceptions of race at a glance. New York Times. Retrieved from http://www.nytimes.com/2015/01/06/science/a-macarthur-grant-winner-tries-to-unearth-biases-to-aid-criminal-justice.html.Google Scholar

Duncan, B. L. (1976). Differential social perception and attribution of intergroup violence: Testing the lower limits of stereotyping of blacks. Journal of Personality and Social Psychology, 34, 590–598.CrossRef Google Scholar

Eagly, A. H., & Johannesen-Schmidt, M. C. (2001). The leadership styles of women and men. Journal of Social Issues, 57, 781–797.CrossRef Google Scholar

Eagly, A. H., & Johnson, B. T. (1990). Gender and leadership style: A meta-analysis. Psychological Bulletin, 108, 233–256.CrossRef Google Scholar

Eagly, A. H., Nater, C., Miller, D. I., Kaufmann, M., & Sczesny, S. (2020). Gender stereotypes have changed: A cross-temporal meta-analysis of U.S. public opinion polls from 1946 to 2018. American Psychologist, 75, 301–315.CrossRef Google Scholar PubMed

Eagly, A. H., Wood, W., & Diekman, A. H. (2000). Social role theory of sex differences and similarities: A current appraisal. In Eckes, T. & Trautner, H. M. (Eds.), The developmental social psychology of gender (pp. 123–174). Erlbaum.Google Scholar

Fazio, R. H. (1990). Multiple processes by which attitudes guide behavior: The MODE model as an integrative framework. In M. P. Zanna (Ed.), Advances in experimental social psychology (Vol. 23, pp. 75–109). Elsevier.Google Scholar

Fazio, R. H., Jackson, J. R., Dunton, B. C., & Williams, C. J. (1995). Variability in automatic activation as an unobtrusive measure of racial attitudes: A bona fide pipeline? Journal of Personality and Social Psychology, 69, 1013–1027.CrossRef Google Scholar PubMed

Felson, R. B. (1991). Blame analysis: Accounting for the behavior of protected groups. The American Sociologist, 22, 5–23.CrossRef Google Scholar

Fiedler, K., Messner, C., & Bluemke, M. (2006). Unresolved problems with the “I”, the “A”, and the “T”: A logical and psychometric critique of the Implicit Association Test (IAT). European Review of Social Psychology, 17, 74–147.CrossRef Google Scholar

Fiske, S. T., & Neuberg, S. L. (1990). A continuum of impression formation, from category-based to individuating processes: Influences of information and motivation on attention and interpretation. In M. P. Zanna (Ed.), Advances in experimental social psychology (Vol. 23, pp. 1–74). Elsevier.Google Scholar

Fryer, R. G. Jr (2016). An empirical analysis of racial differences in police use of force (Tech. Rep.). National Bureau of Economic Research.Google Scholar

Fryer, R. G. Jr, & Levitt, S. D. (2010). An empirical analysis of the gender gap in mathematics. American Economic Journal: Applied Economics, 2, 210–240.Google Scholar

Fyfe, J. J. (1980). Geographic correlates of police shooting: A microanalysis. Journal of Research in Crime and Delinquency, 17, 101–103.CrossRef Google Scholar

Gawronski, B. (2019). Six lessons for a cogent science of implicit bias and its criticism. Perspectives on Psychological Science, 14, 574–595.CrossRef Google Scholar PubMed

Gawronski, B., & Sritharan, R. (2010). Formation, change, and contextualization of mental associations: Determinants and principles of variations in implicit measures. In Gawronski, B. & Payne, B. K. (Eds.), Handbook of implicit social cognition: Measurement, theory, and applications (pp. 216–240). The Guilford Press.Google Scholar

Geller, W. A., & Karales, K. J. (1981). Shootings of and by Chicago police: Uncommon crises – part I: Shootings by Chicago police. Journal of Criminal Law & Criminology, 72, 1813–1866.CrossRef Google Scholar

Gigerenzer, G., Hoffrage, U., & Kleinbölting, H. (1991). Probabilistic mental models: A Brunswikian theory of confidence. Psychological Review, 98, 506–528.CrossRef Google Scholar

Gottfredson, L. S. (1997). Mainstream science on intelligence: An editorial with 52 signatories, history, and bibliography. Intelligence, 24, 13–23.CrossRef Google Scholar

Gottfredson, L. S. (1998). The general intelligence factor. Scientific American Presents, 9, 24–29.Google Scholar

Gottfredson, L. S. (2004). Intelligence: Is it the epidemiologists’ elusive “fundamental cause” of social class inequalities in health? Journal of Personality and Social Psychology, 86, 174–199.CrossRef Google Scholar PubMed

Gray, N. S., Brown, A. S., MacCulloch, M. J., Smith, J., & Snowden, R. J. (2005). An implicit test of the associations between children and sex in pedophiles. Journal of Abnormal Psychology, 114, 304–308.CrossRef Google Scholar PubMed

Green, A. R., Carney, D. R., Pallin, D. J., Ngo, L. H., Raymond, K. L., Iezzoni, L. I., & Banaji, M. R. (2007). Implicit bias among physicians and its prediction of thrombolysis decisions for Black and White patients. Journal of General Internal Medicine, 22, 1231–1238.CrossRef Google Scholar PubMed

Greenwald, A. G., & Banaji, M. (1995). Implicit social cognition: Attitudes, self-esteem, and stereotypes. Psychological Review, 102, 4–27.CrossRef Google Scholar PubMed

Greenwald, A. G., & Krieger, L. H. (2006). Implicit bias: Scientific foundations. California Law Review, 94, 945–967.CrossRef Google Scholar

Greenwald, A. G., McGhee, D. E., & Schwartz, J. L. (1998). Measuring individual differences in implicit cognition: The Implicit Association Test. Journal of Personality and Social Psychology, 74, 1464–1480.CrossRef Google Scholar PubMed

Haidt, J. (2011). The bright future of post-partisan social psychology. Talk given at the annual meeting of the Society for Personality and Social Psychology, San Antonio, TX. Retrieved from http://people.virginia.edu/~jdh6n/postpartisan.html.Google Scholar

Hakim, C. (2006). Women, careers, and work-life preferences. British Journal of Guidance & Counselling, 34, 279–294.CrossRef Google Scholar

Halpern, D. F., Benbow, C. P., Geary, D. C., Gur, R. C., Hyde, J. S., & Gernsbacher, M. A. (2007). The science of sex differences in science and mathematics. Psychological Science in the Public Interest, 8, 1–51.CrossRef Google Scholar PubMed

Heckman, J. J. (1998). Detecting discrimination. Journal of Economic Perspectives, 12, 101–116.CrossRef Google Scholar

Heriot, G. L., & Somin, A. (2018). The department of education's Obama-era initiative on racial disparities in school discipline: Wrong for students and teachers, wrong on the law. Texas Review of Law and Politics, Forthcoming; San Diego Legal Studies Paper No. 18-321.Google Scholar

Higgins, E. T. (1996). Knowledge activation: Accessibility, applicability, and salience. In Higgins, E. T. & Kruglanski, A. W. (Eds.), Social psychology: Handbook of basic principles (pp. 133–168). Guilford Press.Google Scholar

Higgins, E. T., & Bargh, J. A. (1987). Social cognition and social perception. Annual Review of Psychology, 38, 369–425.CrossRef Google Scholar PubMed

Hogarth, R. M. (1981). Beyond discrete biases: Functional and dysfunctional aspects of judgmental heuristics. Psychological Bulletin, 90, 197–217.CrossRef Google Scholar

Hsia, J. (1988). Asian Americans in higher education and at work. Lawrence Erlbaum Associates, Inc.Google Scholar

Hsin, A., & Xie, Y. (2014). Explaining Asian Americans’ academic advantage over whites. Proceedings of the National Academy of Sciences, 111, 8416–8421.CrossRef Google Scholar PubMed

Inbar, Y., & Lammers, J. (2012). Political diversity in social and personality psychology. Perspectives on Psychological Science, 7, 496–403.CrossRef Google Scholar PubMed

Inn, A., Wheeler, A. C., & Sparling, C. L. (1977). The effects of suspect race and situation hazard on police officer shooting behavior. Journal of Applied Social Psychology, 7, 27–37.CrossRef Google Scholar

James, L., James, S. M., & Vila, B. J. (2016). The reverse racism effect: Are cops more hesitant to shoot black than white suspects? Criminology & Public Policy, 15, 457–479.CrossRef Google Scholar

James, L., Klinger, D., & Vila, B. (2014). Racial and ethnic bias in decisions to shoot seen through a stronger lens: Experimental results from high-fidelity laboratory simulations. Journal of Experimental Criminology, 10, 323–340.CrossRef Google Scholar

James, L., Vila, B., & Daratha, K. (2013). Results from experimental trials testing participant responses to White, Hispanic and Black suspects in high-fidelity deadly force judgment and decision-making simulations. Journal of Experimental Criminology, 9, 189–212.CrossRef Google Scholar

Jarvis, S. N., & Okonofua, J. A. (2020). School deferred: When bias affects school leaders. Social Psychological and Personality Science, 11, 492–498.CrossRef Google Scholar

Johnson, D. J., Cesario, J., & Pleskac, T. J. (2018). How prior information and police experience impact decisions to shoot. Journal of Personality and Social Psychology, 115, 601–623.CrossRef Google Scholar

Johnson, D. J., Hopwood, C. J., Cesario, J., & Pleskac, T. J. (2017). Advancing research on cognitive processes in social and personality psychology: A diffusion model primer. Social Psychological and Personality Science, 8, 413–423.CrossRef Google Scholar

Jost, J. T. (2019). The IAT is dead, long live the IAT: Context-sensitive measures of implicit attitudes are indispensable to social and political psychology. Current Directions in Psychological Science, 28, 10–19.CrossRef Google Scholar

Jost, J. T., Rudman, L. A., Blair, I. V., Carney, D. R., Dasgupta, N., Glaser, J., & Hardin, C. D. (2009). The existence of implicit bias is beyond reasonable doubt: A refutation of ideological and methodological objections and executive summary of ten studies that no manager should ignore. Research in organizational behavior, 29, 39–69.CrossRef Google Scholar

Jussim, L. (2012a). Liberal privilege in academic psychology and the social sciences: Commentary on Inbar & Lammers (2012). Perspectives on Psychological Science, 7, 504–507.CrossRef Google Scholar

Jussim, L. (2012b). Social perception and social reality: Why accuracy dominates bias and self-fulfilling prophecy. Oxford University Press.CrossRef Google Scholar

Jussim, L., Cain, T. R., Crawford, J. T., Harber, K., & Cohen, F. (2009). The unbearable accuracy of stereotypes. In T. D. Nelson (Ed.), Handbook of prejudice, stereotyping, and discrimination (pp. 199–227). Psychology Press.Google Scholar

Jussim, L., Crawford, J., Anglin, S., Chambers, J., Stevens, S., & Cohen, F. (2015a). Stereotype accuracy: One of the largest and most replicable effects in all of social psychology. In T. D. Nelson (Ed.), Handbook of prejudice, stereotyping, and discrimination (pp. 31–63). Psychology Press.Google Scholar

Jussim, L., Crawford, J. T., Anglin, S. M., & Stevens, S. T. (2015b). Ideological bias in social psychological research. In J. P. Forgas, K. Fiedler, & W. D. Crano (Eds.), Social psychology and politics (pp. 107–126). Psychology Press.Google Scholar

Jussim, L., Crawford, J. T., & Rubinstein, R. S. (2015c). Stereotype (in)accuracy in perceptions of groups and individuals. Current Directions in Psychological Science, 24, 490–497.CrossRef Google Scholar

Kang, J., & Banaji, M. R. (2006). Fair measures: Behavioral realist revision of affirmative action. California Law Review, 94, 1063–1018.CrossRef Google Scholar

Kiefer, A. K., & Sekaquaptewa, D. (2007). Implicit stereotypes, gender identification, and math-related outcomes: A prospective study of female college students. Psychological Science, 18, 13–18.CrossRef Google Scholar PubMed

Klein, G. A. (1998). Sources of power: How people make decisions. MIT Press.Google Scholar

Kleven, H., Landais, C., Posch, J., Steinhauer, A., & Zweimüller, J. (2020). Do Family Policies Reduce Gender Inequality? Evidence from 60 Years of Policy Experimentation. NBER Working Paper (w28082).CrossRef Google Scholar

Kleven, H., Landais, C., & Søgaard, J. E. (2019). Children and gender inequality: Evidence from Denmark. American Economic Journal: Applied Economics, 11, 181–209.Google Scholar

Klinger, D., Rosenfeld, R., Isom, D., & Deckard, M. (2016). Race, crime, and the micro-ecology of deadly force. Criminology & Public Policy, 15, 193–222.CrossRef Google Scholar

Koch, A. J., D'Mello, S. D., & Sackett, P. R. (2015). A meta-analysis of gender stereotypes and bias in experimental simulations of employment decision making. Journal of Applied Psychology, 100, 128–161.CrossRef Google Scholar PubMed

Koenig, A. M., & Eagly, A. H. (2014). Evidence for the social role theory of stereotype content: Observations of groups’ roles shape stereotypes. Journal of Personality and Social Psychology, 107, 371–392.CrossRef Google Scholar PubMed

Kristof, N. (2014). Is everyone a little bit racist? New York Times. Retrieved from http://www.nytimes.com/2014/08/28/opinion/nicholas-kristof-is-everyone-a-little-bit-racist.html.Google Scholar

Krueger, J., & Rothbart, M. (1988). Use of categorical and individuating information in making inferences about personality. Journal of Personality and Social Psychology, 55, 187–195.CrossRef Google Scholar PubMed

Kunda, Z., & Spencer, S. J. (2003). When do stereotypes come to mind and when do they color judgment? A goal-based theoretical framework for stereotype activation and application. Psychological Bulletin, 129, 522–544.CrossRef Google Scholar PubMed

Kunda, Z., & Thagard, P. (1996). Forming impressions from stereotypes, traits, and behaviors: A parallel-constraint-satisfaction theory. Psychological Review, 103, 284–308.CrossRef Google Scholar

Lee, K., & Ashton, M. C. (2020). Sex differences in HEXACO personality characteristics across countries and ethnicities. Journal of Personality, 88, 1075–1090.CrossRef Google Scholar PubMed

Levine, J. M., Resnick, L. B., & Higgins, E. T. (1993). Social foundations of cognition. Annual Review of Psychology, 44, 585–612.CrossRef Google Scholar PubMed

Lewis, C. W., Butler, B. R., Bonner, F. A. III, & Joubert, M. (2010). African American male discipline patterns and school district responses resulting impact on academic achievement: Implications for urban educators and policy makers. Journal of African American Males in Education, 1, 7–25.Google Scholar

Lhamon, C., & Samuels, J. (2014). Dear colleague letter on the nondiscriminatory administration of school discipline. Washington, DC: U.S. Department of Education, Office of Civil Rights & U.S. Department of Justice, Civil Rights Division.Google Scholar

Lick, D. J., Alter, A. L., & Freeman, J. B. (2018). Superior pattern detectors efficiently learn, activate, apply, and update social stereotypes. Journal of Experimental Psychology: General, 147, 209–227.CrossRef Google Scholar PubMed

Lippa, R. (1998). Gender-related individual differences and the structure of vocational interests: The importance of the people–things dimension. Journal of Personality and Social Psychology, 74, 996–1009.CrossRef Google Scholar PubMed

Locksley, A., Borgida, E., Brekke, N., & Hepburn, C. (1980). Sex stereotypes and social judgment. Journal of Personality and Social Psychology, 39, 821–831.CrossRef Google Scholar

Logan, G. D. (2018). Automatic control: How experts act without thinking. Psychological Review, 125, 453–485.CrossRef Google Scholar PubMed

Lopez, G. (2017). Police shootings and brutality in the US: 9 things you should know. Vox. Retrieved from https://www.vox.com/cards/police-brutality-shootings-us/us-police-racism.Google Scholar

Loughlin, J. K., & Flora, K. (2017). Shots fired: The misunderstandings, misconceptions, and myths about police shootings. Simon and Schuster.Google Scholar

Lu, J. G., Nisbett, R. E., & Morris, M. W. (2020). Why East Asians but not South Asians are underrepresented in leadership positions in the United States. Proceedings of the National Academy of Sciences, 117, 4590–4600.CrossRef Google Scholar

Lubinski, D., & Benbow, C. P. (1992). Gender differences in abilities and preferences among the gifted: Implications for the math-science pipeline. Current Directions in Psychological Science, 1, 61–66.CrossRef Google Scholar

Lynn, R. (2004). The intelligence of American Jews. Personality and Individual Differences, 36, 201–206.CrossRef Google Scholar

Lynn, R., & Irwing, P. (2004). Sex differences on the progressive matrices: A meta-analysis. Intelligence, 32, 481–498.CrossRef Google Scholar

Ma, D. S., & Correll, J. (2011). Target prototypicality moderates racial bias in the decision to shoot. Journal of Experimental Social Psychology, 47, 391–396.CrossRef Google Scholar

Ma, D., Graves, S., & Alvarado, J. (2019). A spatial analysis of officer-involved shootings in Los Angeles. Yearbook of the Association of Pacific Coast Geographers, 81, 158–181.CrossRef Google Scholar

Madon, S., Jussim, L., Guyll, M., Nofziger, H., Salib, E., Willard, J., & Scherr, K. C. (2018). The accumulation of stereotype-based self-fulfilling prophecies. Journal of Personality and Social Psychology, 115(5), 825–844.CrossRef Google Scholar PubMed

Martell, R. F., Lane, D. M., & Emrich, C. (1996). Male–female differences: A computer simulation. American Psychologist, 51, 157–158.CrossRef Google Scholar

McCauley, C., Stitt, C. L., & Segal, M. (1980). Stereotyping: From prejudice to prediction. Psychological Bulletin, 87, 195–208.CrossRef Google Scholar

McLanahan, S., & Percheski, C. (2008). Family structure and the reproduction of inequalities. Annual Review Sociology, 34, 257–276.CrossRef Google Scholar

Mekawi, Y., & Bresin, K. (2015). Is the evidence from racial bias shooting task studies a smoking gun? Results from a meta-analysis. Journal of Experimental Social Psychology, 61, 120–130.CrossRef Google Scholar

Mentch, L. (2020). On racial disparities in recent fatal police shootings. Statistics and Public Policy, 7, 9–18.CrossRef Google Scholar

Miller, A. L. (2019). Expertise fails to attenuate gendered biases in judicial decision-making. Social Psychological and Personality Science, 10, 227–234.CrossRef Google Scholar

Miller, T. R., Lawrence, B. A., Carlson, N. N., Hendrie, D., Randall, S., Rockett, I. R., & Spicer, R. S. (2017). Perils of police action: A cautionary tale from US data sets. Injury Prevention, 23, 27–32.CrossRef Google Scholar PubMed

Mitchell, G. (2012). Revisiting truth or triviality: The external validity of research in the psychological laboratory. Perspectives on Psychological Science, 7, 109–117.CrossRef Google Scholar PubMed

Mitchell, G. (2018). Jumping to conclusions: Advocacy and application of psychological research. In Crawford, J. & Jussim, L. (Eds.), The politics of social psychology (pp. 139–155). Routledge.Google Scholar

Mook, D. G. (1983). In defense of external invalidity. American Psychologist, 38, 379–387.CrossRef Google Scholar

Morrison, G. (2006). Police department and instructor perspectives on pre-service firearm and deadly force training. Policing: An International Journal of Police Strategies & Management, 29, 226–245.CrossRef Google Scholar

Mortensen, C. R., & Cialdini, R. B. (2010). Full-cycle social psychology for theory and application. Social and Personality Psychology Compass, 4, 53–63.CrossRef Google Scholar

Moss-Racusin, C. A., Dovidio, J. F., Brescoll, V. L., Graham, M. J., & Handelsman, J. (2012). Science faculty's subtle gender biases favor male students. Proceedings of the National Academy of Sciences, 109, 16474–9.CrossRef Google Scholar PubMed

Musu-Gillette, L., Zhang, A., Wang, K., Zhang, J., Kemp, J., Diliberti, M., & Oudekerk, B. A. (2018). Indicators of school crime and safety: 2017. National Children's Advocacy Center.Google Scholar

Neal, D. A., & Johnson, W. R. (1996). The role of premarket factors in Black–White wage differences. Journal of Political Economy, 104, 869–895.CrossRef Google Scholar

Nix, J., Campbell, B. A., Byers, E. H., & Alpert, G. P. (2017). A bird's eye view of civilians killed by police in 2015: Further evidence of implicit bias. Criminology & Public Policy, 16, 309–340.CrossRef Google Scholar

Nock, M. K., & Banaji, M. R. (2007). Prediction of suicide ideation and attempts among adolescents using a brief performance-based test. Journal of Consulting and Clinical Psychology, 75, 707–715.CrossRef Google Scholar PubMed

Nosek, B. A., Banaji, M. R., & Greenwald, A. G. (2002). Math = male, me = female, therefore math ≠ me. Journal of Personality and Social Psychology, 83(1), 44–59.CrossRef Google Scholar PubMed

Nosek, B. A., Greenwald, A. G., & Banaji, M. R. (2007). The implicit association test at age 7: A methodological and conceptual review. Automatic Processes in Social Thinking and Behavior, 4, 265–292.Google Scholar

Nosek, B. A., & Smyth, F. L. (2011). Implicit social cognitions predict sex differences in math engagement and achievement. American Educational Research Journal, 48, 1125–1156.CrossRef Google Scholar

Nosek, B. A., Smyth, F. L., Sriram, N., Lindner, N. M., Devos, T., Ayala, A., … Greenwald, A. G. (2009). National differences in gender-science stereotypes predict national sex differences in science and math achievement. Proceedings of the National Academy of Sciences, 106, 10593–7.CrossRef Google Scholar PubMed

Okonofua, J. A., & Eberhardt, J. L. (2015). Two strikes: Race and the disciplining of young students. Psychological Science, 26, 617–624.CrossRef Google Scholar PubMed

Okonofua, J. A., Walton, G. M., & Eberhardt, J. L. (2016). A vicious cycle: A social–psychological account of extreme racial disparities in school discipline. Perspectives on Psychological Science, 11, 381–398.CrossRef Google Scholar PubMed

Oswald, F. L., Mitchell, G., Blanton, H., Jaccard, J., & Tetlock, P. E. (2013). Predicting ethnic and racial discrimination: A meta-analysis of IAT criterion studies. Journal of Personality and Social Psychology, 105, 171–192.CrossRef Google Scholar PubMed

Oswald, F. L., Mitchell, G., Blanton, H., Jaccard, J., & Tetlock, P. E. (2015). Using the IAT to predict ethnic and racial discrimination: Small effect sizes of unknown societal significance. Journal of Personality and Social Psychology, 108, 562–571.CrossRef Google Scholar PubMed

Palfai, T. P., & Ostafin, B. D. (2003). Alcohol-related motivational tendencies in hazardous drinkers: Assessing implicit response tendencies using the modified-IAT. Behaviour Research and Therapy, 41, 1149–1162.CrossRef Google Scholar PubMed

Petty, R. E., & Wegener, D. T. (1999). The elaboration likelihood model; current status and controversies. In Chaiken, S. & Trope, Y. (Eds.), Dual process theories in social psychology (pp. 41–72). Guilford.Google Scholar

Plant, E. A., Goplen, J., & Kunstman, J. W. (2011). Selective responses to threat: The roles of race and gender in decisions to shoot. Personality and Social Psychology Bulletin, 37, 1274–1281.CrossRef Google Scholar

Plant, E. A., & Peruche, B. M. (2005). The consequences of race for police officers’ responses to criminal suspects. Psychological Science, 16, 180–183. doi: 10.1111/j.0956-7976.2005.00800.x.CrossRef Google Scholar PubMed

Platt, J. R. (1964). Strong inference. Science (New York, N.Y.), 146, 347–353.CrossRef Google Scholar PubMed

Pleskac, T. J., Cesario, J., & Johnson, D. J. (2018). How race affects evidence accumulation during the decision to shoot. Psychonomic Bulletin & Review, 25, 1301–1330. https://doi.org/10.3758/s13423-017-1369-6.CrossRef Google Scholar

Pleskac, T. J., & Hertwig, R. (2014). Ecologically rational choice and the structure of the environment. Journal of Experimental Psychology: General, 143, 2000–2019.CrossRef Google Scholar PubMed

Pleskac, T. J., Johnson, D. J., Cesario, J., Terrill, W., & Gagnon, G. (under review). Modeling police officers’ deadly force decisions in an immersive shooting simulator.Google Scholar

Price-Williams, D. R., & Ramirez, M. I. I. I. (1974). Ethnic differences in delay of gratification. The Journal of Social Psychology, 93, 23–30.CrossRef Google Scholar

Ring, K. (1967). Experimental social psychology: Some sober questions about some frivolous values. Journal of Experimental Social Psychology, 3, 113–123.CrossRef Google Scholar

Rocque, M., & Paternoster, R. (2011). Understanding the antecedents of the “school-to-jail” link: The relationship between race and school discipline. The Journal of Criminal Law and Criminology, 101, 633–665.Google Scholar

Rooth, D. (2007). Implicit discrimination in hiring: Real world evidence (IZA Discussion Paper No. 2764). Bonn, Germany: Forschungsinstitut zur Zukunft der Arbeit (Institute for the Study of Labor).Google Scholar

Ross, C. T., Winterhalder, B., & McElreath, R. (2021). Racial disparities in police use of deadly force against unarmed individuals persist after appropriately benchmarking shooting data on violent crime rates. Social Psychological and Personality Science, 12, 323–332.CrossRef Google Scholar

Roth, P. L., Bevier, C. A., Bobko, P., Switzer, F. S. III, & Tyler, P. (2001). Ethnic group differences in cognitive ability in employment and educational settings: A meta-analysis. Personnel Psychology, 54, 297–330.CrossRef Google Scholar

Rozin, P. (2009). What kind of empirical research should we publish, fund, and reward?: A different perspective. Perspectives on Psychological Science, 4, 435–439.CrossRef Google Scholar PubMed

Rudman, L. A., & Ashmore, R. D. (2007). Discrimination and the implicit association test. Group Processes & Intergroup Relations, 10, 359–372.CrossRef Google Scholar

Rudman, L. A., & Glick, P. (2001). Prescriptive gender stereotypes and backlash toward agentic women. Journal of Social Issues, 57, 743–762.CrossRef Google Scholar

Schimmack, U. (2020). The implicit association test: A method in search of a construct. Perspectives on Psychological Science, 16, 396–414.Google Scholar

Scott, K., Ma, D. S., Sadler, M. S., & Correll, J. (2017). A social scientific approach toward understanding racial disparities in police shooting: Data from the department of justice (1980–2000). Journal of Social Issues, 73, 701–722.CrossRef Google Scholar

Shjarback, J. A., & Nix, J. (2020). Considering violence against police by citizen race/ethnicity to contextualize representation in officer-involved shootings. Journal of Criminal Justice, 66, 101653-63.CrossRef Google Scholar

Sim, J. J., Correll, J., & Sadler, M. S. (2013). Understanding police and expert performance: When training attenuates (vs. exacerbates) stereotypic bias in the decision to shoot. Personality and Social Psychology Bulletin, 39, 291–304. doi: 10.1177/0146167212473157.CrossRef Google Scholar

Skiba, R. J., Michael, R. S., Nardo, A. C., & Peterson, R. L. (2002). The color of discipline: Sources of racial and gender disproportionality in school punishment. The Urban Review, 34, 317–342.CrossRef Google Scholar

Smith, E. R., & DeCoster, J. (2000). Dual-process models in social and cognitive psychology: Conceptual integration and links to underlying memory systems. Personality and Social Psychology Review, 4, 108–131.CrossRef Google Scholar

Sowell, T. (2005). Black rednecks and white liberals. Encounter Books.Google Scholar

Sowell, T. (2008). Discrimination and disparities. Basic Books.Google Scholar

Srivastava, S. (2016). Everything is fucked: The syllabus [Blog post]. Retrieved from https://thehardestscience.com/2016/08/11/everything-is-fucked-the-syllabus/.Google Scholar

Stanley, D. A., Sokol-Hessner, P., Banaji, M. R., & Phelps, E. A. (2011). Implicit race attitudes predict trustworthiness judgments and economic trust decisions. Proceedings of the National Academy of Sciences, 108, 7710–7715.CrossRef Google Scholar PubMed

Stickle, B. (2016). A national examination of the effect of education, training and pre-employment screening on law enforcement use of force. Justice Policy Journal, 13, 1–15.Google Scholar

Su, R., & Rounds, J. (2015). All STEM fields are not created equal: People and things interests explain gender disparities across STEM fields. Frontiers in Psychology, 6, 1–20.CrossRef Google Scholar

Su, R., Rounds, J., & Armstrong, P. I. (2009). Men and things, women and people: A meta-analysis of sex differences in interests. Psychological Bulletin, 135, 859–884.CrossRef Google Scholar PubMed

Taylor, A. (2011). The influence of target race on split-second shooting decisions in simulated scenarios :a Canadian perspective. (Doctor of Philosophy, Carleton University, Ottawa, Ontario). doi: 10.22215/etd/2011-09571.CrossRef Google Scholar

Tetlock, P. E. (1985). Accountability: The neglected social context of judgment and choice. Research in Organizational Behavior, 7, 297–332.Google Scholar

Tregle, B., Nix, J., & Alpert, G. P. (2019). Disparity does not mean bias: Making sense of observed racial disparities in fatal officer-involved shootings with multiple benchmarks. Journal of Crime and Justice, 42, 18–31.CrossRef Google Scholar

Uhlmann, E. L., Brescoll, V. L., & Paluck, E. L. (2006). Are members of low status groups perceived as bad, or badly off? Egalitarian negative associations and automatic prejudice. Journal of Experimental Social Psychology, 42, 491–499.CrossRef Google Scholar

Valla, J. M., Ceci, S. J. (2014). Breadth-based models of women's underrepresentation in STEM fields: An integrative commentary on Schmidt (2011) and Nye et al. (2012). Perspectives on Psychological Science, 9, 219–224.CrossRef Google Scholar

von Hippel, W., Brener, L., & von Hippel, C. (2008). Implicit prejudice toward injecting drug users predicts intentions to change jobs among drug and alcohol nurses. Psychological Science, 19, 7–11.CrossRef Google Scholar PubMed

Wheeler, A. P., Phillips, S. W., Worrall, J. L., & Bishopp, S. A. (2017). What factors influence an officer's decision to shoot? The promise and limitations of using public data. Justice Research and Policy, 18, 48–76. doi: 10.1177/1525107118759900.CrossRef Google Scholar

Wittenbrink, B., Judd, C. M., & Park, B. (2001). Spontaneous prejudice in context: Variability in automatically activated attitudes. Journal of Personality and Social Psychology, 81, 815–827.CrossRef Google Scholar PubMed

Worrall, J. L., Bishopp, S. A., Zinser, S. C., Wheeler, A. P., & Phillips, S. W. (2018). Exploring bias in police shooting decisions with real shoot/don't shoot cases. Crime Delinquency, 64, 1171–1192. doi: 10.1177/0011128718756038.CrossRef Google Scholar

Wright, J. P., Morgan, M. A., Coyne, M. A., Beaver, K. M., & Barnes, J. (2014). Prior problem behavior accounts for the racial gap in school suspensions. Journal of Criminal Justice, 42, 257–266.CrossRef Google Scholar

Zytkoskee, A., Strickland, B. R., & Watson, J. (1971). Delay of gratification and internal versus external control among adolescents of low socioeconomic status. Developmental Psychology, 4, 93–98.CrossRef Google Scholar