Cesario's misrepresentation of the research designs of past studies, and overextension of his critique, risk irony given the topic of the article.
Missing contingencies flaw: addressed. Cesario criticizes experimental studies on bias by claiming that they use “novice or experimental participants” (e.g., undergrads) who are untrained decision-makers. Yet the experiments cited (Jarvis & Okonofua, Reference Jarvis and Okonofua2020; Okonofua & Eberhardt, Reference Okonofua and Eberhardt2015) and other similar experiments (Okonofua, Paunesku, & Walton, Reference Okonofua, Paunesku and Walton2016) have exclusively sampled hundreds of practicing K-12 teachers and principals. For example, Okonofua and Eberhardt (Reference Okonofua and Eberhardt2015, Study 2) “recruited 204 K-12 teachers” (p. 4). In Table 3, Okonofua, Perez, and Darling-Hammond (Reference Okonofua, Perez and Darling-Hammond2020) show how their sample of 243 teachers is overwhelmingly similar in demographic representation to the national K-12 teacher workforce.
Cesario also describes the information provided to study participants about student misbehavior as “impoverished descriptions of real teacher–child experiences” (sect. 4.3, para. 3). This claim also lacks factual merit. In the publications, the researchers describe specifically why the stimuli are representative descriptions of actual real teacher–child experiences. First, the stimuli presented describe the most common student misbehavior real teachers face (Losen & Martinez, Reference Losen and Martinez2013). Okonofua and Eberhardt (Reference Okonofua and Eberhardt2015) write, “Minor infractions (e.g., for insubordination or class disruption) are the most frequently reported reasons for referring students to the principal's office” (p. 2). Second, the descriptions of the student misbehavior used in the study are taken directly from actual office-referral forms – using the precise words of a real K-12 teacher who referred a real student to a real principal's office for actual discipline. Okonofua and Eberhardt (Reference Okonofua and Eberhardt2015) state: “[Teachers] then viewed a school record – adapted from actual office-referral records we collected from a public middle school in California” (p. 2). Rather than impoverished, participants read the exact same information that is presented in the real world. Third, the cited research (Jarvis & Okonofua, Reference Jarvis and Okonofua2020) asks in-service principals – real-world decision-makers – to make discipline decisions based on this information.
Missing information flaw: addressed. Cesario claims that we removed “important information that real decision-makers could use such as a child's history of behavior in the classroom” (sect. 4.3, para. 3). Ironically, the purpose of the experiments was specifically to examine how the history of a child's misbehavior influences educators' perceptions of the child and disciplinary decisions. As predicted, providing this history only increased bias; it in no way diminished it. This theoretical emphasis is present not only in the first two words of the publication's title “Two Strikes,” but was deliberately embedded in the repeated-measures design that randomly counterbalanced the order of incidents in the cited experiments (Jarvis & Okonofua, Reference Jarvis and Okonofua2020; Okonofua & Eberhardt, Reference Okonofua and Eberhardt2015).
Missing forces flaw: discussed at length. Cesario claims that the researchers expect “children who differ in myriad important ways should behave identically” (sect. 4.3, para. 5). This is also false. We do not claim that children will always behave identically; to the contrary, our theory is designed to understand specifically how real differences in student behavior and disciplinary outcomes arise (Okonofua et al., Reference Okonofua, Paunesku and Walton2016). In this article, we go to lengths (+1,527 words) to describe what might lead children from different backgrounds to come to behave in different ways. Nevertheless, group differences in student misbehavior cannot fully account for racial disparities in discipline. And one need not take a psychologist's word for it. Education researchers' review of the latest education research conclude that
Although low-income and minority students experience suspensions and expulsions at higher rates than their peers, these differences cannot be solely attributed to socioeconomic status or increased misbehavior. Instead, school and classroom occurrences that result from the policies, practices, and perspectives of teachers and principals appear to play an important role in explaining the disparities (Welsh & Little, Reference Welsh and Little2018)
Using a tightly controlled experimental paradigm, in Okonofua and Eberhardt (Reference Okonofua and Eberhardt2015), we show that even in cases where Black and White children do, indeed, behave the same, teachers do not treat them the same. In fact, differences in teacher treatment may be one factor (of many) that could lead to differences in student behavior down the road. Thus, our theory points to the self-fulfilling consequences of tying individual Black children to group-based stereotypes.
In the end, Cesario's article is as myopic and overextended as it accuses the field of bias research to be. This is manifestly apparent in its neglect of intervention field experiments – randomized placebo-controlled studies that draw directly on the insights and theory developed through laboratory experimentation and then uses these to reduce real-world discipline problems. The criticized Okonofua et al. (Reference Okonofua, Paunesku and Walton2016) publication spends more than 2,000 words reviewing such studies. It is by understanding how bias and apprehensions about bias can undermine teacher–student relationships – through laboratory experiments and basic theory – that these studies find ways to improve trajectories and outcomes. For example, Okonofua et al. (Reference Okonofua, Paunesku and Walton2016) show that the same experimental paradigm can be used to determine if an “empathic-mindset,” a treatment to prioritize valuing students' perspectives when they misbehave, can reduce the likelihood a teacher will label a hypothetical Black student who misbehaves as a troublemaker (also see Okonofua et al., Reference Okonofua, Perez and Darling-Hammond2020). They then use this “empathic-mindset” approach in a field experiment with teachers who serve 1,682 actual students, which cut actual suspension rates over the academic year by 4.8 percentage points (also see Borman, Rozek, Pyne, & Hanselman, Reference Borman, Rozek, Pyne and Hanselman2019; Goyer et al., Reference Goyer, Cohen, Cook, Master, Apfel, Lee and Walton2019; Yeager et al., Reference Yeager, Purdie-Vaughns, Garcia, Apfel, Brzustoski, Master and Cohen2014).
The author's argument rests on a series of basic factual errors in describing controlled lab experiments on school discipline. The article does not acknowledge the contribution of controlled lab experiments to field experiments that have, in fact, dramatically reduced discipline in the real world. Instead of advancing theory or methodology, this article concludes by making a moral claim – that it is acceptable to judge individuals based on assumptions about social groups – that is contrary to public consensus and law, as though it were a scientific claim suitable for a science journal.
Cesario's misrepresentation of the research designs of past studies, and overextension of his critique, risk irony given the topic of the article.
Missing contingencies flaw: addressed. Cesario criticizes experimental studies on bias by claiming that they use “novice or experimental participants” (e.g., undergrads) who are untrained decision-makers. Yet the experiments cited (Jarvis & Okonofua, Reference Jarvis and Okonofua2020; Okonofua & Eberhardt, Reference Okonofua and Eberhardt2015) and other similar experiments (Okonofua, Paunesku, & Walton, Reference Okonofua, Paunesku and Walton2016) have exclusively sampled hundreds of practicing K-12 teachers and principals. For example, Okonofua and Eberhardt (Reference Okonofua and Eberhardt2015, Study 2) “recruited 204 K-12 teachers” (p. 4). In Table 3, Okonofua, Perez, and Darling-Hammond (Reference Okonofua, Perez and Darling-Hammond2020) show how their sample of 243 teachers is overwhelmingly similar in demographic representation to the national K-12 teacher workforce.
Cesario also describes the information provided to study participants about student misbehavior as “impoverished descriptions of real teacher–child experiences” (sect. 4.3, para. 3). This claim also lacks factual merit. In the publications, the researchers describe specifically why the stimuli are representative descriptions of actual real teacher–child experiences. First, the stimuli presented describe the most common student misbehavior real teachers face (Losen & Martinez, Reference Losen and Martinez2013). Okonofua and Eberhardt (Reference Okonofua and Eberhardt2015) write, “Minor infractions (e.g., for insubordination or class disruption) are the most frequently reported reasons for referring students to the principal's office” (p. 2). Second, the descriptions of the student misbehavior used in the study are taken directly from actual office-referral forms – using the precise words of a real K-12 teacher who referred a real student to a real principal's office for actual discipline. Okonofua and Eberhardt (Reference Okonofua and Eberhardt2015) state: “[Teachers] then viewed a school record – adapted from actual office-referral records we collected from a public middle school in California” (p. 2). Rather than impoverished, participants read the exact same information that is presented in the real world. Third, the cited research (Jarvis & Okonofua, Reference Jarvis and Okonofua2020) asks in-service principals – real-world decision-makers – to make discipline decisions based on this information.
Missing information flaw: addressed. Cesario claims that we removed “important information that real decision-makers could use such as a child's history of behavior in the classroom” (sect. 4.3, para. 3). Ironically, the purpose of the experiments was specifically to examine how the history of a child's misbehavior influences educators' perceptions of the child and disciplinary decisions. As predicted, providing this history only increased bias; it in no way diminished it. This theoretical emphasis is present not only in the first two words of the publication's title “Two Strikes,” but was deliberately embedded in the repeated-measures design that randomly counterbalanced the order of incidents in the cited experiments (Jarvis & Okonofua, Reference Jarvis and Okonofua2020; Okonofua & Eberhardt, Reference Okonofua and Eberhardt2015).
Missing forces flaw: discussed at length. Cesario claims that the researchers expect “children who differ in myriad important ways should behave identically” (sect. 4.3, para. 5). This is also false. We do not claim that children will always behave identically; to the contrary, our theory is designed to understand specifically how real differences in student behavior and disciplinary outcomes arise (Okonofua et al., Reference Okonofua, Paunesku and Walton2016). In this article, we go to lengths (+1,527 words) to describe what might lead children from different backgrounds to come to behave in different ways. Nevertheless, group differences in student misbehavior cannot fully account for racial disparities in discipline. And one need not take a psychologist's word for it. Education researchers' review of the latest education research conclude that
Although low-income and minority students experience suspensions and expulsions at higher rates than their peers, these differences cannot be solely attributed to socioeconomic status or increased misbehavior. Instead, school and classroom occurrences that result from the policies, practices, and perspectives of teachers and principals appear to play an important role in explaining the disparities (Welsh & Little, Reference Welsh and Little2018)
Using a tightly controlled experimental paradigm, in Okonofua and Eberhardt (Reference Okonofua and Eberhardt2015), we show that even in cases where Black and White children do, indeed, behave the same, teachers do not treat them the same. In fact, differences in teacher treatment may be one factor (of many) that could lead to differences in student behavior down the road. Thus, our theory points to the self-fulfilling consequences of tying individual Black children to group-based stereotypes.
In the end, Cesario's article is as myopic and overextended as it accuses the field of bias research to be. This is manifestly apparent in its neglect of intervention field experiments – randomized placebo-controlled studies that draw directly on the insights and theory developed through laboratory experimentation and then uses these to reduce real-world discipline problems. The criticized Okonofua et al. (Reference Okonofua, Paunesku and Walton2016) publication spends more than 2,000 words reviewing such studies. It is by understanding how bias and apprehensions about bias can undermine teacher–student relationships – through laboratory experiments and basic theory – that these studies find ways to improve trajectories and outcomes. For example, Okonofua et al. (Reference Okonofua, Paunesku and Walton2016) show that the same experimental paradigm can be used to determine if an “empathic-mindset,” a treatment to prioritize valuing students' perspectives when they misbehave, can reduce the likelihood a teacher will label a hypothetical Black student who misbehaves as a troublemaker (also see Okonofua et al., Reference Okonofua, Perez and Darling-Hammond2020). They then use this “empathic-mindset” approach in a field experiment with teachers who serve 1,682 actual students, which cut actual suspension rates over the academic year by 4.8 percentage points (also see Borman, Rozek, Pyne, & Hanselman, Reference Borman, Rozek, Pyne and Hanselman2019; Goyer et al., Reference Goyer, Cohen, Cook, Master, Apfel, Lee and Walton2019; Yeager et al., Reference Yeager, Purdie-Vaughns, Garcia, Apfel, Brzustoski, Master and Cohen2014).
The author's argument rests on a series of basic factual errors in describing controlled lab experiments on school discipline. The article does not acknowledge the contribution of controlled lab experiments to field experiments that have, in fact, dramatically reduced discipline in the real world. Instead of advancing theory or methodology, this article concludes by making a moral claim – that it is acceptable to judge individuals based on assumptions about social groups – that is contrary to public consensus and law, as though it were a scientific claim suitable for a science journal.
Acknowledgement
The author thanks Shoshana Jarvis, Gregory Walton, and Jennifer Eberhardt.
Financial support
This research received no specific grant from a funding agency, commercial, or not-for-profit sectors.
Conflict of interest
None.