Body
Yarkoni's critique focuses on overgeneralization from narrow sets of experimental stimuli, situations, or manipulations to very broad theoretical constructs. The issue of generalizability is not limited to academic and theoretical work but is also critically relevant for applied work that informs decisions made by organizations, institutions, and governments. Historically, applied psychology suffered from undergeneralization – a mistaken belief in “situational specificity,” that effects were unique to specific contexts and never (or rarely) generalizable (Schmidt & Hunter, Reference Schmidt and Hunter1977). With the advent of meta-analysis, applied psychology learned that much of this apparent variability was due to statistical artifacts. However, the field may have overcorrected and now tends to overgeneralize. Frequently, models, measures, or interventions are “validated” in narrow settings, then applied in other contexts without carefully considering generalizability. Where generalizability is tested, it is often done in limited ways. In this commentary, use two key generalizability challenges as illustrations.
First, consider cross-cultural generalizability. Much applied psychology research occurs in the United States and Western Europe, but these models are commonly used to motivate research and inform operational practices (e.g., selection systems, assessments, interventions) in organizations around the world (Gelfand, Leslie, & Fehr, Reference Gelfand, Leslie and Fehr2008). Researchers may allude to potential cross-cultural differences, but the core of Western models is generally assumed to apply across cultural contexts. Where generalizability across cultures is attended to, it is often done so haphazardly. A study might compare results in samples drawn from only two countries, rather than from a wider range of countries (Ones et al., Reference Ones, Dilchert, Deller, Albrecht, Duehr, Paulus, Ryan, Leong and Oswald2012). Often, these countries are described as varying on a single dimension of cultural characteristics (e.g., collectivistic vs. individualistic). Despite the narrow sampling of cultures in these studies, broad conclusions are often drawn about “collectivism” and assumed to apply to any culture that could be classified into these categories. Finally, samples in studies are frequently drawn from narrow subgroups within a culture (e.g., university students) without consideration of how these groups may differ from other cultural groups within a country. Several common overgeneralizations are apparent:
1. Individual countries are assumed to be exchangeable with others similarly classified.
2. The focal cultural characteristics (e.g., collectivism) are assumed to be the operative cause of cultural effects rather than other unmodeled factors (e.g., power distance, religiosity, history, economic environment).
3. The specific populations or subcultures sampled in a country are assumed to be representative of a country's cultural diversity.
These failures to consider included countries, characteristics, and subgroups as sampled from broader populations of these entities limit what conclusions are justifiable. If a researcher or organization observes that an intervention is effective in two contexts, they may conclude it will be effective more broadly. Conversely, if a study observes differences in relationships between two settings, they may also overestimate the variability in the relationship across cultures more broadly.
To better justify cross-cultural generalizations, researchers and practitioners must consider how representative their samples of individuals, characteristics, and countries are broadly. One possible approach is to conduct studies broadly sampling from diverse cultures around the world (Ones et al., Reference Ones, Dilchert, Deller, Albrecht, Duehr, Paulus, Ryan, Leong and Oswald2012). By robustly sampling from many cultures, researchers can more accurately gauge whether relationships are consistent or variable across contexts. An alternative approach would be to extend conclusions about generalizability cautiously. Conclusions from narrow studies should be limited to the groups, countries, and contexts represented. In reports, investigators should conclude “X predicts Y in a sample of students in the United States” rather than making generalizations about broad cultural factors such as “collectivism.” Over time, as single-context samples accumulate, systematic reviews of this evidence can identify the patterns that generalize (Oh, Reference Oh2009; van Aarde, Meiring, & Wiernik, Reference van Aarde, Meiring and Wiernik2017). Such an approach encourages appropriate caution and also encourages de-centering of Western perspectives, allowing researchers themselves representing diverse cultures to pose questions that are relevant to their cultural contexts (Cheung, van de Vijver, & Leong, Reference Cheung, van de Vijver and Leong2011; Gelfand et al., Reference Gelfand, Leslie and Fehr2008).
Second, consider validation of interventions. Organizational interventions are frequently trialed in narrow populations (e.g., university students or employees from a few organizations), then deployed operationally without careful evaluation of their broader effectiveness. A recent popular example is implicit bias-based interventions to address issues of bias, racism, and inequities in organizations. Such interventions have become widespread, but evidence for their effectiveness for improving prejudice and inequity outcomes is sparse (FitzGerald, Martin, Berner, & Hurst, Reference FitzGerald, Martin, Berner and Hurst2019; Onyeador, Hudson, & Lewis, Reference Onyeador, Hudson and Lewis2021). In a systematic review of implicit bias interventions, Forscher et al. (Reference Forscher, Lai, Axt, Ebersole, Herman, Devine and Nosek2019) found that the large majority of studies were conducted with US university students, focused only on changes in implicit attitudes versus broader outcomes, and reported small effects. Importantly, Forscher et al. observed substantial heterogeneity across studies, underscoring that broad generalizability of implicit bias effects should not be expected. In light of this review, the uptake of implicit bias interventions for operational use has outpaced evidence supporting them. We identify three key areas of overgeneralization:
1. Generalization from studied populations (primarily university students) to operationally-relevant population (employees in specific industries and regions).
2. Generalization from studied outcomes (primarily short-term changes in implicit bias scores) to operationally-relevant outcomes (prejudice and equity outcomes).
3. Generalization from small observed effects to assume larger societal relevance. Despite observing small effects, studies frequently allude to potential large societal impacts (e.g., through accumulation across people or over time; cf. Oswald, Mitchell, Blanton, Jaccard, & Tetlock, Reference Oswald, Mitchell, Blanton, Jaccard and Tetlock2015).
These overgeneralizations have consequences. Not only may these interventions be ineffective, but they appear to crowd out other actions that may better address systemic inequities (Pritlove, Juando-Prats, Ala-leppilampi, & Parsons, Reference Pritlove, Juando-Prats, Ala-leppilampi and Parsons2019). Organizational equity, diversity, and inclusion efforts should adapt to emphasize practices with stronger, more generalizable evidence bases such as intergroup contact and systems to bypass individual prejudice (Onyeador et al., Reference Onyeador, Hudson and Lewis2021).
We highlight these two examples as part of a broader pattern of overgeneralization in applied psychology from narrow samples, contexts, and measures to broader constructs and populations. To ensure effectiveness of organizational practices, we urge applied researchers and practitioners to make generalizations more cautiously. In particular, we urge organizations to await evidence on operationally-relevant groups and measures (e.g., actual diversity and equity outcomes) before moving models and interventions into practice.
Body
Yarkoni's critique focuses on overgeneralization from narrow sets of experimental stimuli, situations, or manipulations to very broad theoretical constructs. The issue of generalizability is not limited to academic and theoretical work but is also critically relevant for applied work that informs decisions made by organizations, institutions, and governments. Historically, applied psychology suffered from undergeneralization – a mistaken belief in “situational specificity,” that effects were unique to specific contexts and never (or rarely) generalizable (Schmidt & Hunter, Reference Schmidt and Hunter1977). With the advent of meta-analysis, applied psychology learned that much of this apparent variability was due to statistical artifacts. However, the field may have overcorrected and now tends to overgeneralize. Frequently, models, measures, or interventions are “validated” in narrow settings, then applied in other contexts without carefully considering generalizability. Where generalizability is tested, it is often done in limited ways. In this commentary, use two key generalizability challenges as illustrations.
First, consider cross-cultural generalizability. Much applied psychology research occurs in the United States and Western Europe, but these models are commonly used to motivate research and inform operational practices (e.g., selection systems, assessments, interventions) in organizations around the world (Gelfand, Leslie, & Fehr, Reference Gelfand, Leslie and Fehr2008). Researchers may allude to potential cross-cultural differences, but the core of Western models is generally assumed to apply across cultural contexts. Where generalizability across cultures is attended to, it is often done so haphazardly. A study might compare results in samples drawn from only two countries, rather than from a wider range of countries (Ones et al., Reference Ones, Dilchert, Deller, Albrecht, Duehr, Paulus, Ryan, Leong and Oswald2012). Often, these countries are described as varying on a single dimension of cultural characteristics (e.g., collectivistic vs. individualistic). Despite the narrow sampling of cultures in these studies, broad conclusions are often drawn about “collectivism” and assumed to apply to any culture that could be classified into these categories. Finally, samples in studies are frequently drawn from narrow subgroups within a culture (e.g., university students) without consideration of how these groups may differ from other cultural groups within a country. Several common overgeneralizations are apparent:
1. Individual countries are assumed to be exchangeable with others similarly classified.
2. The focal cultural characteristics (e.g., collectivism) are assumed to be the operative cause of cultural effects rather than other unmodeled factors (e.g., power distance, religiosity, history, economic environment).
3. The specific populations or subcultures sampled in a country are assumed to be representative of a country's cultural diversity.
These failures to consider included countries, characteristics, and subgroups as sampled from broader populations of these entities limit what conclusions are justifiable. If a researcher or organization observes that an intervention is effective in two contexts, they may conclude it will be effective more broadly. Conversely, if a study observes differences in relationships between two settings, they may also overestimate the variability in the relationship across cultures more broadly.
To better justify cross-cultural generalizations, researchers and practitioners must consider how representative their samples of individuals, characteristics, and countries are broadly. One possible approach is to conduct studies broadly sampling from diverse cultures around the world (Ones et al., Reference Ones, Dilchert, Deller, Albrecht, Duehr, Paulus, Ryan, Leong and Oswald2012). By robustly sampling from many cultures, researchers can more accurately gauge whether relationships are consistent or variable across contexts. An alternative approach would be to extend conclusions about generalizability cautiously. Conclusions from narrow studies should be limited to the groups, countries, and contexts represented. In reports, investigators should conclude “X predicts Y in a sample of students in the United States” rather than making generalizations about broad cultural factors such as “collectivism.” Over time, as single-context samples accumulate, systematic reviews of this evidence can identify the patterns that generalize (Oh, Reference Oh2009; van Aarde, Meiring, & Wiernik, Reference van Aarde, Meiring and Wiernik2017). Such an approach encourages appropriate caution and also encourages de-centering of Western perspectives, allowing researchers themselves representing diverse cultures to pose questions that are relevant to their cultural contexts (Cheung, van de Vijver, & Leong, Reference Cheung, van de Vijver and Leong2011; Gelfand et al., Reference Gelfand, Leslie and Fehr2008).
Second, consider validation of interventions. Organizational interventions are frequently trialed in narrow populations (e.g., university students or employees from a few organizations), then deployed operationally without careful evaluation of their broader effectiveness. A recent popular example is implicit bias-based interventions to address issues of bias, racism, and inequities in organizations. Such interventions have become widespread, but evidence for their effectiveness for improving prejudice and inequity outcomes is sparse (FitzGerald, Martin, Berner, & Hurst, Reference FitzGerald, Martin, Berner and Hurst2019; Onyeador, Hudson, & Lewis, Reference Onyeador, Hudson and Lewis2021). In a systematic review of implicit bias interventions, Forscher et al. (Reference Forscher, Lai, Axt, Ebersole, Herman, Devine and Nosek2019) found that the large majority of studies were conducted with US university students, focused only on changes in implicit attitudes versus broader outcomes, and reported small effects. Importantly, Forscher et al. observed substantial heterogeneity across studies, underscoring that broad generalizability of implicit bias effects should not be expected. In light of this review, the uptake of implicit bias interventions for operational use has outpaced evidence supporting them. We identify three key areas of overgeneralization:
1. Generalization from studied populations (primarily university students) to operationally-relevant population (employees in specific industries and regions).
2. Generalization from studied outcomes (primarily short-term changes in implicit bias scores) to operationally-relevant outcomes (prejudice and equity outcomes).
3. Generalization from small observed effects to assume larger societal relevance. Despite observing small effects, studies frequently allude to potential large societal impacts (e.g., through accumulation across people or over time; cf. Oswald, Mitchell, Blanton, Jaccard, & Tetlock, Reference Oswald, Mitchell, Blanton, Jaccard and Tetlock2015).
These overgeneralizations have consequences. Not only may these interventions be ineffective, but they appear to crowd out other actions that may better address systemic inequities (Pritlove, Juando-Prats, Ala-leppilampi, & Parsons, Reference Pritlove, Juando-Prats, Ala-leppilampi and Parsons2019). Organizational equity, diversity, and inclusion efforts should adapt to emphasize practices with stronger, more generalizable evidence bases such as intergroup contact and systems to bypass individual prejudice (Onyeador et al., Reference Onyeador, Hudson and Lewis2021).
We highlight these two examples as part of a broader pattern of overgeneralization in applied psychology from narrow samples, contexts, and measures to broader constructs and populations. To ensure effectiveness of organizational practices, we urge applied researchers and practitioners to make generalizations more cautiously. In particular, we urge organizations to await evidence on operationally-relevant groups and measures (e.g., actual diversity and equity outcomes) before moving models and interventions into practice.
Financial support
This research received no specific grant from any funding agency, commercial or not-for-profit sectors.
Conflict of interest
None.