Hostname: page-component-745bb68f8f-mzp66 Total loading time: 0 Render date: 2025-02-11T07:13:33.718Z Has data issue: false hasContentIssue false

Empirical Benchmarks for Interpreting Effect Size Variability in Meta-Analysis

Published online by Cambridge University Press:  30 August 2017

Brenton M. Wiernik*
Affiliation:
Department of Developmental, Personality and Social Psychology, Ghent University
Jack W. Kostal
Affiliation:
Department of Psychology, University of Minnesota
Michael P. Wilmot
Affiliation:
Department of Psychology, University of Minnesota
Stephan Dilchert
Affiliation:
Narendra Paul Loomba Department of Management, Baruch College, CUNY
Deniz S. Ones
Affiliation:
Department of Psychology, University of Minnesota
*
Correspondence concerning this article should be addressed to Brenton M. Wiernik, Department of Developmental, Personality and Social Psychology, Ghent University, Henri Dunantlaan 2, 9000 Gent, Belgium. Email: wiernik@workpsy.ch
Rights & Permissions [Opens in a new window]

Extract

Generalization in meta-analyses is not a dichotomous decision (typically encountered in papers using the Q test for homogeneity, the 75% rule, or null hypothesis tests). Inattention to effect size variability in meta-analyses may stem from a lack of guidelines for interpreting credibility intervals. In this commentary, we describe two methods for making practical interpretations and determining whether a particular SDρ represents a meaningful level of variability.

Type
Commentaries
Copyright
Copyright © Society for Industrial and Organizational Psychology 2017 

Generalization in meta-analyses is not a dichotomous decision (typically encountered in papers using the Q test for homogeneity, the 75% rule, or null hypothesis tests). Inattention to effect size variability in meta-analyses may stem from a lack of guidelines for interpreting credibility intervals. In this commentary, we describe two methods for making practical interpretations and determining whether a particular SDρ represents a meaningful level of variability.

Normative Interpretation

Typically, meta-analysts evaluate the magnitude of heterogeneity in effect sizes by focusing on the face value of the true variability of the effect size distribution (SDρ) or the width of its associated credibility interval (cf. “the .30-unit width of the interval”; Tett, Hundley, & Christiansen, Reference Tett, Hundley and Christiansen2017, p. 8). This approach needs refinement, however, as correct interpretation of a credibility interval depends not only on its width but also on the range of correlations that it spans. Correlations in applied psychology are not uniformly distributed, and a number of reviews have summarized distributions of effect sizes (e.g., Bosco, Aguinis, Singh, Field, & Pierce, Reference Bosco, Aguinis, Singh, Field and Pierce2015; Gignac & Szodorai, Reference Gignac and Szodorai2016; Hemphill, Reference Hemphill2003). Among them, the most comprehensive review is that of Paterson, Harms, Steel, and Credé (Reference Paterson, Harms, Steel and Credé2016), which integrates 30 years of meta-analyses of micro-level variables in management to develop empirical benchmarks for observed and corrected correlations. An advantage of this approach is that, by constructing distributions from meta-analyses, upward biasing effects of sampling error on effect size variability are controlled.

Paterson et al. (Reference Paterson, Harms, Steel and Credé2016, p. 76) report a general distribution of corrected correlations with quartiles of .15, .25, and .39. According to these benchmarks, the distribution of correlations in applied psychology is positively skewed, with most correlations falling between .15 and .40. A consequence of this nonuniform distribution is that true variability is more substantively meaningful for some correlation ranges than for others. For example, a .20-unit wide interval from .08 to .28 reflects a range from comparatively negligible (<25th percentile of the empirical distribution) to moderate (between 50th and 75th percentiles) correlation values, whereas, the endpoints of a .20-unit wide interval from .55 to .75 would both be considered “very large” (>90th percentile). Variability is more meaningful, both for theoretical interpretations and practical applications (e.g., incremental validity), when it occurs near the center of the empirical distribution of effects.

An alternative to face-value interpretations of SDρ is comparing ρ and credibility interval endpoints to a relevant empirical distribution of effect sizes. Appropriate questions from this perspective include the following: What percentiles of the empirical distribution do the endpoints of the interval correspond to? Do endpoints reflect substantively different parts of the distribution, relative to the research question? What is the percentile distance between top and bottom bounds of the interval? If the credibility interval spans a wide range of the empirical distribution, then there may be value in searching for moderators. By comparison, when an interval covers a narrow slice of the empirical distribution, any potential moderators are unlikely to have a strong impact on theoretical interpretations or practical decisions.

Table 1 illustrates hypothetical combinations of ρ and SDρ values, along with corresponding credibility intervals, percentile values for Paterson et al.’s (Reference Paterson, Harms, Steel and Credé2016) overall empirical distribution for ρ, and the percentile distance between credibility interval endpoints. Several patterns are clear from these values. First, for ρ < .40, SDρ values between .07 and .11 translate to credibility intervals that cover major portions of the distribution of correlations found in industrial and organizational (I-O) psychology research. For larger correlations, however, even sizable SDρ values represent increasingly small slices of the distribution (e.g., endpoints of a credibility interval for ρ = .55, SDρ = .09 are only 21 percentile points apart). For ρ ≥ .60, both bounds of all credibility intervals examined fall in the upper 25% of applied effects. For effects this large, even ostensibly large variability is unlikely to alter substantive conclusions about the strength of relations between constructs.

Table 1. I-O Research Correlation Percentiles for Credibility Intervals

Note. Percentiles based on Paterson et al.’s (Reference Paterson, Harms, Steel and Credé2016) overall distribution of corrected correlations; bars in percentile columns reflect in which quarter of the correlation distribution values fall (white bars with border indicate that one bound of the interval crosses zero); percentile (%ile) distance range plots scaled to minimum of 0 and maximu m of 100; Cred. Int. = credibility interval; CV = credibility value.

Conversely, comparatively small variability can be substantively meaningful for some values of ρ. An SDρ = .03 is often regarded as negligible, and, indeed, for ρ ≤ .15 or ρ ≥ .35, this variability reflects a minor portion of the empirical distribution. However, for correlations between these values, this small SDρ corresponds to credibility intervals that span a wide section of the effect size distribution. Indeed, for a median correlation (ρ = .25), the credibility interval for SDρ = .03 spans 30% of empirical effects.

In sum, there is more substantive value for explaining variability and identifying potential moderators if the credibility interval covers a wide range of the empirical effect sizes. For SDρ values examined here, the largest meaningful variability occurs for |ρ| = .10 to .45. By contrast, variability for larger correlations will not affect central conclusions.

Table 2 presents the percentile ranges of credibility intervals for a sample of published meta-analyses reporting relations to Conscientiousness (based on Wilmot, Reference Wilmot2017). Consistent with the hypothetical credibility intervals in Table 1, the empirical credibility intervals cover the widest portion of the empirical effect size distribution (i.e., have the largest percentile difference) for correlations ranging from .10 to .30. Credibility intervals for larger correlations tend to cover less of the empirical distribution, even for relatively large SDρ values. For larger correlations, there is less value in exploring for moderators, as theoretical conclusions about the impact of Conscientiousness are unlikely to change for different values (e.g., whether Conscientiousness correlates ρ = .45 or .60 with Quality of Life, researchers would still conclude this relationship is very large). An extreme example is the relation of Conscientiousness to Grit. The correlation has a relatively large SDρ (.07), but the mean effect (ρ = .84) is so large that the entire credibility interval is above the 99th percentile of applied effects.Footnote 1 Thus, moderator searches will be more meaningful for criteria with ρ located in the center of the distribution.

Table 2. Percentile Interpretations of Selected Published Credibility Intervals for Conscientiousness

Note. Table columns and color-coding correspond to those in Table 1; †reverse-coded; OCB = organizational citizenship behavior; CWB = counterproductive work behavior; all correlations corrected for unreliability in both variables; tabulated data based on Wilmot (Reference Wilmot2017).

Objective Interpretation

When dependent variables are expressed in units with meaningful, objective outcomes for individuals or organizations (e.g., salary, turnover rates, production costs), credibility intervals can also be interpreted in terms of their practical implications. For these criteria, variability can be translated into real-world consequences of correlations at each end of the interval. From this perspective, important questions include the following: Would applied decisions (e.g., to use a test in employee selection) change for the highest versus lowest interval values? Would conclusions about the importance of a variable for a real-world outcome change across the range of these correlations?

As an example, using the correlation with salary in Table 2 and the U. S. Census Bureau–reported SD of annual income for adults working ≥ 40 hours per week of $33,647, the impact of a 1 SD increase in Conscientiousness on individual salary varies from $7,066 for the upper bound value to negative $2,692 for the lower bound. This nearly $10,000 swing (and reversal of the effect's direction) clearly has practical implications for individuals and warrants investigation of moderators. Conversely, for turnover, assuming a base rate of 50% turnover and a selection ratio of .30, the lower and upper credibility values reflect a difference in reduced turnover rates of only 3%, which is less likely to substantively alter organizational decision making.

Conclusions and Caveats

Dichotomous conclusions about homogeneity or generalizability are not useful or appropriate, just as dichotomous conclusions about effects based on statistical significance tests lead to erroneous conclusions (see Schmidt, Reference Schmidt2008). Normative and objective interpretations of credibility intervals can help producers and consumers of meta-analyses to grow beyond reductionist descriptions of variability and simplistic interpretations of SDρ. Accounting for variability matters more when the credibility interval spans a sufficiently wide portion of the empirical distribution of effects. Moreover, values of SDρ that translate into meaningful variability depend on the position of ρ in the empirical distribution. As a result, proper interpretations rely on accurate estimates of both ρ and SDρ. For the latter, precise estimates of SDρ require larger k values than are typical in I-O psychology meta-analyses (Schmidt, Reference Schmidt2008; Steel, Kammeyer-Mueller, & Paterson, Reference Steel, Kammeyer-Mueller and Paterson2015), as well as the ability to fully correct for statistical artefacts using robust artefact estimates.Footnote 2 To increase precision, future meta-analyses may consider placing confidence intervals around SDρ and/or using a Bayesian estimator with priors based on relevant variables. As meta-analytic research progresses and informs practice, accurately estimating true variability and properly characterizing its implications, is paramount.

Footnotes

1 These percentiles refer to the full distribution of correlations in management; it would alternatively be possible to estimate percentiles referenced to distributions for more specific construct categories (e.g., the distribution of personality trait–behavior correlations).

2 Examples of robust artefact estimates include artefact values from the individual studies included in a meta-analysis (including methods that account for sampling error in these estimates; e.g., Raju et al., Reference Raju, Burke, Normand and Langlois1991) and artefact distributions based on a large number of estimates or the results of reliability generalization analyses (e.g., Vacha-Haase et al., Reference Vacha-Haase, Tani, Kogan, Woodall and Thompson2001).

References

Berry, C. M., Ones, D. S., & Sackett, P. R. (2007). Interpersonal deviance, organizational deviance, and their common correlates: A review and meta-analysis. Journal of Applied Psychology, 92 (2), 410424. https://doi.org/10.1037/0021-9010.92.2.410 Google Scholar
Blume, B. D., Ford, J. K., Baldwin, T. T., & Huang, J. L. (2010). Transfer of training: A meta-analytic review. Journal of Management, 36 (4), 10651105. https://doi.org/10/ftwgxk Google Scholar
Bosco, F. A., Aguinis, H., Singh, K., Field, J. G., & Pierce, C. A. (2015). Correlational effect size benchmarks. Journal of Applied Psychology, 100 (2), 431449. http://doi.org/10/bnw8 Google Scholar
Chang, C.-H., Ferris, D. L., Johnson, R. E., Rosen, C. C., & Tan, J. A. (2012). Core self-evaluations: A review and evaluation of the literature. Journal of Management, 38 (1), 81128. https://doi.org/10/dbd9nb Google Scholar
Chiaburu, D. S., Oh, I.-S., Berry, C. M., Li, N., & Gardner, R. G. (2011). The five-factor model of personality traits and organizational citizenship behaviors: A meta-analysis. Journal of Applied Psychology, 96 (6), 11401166. https://doi.org/10.1037/a0024004 Google Scholar
Christian, M. S., Garza, A. S., & Slaughter, J. E. (2011). Work engagement: A quantitative review and test of its relations with task and contextual performance. Personnel Psychology, 64 (1), 89136. https://doi.org/10/c6b58z Google Scholar
Credé, M., Tynan, M. C., & Harms, P. D. (2016). Much ado about grit: A meta-analytic synthesis of the grit literature. Journal of Personality and Social Psychology. Advance online publication. https://doi.org/10.1037/pspp0000102 Google Scholar
Gignac, G. E., & Szodorai, E. T. (2016). Effect size guidelines for individual differences researchers. Personality and Individual Differences, 102, 7478. http://doi.org/10/f84bhv CrossRefGoogle Scholar
Hemphill, J. F. (2003). Interpreting the magnitudes of correlation coefficients. American Psychologist, 58 (1), 7879. http://doi.org/10/fb38g8 Google Scholar
Judge, T. A., Heller, D., & Mount, M. K. (2002). Five-factor model of personality and job satisfaction: A meta-analysis. Journal of Applied Psychology, 87 (3), 530541. https://doi.org/10.1037//0021-9010.87.3.530 Google Scholar
Judge, T. A., Rodell, J. B., Klinger, R. L., Simon, L. S., & Crawford, E. R. (2013). Hierarchical representations of the five-factor model of personality in predicting job performance: Integrating three organizing frameworks with two theoretical perspectives. Journal of Applied Psychology, 98 (6), 875925. https://doi.org/10/bdbb Google Scholar
Ng, T. W. H., Eby, L. T., Sorensen, K. L., & Feldman, D. C. (2005). Predictors of objective and subjective career success: A meta-analysis. Personnel Psychology, 58 (2), 367408. https://doi.org/10/dw64z6 Google Scholar
Paterson, T. A., Harms, P. D., Steel, P., & Credé, M. (2016). An assessment of the magnitude of effect sizes: Evidence from 30 years of meta-analysis in management. Journal of Leadership & Organizational Studies, 23 (1), 6681. http://doi.org/10/bjz9 Google Scholar
Raju, N. S., Burke, M. J., Normand, J., & Langlois, G. M. (1991). A new meta-analytic approach. Journal of Applied Psychology, 76 (3), 432446. http://doi.org/10/dcrgkf Google Scholar
Schmidt, F. L. (2008). Meta-analysis: A constantly evolving research integration tool. Organizational Research Methods, 11 (1), 96113. http://doi.org/10/drwrb2 CrossRefGoogle Scholar
Steel, P. D. G. (2007). The nature of procrastination: A meta-analytic and theoretical review of quintessential self-regulatory failure. Psychological Bulletin, 133 (1), 6594. https://doi.org/10.1037/0033-2909.133.1.65 Google Scholar
Steel, P., Schmidt, J., & Shultz, J. (2008). Refining the relationship between personality and subjective well-being. Psychological Bulletin, 134 (1), 138161.Google Scholar
Steel, P. D. G., Kammeyer-Mueller, J., & Paterson, T. A. (2015). Improving the meta-analytic assessment of effect size variance with an informed Bayesian prior. Journal of Management, 41 (2), 718743. http://doi.org/10/b6rc Google Scholar
Tett, R. P., Hundley, N. A., & Christiansen, N. D. (2017). Meta-analysis and the myth of generalizability. Industrial and Organizational Psychology: Perspectives on Science and Practice, 10 (3), 421–456.Google Scholar
Thomas, J. P., Whitman, D. S., & Viswesvaran, C. (2010). Employee proactivity in organizations: A comparative meta-analysis of emergent proactive constructs. Journal of Occupational and Organizational Psychology, 83 (2), 275300. https://doi.org/10.1348/096317910x502359 Google Scholar
Vacha-Haase, T., Tani, C. R., Kogan, L. R., Woodall, R. A., & Thompson, B. (2001). Reliability generalization: Exploring reliability variations on MMPI/MMPI-2 validity scale scores. Assessment, 8 (4), 391401. http://doi.org/10.1177/107319110100800404 Google Scholar
Wilmot, M. P. (2017). Personality and its impacts across the behavioral sciences: A quantitative review of meta-analytic findings. Doctoral dissertation. University of Minnesota, Minneapolis, MN.Google Scholar
Zimmerman, R. D. (2008). Understanding the impact of personality traits on individuals' turnover decisions: A meta-analytic path model. Personnel Psychology, 61 (2), 309348. https://doi.org/10.1111/j.1744-6570.2008.00115.x Google Scholar
Figure 0

Table 1. I-O Research Correlation Percentiles for Credibility Intervals

Figure 1

Table 2. Percentile Interpretations of Selected Published Credibility Intervals for Conscientiousness