We agree with the importance of the question raised by Melson-Silimon, Harris, Shoenfelt, Miller, and Carter (Reference Melson-Silimon, Harris, Shoenfelt, Miller and Carter2019): With the integration of the five-factor model of personality into the fifth edition of the Diagnostic and Statistical Manual of Mental Disorders (DSM-5; American Psychiatric Association, 2013), does use of personality testing in personnel selection risk running afoul of the Americans with Disabilities Act (ADA)? We clarify their layout of general concerns by considering personality-based measures in operational employment decisions. Our commentary addresses three themes: (1) the psychometric properties of operational personality tests, (2) the applicability of neuroscience for informing us about job applicant pools, and (3) the degree to which the test user has support for the predictive hypothesis when using the test. We conclude that test users considering these themes as part of professional development and validation practice can avoid concerns raised in the focal article.
Psychometric considerations for operational personality tests
Construct validity
Typically, construct validity (per Cronbach & Meehl, Reference Cronbach and Meehl1955) is considered for the range of test scores as a whole; for example, a custom-developed measure of conscientiousness might be correlated with other measures of the same construct (e.g., NEO-PI-R or IPIP scales). To infer that extreme levels of work-related personality scales indicate clinical disorder, one must consider the construct validity of personnel selection personality tests at these extreme levels. Ideally, test developers maximize test score information near the cutoff score (if a cutoff is used) or near the mean. Taking an item response theory (IRT) perspective, the test information function would be maximized in these areas. Typically, test information functions for selection devices have their lowest levels at extreme ends of the scale. Maples, Guan, Carter, and Miller (Reference Maples, Guan, Carter and Miller2014) provide a test information function for one facet of the NEO-PI-R showing less precision at the high and low ends of the latent trait continuum. This implies that individual differences in personality construct-related behavior for those very high or low in the trait are less well-measured by personality tests used in selection.
In contrast, personality tests designed for use in clinical populations (e.g., the MMPI) likely provide better precision at the level of behavioral extremes. It is also likely that the behaviors and cognitions measured by personality tests for clinical vs. nonclinical populations are different. Consider the suggestion by Melson-Silimon et al. (Reference Melson-Silimon, Harris, Shoenfelt, Miller and Carter2019) that extreme conscientiousness is associated with obsessive compulsive disorder. Looking at the item content of the Yale–Brown Obsessive Compulsive Scale (Feinstein, Fallon, Petkova, & Liebowitz, Reference Feinstein, Fallon, Petkova and Liebowitz2003; Goodman et al., Reference Goodman, Price, Rasmussen, Mazure, Fleischmann, Hill and Charney1989a, Reference Goodman, Price, Rasmussen, Mazure, Delgado, Heninger and Charney1989b; International OCD Foundation, 2016), it is apparent that the behaviors and cognitions (e.g., obsessive thoughts, compulsive behaviors, fear of harming oneself or others, fear of stealing things, compulsory checking of locks, repeating routine activities) are intentionally quite different from the constructs measured in a work-oriented conscientiousness scale. Personality tests typically used in personnel selection lack sufficient measurement precision to draw meaningful inferences about the extreme behavior that would indicate an individual’s psychopathology.
The nomological link between mental disorders and job performance
There is a potential tautological issue with mental disorders and job performance in general. Freud is often attributed with stating that a psychological healthy individual is one who has the ability to love and to work (“zu lieben und zu arbeiten”).Footnote
1
This conceptualization continues today in the DSM-5, which states that mental disorders are often “associated with significant distress or disability in social, occupational, or other important activities.” The DSM-5 also states that mental disorders are “characterized by clinically significant disturbance in an individual’s cognition, emotion regulation, or behavior.” The DSM-IV had previously stated that a mental disorder is clinically significant when it “causes clinically significant distress or impairment in social, occupational, or other important areas of functioning” (p. 7). The DSM-IV included this wording in the criteria for most disorders. One of the hallmarks of mental disorders is impaired ability to perform a job as well as impairments in competencies that are related to job performance. From a construct validity perspective, many mental disorders are associated with impaired job performance. Thus, it should come as no surprise that individuals with mental disorders may be less successful on the job and on personnel selection instruments. To put it into a syllogism, some individuals with a disorder may perform poorly on a job-related personality test, but individuals performing poorly on the test do not necessarily have a disorder.
Let us also recall the distinction between personality scale scores and personality disorders. DSM-5 criteria do not indicate specific tests or test scores for providing cutoffs for diagnosis. In fact, a clinician could make a diagnosis without administering a personality test at all. A personality test cannot be used solely to make a diagnosis; thus, an individual who has an extreme score on a personality test does not necessarily have a diagnosis. In fact, the DSM-5 states that “clinical training and experience” is needed to make a diagnosis. Thus, we disagree with Melson-Silimon et al.’s (Reference Melson-Silimon, Harris, Shoenfelt, Miller and Carter2019) assertion that operational personality tests could provide medical information.
Criterion-related validity evidence, job relatedness, and legal defensibility
The fact that personnel selection instruments or even measures of job performance itself might disproportionately screen out individuals with mental disorders does not mean their use is illegal under ADA. Section 12112 (b) (6) of the ADA states that employers are prohibited from “using qualification standards, employment tests or other selection criteria that screen out or tend to screen out an individual with a disability or a class of individuals with disabilities….” However, the section goes on to provide an exemption when “… the standard, test or other selection criteria, as used by the covered entity, is shown to be job-related for the position in question and is consistent with business necessity.” If an employer uses a personnel selection instrument (e.g., a personality test) that is job-related and consistent with business necessity, under ADA it would not be illegal if applicants with a mental disorder failed the instrument at higher levels than mentally healthy applicants. We return to validity and the predictive hypothesis below.
Is there curvilinearity in the predictor-criterion relationship?
Melson-Silimon et al. (Reference Melson-Silimon, Harris, Shoenfelt, Miller and Carter2019) suggested that research demonstrating curvilinear relationships between personality and performance provides evidence that extreme levels of a personality dimension have a negative effect on performance. However, there is conflicting evidence of curvilinearity in the prediction of job performance. We focus on studies examining nonspecialized on-the-job performance. Some have reported evidence of curvilinearity (Carter, Dalal, Boyce, O’Connell, Kung, & Delgado, Reference Carter, Dalal, Boyce, O’Connell, Kung and Delgado2014; LaHuis, Martin, & Avis, Reference LaHuis, Martin and Avis2005; Le, Oh, Robbins, Ilies, Holland, & Westrick, Reference Le, Oh, Robbins, Ilies, Holland and Westrick2011). On the basis of this type of evidence, Carter, Miller, and Widiger (Reference Carter, Miller and Widiger2018) suggested that not only those with low scores but also those with “too high” scores on personality traits are likely to exhibit maladaptive behavioral tendencies.
However, some authors of these studies have failed to replicate these findings (Le et al., Reference Le, Oh, Robbins, Ilies, Holland and Westrick2011, second sample) or found a different pattern (LaHuis et al., Reference LaHuis, Martin and Avis2005, second sample), and other researchers have found no evidence for meaningful curvilinearity (Nickel, Roberts, & Chernyshenko, Reference Nickel, Roberts and Chernyshenko2019; Robie & Ryan, Reference Robie and Ryan1999; Walmsley, Sackett, & Nichols, Reference Walmsley, Sackett and Nichols2018; Whetzel, McDaniel, Yost, & Kim, Reference Whetzel, McDaniel, Yost and Kim2010). The differences could be due to several varying features across studies, but here we highlight analytic approaches. For instance, Carter et al. (Reference Carter, Dalal, Boyce, O’Connell, Kung and Delgado2014) used a graded unfolding item response theory (IRT) model whereas others used classical test theory approaches, although Nickel et al. (Reference Nickel, Roberts and Chernyshenko2019) also used ideal point models and found no evidence for curvilinearity. Nickel et al. discuss speculation of high-score maladaptation as a possible “misattribution hypothesis”; specifically, they say that “people observe what looks like a person being too conscientious, but what they are really observing is one or two other constructs that look pathological when combined with normal levels of conscientious” (pp. 309–310). There is not enough current evidence to make a universal suggestion that individuals at higher ends of conscientiousness, for example, have lower performance than those in the middle.
Most examinations of curvilinearity have used quadratic terms to model curvilinearity under the expectation of an inverted-U-shaped function. However, it could be the case that the true underlying relationship is asymptotic, whereby progressively higher scores are neither beneficial nor detrimental. With the exception of Nickel et al. (Reference Nickel, Roberts and Chernyshenko2019), we are unaware of published empirical personality research comparing asymptotic versus quadratic relationships. Regardless, future research needs to disentangle the conflicting findings before an inference of construct validity can be made and an understanding of implications for work behavior can be generalized.
Neuroscience and applicant pool considerations
We find that the focal article’s claims related to personality neuroscience are premature and over-stated. Our reservation is based on four sets of evidence: the data used to support personality neuroscience, contradictory theory, evidence related to personality change, and the ADA’s preclusion of basing hiring decisions on stereotypes and assumptions.
Veracity of personality neuroscience research
In Table 1, we highlight several passages in the focal article relative to our understanding of the cited studies cited. The experimental designs, magnitude of effect sizes, and sample sizes in these studies are problematic from either a replication or meta-analytic perspective (Camerer et al., Reference Camerer, Dreber, Holzmeister, Ho, Huber, Johannesson and Wu2018; Schmidt & Oh, Reference Schmidt and Oh2016; Vazire, Reference Vazire2018). Contrary to claims made in Melson-Silimon et al. (Reference Melson-Silimon, Harris, Shoenfelt, Miller and Carter2019), these empirical studies do not yield evidence that personality-oriented neuroscience has implications for personnel selection.
Contradictory theory
From a theoretical perspective, we also do not find personality neuroscience to have compelling implications for personnel selection. Both Barlow, Sauer-Zavala, Carl, Bullis, and Ellard (Reference Barlow, Sauer-Zavala, Carl, Bullis and Ellard2014) and Lahey, Krueger, Rathouz, Waldman, and Zald (Reference Lahey, Krueger, Rathouz, Waldman and Zald2017) discuss psychopathology in terms of a discrete set of higher-order factors referred to as externalizing and internalizing. Regarding the focal article by Melson-Silimon et al. (Reference Melson-Silimon, Harris, Shoenfelt, Miller and Carter2019), psychopathology related to externalizing would implicate extraversion and psychopathology related to internalizing would implicate neuroticism. As Barlow et al. (Reference Barlow, Sauer-Zavala, Carl, Bullis and Ellard2014, p. 345) put it when paraphrasing Eysenck (Reference Eysenck1947), “[I]ndividuals with the diagnosis of neurosis occupied the pathological extreme of the personality trait of neuroticism.” However, not everyone who scores “high” in neuroticism meets the criteria for clinical psychopathology. Additionally, both Barlow et al. (Reference Barlow, Sauer-Zavala, Carl, Bullis and Ellard2014) and Lahey et al. (Reference Lahey, Krueger, Rathouz, Waldman and Zald2017) pointed to genetic and psychobiological influences on psychopathology, not problematic brain structure. In summary, models and theories of psychopathology incorporate two selection-oriented personality dimensions, extraversion and neuroticism, as nonclinical indicators that at an extreme level, for a minority of people, might indicate clinically significant problems.
Evidence of personality change
A related issue is change in personality dimensions such as neuroticism over the life span. Roberts, Walton, and Viechtbauer (Reference Roberts, Walton and Viechtbauer2006) showed that personality scores increase over the life span, largely due to environmental factors and not genetic ones (p. 18). Roberts et al. (Reference Roberts, Luo, Briley, Chow, Su and Hill2017) found that short-term therapy can cause trait-level (as opposed to state-level) improvement in personality dimensions, specifically neuroticism, nearly equal to the total amount of anticipated lifespan change due to environmental factors alone. Personality trait changes were found in subjects with severe depression and generalized anxiety. It is worth reflecting upon whether significant trait-level change in neuroticism after brief therapy is consistent with changes in brain structure change, as in the strong case proposed by Melson-Silimon et al. (Reference Melson-Silimon, Harris, Shoenfelt, Miller and Carter2019). Personality neuroscience deserves further research, but its currently under-developed state mitigates against blanket warnings about personality testing in organizations at least in regard to hard-wired brain structure–based impairments.
ADA precludes hiring based on stereotypes and assumptions
Our fourth and final reservation is about a missed opportunity by Melson-Silimon et al. (Reference Melson-Silimon, Harris, Shoenfelt, Miller and Carter2019) to note that the ADA explicitly states (https://www.eeoc.gov/eeoc/history/ada25th/ada.cfm) that employers cannot base decisions about candidates or employees based on stereotypes and assumptions. In terms of personality neuroscience as outlined in Melson-Silimon et al., people whose neuroticism (and/or other personality dimension) test score is so high that one might believe them to be impaired cannot legally be assumed to be diagnosable under the ADA. An employer can require that all candidates must be demonstrably able to perform the essential functions of the job and still be in compliance with the ADA. Melson-Silimon et al. clearly note that personality-oriented testing has been upheld in court challenges because behavioral tendencies related to getting along with others are routinely found in job analyses. Current evidence does not support speculation about whether extreme scores indicate pathology. Thus, based on the literature reviewed here and our own experience, we are skeptical that extreme scores on neuroticism (and other personality dimensions) among job applicants in competitive employment applicant pools are indicative of pathology at the level that would indicate a disability as defined in the ADA.
Evaluating the predictive hypothesis in operational measurement
For an employment personality test to yield valid inferences, score use and interpretation are based on understanding and prediction of an applicant’s likelihood of engaging in certain types of job behavior. In Binning and Barrett’s (Reference Binning and Barrett1989) framework, this inference refers to the link between observed predictor data and the criterion construct. The basis for using the test in an employment setting is its relevance to the criterion construct(s) (Guion, Reference Guion2011). Validity evidence supports the test to the extent that (a) the criterion domain is a legitimate representation of important behavior and (b) test scores provide a sample of this behavior or a sign of the individual attributes that determine behavior (Wernimont & Campbell, Reference Wernimont and Campbell1968).
This contrasts with a strategy of attempting to evaluate test use via links between observed test scores and the predictor construct domain without regard to a predictive hypothesis conditional on work performance. The example in the focal article is whether a test is intentionally or unintentionally diagnostic of disability or impairment. This is an incomplete analysis. If concerns of specific impairments are articulated as job-related, the analysis shifts from one solely predictor-focused to one involving this predictive hypothesis. This is why a measure like the MMPI can be used appropriately as a sign of behavior in some public safety selection systems (post-conditional offer), but can otherwise be inappropriate for use in the absence of credible relevance to the legitimate performance domain as a pre-conditional-offer predictor.
Employment decisions using test scores should be made on the basis of forecasts about legitimate aspects of productive or unproductive work behavior. Tests are used for description to the extent that this enhances understanding of the scores, and are useful when their predictions increase the base rate of successful employees. As Melson-Silimon et al. (Reference Melson-Silimon, Harris, Shoenfelt, Miller and Carter2019) suggest, starting with job analysis promotes a focus on work behavior. Additionally, the predictive hypothesis should be evaluated with an appropriate validation strategy: As described elsewhere (Putka & Sackett, Reference Putka, Sackett, Farr and Tippins2012), pertinent evidence may take a variety of forms, and may include validity generalization, which is not solely correlations from a predictive validity study. Personality-based employment tests, whether in the form of signs or samples, are based on evidence that they relate to an individual’s capacity to perform a job, not solely an analysis of the predictor space to which they apply. Let us focus on that when discussing whether work-oriented personality tests are fair for people with extreme scores.
We agree with the importance of the question raised by Melson-Silimon, Harris, Shoenfelt, Miller, and Carter (Reference Melson-Silimon, Harris, Shoenfelt, Miller and Carter2019): With the integration of the five-factor model of personality into the fifth edition of the Diagnostic and Statistical Manual of Mental Disorders (DSM-5; American Psychiatric Association, 2013), does use of personality testing in personnel selection risk running afoul of the Americans with Disabilities Act (ADA)? We clarify their layout of general concerns by considering personality-based measures in operational employment decisions. Our commentary addresses three themes: (1) the psychometric properties of operational personality tests, (2) the applicability of neuroscience for informing us about job applicant pools, and (3) the degree to which the test user has support for the predictive hypothesis when using the test. We conclude that test users considering these themes as part of professional development and validation practice can avoid concerns raised in the focal article.
Psychometric considerations for operational personality tests
Construct validity
Typically, construct validity (per Cronbach & Meehl, Reference Cronbach and Meehl1955) is considered for the range of test scores as a whole; for example, a custom-developed measure of conscientiousness might be correlated with other measures of the same construct (e.g., NEO-PI-R or IPIP scales). To infer that extreme levels of work-related personality scales indicate clinical disorder, one must consider the construct validity of personnel selection personality tests at these extreme levels. Ideally, test developers maximize test score information near the cutoff score (if a cutoff is used) or near the mean. Taking an item response theory (IRT) perspective, the test information function would be maximized in these areas. Typically, test information functions for selection devices have their lowest levels at extreme ends of the scale. Maples, Guan, Carter, and Miller (Reference Maples, Guan, Carter and Miller2014) provide a test information function for one facet of the NEO-PI-R showing less precision at the high and low ends of the latent trait continuum. This implies that individual differences in personality construct-related behavior for those very high or low in the trait are less well-measured by personality tests used in selection.
In contrast, personality tests designed for use in clinical populations (e.g., the MMPI) likely provide better precision at the level of behavioral extremes. It is also likely that the behaviors and cognitions measured by personality tests for clinical vs. nonclinical populations are different. Consider the suggestion by Melson-Silimon et al. (Reference Melson-Silimon, Harris, Shoenfelt, Miller and Carter2019) that extreme conscientiousness is associated with obsessive compulsive disorder. Looking at the item content of the Yale–Brown Obsessive Compulsive Scale (Feinstein, Fallon, Petkova, & Liebowitz, Reference Feinstein, Fallon, Petkova and Liebowitz2003; Goodman et al., Reference Goodman, Price, Rasmussen, Mazure, Fleischmann, Hill and Charney1989a, Reference Goodman, Price, Rasmussen, Mazure, Delgado, Heninger and Charney1989b; International OCD Foundation, 2016), it is apparent that the behaviors and cognitions (e.g., obsessive thoughts, compulsive behaviors, fear of harming oneself or others, fear of stealing things, compulsory checking of locks, repeating routine activities) are intentionally quite different from the constructs measured in a work-oriented conscientiousness scale. Personality tests typically used in personnel selection lack sufficient measurement precision to draw meaningful inferences about the extreme behavior that would indicate an individual’s psychopathology.
The nomological link between mental disorders and job performance
There is a potential tautological issue with mental disorders and job performance in general. Freud is often attributed with stating that a psychological healthy individual is one who has the ability to love and to work (“zu lieben und zu arbeiten”).Footnote 1 This conceptualization continues today in the DSM-5, which states that mental disorders are often “associated with significant distress or disability in social, occupational, or other important activities.” The DSM-5 also states that mental disorders are “characterized by clinically significant disturbance in an individual’s cognition, emotion regulation, or behavior.” The DSM-IV had previously stated that a mental disorder is clinically significant when it “causes clinically significant distress or impairment in social, occupational, or other important areas of functioning” (p. 7). The DSM-IV included this wording in the criteria for most disorders. One of the hallmarks of mental disorders is impaired ability to perform a job as well as impairments in competencies that are related to job performance. From a construct validity perspective, many mental disorders are associated with impaired job performance. Thus, it should come as no surprise that individuals with mental disorders may be less successful on the job and on personnel selection instruments. To put it into a syllogism, some individuals with a disorder may perform poorly on a job-related personality test, but individuals performing poorly on the test do not necessarily have a disorder.
Let us also recall the distinction between personality scale scores and personality disorders. DSM-5 criteria do not indicate specific tests or test scores for providing cutoffs for diagnosis. In fact, a clinician could make a diagnosis without administering a personality test at all. A personality test cannot be used solely to make a diagnosis; thus, an individual who has an extreme score on a personality test does not necessarily have a diagnosis. In fact, the DSM-5 states that “clinical training and experience” is needed to make a diagnosis. Thus, we disagree with Melson-Silimon et al.’s (Reference Melson-Silimon, Harris, Shoenfelt, Miller and Carter2019) assertion that operational personality tests could provide medical information.
Criterion-related validity evidence, job relatedness, and legal defensibility
The fact that personnel selection instruments or even measures of job performance itself might disproportionately screen out individuals with mental disorders does not mean their use is illegal under ADA. Section 12112 (b) (6) of the ADA states that employers are prohibited from “using qualification standards, employment tests or other selection criteria that screen out or tend to screen out an individual with a disability or a class of individuals with disabilities….” However, the section goes on to provide an exemption when “… the standard, test or other selection criteria, as used by the covered entity, is shown to be job-related for the position in question and is consistent with business necessity.” If an employer uses a personnel selection instrument (e.g., a personality test) that is job-related and consistent with business necessity, under ADA it would not be illegal if applicants with a mental disorder failed the instrument at higher levels than mentally healthy applicants. We return to validity and the predictive hypothesis below.
Is there curvilinearity in the predictor-criterion relationship?
Melson-Silimon et al. (Reference Melson-Silimon, Harris, Shoenfelt, Miller and Carter2019) suggested that research demonstrating curvilinear relationships between personality and performance provides evidence that extreme levels of a personality dimension have a negative effect on performance. However, there is conflicting evidence of curvilinearity in the prediction of job performance. We focus on studies examining nonspecialized on-the-job performance. Some have reported evidence of curvilinearity (Carter, Dalal, Boyce, O’Connell, Kung, & Delgado, Reference Carter, Dalal, Boyce, O’Connell, Kung and Delgado2014; LaHuis, Martin, & Avis, Reference LaHuis, Martin and Avis2005; Le, Oh, Robbins, Ilies, Holland, & Westrick, Reference Le, Oh, Robbins, Ilies, Holland and Westrick2011). On the basis of this type of evidence, Carter, Miller, and Widiger (Reference Carter, Miller and Widiger2018) suggested that not only those with low scores but also those with “too high” scores on personality traits are likely to exhibit maladaptive behavioral tendencies.
However, some authors of these studies have failed to replicate these findings (Le et al., Reference Le, Oh, Robbins, Ilies, Holland and Westrick2011, second sample) or found a different pattern (LaHuis et al., Reference LaHuis, Martin and Avis2005, second sample), and other researchers have found no evidence for meaningful curvilinearity (Nickel, Roberts, & Chernyshenko, Reference Nickel, Roberts and Chernyshenko2019; Robie & Ryan, Reference Robie and Ryan1999; Walmsley, Sackett, & Nichols, Reference Walmsley, Sackett and Nichols2018; Whetzel, McDaniel, Yost, & Kim, Reference Whetzel, McDaniel, Yost and Kim2010). The differences could be due to several varying features across studies, but here we highlight analytic approaches. For instance, Carter et al. (Reference Carter, Dalal, Boyce, O’Connell, Kung and Delgado2014) used a graded unfolding item response theory (IRT) model whereas others used classical test theory approaches, although Nickel et al. (Reference Nickel, Roberts and Chernyshenko2019) also used ideal point models and found no evidence for curvilinearity. Nickel et al. discuss speculation of high-score maladaptation as a possible “misattribution hypothesis”; specifically, they say that “people observe what looks like a person being too conscientious, but what they are really observing is one or two other constructs that look pathological when combined with normal levels of conscientious” (pp. 309–310). There is not enough current evidence to make a universal suggestion that individuals at higher ends of conscientiousness, for example, have lower performance than those in the middle.
Most examinations of curvilinearity have used quadratic terms to model curvilinearity under the expectation of an inverted-U-shaped function. However, it could be the case that the true underlying relationship is asymptotic, whereby progressively higher scores are neither beneficial nor detrimental. With the exception of Nickel et al. (Reference Nickel, Roberts and Chernyshenko2019), we are unaware of published empirical personality research comparing asymptotic versus quadratic relationships. Regardless, future research needs to disentangle the conflicting findings before an inference of construct validity can be made and an understanding of implications for work behavior can be generalized.
Neuroscience and applicant pool considerations
We find that the focal article’s claims related to personality neuroscience are premature and over-stated. Our reservation is based on four sets of evidence: the data used to support personality neuroscience, contradictory theory, evidence related to personality change, and the ADA’s preclusion of basing hiring decisions on stereotypes and assumptions.
Veracity of personality neuroscience research
In Table 1, we highlight several passages in the focal article relative to our understanding of the cited studies cited. The experimental designs, magnitude of effect sizes, and sample sizes in these studies are problematic from either a replication or meta-analytic perspective (Camerer et al., Reference Camerer, Dreber, Holzmeister, Ho, Huber, Johannesson and Wu2018; Schmidt & Oh, Reference Schmidt and Oh2016; Vazire, Reference Vazire2018). Contrary to claims made in Melson-Silimon et al. (Reference Melson-Silimon, Harris, Shoenfelt, Miller and Carter2019), these empirical studies do not yield evidence that personality-oriented neuroscience has implications for personnel selection.
Table 1. Comparisons between claims in Melson-Silimon et al. (Reference Melson-Silimon, Harris, Shoenfelt, Miller and Carter2019) for personality neuroscience based on brain structure and personality neuroscience data
Contradictory theory
From a theoretical perspective, we also do not find personality neuroscience to have compelling implications for personnel selection. Both Barlow, Sauer-Zavala, Carl, Bullis, and Ellard (Reference Barlow, Sauer-Zavala, Carl, Bullis and Ellard2014) and Lahey, Krueger, Rathouz, Waldman, and Zald (Reference Lahey, Krueger, Rathouz, Waldman and Zald2017) discuss psychopathology in terms of a discrete set of higher-order factors referred to as externalizing and internalizing. Regarding the focal article by Melson-Silimon et al. (Reference Melson-Silimon, Harris, Shoenfelt, Miller and Carter2019), psychopathology related to externalizing would implicate extraversion and psychopathology related to internalizing would implicate neuroticism. As Barlow et al. (Reference Barlow, Sauer-Zavala, Carl, Bullis and Ellard2014, p. 345) put it when paraphrasing Eysenck (Reference Eysenck1947), “[I]ndividuals with the diagnosis of neurosis occupied the pathological extreme of the personality trait of neuroticism.” However, not everyone who scores “high” in neuroticism meets the criteria for clinical psychopathology. Additionally, both Barlow et al. (Reference Barlow, Sauer-Zavala, Carl, Bullis and Ellard2014) and Lahey et al. (Reference Lahey, Krueger, Rathouz, Waldman and Zald2017) pointed to genetic and psychobiological influences on psychopathology, not problematic brain structure. In summary, models and theories of psychopathology incorporate two selection-oriented personality dimensions, extraversion and neuroticism, as nonclinical indicators that at an extreme level, for a minority of people, might indicate clinically significant problems.
Evidence of personality change
A related issue is change in personality dimensions such as neuroticism over the life span. Roberts, Walton, and Viechtbauer (Reference Roberts, Walton and Viechtbauer2006) showed that personality scores increase over the life span, largely due to environmental factors and not genetic ones (p. 18). Roberts et al. (Reference Roberts, Luo, Briley, Chow, Su and Hill2017) found that short-term therapy can cause trait-level (as opposed to state-level) improvement in personality dimensions, specifically neuroticism, nearly equal to the total amount of anticipated lifespan change due to environmental factors alone. Personality trait changes were found in subjects with severe depression and generalized anxiety. It is worth reflecting upon whether significant trait-level change in neuroticism after brief therapy is consistent with changes in brain structure change, as in the strong case proposed by Melson-Silimon et al. (Reference Melson-Silimon, Harris, Shoenfelt, Miller and Carter2019). Personality neuroscience deserves further research, but its currently under-developed state mitigates against blanket warnings about personality testing in organizations at least in regard to hard-wired brain structure–based impairments.
ADA precludes hiring based on stereotypes and assumptions
Our fourth and final reservation is about a missed opportunity by Melson-Silimon et al. (Reference Melson-Silimon, Harris, Shoenfelt, Miller and Carter2019) to note that the ADA explicitly states (https://www.eeoc.gov/eeoc/history/ada25th/ada.cfm) that employers cannot base decisions about candidates or employees based on stereotypes and assumptions. In terms of personality neuroscience as outlined in Melson-Silimon et al., people whose neuroticism (and/or other personality dimension) test score is so high that one might believe them to be impaired cannot legally be assumed to be diagnosable under the ADA. An employer can require that all candidates must be demonstrably able to perform the essential functions of the job and still be in compliance with the ADA. Melson-Silimon et al. clearly note that personality-oriented testing has been upheld in court challenges because behavioral tendencies related to getting along with others are routinely found in job analyses. Current evidence does not support speculation about whether extreme scores indicate pathology. Thus, based on the literature reviewed here and our own experience, we are skeptical that extreme scores on neuroticism (and other personality dimensions) among job applicants in competitive employment applicant pools are indicative of pathology at the level that would indicate a disability as defined in the ADA.
Evaluating the predictive hypothesis in operational measurement
For an employment personality test to yield valid inferences, score use and interpretation are based on understanding and prediction of an applicant’s likelihood of engaging in certain types of job behavior. In Binning and Barrett’s (Reference Binning and Barrett1989) framework, this inference refers to the link between observed predictor data and the criterion construct. The basis for using the test in an employment setting is its relevance to the criterion construct(s) (Guion, Reference Guion2011). Validity evidence supports the test to the extent that (a) the criterion domain is a legitimate representation of important behavior and (b) test scores provide a sample of this behavior or a sign of the individual attributes that determine behavior (Wernimont & Campbell, Reference Wernimont and Campbell1968).
This contrasts with a strategy of attempting to evaluate test use via links between observed test scores and the predictor construct domain without regard to a predictive hypothesis conditional on work performance. The example in the focal article is whether a test is intentionally or unintentionally diagnostic of disability or impairment. This is an incomplete analysis. If concerns of specific impairments are articulated as job-related, the analysis shifts from one solely predictor-focused to one involving this predictive hypothesis. This is why a measure like the MMPI can be used appropriately as a sign of behavior in some public safety selection systems (post-conditional offer), but can otherwise be inappropriate for use in the absence of credible relevance to the legitimate performance domain as a pre-conditional-offer predictor.
Employment decisions using test scores should be made on the basis of forecasts about legitimate aspects of productive or unproductive work behavior. Tests are used for description to the extent that this enhances understanding of the scores, and are useful when their predictions increase the base rate of successful employees. As Melson-Silimon et al. (Reference Melson-Silimon, Harris, Shoenfelt, Miller and Carter2019) suggest, starting with job analysis promotes a focus on work behavior. Additionally, the predictive hypothesis should be evaluated with an appropriate validation strategy: As described elsewhere (Putka & Sackett, Reference Putka, Sackett, Farr and Tippins2012), pertinent evidence may take a variety of forms, and may include validity generalization, which is not solely correlations from a predictive validity study. Personality-based employment tests, whether in the form of signs or samples, are based on evidence that they relate to an individual’s capacity to perform a job, not solely an analysis of the predictor space to which they apply. Let us focus on that when discussing whether work-oriented personality tests are fair for people with extreme scores.
Author ORCIDs
Jeffrey M. Cucina, 0000-0002-1309-0426; Theodore L. Hayes, 0000-0002-0576-4283