Estimates indicate that approximately 527,976,150 individuals speak Spanish worldwide (Simons & Fenning, Reference Simons and Fennig2017). Increasing numbers of Latin Americans are immigrating to the United States and 37 million individuals residing in the United States speak Spanish at home (Flores, Reference Flores2017). These numbers are increasingly relevant for mental health practitioners whose psychological evaluations may provide diagnostic clarification and guide treatment planning for Spanish-speaking clients. At present, the psychometric properties of only a few personality measures have been explored with English/Spanish bilinguals.
Personality Assessment of Spanish-speakers
American Psychological Association (APA) guidelines indicate that practitioners are expected to individualize test selection and administration according to clients’ unique characteristics (American Psychological Association, 2017). For example, individuals who primarily speak or prefer Spanish should be administered assessments written in and validated with Spanish-speakers. Fernandez, Boccaccini, and Noland (Reference Fernandez, Boccaccini and Noland2007) recommend that practitioners should consider which translated tests are available, identify relevant research regarding the tests’ administration and interpretation, evaluate the research relevant to the clients’ background, and assess their own level of confidence that the tests are appropriate for use with specific clients. In the United States, these guidelines are difficult to follow in practice because practitioners have limited options regarding multiscale inventories that are translated and validated with Spanish-speakers. Further, there is only a small body of research to reference on this topic (Fernandez et al., Reference Fernandez, Boccaccini and Noland2007; Weiss & Rosenfeld, Reference Weiss and Rosenfeld2012).
Personality and clinical assessment research show some differences in score profiles between non-Hispanic and Hispanic populations (Estrada & Smith, Reference Estrada and Smith2017; Gurven, von Rueden, Massenkoff, Kaplan, & Lero Vie, Reference Gurven, von Rueden, Massenkoff, Kaplan and Lero Vie2013). Furthermore, researchers have found significant differences in personality assessment protocols across Latin Americans by country of origin. For instance, Fantoni-Salvador and Rogers (Reference Fantoni-Salvador and Rogers1997) identified significant differences in clinical scales using the Personality Assessment Inventory Spanish Edition (PAISE; Morey, 1991–2007) among Latin American, Mexican American, and Puerto Rican Spanish-speakers. This suggests that differences in country of origin could play a role in personality profiles as measured by standardized assessments. Others (Boscán et al., Reference Boscán, Penn, Velásquez, Reimann, Gómez, Guzmán and de Romero2000) found significant differences in Minnesota Multiphasic Personality Inventory–2 (MMPI–2; Butcher, Dahlstrom, Graham, Tellegen, & Kaemmer, Reference Butcher, Dahlstrom, Graham, Tellegen and Kaemmer1989) Spanish translation protocols according to country of origin in their sample of Mexican and Venezuelan college students and Colombian college students and community members. Thus, research indicates that country of origin in the context of personality assessment should be further explored.
The Personality Assessment Inventory (PAI; Morey, 1991–2007) is a popular personality assessment among U.S. practitioners (Wright et al., Reference Wright, Beattie, Galper, Church, Bufka, Brabender and Smith2017). Collectively, its 22 scales are meant to assess a broad range of characteristics. It contains four validity scales (Inconsistency, Infrequency, Negative Impression, Positive Impression), which measure response style. The 11 clinical scales are intended to assess symptoms of psychopathology: Somatic Complaints, Anxiety, Anxiety-Related Disorders, Depression, Mania, Paranoia, Schizophrenia, Borderline Features, Antisocial Features, Alcohol Problems, and Drug Problems. Five treatment consideration scales focus on Aggression, Suicidal Ideation, Stress, Nonsupport, and Treatment Rejection. Two interpersonal functioning scales aim to assess Dominance and Warmth. Furthermore, there are 28 subscales within the clinical scales and three subscales within the treatment consideration Aggression scale. The interested reader may reference Morey’s PAI manual or Morey (Reference Morey2003) for detailed scale and subscale descriptions.
The PAI has a number of strengths: a relatively low (fourth grade) reading level requirement, brief administration time, and low long-term administration costs. It has demonstrated utility in a range of clinical contexts, such as providing diagnostic clarification (Edens & Ruiz, Reference Edens and Ruiz2008); differentiating levels of care (Sinclair et al., Reference Sinclair, Smith, Chung, Liebman, Stein, Antonius and Blais2015); and forensic referrals, such as assessing risk for violence (Gardner, Boccaccini, Bitting, & Edens, Reference Gardner, Boccaccini, Bitting and Edens2015). However, practitioners should not assume that the mounting research on the PAI with predominantly U.S. born, English-speaking samples generalizes to Spanish-speakers. Below we summarize research on the two available Spanish versions of the PAI–the Personality Assessment Inventory-Spanish Edition (PAISE; Morey, Reference Morey1992) and the Personality Assessment Inventory European-Spanish version (PAIE-S; Ortiz-Tallo, Santamaría, Cardenal, & Sánchez, Reference Ortiz-Tallo, Santamaría, Cardenal and Sánchez2011).
Personality Assessment Inventory Spanish Edition (PAISE) Research
The PAISE is a direct translation of the PAI from English to Spanish and it has yet to be validated with a Spanish-speaking population, an important step per International Test Commission guidelines (American Education Research Association, American Psychological Association, & National Counsel on Measurement in Education, 2014). There are some complications with this translation. First, direct English-to-Spanish test translation may not be a valid method for facilitating the psychodiagnostic assessment of English/Spanish bilinguals. English assessments that are validated with U.S. English-speakers may successfully measure constructs among this population. However, when directly translated and applied in other languages, test items may take on different meaning and tone, thereby creating confounds that may interfere with test reliability, validity, and its ultimate utility. Second, language, culture, and ethnicity may also influence psychometric properties. Individuals who share the same language do not necessarily share the same culture (Sue & Sue, Reference Sue and Sue2012). Third, differences across Hispanic subcultures may impact PAI and PAISE measurement outcomes across these groups. Therefore, it is critical that researchers explore the PAI and its iterations among Hispanics and English/Spanish bilinguals in order to understand their generalizability across subcultures.
Three studies have explored aspects of the PAISE’s psychometric properties in the United States. Rogers, Flores, Ustad, and Sewell (Reference Rogers, Flores, Ustad and Sewell1995) administered the PAISE to a sample of Mexican American Spanish-speakers (three quarter monolingual) engaged in substance abuse treatment. They found that bilinguals’ PAISE and PAI protocols had adequate convergence. The PAISE had satisfactory test-retest reliability but low average scale internal consistency (α = .63). In addition, three of four PAISE validity scales had poor internal consistency and protocols were invalid twice as often as normative PAI protocols. Conversely, Fernandez, Boccaccini, and Noland (Reference Fernandez, Boccaccini and Noland2008) administered the PAI and the PAISE to English/Spanish bilinguals and found that Negative Impression Management and Positive Impression Management mean T scores for the PAISE were similar to those of the PAI’s normative samples. They also identified strong convergence between Negative and Positive Impression Management scales across the two measures and found that the scales were robust in detecting feigning. However, the authors did not consider clinical, treatment consideration, or interpersonal scales. Regarding concurrent validity, Fantoni-Salvador and Rogers (Reference Fantoni-Salvador and Rogers1997) found that the PAISE had an average correct classification hit rate of .72 when distinguishing Alcohol Dependence, Anxiety Disorders, Major Depression, and Schizophrenia among their clinical sample of Latin American, Mexican American, and Puerto Rican Spanish-speakers. Notably, the three Hispanic groups differed on the Anxiety, Schizophrenia, and Alcohol scales, highlighting the importance of examining country of origin. However, after incorporating participants’ number of reported symptoms, no differences in scale elevations across the three groups were significant. While this research provides beginning support for the psychometric properties of the PAISE, it does not constitute a comprehensive validation study and the increased rate of invalid protocols found by Rogers et al. (Reference Rogers, Flores, Ustad and Sewell1995) is particularly concerning. Lack of a thorough research base creates difficulty for practitioners in the United States to engage in informed test selection (Fernandez et al., Reference Fernandez, Boccaccini and Noland2007). It is worth exploring other forms of the PAI with Spanish-speaking populations while research on the PAISE continues to build.
Personality Assessment Inventory European-Spanish Edition (PAIE-S)
Ortiz-Tallo et al. (Reference Ortiz-Tallo, Santamaría, Cardenal and Sánchez2011) adapted items per the International Test Commission’s guidelines when developing the PAIE-S. The authors found that some PAI and PAISE items were irrelevant or inapplicable to European Spanish-speakers and adapted them accordingly. The authors validated the measure with Spanish normative community and clinical samples that were stratified by sex and age and included individuals from across Spain. As reported by Ortiz-Tallo and colleagues, the PAIE-S sample scored higher on Anxiety, Paranoia, and Treatment Rejection scales and lower on the Warmth scale relative to PAI normative data (Morey, 1991–2007), which the authors took into account when standardizing the measure. They also found good test-retest reliability and convergence between parallel scales on the Minnesota Multiphasic Inventory–2 Restructured Form (Ben-Porath & Tellegen, Reference Ben-Porath and Tellegen2008) and Millon Clinical Multiaxial Inventory–III (Millon, Davis, & Millon, Reference Millon, Davis and Millon1994). Internal consistencies ranged from .46 (Anxiety-Affective) to .89 (Anxiety), with Cronbach’s alphas generally over .70. Given its generally robust properties among its validation sample, the PAIE-S may be appropriate for use with Latin American Spanish-speakers as well.
However, there is significant heterogeneity across Hispanic subcultures and the PAIE-S may not generalize to all Spanish-speaking groups. Recognizing this, the PAIE-S developers sought data from a South American sample and examined the PAIE-S among Chileans (Ortiz-Tallo, Cardenal, Ferragut, & Santamaría, Reference Ortiz-Tallo, Cardenal, Ferragut and Santamaría2015). Compared to the Spanish normative sample, Chilean participants had significantly higher T scores on approximately 72% of scales (13 out of 18 scales, validity scales were not examined) and 61% of subscales (19 out of 31 subscales), most notably Mania and Mania-Grandiosity. Internal consistencies for the Chilean sample ranged from .66 (Stress) to .86 (Somatic Complaints) with an average alpha of .77. Ortiz-Tallo et al. concluded that differences in test scores and internal consistencies across their Spanish and Chilean samples provided evidence that the PAIE-S should be standardized separately for the Chilean population.
Stover, Solano, and Liporace (Reference Stover, Solano and Liporace2015) administered the PAIE-S to Argentinian Spanish-speakers. In their study, Spanish and Argentinian experts in psychopathology and psychometrics evaluated PAIE-S items, assessing item clarity and whether they held the same meanings across Spain and Argentina. They opined that four items were not appropriate for use with Argentinian populations and adapted the PAIS-E phrasing for these items accordingly. Among the primary scales, they found that alphas ranged from .60 (Stress) to .86 (Anxiety) with an average alpha of .76. Hence, the authors found internal consistencies comparable to Ortiz-Tallo et al.’s (Reference Ortiz-Tallo, Cardenal, Ferragut and Santamaría2015) Chilean sample. Still, it is important to remember that Stover et al. assessed for appropriateness of items and made minor item modifications, while Ortiz-Tallo et al. directly administered the PAIE-S to their Chilean sample with no adaptation. No other studies have examined the PAIE-S among Spanish-speakers. To date, the PAIE-S has been researched among Spanish, Chilean, and Argentinian populations. Further research is needed to examine whether these findings generalize across Latin American cultures.
The Current Study
Research indicates that the PAI psychometric properties can vary across groups (e.g., Fantoni-Salvador & Rogers, Reference Fantoni-Salvador and Rogers1997). Appropriate psychological assessment of Spanish-speakers is critical. Given that the PAIE-S adaptation followed recommended guidelines by the International Test Commission and has shown some promise with South American Spanish-speakers, it may be appropriate with more diverse groups of Latin American Spanish-speakers as well. We sought to assess the internal consistency, protocol scores, and convergent validity of the PAI and PAIE-S among English/Spanish bilinguals of Latin American descent. We compared the PAI and the PAIE-S, as these measures have both been validated among their respective samples. Findings will inform theories of individual (cultural and linguistic) differences and psychodiagnostic assessment.
Method
Participants
The initial group of participants (N = 142) were undergraduate students at a Hispanic-serving, urban university in the northeast United States over a two-semester period. We invited bilingual students who were of Latin American descent and spoke Spanish as a first and/or primary language to participate in the study.
Measures
Reading Level Indicator Spanish-Companion (RLIS-C). The RLIS-C (Williams, Reference Williams2000) is a 40-item multiple-choice reading screen used to estimate an individual’s abilities to read and comprehend Spanish. The measure has been used in past PAI research with bilingual samples (Fernandez et al., Reference Fernandez, Boccaccini and Noland2008). Total raw ability scores are used to indicate participants’ reading level (second grade and beyond). The RLIS-C has good test-retest reliability (r = .90). To increase participation and minimize burden on participants, we did not administer an English proficiency exam given that participants were attending a four-year English-speaking college and were likely proficient in English.
Demographic Questionnaire. Participants completed a questionnaire (English language format) that queried demographic information such as age, sex, race, ethnicity, and place of birth. They also reported their linguistic experience: first language, age of English fluency, and primary language spoken at home.
Personality Assessment Inventory (PAI) and European-Spanish version (PAIE-S). The PAI and the PAIE-S are both self-report measures of personality and psychopathology that consist of 344 items. Individuals indicate whether items pertain to their personal experience (false, slightly true, mainly true, or very true). It should be noted that, while almost all scales and subscales have the same number of items on both measures, there is one scale in which the number of items differ. The PAI Inconsistency scale has only 20 items, comprised of 10 contradictory item pairs. The PAIE-S developers found that the Inconsistency scale had better discriminant validity when they drew an additional 10 contradictory item pairs from the larger item pool. Consequently, the PAIE-S Inconsistency scale has 40 items. Scales and subscales are reported as T scores (M = 50 with SD = 10) and are considered to be at clinical elevations when T ≥ 70. The PAI has good overall internal consistency and reliability (mean alpha = .70 and test-retest correlations = .85). The PAIE-S has demonstrated strong psychometric properties with its normative sample (Ortiz-Tallo et al., Reference Ortiz-Tallo, Santamaría, Cardenal and Sánchez2011). Morey’s (Reference Morey2003) recommended validity cut scores for the PAI are Inconsistency ≥ 73 T, Infrequency ≥ 75 T, Negative Impression Management ≥ 92 T, and Positive Impression Management ≥ 68 T. For the PAIE-S, protocols are considered invalid if Inconsistency ≥ 75 T, Infrequency ≥ 75 T, Negative Impression Management ≥ 101 T, or Positive Impression Management ≥ 65 T (Ortiz-Tallo et al., Reference Ortiz-Tallo, Santamaría, Cardenal and Sánchez2011).
Procedure
Research assistants, one of whom spoke Spanish, screened the initial 142 participants using the RLIS-C and excluded 12 participants who did not pass the measure at the fourth-grade level. We counterbalanced PAI and PAIE-S administration across two sessions spaced approximately two weeks apart. Participants received course credit upon study completion. Participants’ PAI and PAIE-S responses were entered into their respective scoring programs and exported as individual item scores and scale and subscale T scores.
Results
Participants were approximately 20-years-old (M = 19.83, SD = 2.61). Most (n = 108, 85.0%) were female and all participants identified as Hispanic. Over half were born in the United States (n = 71, 54.6%), followed by the Dominican Republic (n = 24, 18.9%), Ecuador (n = 11, 8.7%), Mexico (n = 7, 5.5%), and Colombia (n = 5, 3.9%), with the remaining participants (n = 9, 7.1%) originating from other Latin American countries. Participants who were born outside of the United States reported moving to the United States at 11.27 years old (SD = 5.37). Most participants (n = 69, 53.1%) said that they spoke Spanish in their household. During the second semester of data collection, we asked 88 participants about their preferred language. The majority cited English as their preferred language (n = 47, 56.0%), followed by Spanish (n = 22, 26.2%), and then both English and Spanish (n = 15, 17.9%). Spanish reading level was beyond recommended proficiency (RLIS-C M = 31.42, SD = 3.67).
Validity Analyses
One hundred and ten participants (84.6%) completed the PAI, 106 participants (89.2%) completed the PAIE-S, and 86 (66.2%) completed both measures. In looking at validity scales (Inconsistency, Infrequency, Negative Impression Management, and Positive Impression Management), we found 90 (81.8%) valid PAI protocols and 77 (72.6%) valid PAIE-S protocols. The Inconsistency scale was the most frequently invalid scale for the PAI (9.0% of all PAI protocols), while Infrequency was the most frequently invalid scale for the PAIE-S (18.6% of all PAIE-S protocols). Of the 86 who completed both the PAI and PAIE-S, 76 (88.4%) were both valid for Inconsistency, 66 (76.7%) were both valid for Inconsistency, 86 (100%) were both valid for Negative Impression Management, and 74 (86.0%) were both valid for Positive Impression Management (see Table 1).
Note. PAI validity cut scores = ICN ≥ 73t; INF ≥ 75t; NIM ≥ 92t; and PIM ≥ 68t.
PAIE-S validity cut scores = ICN ≥ 75t; INF ≥ 75t; NIM ≥ 101t; and PIM ≥ 65t.
However, only one validity scale beyond the aforementioned cut scores rendered a protocol invalid. Among participants who took both measures, 53 (61.6% of 86) participants produced valid protocols for both measures. We proceeded analyses with only valid protocols. We conducted t-tests to assess whether there were significant differences associated with order of PAI/PAIE-S administration. Given the number of comparisons, we used a Bonferroni correction of p < .00094 (p = .05/53 planned comparisons) to detect significance. We were missing counterbalancing data for five participants. Among the remaining n = 48, there were no significant differences on scale or subscale scores related to PAI/PAIE-S administration order.
Internal Consistency
For the PAI, 16 (72.72%) scales and 11 (35.48%) subscales had alphas above .70 (see Table 2). This indicates that among the valid protocols in our sample, approximately 27.28% of PAI scales and 64.52% PAI subscales did not have adequate internal consistency. PAI alpha coefficients ranged from .10 (Infrequency) to .89 (Aggression), with an average alpha of .65. With respect to PAIE-S protocols, 11 (50.00%) scales and 8 (25.81%) subscales were at or above .70 alpha, with an average alpha of .65. This indicates that 50.00% of PAIE-S scales and 74.19% of PAIE-S subscales did not have adequate internal consistency. PAIE-S scale and subscales alphas ranged from .27 (Mania-Activity Level) to .81 (Somatic) with an average alpha of .63.
Note. † PAI ICN = 20 items. PAIE-S = 40 items. For PAI clinical scales (bolded), SOM = Somatic Complaints (SOM-C = Conversion, SOM-S = Somatization, and SOM-H = Health Concerns); ANX = Anxiety (ANX-C = Cognitive, ANX-A = Affective, and ANX-P = Physiological); ARD = Anxiety- Related Disorders (ARD-O = Obsessive-Compulsive, ARD-P = Phobia, and ARD-T = Traumatic Stress); DEP = Depression (DEP-C = Cognitive, DEP-A = Affective, and DEP-P = Physiological); MAN = Mania (MAN-A = Activity Level, MAN-G = Grandiosity, and MAN-I = Irritability); PAR = Paranoia (PAR-H = Hypervigilance, PAR-P = Persecution, and PAR-R = Resentment); SCZ = Schizophrenia (SCZ-P = Psychotic Experiences, SCZ-S = Social Detachment, and SCZ-T = Thought Disorder); BOR = Borderline Features (BOR-A = Affective Instability, BOR-I = Identity Problems, BOR-N = Negative Relationships, and BOR-S = Self- Harm); ANT = Antisocial Features (ANT-A = Antisocial Behaviors, ANT-E = Egocentricity, and ANT-S = Stimulus Seeking); ALC = Alcohol Problems, and DRG = Drug Problems. For PAI treatment consideration scales (bolded), AGG = Aggression (AGG-A = Aggressive Attitude, AGG-V = Verbal Aggression, and AGG-P = Physical Aggression); SUI = Suicidal Ideation; STR = Stress; NON = Nonsupport; and RXR = Treatment Rejection. For PAI interpersonal scales (bolded), DOM = Dominance and WRM = Warmth. For PAI validity scales; ICN = Inconsistency; INF = Infrequency; PIM = Positive Impression; and NIM = Negative Impression.
On average, mean PAIE-S inter-item correlations were less robust, and more variable, compared to PAI mean inter-item correlations (Table 2). PAI scale and subscale mean inter-item correlations ranged between r = .02 (Infrequency) and r = .53 (Aggression-Aggressive Attitude), with an overall average of r = .21. PAIE-S scale and subscale inter-item correlations ranged from r = .05 (Mania-Activity Level) to r = .36 (Anxiety-Related Disorders-Traumatic Stress), with an overall average of r = .18.
Scale and Subscale T-scores
Table 3 depicts the range of T scores for scales and subscales for the 53 valid protocols.
Note. † PAI ICN = 20 items. PAIE-S = 40 items. Convergence = correlation between the Spanish and English PAIs. For PAI clinical scales (bolded), SOM = Somatic Complaints (SOM-C = Conversion, SOM-S = Somatization, and SOM-H = Health Concerns); ANX = Anxiety (ANX-C = Cognitive, ANX-A = Affective, and ANX-P = Physiological); ARD = Anxiety- Related Disorders (ARD-O = Obsessive-Compulsive, ARD-P = Phobia, and ARD-T = Traumatic Stress); DEP = Depression (DEP-C = Cognitive, DEP-A = Affective, and DEP-P = Physiological); MAN = Mania (MAN-A = Activity Level, MAN-G = Grandiosity, and MAN-I = Irritability); PAR = Paranoia (PAR-H = Hypervigilance, PAR-P = Persecution, and PAR-R = Resentment); SCZ = Schizophrenia (SCZ-P = Psychotic Experiences, SCZ-S = Social Detachment, and SCZ-T = Thought Disorder); BOR = Borderline Features (BOR-A = Affective Instability, BOR-I = Identity Problems, BOR-N = Negative Relationships, and BOR-S = Self- Harm); ANT = Antisocial Features (ANT-A = Antisocial Behaviors, ANT-E = Egocentricity, and ANT-S = Stimulus Seeking); ALC = Alcohol Problems, and DRG = Drug Problems. For PAI treatment consideration scales (bolded), AGG = Aggression (AGG-A = Aggressive Attitude, AGG-V = Verbal Aggression, and AGG-P = Physical Aggression); SUI = Suicidal Ideation; STR = Stress; NON = Nonsupport; and RXR = Treatment Rejection. For PAI interpersonal scales (bolded), DOM = Dominance and WRM = Warmth. For PAI validity scales, ICN = Inconsistency; INF = Infrequency; PIM = Positive Impression; and NIM = Negative Impression.
* p < .00094.
Note. † PAI ICN = 20 items. PAIE-S = 40 items. For PAI clinical scales (bolded), SOM = Somatic Complaints (SOM-C = Conversion, SOM-S = Somatization, and SOM-H = Health Concerns); ANX = Anxiety (ANX-C = Cognitive, ANX-A = Affective, and ANX-P = Physiological); ARD = Anxiety-Related Disorders (ARD-O = Obsessive-Compulsive, ARD-P = Phobia, and ARD-T = Traumatic Stress); DEP = Depression (DEP-C = Cognitive, DEP-A = Affective, and DEP-P = Physiological); MAN = Mania (MAN-A = Activity Level, MAN-G = Grandiosity, and MAN-I = Irritability); PAR = Paranoia (PAR-H = Hypervigilance, PAR-P = Persecution, and PAR-R = Resentment); SCZ = Schizophrenia (SCZ-P = Psychotic Experiences, SCZ-S = Social Detachment, and SCZ-T = Thought Disorder); BOR = Borderline Features (BOR-A = Affective Instability, BOR-I = Identity Problems, BOR-N = Negative Relationships, and BOR-S = Self- Harm); ANT = Antisocial Features (ANT-A = Antisocial Behaviors, ANT-E = Egocentricity, and ANT-S = Stimulus Seeking); ALC = Alcohol Problems, and DRG = Drug Problems. For PAI treatment consideration scales (bolded), AGG = Aggression (AGG-A = Aggressive Attitude, AGG-V = Verbal Aggression, and AGG-P = Physical Aggression); SUI = Suicidal Ideation; STR = Stress; NON = Nonsupport; and RXR = Treatment Rejection. For PAI interpersonal scales (bolded), DOM = Dominance and WRM = Warmth. For PAI validity scales, ICN = Inconsistency; INF = Infrequency; PIM = Positive Impression; and NIM = Negative Impression.
* p < .00094.
PAI mean scale and subscale T scores ranged from a low of 47.19 (SD = 7.45, Alcohol Problems) to a high of 61.15 (SD = 10.56; Paranoia-Hypervigilance). Participants’ average PAI T score across scales and subscales was approximately 50 (M = 52.13, SD = 9.12). PAIE-S average T scores ranged from 45.49 (SD = 10.55; Warmth) to 57.64 (SD = 8.92; Paranoia). Similar to the PAI, the mean scale and subscale T score for PAIE-S protocols was approximately 50 (M = 51.47, SD = 9.05). We again used a Bonferroni correction (p < .00094) to identify whether or not participants’ scale and subscale scores were significantly different across the two measures. We noted 10 instances. On average, participants scored significantly higher on PAI, compared to PAIE-S, on the following: Anxiety, Anxiety-Cognitive, Anxiety-Physiological, Anxiety-related Disorders-Traumatic Stress, Depression-Cognitive, Paranoia-Hypervigilance, Borderline Features, Borderline Features-Affective Instability, and Borderline Features-Negative Relationships. They scored significantly lower on the PAI Aggression-Aggressive Attitude subscale compared to the PAIE-S.
To explore why T scores varied across the two measures, we conducted post hoc analyses of PAI and PAIE-S raw score item endorsements using p < .00094 to detect statistical significance. We did not compare findings across Inconsistency scores given that the PAIE-S has twice as many Inconsistency items than the PAI. Results showed that mean T scores were significantly higher on the PAI relative to the PAIE-S but that mean raw scores were not significantly different for the Anxiety and Borderline Features scales and the Anxiety-Cognitive, Anxiety-Physiological, Anxiety-Related Disorders-Traumatic Stress, Depression-Cognitive, Paranoia-Hypervigilance, and Borderline Features-Affective Instability subscales. Mean Aggression-Aggressive Attitude T scores were significantly higher on the PAIE-S than the PAI, but there was no significant difference across Aggression-Aggressive Attitude raw scores. Borderline Features-Negative Relationship subscale mean T and raw scores were both significantly higher on the PAI relative to the PAIE-S. The Warmth scale was the only scale that showed significant differences between mean raw scores, but not mean T scores, with participants typically endorsing more items on the PAI relative to the PAIE-S.
Convergence
Table 3 also shows the convergence statistics for participants who completed valid protocols for both measures (n = 53). Nearly all scales converged at p < .00094. The Antisocial-Egocentricity subscales converged at p = .0011. Inconsistency and Infrequency scales had the lowest convergence statistics, with r = .19 and r = .20, respectively. Neither relationship achieved statistical significance.
Discussion
Cultural and linguistic diversity is of growing importance in the assessment field. In particular, there is a pressing need to study measures among Latin Americans and Spanish-speakers given the anticipated population increase and professional assessment guidelines. We explored the internal consistency and convergent validity of the PAI and PAIE-S with Latin American Spanish-speakers. Our study is in the spirit of test development and multicultural guidelines by the APA (American Psychological Association, 2017) and Standards for Educational and Psychological Testing (American Education Research Association, American Psychological Association, & National Counsel on Measurement in Education, 2014) and builds on the limited research in this area.
Similar to Rogers’ et al. (Reference Rogers, Flores, Ustad and Sewell1995) study of the PAISE with Mexican Americans, relative to Morey (1991–2007) we found an unusually high number of invalid protocols. Participants who produced invalid protocols most often scored above recommended validity cut scores on the PAI Inconsistency or PAIE-S Infrequency scales. We recognize that participants may not have fully attended to the measure, but also consider linguistic issues as a factor. Morey (Reference Morey2003) noted that reading difficulties and confusion on the PAI are among the primary reasons for obtaining high Inconsistency and Infrequency scores. The interaction between scale content and cultural factors (e.g., differences in Latin American and Spanish samples) may also play a role. When looking across the measures for all participants who took both measures, Negative Impression Management had perfect agreement on scale validity. Inconsistency and Positive Impression Management scales also had good agreement, with Infrequency having the least.
With respect to internal consistency, approximately 72% of PAI scales and 35% of PAI subscales met our cutoff of .70 for acceptable internal consistency. Findings were not as encouraging for PAIE-S protocols: 50% of scales and 26% subscales had alphas above .70. In general, mean inter-item correlations were largely similar to Morey’s (1991–2007) census sample and the PAIE-S normative sample. However, internal consistencies are significantly lower than PAI and PAIE-S validation samples and suggests that a significant number of PAI and PAIE-S scales and subscales may not adequately measure unitary constructs and bring into question their appropriateness for use with Latin American bilinguals.
As validity is the first step in protocol interpretation, we paid special attention to the measures’ validity scales. We noted some differences in alphas across PAI and PAIE-S Inconsistency scales. The PAI’s relatively lower alpha could be partly attributed to the number of each scales’ test items, given that the PAI consists of 10 inconsistent pairs while the PAIE-S consists of 20 and internal consistency increases with the number of test items. The PAI Infrequency scale’s alpha was the lowest of all scale and subscale alpha values, and the PAIE-S alpha was relatively low as well. These findings could indicate that Inconsistency and Infrequency item endorsements were not as unusual among our sample as developers intended and do not capture particularly unusual item endorsement. These differences may be attributable to the sample’s unique characteristics relative to those of other research samples. Alternatively, and perhaps more likely, it may simply be that Inconsistency and Infrequency scales do not capture theoretically meaningful constructs and by definition would have low internal consistency (Morey, 1991–2007). Negative Impression Management had relatively low alphas. It may be that our samples’ expressions of negative self-evaluation and distress differ from PAI and PAIE-S normative samples. If this is the case, it is unclear why its intended opposite scale, Positive Impression Management, had relatively strong and consistent alphas.
Although internal consistency was not ideal for a number of scales, several clinical scales stood out as particularly problematic with α ≤ .40. Mania-Activity Level for both measures was below this cut off, while Antisocial Features-Antisocial Behaviors for the PAIE-S and the Stress scale for PAI was also below α ≤ .40. We suspect that there is a common theme among these scales – active and risk-taking behaviors of early adulthood (mean age for our sample), which along with stress of college and/or acculturation could interfere with these scales’ intended constructs. It is unclear why internal consistency would be low in PAIE-S Antisocial Features-Antisocial Behavior but not on the PAI, and why the PAI Stress scale would be so much lower than on the PAIE-S.
Some of our findings that differ from past research may be explained by differences in sampling and methodology. We found lower internal consistency on average for PAI scales than Rogers et al. (Reference Rogers, Flores, Ustad and Sewell1995); however, their sample originated from Mexico, while our sample was more diverse, with only 5.50% of our sample identifying as of Mexican descent. The notable heterogeneity among Hispanic cultures (Sue & Sue, Reference Sue and Sue2012) may have been reflected in our sample’s item endorsement relative to Rogers et al.’s sample of Mexican participants. Unfortunately, sample size limited exploring PAI and PAIE-S scores as a function of country of origin in our study. Other possible explanations for differences in the PAI’s internal consistency compared to Rogers et al. (Reference Rogers, Flores, Ustad and Sewell1995) may be the age (younger), sex (largely female), education (college attendance), and context (college versus clinical setting) of our sample, as well as changing societal norms over the course of 20 years. Our sample was much younger than samples in other PAIE-S studies (Ortiz-Tallo et al., Reference Ortiz-Tallo, Cardenal, Ferragut and Santamaría2015; Stover et al., Reference Stover, Solano and Liporace2015).
Despite relatively low internal consistency, T-scores for Infrequency, Mania-Activity Level, Antisocial Features-Antisocial Behavior, and Stress did not significantly differ, evidencing that there is some overlap in these constructs across measures. Findings for mean PAI Positive and Negative Impression Management T scores were similar to those of Fernandez et al. (Reference Fernandez, Boccaccini and Noland2008), in that they were similar to the PAI’s normative sample. Also in line with past research, participants in this study tended to score consistently low T scores on the Warmth scale and high T scores on Paranoia subscales. These findings closely mirror those of others (Ortiz-Tallo et al., Reference Ortiz-Tallo, Cardenal, Ferragut and Santamaría2015; Ortiz-Tallo et al., Reference Ortiz-Tallo, Santamaría, Cardenal and Sánchez2011) and suggest that Latin American, South American, and European-Spanish subgroups may report relatively low Warmth characteristics and relatively high, but not necessarily clinically elevated, symptoms and experiences associated with paranoia. Indeed, some researchers have reported relatively elevated paranoia scores in non-U.S. samples (Groves & Engel, Reference Groves and Engel2007).
There still were a number of scales and subscales in which we observed significantly different profile scores. Specifically, we found that on average, participants obtained significantly higher T scores on PAI Anxiety, Anxiety-Cognitive, Anxiety-Physiological, Anxiety-related Disorders-Traumatic Stress, Depression-Cognitive, Paranoia-Hypervigilance, Borderline Features, Borderline Features-Affective Instability, and Borderline Features-Negative Relationships. They obtained significantly lower T scores on the PAI Aggression-Aggressive Attitude subscale compared to the PAIE-S. For each of these significant standardized T differences, we did not find remarkable discrepancies at the raw score item endorsement level. These findings imply that for each of the significantly different T scores on the aforementioned scales and subscales, characteristics and/or experiences were more or less common in either one or the other normative samples. For example, there appears to be a theme of anxiety throughout scales and subscale constructs in which participants scored significantly higher on the PAI. This suggests that anxiety-related themes are more common among our sample than the PAI’s normative sample. This may mean that the PAI is either more sensitive to anxiety-related themes in our sample or is overpathologizing our participants. On the other hand, it may also mean that the PAIE-S is underpathologizing our sample or is more reflective of cultural norms. Their higher PAI scores may be attributed to characteristics among Hispanic culture, such as culturally normative anxiety, potentially resulting from the effects of social marginalization, which is more salient on an English language measure (Gamst et al., Reference Gamst, Dana, Der-Karabetian, Aragón, Arellano and Kramer2002; Hiott, Grzywacz, Arcury, & Quandt, Reference Hiott, Grzywacz, Arcury and Quandt2006; Sue & Sue, Reference Sue and Sue2012). However, research indicates that bilinguals tend to experience emotion more strongly in native rather than foreign languages (Caldwell-Harris, Reference Caldwell-Harris2015). Then again, in some instances, Latin Americans tend to underreport anxiety and depression (Bell et al., Reference Bell, Franks, Duberstein, Epstein, Feldman, y Garcia and Kravitz2011; Leung, LaChapelle, Scinta, & Olvera, Reference Leung, LaChapelle, Scinta and Olvera2014). Taking these findings under consideration, it is unclear precisely why our bilingual participants tended to score higher on the PAI’s anxious themes. While it may be that the PAI overpathologized our sample in some respects, it may be also that the PAIE-S underpathologized the sample.
Virtually all scales (aside from Inconsistency and Infrequency) and subscales converged at our Bonferroni correction with the exception of Antisocial Features-Egocentricity. While Ortiz-Tallo and colleagues (2015) found that Chileans were significantly more likely to endorse items that produced higher PAIE-S Mania scores compared to their normative sample, this was not the case in our study. Instead, our participants tended to endorse PAI items broadly related to anxiety. Taken together, findings support that socially normative thoughts, feelings, and behaviors may vary across Hispanic subcultures, or at least across language format. It could be that these demographic groups differ in their experience of mental health symptoms, their likelihood of reporting psychiatric distress, or other unknown factors that may influence measurement outcomes. Our findings may also be impacted by sample size and participant attrition. We retained two-thirds of the small pool of eligible participants for our convergence analyses, and only two-thirds of that sample produced valid protocols.
Implications for Practice and Research
Practitioners may use these results to inform their test selection and interpretation. Fernandez et al. (Reference Fernandez, Boccaccini and Noland2007) encourage practitioners to consider their testing options, the research regarding these tests, the tests’ applicability to clients, and their confidence in the testing process. Our findings suggest that practitioners should take extra care when assuming a test’s generalizability across cultural subgroups. Notably, participants in our sample were all enrolled in an English-speaking four-year college. In clinical practice, some practitioners may assume that English-speaking college students are entirely English proficient and that they may appropriately administer and interpret the PAI using its normative scores. We did not screen for English proficiency. It could be that some participants were not as strong in their English reading comprehension as we anticipated, which could have influenced our findings. We encourage practitioners to consider linguistic preferences and proficiencies in their test selection and utilize tools to assess reading level before administering English-language tests to bilinguals. In addition, researchers may wish to explore whether or not high paranoia scale and subscales findings are common among Hispanic individuals residing in the U.S. Perceived prejudice and discrimination could drive higher reported paranoia, as these elevations can indicate suspiciousness, mistrust, hypervigilance, and beliefs that one has been mistreated.
Our study, in collection with others (Ortiz-Tallo et al., Reference Ortiz-Tallo, Cardenal, Ferragut and Santamaría2015; Ortiz-Tallo et al., Reference Ortiz-Tallo, Santamaría, Cardenal and Sánchez2011; Rogers et al., Reference Rogers, Flores, Ustad and Sewell1995; Stover et al., Reference Stover, Solano and Liporace2015) suggests that the PAI and PAIE-S sometimes function differently across Spanish-speaking subgroups. Continued research is needed to expand our understanding of personality assessment as it pertains to diverse groups of Spanish-speakers. Future researchers, using larger sample sizes, should examine the factor structures of PAI and PAIE-S scales and subscales among English/Spanish bilinguals both in the community and in clinical contexts. In a small study of community-dwelling Latin American English-Spanish speaking bilinguals, Obando, Pearson, Kois, and Chauhan (Reference Obando, Pearson, Kois and Chauhan2016, August) found PAI and PAIE-S psychometric properties similar to those identified here. Their sample was notably older (M = 50) and approximately half male and half female. As such, early evidence of generalizability has emerged and clinical replication research is warranted.
We encourage researchers to assess whether country of origin is significantly associated with differences in PAI/PAIE-S protocols. There are several characteristics of our data that, regrettably, limit this approach. Although a country-by-country analysis is ideal, we could not conduct these statistics given the small cell sizes. We would face sample size issues even if we collapsed Central and South American participants into one group and compared them to U.S.-born participants. While 46% of the sample was born outside of the U.S., many of them moved to the U.S. as young children, while many others moved in adolescence or adulthood (M age move to U.S. = 11.27, SD = 5.37, 95% CI [0.53, 22.01 years]). We felt that it was inappropriate to collapse the hypothetical participant who moved to the U.S. at the age of 1 with a participant who moved to the U.S. at age 20.
An item-level analysis, using a larger sample size, will be important to help clarify which items are problematic across groups. This is particularly the case for those scales and subscales in which participants scored significantly different across measures (e.g., on themes of anxiety and aggression). For example, Anxiety and Borderline Features scales both had multiple subscales in which participants scored significantly higher on the PAI relative to the PAIE-S. Discriminant validity studies may help discern whether or not the measures adequately capture Anxiety and Borderline constructs and elucidate potential over/underpathologizing by the PAI and/or PAIE-S.
In addition to validation studies, acculturation is an important factor to consider in future work. Acevedo-Polakovich et al. (Reference Acevedo-Polakovich, Reynaga-Abiko, Garriott, Derefinko, Wimsatt, Gudonis and Brown2007) remind practitioners to be attentive to clients’ immigration histories, acculturation, and acculturative stress. Indeed, researchers have identified associations between personality assessment outcomes and acculturative status across various race and ethnic groups (Chang & Smith, Reference Chang and Smith2015; Tsai & Pike, Reference Tsai and Pike2000; Wong, Correa, Robinson, & Lu, Reference Wong, Correa, Robinson and Lu2017). Further research should include acculturative measures and explore whether or not test properties vary according to acculturation status. Regarding study methodology, future researchers can consider whether administration order is associated with participants’ PAI/PAIE-S profiles. Lastly, future researchers may consider whether test-retest reliability is an issue, given that both measures tap into similar constructs.
Practitioners should continue to closely consider the psychometric properties of available tests while researchers explore the best means to assess Spanish-speaking populations. We found that participants in our study frequently scored higher on PAI scales and subscales than the PAIE-S and that PAI and PAIE-S scales and subscales generally exhibited inadequate internal consistency when administered to our sample of Spanish-speaking participants. These findings bring into question whether either measure is entirely appropriate for use with similar samples. It is our hope that these results encourage further research and test development in this area. By continuing to explore testing issues among Latin Americans and Spanish speakers, the practitioner and research communities will do service to increasingly diverse populations who deserve culturally sensitive psychological assessment.