INTRODUCTION
Fetal alcohol spectrum disorder (FASD) comprises a range of impairments stemming from prenatal alcohol exposure (PAE), including neurocognitive deficits, problems regulating affect and behaviour, and in some cases, characteristic dysmorphic facial features, and growth restriction (Cook et al., Reference Cook, Green, Lilley, Anderson, Baldwin, Chudley, Conry, LeBlanc, Loock, Lutke, Mallon, McFarlane, Temple and Rosales2016; Hoyme et al., Reference Hoyme, Kalberg, Elliott, Blankenship, Buckley, Marais, Manning, Robinson, Adam, Abdul-Rahman, Jewett, Coles, Chambers, Jones, Adnams, Shah, Riley, Charness, Warren and May2016; Mattson, Bernes, & Doyle, Reference Mattson, Bernes and Doyle2019). Individuals with FASD experience high rates of additional adversities and comorbid conditions (e.g., McLachlan et al., Reference McLachlan, Rasmussen, Oberlander, Loock, Pei, Andrew, Reynolds and Weinberg2016; Pei, Denys, Hughes, & Rasmussen, Reference Pei, Denys, Hughes and Rasmussen2011; Popova et al., Reference Popova, Lange, Shield, Mihic, Chudley, Mukherjee, Bekmuradov and Rehm2016; Streissguth et al., Reference Streissguth, Bookstein, Barr, Sampson, O’Malley and Young2004). North American FASD prevalence estimates range from 2% to 5%, with higher rates in forensic and criminal justice contexts, where prevalence estimates range from 10% to 36% (May et al., Reference May, Baete, Russo, Elliott, Blankenship, Kalberg, Buckley, Brooks, Hasken, Abdul-Rahman, Adam, Robinson, Manning and Hoyme2014, Reference May, Chambers, Kalberg, Zellner, Feldman, Buckley, Kopald, Hasken, Xu, Honerkamp-Smith, Taras, Manning, Robinson, Adam, Abdul-Rahman, Vaux, Jewett, Elliott, Kable, Akshoomoff, Falk, Arroyo, Hereld, Riley, Charness, Coles, Warren, Jones and Hoyme2018; Popova et al., Reference Popova, Lange, Chudley, Reynolds, Rehm, May and Riley2018; Popova, Lange, Shield, Burd, & Rehm, Reference Popova, Lange, Shield, Burd and Rehm2019). A number of factors have been proposed to explain the increased risk of criminal justice contact among individuals with FASD, including the nature of neurocognitive deficits, high rates of early adversities, and challenges related to diagnosis (see Currie, Hoy, Legge, Temple, & Tahir, Reference Currie, Hoy, Legge, Temple and Tahir2016; Flannigan, Pei, Stewart, & Johnson, Reference Flannigan, Pei, Stewart and Johnson2018).
Cognitive impairment is a defining characteristic of FASD, though notable variability of relative deficits and strengths is often seen in this population (Ali, Kerns, Mulligan, Olson, & Astley, Reference Ali, Kerns, Mulligan, Olson and Astley2018; Mattson et al., Reference Mattson, Bernes and Doyle2019; Mattson, Crocker, & Nguyen, Reference Mattson, Crocker and Nguyen2011). Specific cognitive impairments may include deficits in attention, executive functioning, language, memory, learning, communication, and intellectual functioning (see reviews by Kodituwakku & Kodituwakku, Reference Kodituwakku and Kodituwakku2014; Mattson et al., Reference Mattson, Bernes and Doyle2019). FASD is often underrecognized owing to many factors, including the heterogeneous nature of the disability, sometimes preserved overall intellectual functioning, and compensatory skills that mask underlying neurocognitive deficits (Ali et al., Reference Ali, Kerns, Mulligan, Olson and Astley2018; Astley, Reference Astley2010; Kodituwakku & Kodituwakku, Reference Kodituwakku and Kodituwakku2014; Mattson et al., Reference Mattson, Crocker and Nguyen2011).
There are currently several approaches to the diagnosis and classification of individuals with FASD in North America. Across guidelines, best practices include comprehensive assessment undertaken by a trained interdisciplinary team (e.g., Astley, Reference Astley2004; Coles et al., Reference Coles, Gailey, Mulle, Kable, Lynch and Jones2016; Cook et al., Reference Cook, Green, Lilley, Anderson, Baldwin, Chudley, Conry, LeBlanc, Loock, Lutke, Mallon, McFarlane, Temple and Rosales2016; Hoyme et al., Reference Hoyme, Kalberg, Elliott, Blankenship, Buckley, Marais, Manning, Robinson, Adam, Abdul-Rahman, Jewett, Coles, Chambers, Jones, Adnams, Shah, Riley, Charness, Warren and May2016). However, given elevated rates of FASD in special populations, it is likely that clinicians working in both general health and forensic/correctional settings will encounter individuals with PAE and/or FASD in their usual practice. This is also likely to occur with increased frequency given the addition of FASD in the Diagnostic and Statistical Manual of Mental Disorders Footnote 1 (DSM-5; American Psychiatric Association, 2013, p. 86) and the International Classification of Diseases Footnote 2 (ICD-10; World Health Organization, 2007), and as knowledge about the disability continues to increase (e.g., Mukherjee, Hollins, & Turk, Reference Mukherjee, Hollins and Turk2006).
Performance Validity Testing
Performance validity tests (PVTs) form an important aspect of neuropsychological and cognitive assessment. This is particularly true in medicolegal, correctional, and forensic contexts, where there may be greater incentive to mislead an examiner in order to increase potential financial compensation or decrease legal consequences. Correspondingly, higher base rates of non-credible responding are often observed in these populations (Ardolf, Denney, & Houston, Reference Ardolf, Denney and Houston2007; Bush, Heilbronner, & Ruff, Reference Bush, Heilbronner and Ruff2014; Bush et al., Reference Bush, Ruff, Tröster, Barth, Koffler, Pliskin, Reynolds and Silver2005; Larrabee, Reference Larrabee2003). PVTs should be sensitive to effort, while remaining robust to the effects of cognitive impairment in order to accurately differentiate individuals with true deficits from those with non-credible responding (Bain & Soble, Reference Bain and Soble2017; Dwyer, Reference Dwyer1996). Best practice guidelines underscore the importance of selecting PVTs with established psychometric properties for both examinee and setting (e.g., clinical, medicolegal) and suggest using multiple measures over the course of an evaluation (Bush et al., Reference Bush, Heilbronner and Ruff2014, Reference Bush, Ruff, Tröster, Barth, Koffler, Pliskin, Reynolds and Silver2005). These considerations are particularly critical in legal contexts, where any psychological measures introduced during court proceedings must meet standards for evidentiary admissibility (e.g., Daubert v. Merrell Dow Pharmaceuticals, 1993; R. v. Peters, 2011; R v. Mohan, 1994).
Many PVTs have demonstrated sound psychometric properties in both clinical and forensic settings. However, their classification accuracy tends to be lower in populations marked by severe neurocognitive deficits, including individuals with intellectual disability (ID), dementia, Alzheimer’s disease, and traumatic brain injury (TBI) (Bain & Soble, Reference Bain and Soble2017; Bain et al., Reference Bain, Soble, Webber, Messerly, Bailey, Kirton and McCoy2019; Bigler, Reference Bigler2012; Dean, Victor, Boone, & Arnold, Reference Dean, Victor, Boone and Arnold2008; Glassmire, Wood, Ta, Kinney, & Nitch, Reference Glassmire, Wood, Ta, Kinney and Nitch2019; Merten, Bossink, & Schmand, Reference Merten, Bossink and Schmand2007; Zenisek, Millis, Banks, & Miller, Reference Zenisek, Millis, Banks and Miller2016). For instance, participants with confirmed neurocognitive deficits often present with lower overall scores and high failure rates using established cutoff scores on PVTs, including Reliable Digit Span (RDS), Word Memory Test (WMT), Coding age-corrected scaled score (CD ACSS), Symbol Search ACSS (SS ACSS), and Coding-Symbol Search ACSS (CD–SS ACSS) (Dean et al., Reference Dean, Victor, Boone and Arnold2008; Erdodi et al., Reference Erdodi, Abeare, Lichtenstein, Tyson, Kucharski, Zuccato and Roth2017; Merten et al., Reference Merten, Bossink and Schmand2007; Zenisek et al., Reference Zenisek, Millis, Banks and Miller2016). They also tend to fail more PVTs when multiple measures are administered, compared to individuals without severe neurocognitive deficits (Dean et al., Reference Dean, Victor, Boone and Arnold2008; Merten et al., Reference Merten, Bossink and Schmand2007; Zenisek et al., Reference Zenisek, Millis, Banks and Miller2016). Consistent with this, commonly used PVTs, such as RDS and Logical Memory II Recognition (LM-II-R), have often demonstrated inadequate specificity (defined as less than 90%; Boone, Reference Boone2013) and/or sensitivity (above 40% but closer to 70%; Boone, Reference Boone2013) for identifying non-credible responding in individuals with neurocognitive deficits (Bain et al., Reference Bain, Soble, Webber, Messerly, Bailey, Kirton and McCoy2019; Dean et al., Reference Dean, Victor, Boone and Arnold2008; Schroeder, Twumasi-Ankrah, Baade, & Marshall, Reference Schroeder, Twumasi-Ankrah, Baade and Marshall2012).
There is some evidence to suggest that PVT performance is associated with overall intellectual ability, where samples with low IQ perform worse on commonly used PVTs and produce more failing scores, compared to those with preserved overall intellectual functioning (Dean et al., Reference Dean, Victor, Boone and Arnold2008; Glassmire et al., Reference Glassmire, Wood, Ta, Kinney and Nitch2019; Merten et al., Reference Merten, Bossink and Schmand2007; Zenisek et al., Reference Zenisek, Millis, Banks and Miller2016). However, low IQ on its own may be insufficient to explain failure on PVTs, suggesting that other mechanisms may contribute to poor performance in populations with high failure rates, such as a cumulative or interactive combination of cognitive deficits (Flaro, Green, & Robertson, Reference Flaro, Green and Robertson2007; Green & Flaro, Reference Green and Flaro2015; Love, Glassmire, Zanolini, & Wolf, Reference Love, Glassmire, Zanolini and Wolf2014; Shandera et al., Reference Shandera, Berry, Clark, Schipper, Graue and Harp2010; Simon, Reference Simon2007). False positives have adverse clinical and practical implications, as they may influence diagnostic accuracy, and consequently prevent access to appropriate treatment and services. As a result, some guidelines suggest that certain clinical groups, including individuals with ID and dementia, be exempt from PVTs (Boone, Reference Boone2013).
Performance Validity Testing and FASD
Given the severe neurocognitive deficits linked with FASD, there may be an increased risk of improperly identifying individuals with the disability as non-credible responders when using PVTs. While experts caution against the use of PVTs for individuals with neurocognitive deficits, limited knowledge about FASD among practitioners, coupled with high rates of ‘missed diagnosis’ and the relative invisibility of the disability, suggest that clinicians may unknowingly use PVTs with this population (Astley, Reference Astley2010; Cox, Clairmont, & Cox, Reference Cox, Clairmont and Cox2008; May et al., Reference May, Chambers, Kalberg, Zellner, Feldman, Buckley, Kopald, Hasken, Xu, Honerkamp-Smith, Taras, Manning, Robinson, Adam, Abdul-Rahman, Vaux, Jewett, Elliott, Kable, Akshoomoff, Falk, Arroyo, Hereld, Riley, Charness, Coles, Warren, Jones and Hoyme2018; Sokol, Delaney-Black, & Nordstom, Reference Sokol, Delaney-Black and Nordstom2003). However, to our knowledge, there is limited evidence to support PVT validity among adults with FASD, in addition to justice-involved adults, despite high rates of cognitive impairment and frequent PVT use in these populations (Farrer & Hedges, Reference Farrer and Hedges2011; Hellenbach, Karatzias, & Brown, Reference Hellenbach, Karatzias and Brown2017; LaDuke, Brodale, & Rabin, Reference LaDuke, Brodale and Rabin2016). In the FASD diagnostic context, inaccurate identification of invalid performance may lead to missed diagnosis, poor understanding of cognitive deficits and needs, and limited access to appropriate services. In legal contexts, consequences may be particularly serious and could include lengthier incarceration terms or inability to access appropriate defense or legal safeguards (e.g., fitness to stand trial). Research examining PVTs in cognitively impaired populations provides a helpful starting point for considering FASD PVT validity. However, FASD may be distinguished from other neurodevelopmental and cognitive disorders based on phenotypic variability, often preserved overall intellectual ability, high comorbidity with physical and mental health conditions, and high rates of criminal justice-involvement, highlighting the need for focused study in this group (Pei et al., Reference Pei, Denys, Hughes and Rasmussen2011; Popova et al., Reference Popova, Lange, Shield, Mihic, Chudley, Mukherjee, Bekmuradov and Rehm2016; Streissguth et al., Reference Streissguth, Bookstein, Barr, Sampson, O’Malley and Young2004).
A limited number of studies have examined PVT performance patterns among individuals with FASD. Two studies focusing on the WMT and Medical Symptom Validity Test (MSVT, Green, Reference Green2003) have shown that both children and adults with FASD performed better on these measures and failed less frequently compared to those with mild TBI (Green, Montijo, & Brockhaus, Reference Green, Montijo and Brockhaus2011; Larson et al., Reference Larson, Flaro, Peterson, Connery, Baker and Kirkwood2015). Similarly, within the WMT standardization sample, a subset of 19 youth with fetal alcohol syndrome scored above the clinical cutoff on all three effort subtests, on average, suggesting valid performance (Green, Reference Green2003). While these findings may provide preliminary support for the valid use of the WMT in children and adolescents with FASD, research examining PVT performance in adults with FASD is limited. Moreover, given the importance of using multiple measures to assess PVT validity, additional research is needed to disentangle performance patterns in this population across a wider range of measures.
CURRENT STUDY
The current study sought to evaluate the validity of 10 commonly used PVT scores in a sample of justice-involved adults diagnosed with FASD or possible FASD, compared to a control group of adults in the criminal justice system (CJS) who did not have FASD. Based on findings from other neurocognitively compromised populations, we expected that individuals with diagnosed and possible FASD would show worse performance and higher failure rates on PVTs, compared to those without FASD (e.g., Dean et al., Reference Dean, Victor, Boone and Arnold2008; Merten et al., Reference Merten, Bossink and Schmand2007).
METHOD
Participants
Data for this study were drawn from a larger project (McLachlan et al., Reference McLachlan, McNiel, Pei, Brain, Andrew and Oberlander2019). Participants included 80 justice-involved adults from a Northern Canadian correctional jurisdiction. Participants were consecutively recruited from both jail and community-based criminal justice settings and each had current legal involvement or were in custody either pre- or post-adjudication. Recruitment occurred over an 18-month period, using information sessions, posters, and direct referrals by probation officers and case managers. In total, 174 prospective participants were approached by the research team, 45 declined participation, and 50 were deemed ineligible, primarily due to age >40 years or discontinued contact with the research team. Individuals who were under a review board supervision order or considered medically or psychiatrically unstable were also excluded from this study.
Participants provided written informed consent, and study procedures were reviewed and approved by the Children’s and Women’s Research Ethics Board at the University of British Columbia, and Research Ethics Board at the University of Guelph. Although participants were provided an incentive commensurate with time spent participating in the study, their performance on this assessment was not tied to a clinical or legal outcome, thereby potentially diminishing external gains associated with intentional non-credible responding.
Overall, the sample was predominantly male and ranged from 18 to 40 years (M = 29.38, SD = 5.34) (Table 1). Participants were primarily assessed in a correctional facility (n = 70, 87.5% incarcerated) and two-thirds were awaiting adjudication at the time of study (n = 50, 62.5%). In total, 14 participants (17.5%) were diagnosed with FASD, and FASD was ruled out in 55 cases (69%). Another 11 individuals (13.8%) presented with neurocognitive impairment consistent with FASD; however, an FASD diagnosis could not be confirmed owing to inadequate information concerning PAE in most cases (required for diagnosis). Given similar neurocognitive presentations (e.g., cognition, academic skills, attention, memory, executive function, adaptive skills) between the confirmed and possible FASD groups, they were combined in the current study. Therefore, the FASD group includes 25 individuals with a confirmed or possible FASD diagnosis, while the criminal justice (CJ) group includes 55 individuals for whom a diagnosis of FASD was ruled out.
Table 1. Demographic characteristics
![](https://static.cambridge.org/binary/version/id/urn:cambridge.org:id:binary:20200810125557142-0748:S1355617720000132:S1355617720000132_tab1.png?pub-status=live)
WAIS-IV = Wechsler Adult Intelligence Scale-IV (Wechsler, Coalson, & Raiford, Reference Wechsler, Coalson and Raiford2008). IQ = Full Scale Intelligence Quotient.
a N = 77 as raw IQ scores were not available for three participants.
*p < .05, **p < .001.
Procedure
Participants completed a comprehensive evaluation for FASD undertaken by a multidisciplinary team that adhered to the 2005 Canadian Diagnostic Guidelines for FASDFootnote 3 (Chudley et al., Reference Chudley, Conry, Cook, Loock, Rosales and Leblanc2005). This included a semi-structured interview canvassing personal, social, and medical history, analysis of three-digit facial photographs for sentinel facial features, medical assessment, and a comprehensive psychological assessment completed by psychologists with supervision from expert neuropsychologists. Features of FASD (e.g., growth restriction, facial features, neurocognitive deficits, and PAE) were ranked and identified according to recommended cutoff scores, and diagnostic decisions were made following an interdisciplinary case conference (Chudley et al., Reference Chudley, Conry, Cook, Loock, Rosales and Leblanc2005). The larger cognitive test battery included 10 PVT scores from both stand alone and embedded measures, including the WMT, Genuine Memory Impairment Profile (GMIP), RDS, Digit Span age-corrected scaled score (DS ACSS), CD ACSS, SS ACSS, CD-SS ACSS, Vocabulary-Digit Span ACSS (VC-DS ACSS), LM-II-R and Word Choice (WC).
Measures
Word Memory Test (WMT; Green, Reference Green2003)
The WMT is a stand-alone PVT comprising multiple effort indicators in the context of a verbal memory task. In the current study, three WMT subtests were administered, including Immediate Recall (IR), Delayed Recall (DR) and Consistency (CNS). IR and DR measure an individual’s ability to remember a list of 20 word-pairs immediately after exposure (IR) and at a 30 minute delay (DR), whereas CNS provides a measure of response consistency from IR to DR. A score <82.5% correct on IR, DR, or CNS is classified as failure (Green, Reference Green2003). Research suggests that the primary WMT classification decision is relatively insensitive to neurological diseases and memory impairment (Green, Reference Green2003). In the current study, we applied the standard <82.5% cutoff score for IR, DR, or CNS. However, this was not used to calculate the total number of measures failed and was instead replaced with GMIP.
Genuine Memory Impairment Profile (GMIP; Green et al., Reference Green, Montijo and Brockhaus2011)
GMIP is an alternative WMT criterion designed to reduce false positives in cognitively impaired populations by differentiating performance below standard cutoff scores (Alverson, O’Rourke, & Soble, Reference Alverson, O’Rourke and Soble2019; Rienstra, Twennaar, & Schmand, Reference Rienstra, Twennaar and Schmand2013). Criteria for an invalid GMIP profile involves failure on ≥1 effort subtest (IR, DR, CNS) and a discrepancy ≥30 between the means of WMT effort and memory subtests (multiple choice, paired associates, free recall) (Green et al., Reference Green, Montijo and Brockhaus2011). Research suggests that the GMIP results in lower failure rates than the WMT, and adequate specificity and sensitivity in clinical samples with mild cognitive impairment (Alverson et al., Reference Alverson, O’Rourke and Soble2019; Green et al., Reference Green, Montijo and Brockhaus2011). In the current study, we applied the standard cutoff criterion involving <82.5% on IR, DR, or CNS and ≥30 discrepancy, which was used to calculate the total number of measures failed. We also applied adjusted discrepancy criteria of ≥35, 40, and 45.
Wechsler Adult Intelligence Scale – Fourth Edition (WAIS-IV; Wechsler et al., Reference Wechsler, Coalson and Raiford2008)
The WAIS-IV is an overall measure of intellectual functioning for adults. In the current study, six commonly used embedded WAIS-IV PVT scores were considered, including RDS, DS ACSS, CD ACSS, SS ACSS, CD-SS ACSS, and VC-DS ACSS. Digit Span (DS) provides a measure of attention and working memory, from which a commonly used PVT can be derived by combining the total number of digits correctly recalled on two successive trials of both DS forward and DS backward. Previously, RDS ≤7 was considered indicative of invalid performance (Axelrod, Fichtenberg, Millis, & Wertheimer, Reference Axelrod, Fichtenberg, Millis and Wertheimer2006; Schroeder et al., Reference Schroeder, Twumasi-Ankrah, Baade and Marshall2012; Zenisek et al., Reference Zenisek, Millis, Banks and Miller2016). More recent research suggests that a cutoff ≤6 yields improved specificity, though caution is advised in populations with ID and severe memory impairment (Schroeder et al., Reference Schroeder, Twumasi-Ankrah, Baade and Marshall2012; Webber & Soble, Reference Webber and Soble2018). Using a mean cutoff score of 7.1 across 24 studies, RDS has shown moderate sensitivity (63%), good specificity (86%), and an overall hit rate of 76% in distinguishing valid from suboptimal effort (Jasinski, Berry, Shandera, & Clark, Reference Jasinski, Berry, Shandera and Clark2011). In the current study, we applied the ≤7 cutoff, which was used to calculate the total number of measures failed among participants, as well as the ≤6 cutoff for exploratory purposes.
DS ACSS has also been applied as a PVT, with scaled scores ≤5 suggesting invalid performance (Webber & Soble, Reference Webber and Soble2018). Evidence suggests that DS ACSS may be as effective as RDS and potentially superior among older (e.g., 39–69) clinical groups and those at higher risk of neurocognitive impairment (Jasinski et al., Reference Jasinski, Berry, Shandera and Clark2011; Reese, Suhr, & Riddle, Reference Reese, Suhr and Riddle2012; Spencer et al., Reference Spencer, Axelrod, Drag, Waldron-Perrine, Pangilinan and Bieliauskas2013, Reference Spencer, Waldron-Perrine, Drag, Pangilinan, Axelrod and Bieliauskas2017; Webber & Soble, Reference Webber and Soble2018). In a sample of veterans referred to a neuropsychological clinic, DS ACSS significantly predicted group membership (e.g., valid vs. invalid performance) with an AUC of .85 (Webber & Soble, Reference Webber and Soble2018). In the current study, we applied the ≤5 cutoff score for DS ACSS.
Additional WAIS-IV ACSS have been evaluated as embedded PVTs, including Coding (CD) and Symbol Search (SS), both measures of processing speed. CD ACSS ≤5 has shown good specificity (.90–1.00), but low and variable sensitivity (.04–.64) for identifying invalid performance in mixed clinical samples (Erdodi et al., Reference Erdodi, Abeare, Lichtenstein, Tyson, Kucharski, Zuccato and Roth2017; Erdodi & Lichtenstein, Reference Erdodi and Lichtenstein2017). SS ACSS scores ≤6 have shown similarly variable sensitivity (.38–.64) and good specificity (.88–.93) in a mixed clinical sample (Erdodi et al., Reference Erdodi, Abeare, Lichtenstein, Tyson, Kucharski, Zuccato and Roth2017), though other studies have found that SS ACSS failed to reach minimum specificity against other validated PVTs (Erdodi & Lichtenstein, Reference Erdodi and Lichtenstein2017). CD-SS ACSS is another embedded WAIS-IV PVT calculated by taking the difference between the the CD and SS ACSS. Difference scores ≥3 on this measure have been shown to yield adequate specificity (.94) in forensic patients with schizophrenia, including a subset with a general ability index (GAI) between 70 and 79 (.97) and GAI ≤ 69 (.88) (Glassmire et al., Reference Glassmire, Wood, Ta, Kinney and Nitch2019). In the current study, we applied the ≤5 criteria for CD ACSS; ≤6 for SS ACSS; and ≥3 for CD-SS ACSS.
The WAIS-IV Vocabulary (VC) subtest measures word knowledge and verbal concept formation and has been compared with DS as a possible PVT, with VC-DS ACSS differences ≥3 reflecting invalid performance. Mittenberg et al. (Reference Mittenberg, Theroux-Fichera, Zielinski and Heilbronner1995) found that VC-DS ACSS accurately classified 71% of cases instructed to provide invalid performance. Moreover, in a sample of 151 adults referred for neuropsychological assessment, Greve and colleages (Reference Greve, Bianchini, Mathias, Houston and Crouch2003) found that the measure had good sensitivity (.67) and specificity (.80) for identifying invalid performance by participants with FSIQ ≥85, but poor sensitivity for those with FSIQ <85. In the current study, we applied the standard ≥3 cutoff score for VC-DS ACSS.
Logical Memory II Recognition (WMS-IV; Wechsler, Holdnack, & Drozdick, Reference Wechsler, Holdnack and Drozdick2009)
The Wechsler Memory Scale - Fourth Edition (WMS-IV) is a neuropsychological test designed to assess memory in adults and comprises several PVTs, including one used in the current study. LM-II-R provides a measure of delayed verbal memory using a dichotomous recognition format, and most neurologically healthy examinees perform well on this task. LM-II-R has been applied as a PVT, with unexpectedly low raw scores (≤20) indicating invalid performance (Bortnik et al., Reference Bortnik, Boone, Marion, Amano, Ziegler, Victor and Zeller2010). Within the WMS-IV standardization sample, fewer than 25% of the clinical sample achieved a score indicative of poor effort, resulting in an accuracy rate of 67% (Pearson Assessment, 2009). In the current study, we applied the ≤20 cutoff criterion.
Word Choice (ACS; Pearson Assessment, 2009)
Advanced Clinical Solutions (ACS) is a test battery designed to enhance the clinical utility of the WAIS-IV and WMS-IV. The ACS WC subtest is a stand-alone PVT that uses a forced-choice recognition memory paradigm. Using a criterion of ≤42 (raw score) on WC has shown good classification accuracy (86%) for individuals without cognitive impairment (Bain & Soble, Reference Bain and Soble2017; Barhon, Batchelor, Meares, Chekaluk, & Shores, Reference Barhon, Batchelor, Meares, Chekaluk and Shores2015; Miller et al., Reference Miller, Millis, Rapport, Bashem, Hanks and Axelrod2011). However, classification accuracy is thought to be lower for individuals with cognitive impairment (69%), owing to reduced sensitivity (Bain & Soble, Reference Bain and Soble2017; Davis, Reference Davis2014). In the current study, we applied the standard ≤42 cutoff criterion.
Data Analysis
Participant characteristics and between-group comparisons on PVTs for dichotomous scores (pass/fail) were compared using t-tests and chi-square analyses. Failure on each of the PVTs was established using standard cutoff scores (see Measures). The total number of PVTs failed was calculated by summing the number of ‘failure’ classifications across nine PVT scores, excluding WMT (see Measures). Practice guidelines suggest that failing ≥2 PVTs within a battery of measures is indicative of invalid performance, though others recommend a more stringent criteria of ≥3 (Erdodi et al., Reference Erdodi, Hurtubise, Charron, Dunn, Enache, McDermott and Hirst2018; Larrabee, Reference Larrabee2008; Victor, Boone, Serpa, Buehler, & Ziegler, Reference Victor, Boone, Serpa, Buehler and Ziegler2009). As a result, we examined the number of participants failing ≥1, 2, 3, and 4 PVTs within both participant groups. We also undertook exploratory GMIP analyses to identify a potential alternative percentage point difference for classifying suboptimal effort in this sample (e.g., differences ≥35, 40, 45). Failure rates are also presented for participants with IQ ≥70 and <70, based on clinical diagnostic criteria for ID (Carr & O’Reilly, Reference Carr, O’Reilly, Carr, Linehan, O’Reilly, Noonan Walsh and McEvoy2016). This dichotomy was used to draw comparisons between individuals with low and higher IQ but was understood to be clinically artificial given that many other considerations factor into a diagnosis of ID. Effect sizes are reported for all analyses, including Cohen’s d (small = .2, medium = .5, large = .8), and phi (small = .1, medium = .3, large = ≥.35) (Cohen, Reference Cohen1988). 95% confidence intervals are also reported. Statistical analyses were conducted using IBM SPSS version 25.0 for Mac.
RESULTS
Sample characteristics
There were few demographic differences between the groups, although participants with confirmed or possible FASD were, on average, 3 years younger, and presented with lower average IQ than the CJ group. In addition, substantially more participants in the FASD group (n = 19, 79%) had IQ <70 on a standard measure of intellectual functioning compared to the CJ group (n = 10, 19%) χ 2(1) = 25.58, p <.001, ϕ=.58. The average IQ for the CJ group was approximately one-standard deviation below the general population average (M = 83.08, SD = 12.29).
PVT Performance
Participants in the FASD group performed substantially worse on most PVTs compared to both the CJ group and in reference to published scores for individuals with and without severe cognitive impairment (Table 2). Scores for the CJ group also tended to be lower compared to published neurotypical scores and were comparable to populations with severe cognitive deficits (Table 2). Failure rates in the FASD group were highest on DS ACSS (76%), LM-II-R (68%) and CD ACSS (60%), and lowest on WC (4%) and RDS (13%).
Table 2. PVT performance
![](https://static.cambridge.org/binary/version/id/urn:cambridge.org:id:binary:20200810125557142-0748:S1355617720000132:S1355617720000132_tab2.png?pub-status=live)
Note. FASD = fetal alcohol spectrum disorder; CJ = criminal justice sample; WMT = Word Memory Test (Green, Reference Green2003); GMIP = Genuine Memory Impairment Profile (Green et al., Reference Green, Montijo and Brockhaus2011); IR = Immediate Recall; DR = Delayed Recall; CNS = Consistency; ACS = Advanced Clinical Solutions (Pearson Assessment, 2009); RDS = Reliable Digit Span; DS ACSS = Digit Span age-corrected scaled score; CD ACSS = Coding age-corrected scaled score; SS ACSS = Symbol Search age-corrected scaled score; CD-SS ACSS = Coding - Symbol Search age-corrected scaled score; VC-DS ACSS = Vocabulary - Digit Span age-corrected scaled score; WMS-IV = Wechsler Memory Scale, Third Edition (Wechsler et al., Reference Wechsler, Holdnack and Drozdick2009); LM-II-R = Logical Memory II Recognition (Wechsler et al., Reference Wechsler, Holdnack and Drozdick2009).
aGreen (Reference Green2003), bGreen, Lees-Haley, and Allen (Reference Green, Lees-Haley and Allen2003), c Pearson Assessment (2009), d Axelrod, Fichtenberg, Millis, and Wertheimer (Reference Axelrod, Fichtenberg, Millis and Wertheimer2006), eSchroeder et al. (Reference Schroeder, Twumasi-Ankrah, Baade and Marshall2012), fWebber and Soble (Reference Webber and Soble2018), gErdodi et al. (Reference Erdodi, Abeare, Lichtenstein, Tyson, Kucharski, Zuccato and Roth2017), hGlassmire, Wood, Ta, Kinney, and Nitch (Reference Glassmire, Wood, Ta, Kinney and Nitch2019), iMittenberg, Theroux-Fichera, Zielinski, and Heilbronner (Reference Mittenberg, Theroux-Fichera, Zielinski and Heilbronner1995), jBortnik et al. (Reference Bortnik, Boone, Marion, Amano, Ziegler, Victor and Zeller2010), kAlverson, O’Rourke, and Soble (Reference Alverson, O’Rourke and Soble2019), lBain and Soble (Reference Bain and Soble2017), mStrauss et al. (Reference Strauss, Slick, Levy-Bencheton, Hunter, MacDonald and Hultsch2002), nAshendorf, Clark, and Sugarman (Reference Ashendorf, Clark and Sugarman2017), oBrockhaus and Merten (Reference Brockhaus and Merten2004), pMiller et al. (Reference Miller, Millis, Rapport, Bashem, Hanks and Axelrod2011), qMarshall and Happe, (Reference Marshall and Happe2007).
n = 67–80 due to missing data.
*p < .05, **p < .001.
Participants with FASD failed more PVTs (M = 3.52, SD = 1.29, range 1–6) compared to those in the CJ group (M = 1.51, SD = 1.37, 0–5), t(78) = −6.18, p < .001, d =1.51, 95% CI = −2.66, −1.36 (Figure 1). All participants in the FASD group failed ≥1 PVT, and all but two failed ≥2 (n = 23, 92%). In contrast, substantially fewer participants in the CJ group failed any single PVT indicator (n = 39, 71%), χ 2 (1) = 9.09, p = .003, ϕ = .34. They were also less likely to fail ≥2 PVTs (n = 25, 46%) compared to the FASD group, χ 2 (1) = 15.52, p < .001, ϕ = .44. Using more stringent criteria, 80% of participants in the FASD group (n = 20) failed ≥3 PVTs, compared to only 22% of those in the CJ group (n = 12), χ 2 (1) = 24.24, p <.001, ϕ = .55. Over half of participants in FASD group (n = 13, 52%) failed ≥4 PVTs, compared to only 9% (n = 5) in CJ group, χ 2 (1) = 18.15, p<.001, ϕ = .48. Last, a greater proportion of the FASD group (e.g., 35–76%) was classified in the ‘fail’ range on five PVTs (WMT, DS ACSS, CD ACSS, SS ACSS, and LM-II-R), compared to the CJ group (9–35%) (Table 2).
![](https://static.cambridge.org/binary/version/id/urn:cambridge.org:id:binary:20200810125557142-0748:S1355617720000132:S1355617720000132_fig1.png?pub-status=live)
Figure 1. Number of participants failing 0–6 PVTs (N = 80)
Using the GMIP, failure rates decreased from 35% to 20% for the FASD group, and from 9% to 7% for the CJ group. Increasing the difference criterion for the GMIP from 30 to 35 resulted in a reduced failure rate of 16% (n = 4) for the FASD group, and no change for the CJ group (7%, n = 4). Further increasing the difference range to 40 points resulted in an additional lowering of the failure rate for both the FASD (12%, n = 3) and CJ (0%) groups. Increasing the difference criterion to 45 points resulted in only a marginal change in failure rate for the FASD group (e.g., 8%, n = 2).
Examining failure rates for individuals with low and high IQ scores, we found that substantially more participants with IQ <70 failed ≥1 PVT(s) (n = 29, 100%), compared to those with IQ ≥70 (n = 33, 69%) χ 2 (1) = 11.26, p = .001, ϕ = .38. Similarly, 90% (n = 26) of participants with IQ <70 failed ≥2 PVTs, compared to fewer than half of those with IQ ≥70 (n = 21, 44%) χ 2 (1) = 16.02, p < .001, ϕ = .46. Of those with IQ ≥70 who failed ≥2 PVTs, 19% (n = 4) had diagnosed/possible FASD, and 81% (n = 17) were not diagnosed with FASD.
DISCUSSION
Assessing PVT validity is critical in the context of neuropsychological and cognitive evaluation, particularly in forensic and medicolegal contexts (Bush et al., Reference Bush, Ruff, Tröster, Barth, Koffler, Pliskin, Reynolds and Silver2005; Larrabee, Reference Larrabee2003). The current study undertook a novel investigation of 10 commonly used PVT scores in justice-involved adults with diagnosed/possible FASD. Consistent with studies evaluating PVT validity in adults with a range of neurocognitive deficits, we found worse performance across multiple PVT indicators for individuals with diagnosed and possible FASD, compared to CJ controls (Dean et al., Reference Dean, Victor, Boone and Arnold2008; Merten et al., Reference Merten, Bossink and Schmand2007; Zenisek et al., Reference Zenisek, Millis, Banks and Miller2016). Almost all participants in the FASD group met criteria for non-credible responding based on the ‘two-or-more’ guideline, and more than half met criteria for suboptimal effort based on the ‘three-or-more’ guideline (Larrabee, Reference Larrabee2008; Victor et al., Reference Victor, Boone, Serpa, Buehler and Ziegler2009). Thus, participants with FASD were more likely to be identified as having provided invalid performance based on a series of nine PVT scores. This is consistent with a large body of research examining PVT use in groups with severe cognitive impairment, who show higher failure rates compared to unimpaired populations, and inadequate sensitivity and specificity for identifying invalid performance in these populations (Bain et al., Reference Bain, Soble, Webber, Messerly, Bailey, Kirton and McCoy2019; Dean et al., Reference Dean, Victor, Boone and Arnold2008; Merten et al., Reference Merten, Bossink and Schmand2007; Soble et al., Reference Soble, Bain, Bailey, Kirton, Marceaux, Critchfield, McCoy and O’Rourke2018; Zenisek et al., Reference Zenisek, Millis, Banks and Miller2016). This finding also highlights the importance of considering the relation between PVTs when multiple tests are administered, in order to avoid over-administration of similar measures, which may result in inflated failure rates (Berthelson, Mulchan, Odland, Miller, & Mittenberg, Reference Berthelson, Mulchan, Odland, Miller and Mittenberg2013; Odland, Lammy, Martin, Grote, & Mittenberg, Reference Odland, Lammy, Martin, Grote and Mittenberg2015). Moreover, it is noteworthy that individuals with FASD performed worse on PVTs compared to CJ controls. This finding suggests that the deficits associated with FASD may increase the likelihood of PVT failure even when compared to other cognitively impaired populations.
Consistent with findings that the WMT may have inadequate specificity and classification accuracy in the context of cognitive impairment, participants in the FASD group were more likely to fail this measure using standard cutoff criterion, compared to the CJ group (Allen, Bigler, Larsen, Goodrich-Hunsaker, & Hopkins, Reference Allen, Bigler, Larsen, Goodrich-Hunsaker and Hopkins2007; Allen, Wu, & Bigler, Reference Allen, Wu and Bigler2011; Greve, Ord, Curtis, Bianchini, & Brennan, Reference Greve, Ord, Curtis, Bianchini and Brennan2008; Merten et al., Reference Merten, Bossink and Schmand2007). This finding stands in contrast to that of Larson and colleagues (Reference Larson, Flaro, Peterson, Connery, Baker and Kirkwood2015), who found low failure rates on the MSVT in a sample of children and adolescents with FASD. Several possible factors may account for this difference, including higher rates of poor health and cognitive impairment in the current justice-involved sample, compared to children and adolescents referred to a private practice for neuropsychological assessment (Larson et al., Reference Larson, Flaro, Peterson, Connery, Baker and Kirkwood2015). On the other hand, it is possible that the MSVT is more robust to cognitive impairment than its original counterpart. Indeed, findings from this study suggest that the GMIP may be more appropriate for use in FASD populations, given comparable failure rates to the CJ group, and lower failure rates compared to traditional WMT failure indicators. Nonetheless, in the current sample, it took considerable adjustment to the GMIP difference criterion in order to achieve lower failure rates. Moreover, it is not possible to know whether those identified as providing inadequate effort using the traditional GMIP represent false positives given the absence of an external criterion to validate classification. Thus, our findings suggest that using adjusted criteria developed for other clinically impaired populations, such as the GMIP, may not adequately protect against the risk of false positives for individuals with FASD, and further research is encouraged.
Participants in the FASD group were also more likely to fail a number of embedded PVTs, including DS ACSS, CD ACSS, SS ACSS, and LM-II-R. This finding is consistent with suggestions that embedded PVTs may be less robust to cognitive impairment compared to stand-alone PVTs (Zenisek et al., Reference Zenisek, Millis, Banks and Miller2016). For instance, findings regarding the utility of LM-II-R appear mixed, and recent studies have shown that it has inadequate classification accuracy compared to other commonly used PVTs (e.g., WC) and mixed specificity and sensitivity in clinical samples, including those with cognitive impairment (Bain et al., Reference Bain, Soble, Webber, Messerly, Bailey, Kirton and McCoy2019; Erdodi, Tyson, et al., Reference Erdodi, Tyson, Abeare, Zuccato, Rai, Seke, Sagar and Roth2018; Greve et al., Reference Greve, Ord, Curtis, Bianchini and Brennan2008; Miller et al., Reference Miller, Millis, Rapport, Bashem, Hanks and Axelrod2011; Soble et al., Reference Soble, Bain, Bailey, Kirton, Marceaux, Critchfield, McCoy and O’Rourke2018; Webber & Soble, Reference Webber and Soble2018). Thus, the current findings highlight the need for further research examining performance patterns using embedded PVTs in populations with severe cognitive impairment, given their inextricable relation to cognitive ability. Moreover, caution may be warranted when using these measures in the context of severe cognitive impairment until further research is undertaken to explore their utility in these populations.
In evaluating the extent to which overall intellectual functioning contributes to PVT performance in CJ adults with and without FASD, the majority of participants with IQ <70 failed ≥2 PVTs. However, nearly half of participants with IQ ≥70 also failed ≥2 PVTs. Functionally, this may suggest that low IQ alone is not a sufficient predictor of poor PVT performance in individuals with FASD. This is consistent with previous findings wherein patterns of failure and false positives have varied substantially between participants of varying cognitive abilities (Flaro et al., Reference Flaro, Green and Robertson2007; Green & Flaro, Reference Green and Flaro2015; Love et al., Reference Love, Glassmire, Zanolini and Wolf2014; Smith et al., Reference Smith, Boone, Victor, Miora, Cottingham, Ziegler, Zeller and Wright2014).
While the finding that individuals with FASD performed worse compared to those without FASD may be unsurprising in the context of a large body of research that cautions against the use of PVTs in cognitively impaired populations, the challenges associated with identification and diagnosis for individuals with FASD suggest that clinicians may be unknowingly using PVTs with this population. In the context of limited knowledge concerning FASD among clinicians, coupled with high rates of undiagnosed cases, the risk of potentially invalid PVT interpretation in this population may be significant and lead to inaccurate conclusions regarding invalid performance. Given the lack of external incentives linked with participants’ performance in the context of this study as well as the variability of performance across measures, it is possible that the high failure rates for individuals with possible and diagnosed FASD could represent false positives.
Limitations and Future Directions
The current study uniquely contributes to the literature on the use of common, clinically normed PVTs in adults with and without FASD recruited from the CJS. However, some limitations should be noted. The study sample size was small and geographically unique, suggesting that results best generalize to similar correctional jurisdictions and warrant further study before being applied to clinical, non-CJ populations. Although a comprehensive “gold standard” FASD evaluation was completed for all participants, this study did not control for additional diagnoses or impairments that may have impacted individuals’ performance on PVTs, which may be particularly relevant in correctional populations (Farrer & Hedges, Reference Farrer and Hedges2011; Hellenbach et al., Reference Hellenbach, Karatzias and Brown2017). Therefore, there is a need for further study in larger samples and exploring contributing mechanisms. Finally, the unique research design was thought to limit external incentives associated with performance, and therefore, motivation for non-credible responding. However, participants did not necessarily have incentive to perform the best of their ability and may still have had motivation perform poorly (An, Kaploun, Erdodi, & Abeare, Reference An, Kaploun, Erdodi and Abeare2017; Erdodi et al., Reference Erdodi, Hurtubise, Charron, Dunn, Enache, McDermott and Hirst2018). As a result, additional research involving individuals with bona fide and feigned impairments associated with FASD is also needed, in order to assess the predictive validity and psychometrics of PVTs for this unique population. Moreover, future studies should aim to explore and propose alternative cutoff scores for PVT interpretation for use with individuals who may have FASD.
Implications
This study represents an important step towards understanding whether and how PVTs should be used for individuals with FASD in the criminal justice context, in addition to furthering the literature on PVT use for individuals with severe neurocognitive deficits. Ensuring valid interpretation of PVTs is critical given the negative potential consequences associated with failure, particularly in criminal and civil legal contexts. For example, mislabelling individuals with true cognitive deficits as having provided invalid effort may result in incorrect diagnosis and prevent access to appropriate treatment opportunities or resources. In forensic and medicolegal contexts, this may extend to finding that an examinee was uncooperative or engaging in overt misrepresentation of true functioning. In turn, this may result in a range of adverse legal outcomes, including conviction, restriction from injury benefits, or restriction from legal safeguards, such as fitness to stand trial. The current findings highlight the need for further research examining PVT use in this unique population. In addition, developing practice guidelines may prove helpful in informing PVT interpretation in adults with FASD, particularly in legal and forensic contexts. There is also a critical need for increased FASD training among professionals in order to prevent misdiagnosis and ensure that clinicians understand the complex relationship between neurocognitive impairment, and potentially, FASD and PVTs, to support appropriate treatment and intervention practices.
Acknowledgements
Funding for the primary study was provided by the Department of Justice, Territorial Government in the study jurisdiction. There are no conflicts of interest to report for the authors.
CONFLICTS OF INTEREST
The authors have nothing to disclose.