Introduction
Over the past several decades a growing body of research has demonstrated the occurrence of cognitive changes associated with systemic therapies for non-central nervous system (CNS) tumors (Ahles, Root, & Ryan, Reference Ahles, Root and Ryan2012; O'Farrel, Mackenzie, & Collins, 2013). Many people worldwide are confronted with non-CNS cancer. Age is a risk factor for cancer and given trends toward increased longevity in industrialized countries as well as improvements in survivorship, we are likely to see more and more cancer patients, many of whom may experience treatment associated cognitive dysfunction. Therefore, understanding cognitive changes associated with cancer treatments and their impact on quality of life and daily life functioning is important.
Numerous cross-sectional and longitudinal studies have demonstrated treatment-related cognitive decline in women with breast cancer that receive chemotherapy and/or endocrine therapy (see Wefel and Schagen, Reference Wefel and Schagen2012, for a review). In this patient group alone, over 60 neuropsychological studies have been published that have investigated whether chemotherapy is associated with cognitive impairment. The vast majority of the neuropsychological studies demonstrate cognitive impairment and/or cognitive decline in a subgroup of breast cancer patients treated with cytotoxic agents compared to breast cancer patients that did not receive chemotherapy or compared to non-cancer controls. Patients show deficits on a wide range of neuropsychological tests but core impairments are related to learning and memory, executive functions and psychomotor speed—indicative of a frontal-subcortical profile (Wefel, Vardy, Ahles, & Schagen, Reference Wefel, Vardy, Ahles and Schagen2011). Cognitive problems may arise during treatment and may persist years after completion of treatment. Endocrine therapy (consisting of selective estrogen receptor modulators and/or aromatase inhibitors) is frequently part of the treatment strategy for many women with breast cancer, and may also add to the occurrence of cognitive problems (Phillips et al., Reference Phillips, Ribi, Sun, Stephens, Thompson, Harvey and Bernhard2010; Schilder et al., Reference Schilder, Seynaeve, Beex, Boogerd, Linn, Gundy and Schagen2010). Preclinical studies as well as imaging studies have provided insights on the neurobiological changes associated with many chemotherapeutic agents that result in the cognitive changes that have been observed (Seigers, Schagen, van Tellingen, & Dietrich, Reference Seigers, Schagen, Van Tellingen and Dietrich2013; Seigers, & Fardell, 2011). Specifically, imaging studies have shown gray and white matter volume loss, altered white matter integrity, and abnormalities in functional activation as well as resting state network connectivity (Pomykala, de Ruiter, Deprez, McDonald, & Silverman, Reference Pomykala, de Ruiter, Deprez, McDonald and Silverman2013; Scherling, & Smith, 2013). Preclinical animal studies have shown increased apoptosis in healthy proliferating cells in the central nervous system as well as damage to neural precursor cells.
Despite this strong, multidimensional body of preclinical, human, and neuroimaging research supporting a neurobiological relationship between chemotherapy and cognitive decline, critics have asserted that poor neuropsychological test performance may be influenced by factors other than neurological disease, including conditions such as anxiety and depression, or inattentiveness secondary to various causes including limited motivation. Studies on the cognitive effects of cancer treatments have shown that anxiety and depression do not seem to drive the relationship between cognitive problems as assessed by neuropsychological tests and systemic therapies (Wefel & Schagen, Reference Wefel and Schagen2012).
In neuropsychological practice, performance validity testing (PVT) (Larrabee, Reference Larrabee2012) has been advanced as an important element in the evaluation of both litigating and non-litigating patient populations. In fact, many neuropsychologists recommend PVT to be included in standard practice (Boone, Reference Boone2009; Dandachi-Fitzgerald, Reference Dandachi-Fitzgerald2013). However, PVT failure has not been systematically explored in research on the cognitive effects of systemic therapies in non-CNS cancer patients. Various terms are used in the literature to describe both intentional and unintentional underperformance that may undermine the validity of a neuropsychological examination, and at present no clear consensus exists on the preferred descriptor. We choose to use the term “noncredible performance” derived from the consensus conference statement on neuropsychological assessment of effort, response bias and malingering (Heilbronner et al., 2009).
It is plausible to imagine situations in which noncredible performance on cognitive testing is likely in this patient population and thus could invalidate the results of these studies. In clinical practice, one must always be aware of the potential of a patient to intentionally underperform for purposes of external gain. Although it may seem unlikely to practitioners not routinely caring for patients in an oncologic setting, we have each experienced patients failing PVTs both in the context of overt external gain scenarios and outside of this context. Furthermore, many practicing in this area have experienced occasions when the medical team asserts the observed cognitive deficits are due to patient attempting to appear impaired rather than treatment-related neurotoxicity. Unlike in traumatic brain injury, to date there are no studies in cancer populations reporting on noncredible performance to assist practicing clinicians in discussing the relative frequency or infrequency of noncredible performance in this population. If noncredible performance in this patient population is frequent, reconsideration of the interpretation that cognitive decline can occur with chemotherapy would be essential.
It is also imperative to evaluate the possibility of noncredible performance in non-cancer control groups as many published studies on cognitive performance of cancer patients compare patients to either published normative data of healthy individuals or to a control group consisting of women without a history of cancer. Non-patient control populations may also produce noncredible performance on formal cognitive tests when participating in a study on the cognitive effects of a disease/treatment not applicable to themselves.
There is a large body of research on the use and the limitations of PVTs (Strauss, Sherman, & Green, Reference Strauss, Sherman and Green2006). Over the years, several methods have been developed to detect noncredible performance. Examples of PVTs include the Test of Memory Malingering (TOMM) (Tombaugh, Reference Tombaugh1996), the Amsterdam Short-Term Memory Test (ASTM) (Schmand & Lindeboom, 2003), the Hiscock Forced-Choice Test (Hiscock & Hiscock, Reference Hiscock and Hiscock1989) and the Word Memory Test (WMT) (Green, Reference Green2003); the latter being one of the most popular and best investigated free standing PVTs currently available.
Another method to detect noncredible performance is to derive effort indices from conventional neuropsychological tests, so-called embedded PVT measures. Examples of such measures are the WAIS-III Digit Span subtest, several indices of the California Verbal Learning Test, the Rey Complex Figure Test, and Warrington's Recognition Memory test (Strauss et al., Reference Strauss, Sherman and Green2006).
PVT cutoffs are available for both stand-alone tests and embedded indices. The sensitivity and specificity of these measures at specific cutoffs have been studied in naïve malingerers, in subjects instructed to simulate deficits, and in clinical populations (generally mild TBI populations).
The current study sought to define the frequency of noncredible performance in breast cancer patients before, during and/or after completion of systemic treatment, as well as potential predictors of noncredible performance in this patient population. We examined six datasets from studies (Van Dam, et al., Reference Van Dam, Schagen, Muller, Boogerd, vd Wall, Droogleever Fortuyn and Rodenhuis1998; Schagen, van Dam, Muller, Boogerd, & Lindeboom, Reference Schagen, van Dam, Muller, Boogerd and Lindeboom1999; Schagen, Muller, Boogerd, Mellenbergh, & van Dam, Reference Schagen, Muller, Boogerd, Mellenbergh and van Dam2006; Schilder et al., Reference Schilder, Seynaeve, Beex, Boogerd, Linn, Gundy and Schagen2010; Wefel, Lenzi, Theriault, Davis, & Meyers, Reference Wefel, Lenzi, Theriault, Davis and Meyers2004; Wefel, Saleeba, Buzdar, & Meyers, Reference Wefel, Saleeba, Buzdar and Meyers2010; Wefel, 2002) on the effects of either chemotherapy and/or endocrine therapy on the cognitive functioning of breast cancer patients. Occurrence of possible noncredible performance and differences in rates of possible noncredible performance between treatments and assessment times were examined. Also, differences in possible noncredible performance between patients undergoing systemic therapies and disease specific and healthy controls were investigated. Ruling out this potential explanatory factor for the observed deficits in cognitive function would strengthen the conclusion that therapies not targeting the CNS can have untoward effects on cognitive function.
Methods
Datasets
The characteristics of the different datasets are listed in Table 1 and Table 2. Datasets 1–3 were obtained at the Netherlands Cancer Institute (NKI). Datasets 4–6 were obtained at M.D. Anderson Cancer Center (MDACC). None of the datasets contain overlapping data.
Table 1 Characteristics of patients in NKI datasets
![](https://static.cambridge.org/binary/version/id/urn:cambridge.org:id:binary-alt:20160922013803-34421-mediumThumb-S1355617714000022_tab1.jpg?pub-status=live)
1Education is based on Verhage education scores 1–7 (Verhage, Reference Verhage1964), corresponding with the following US years of education; 1: 1–5 years; 2: 6 years; 3: 7–8 years; 4: 7–9 years; 5: 7–10 years; 6: 7–16 years; 7: 17–20 years.
2Fatigue based on EORTC QLQ-C30, EORTC = European Organization for Research and Treatment of Cancer: EORTC QLQ-C30 is a health-related quality of life questionnaire (Aaronson et al., Reference Aaronson, Ahmedzi, Bergman, Bullinger and Cull1993). Score on the Fatigue scale ranges from 0 to 100, with a higher score meaning more bothered by complaint.
3Anxiety and Depression based HSCL-25, HSCL-25 = the Hopkins symptom checklist-25 (Hesbacher et al., Reference Hesbacher, Rickels, Downing and Stepansky1978). Scores range from 0 to 100; a higher score indicates more complaints.
CTC = cyclophosphamide, thiotepa, carboplatin; FEC = 5-fluorouracil, epirubicin, cyclophosphamide; CMF = cyclophosphamide, methotrexate, 5-fluorouracil; CT = chemotherapy; TAM = tamoxifen; EXE = exemestane.
Table 2 Characteristics of patients in MDACC datasets
![](https://static.cambridge.org/binary/version/id/urn:cambridge.org:id:binary-alt:20160922013803-70853-mediumThumb-S1355617714000022_tab2.jpg?pub-status=live)
1 Fatigue level on a 0-10 scale with 10 being the worst.
2Anxiety based on State-Trait Anxiety Inventory (z-scale) (Spielberger et al., Reference Spielberger, Gorsuch, Lushene, Vagg and Jacobs1983), Minnesota Multiphasic Personality Inventory (MMPI) (Greene, Reference Greene1991) scale 7 (T-scores) and, Beck Anxiety Inventory (Beck, Reference Beck1990). A higher score indicated a higher level of anxiety.
3Depression based on the Beck Depression Inventory (Beck, Reference Beck1996) and, MMPI (Greene, Reference Greene1991) scale 2 (T-scores).
FAC = 5-fluorouracil, doxorubicin, cyclophosphamide; TAM = tamoxifen.
-
1. Dataset 1. The first dataset originated from a prospective study performed by Schagen et al. (Reference Schagen, Muller, Boogerd, Mellenbergh and van Dam2006). In this study, changes in cognitive performance were examined in three different breast cancer groups and a healthy control group. The breast cancer patient groups consisted of (1) patients who adjuvantly received four cycles of standard-dose chemotherapy (FEC) followed by one cycle of high-dose chemotherapy (CTC) (labeled “CTC”, n = 28); (2) patients who received five cycles of standard-dose FEC chemotherapy (labeled “FEC”, n = 39); and (3) a breast cancer control group (labeled “Control”, n = 57) comprised by stage-1 breast cancer patients for whom systemic therapy was not part of the treatment strategy. The breast cancer patients receiving chemotherapy were randomized between either high-dose or standard dose-treatment (Rodenhuis et al., Reference Rodenhuis, Bontenbal, Beex, Wagstaff, Richel, Nooij and de Vries2003). Patients who received chemotherapy also received endocrine therapy (20 mg a day for a period of 2 years). All patients underwent neuropsychological testing before chemotherapy and 6 months after treatment over a 12-month interval. The healthy control group underwent repeated testing over a 6-month interval.
-
2. Dataset 2. This is the dataset of a prospective study in which the influence of endocrine therapy on cognitive functioning was examined in postmenopausal patients with breast cancer (Schilder et al., Reference Schilder, Seynaeve, Beex, Boogerd, Linn, Gundy and Schagen2010). Patients were randomized to either tamoxifen (n = 80) or exemestane (n = 99) and underwent neuropsychological testing after breast surgery and before the start of endocrine treatment. Follow-up assessment was conducted after 1 year of endocrine treatment. A healthy control group (n = 120) underwent the same assessments with a similar time interval.
-
3. Dataset 3. This dataset describes four patient groups that were examined on average 2 years after completion of treatment (Van Dam et al., Reference Van Dam, Schagen, Muller, Boogerd, vd Wall, Droogleever Fortuyn and Rodenhuis1998; Schagen et al., Reference Schagen, van Dam, Muller, Boogerd and Lindeboom1999). In this study, the cognitive status of breast cancer patients treated with various cytotoxic regimens as part of an adjuvant therapy strategy was examined. A group of patients treated with high-dose adjuvant chemotherapy (CTC group; n = 34), two groups treated with conventional dose chemotherapy (FEC group; n = 36, CMF group; n = 39). CTC and FEC patients and a select number of randomized CMF patients (n = 20) also received endocrine treatment with Tamoxifen for 2 years. A group of breast cancer patients without any systemic treatment was also included (n = 34).
-
4. Dataset 4. The fourth dataset stems from a prospective longitudinal study performed by Wefel et al. (Reference Wefel, Saleeba, Buzdar and Meyers2010). In this study, the incidence, nature, and chronicity of cognitive dysfunction in 42 breast cancer patients undergoing adjuvant chemotherapy according to the FAC regimen with or without paclitaxel was examined. Patients were tested before, during, and at two time points after chemotherapy.
-
5. Dataset 5. This set contains data from a prospective longitudinal study performed by Wefel et al. (Reference Wefel, Lenzi, Theriault, Davis and Meyers2004). In this study, cognitive changes of breast cancer patients who received chemotherapy with FAC with or without methotrexate and vinblastine was examined over time (n = 18). Patients underwent neuropsychological testing before and at three time points after completion of chemotherapy.
-
6. Dataset 6. The last dataset originates from a prospective longitudinal study performed by Wefel (2002) on cognitive changes in breast cancer patients treated with endocrine therapy (n = 62). Patients underwent neuropsychological testing after their primary medical treatment (e.g., surgery, radiation, chemotherapy) and before starting Tamoxifen. Follow-up assessment was performed twice after starting endocrine treatment.
We searched Pubmed, EMBASE, and PsychInfo (January 1980–January 2013) and reference lists of articles in the English language for embedded PVTs that corresponded with the neuropsychological tests used in the six studies. Search terms were: (noncredible performance) AND (embedded symptom or performance validity) AND (malingering) AND (embedded effort) in combination with the neuropsychological instruments used in our datasets.
The following tests were determined to contain embedded PVT indices: California Verbal Learning Test (CVLT), Controlled Oral Word Association (COWA), WAIS-III subtest Digit Span, Rey-Osterrieth Complex Figure Test (ROCFT), Rey Auditory Verbal Learning Test (RAVLT), Stroop test, and Trail Making Test (TMT). We also examined the Fake Bad Scale (FBS) from the Minnesota Multiphasic Personality Inventory (MMPI).
Table 3 summarizes the identified performance validity indices, the operational definitions of noncredible performance for each index, and their sensitivity and specificity. For each test, the embedded PVT index was evaluated based on cutoffs previously described in published literature. Additionally, measures of fatigue, depression, and anxiety were obtained in most of the study datasets and were examined in relation to PVT performance.
Table 3 Characteristics of the studies from which cutoffs were determined
![](https://static.cambridge.org/binary/version/id/urn:cambridge.org:id:binary-alt:20160922013803-67161-mediumThumb-S1355617714000022_tab3.jpg?pub-status=live)
1Study participants include participants instructed to simulate, patients with and without credible performance, and patients evaluated in context of litigation. LTCR = Long term cued recall; NA = not available
Measures
-
1. California Verbal Learning Test. The Dutch version of the CVLT (Delis, Kramer, Kaplan, & Ober, Reference Delis, Kramer, Kaplan and Ober1987) was used in dataset 1. The indices calculated in the Dutch version are different compared to the indices in the original CVLT (Mulder, Dekker, & Dekker, 1996). However, the administration and the data obtained are identical. Millis and Putman (Reference Millis and Putman1995) identified the following five indices suitable for examining noncredible performance: CVLT Total (sum of the scores on List A trial 1 to 5); Long Delay Cued Recall (number of correct responses on List A LDCR), Discriminability (score on the following function [1-(false positives+misses)/44] × 100); Recognition Hits (number of correct answers on List A Long Term Recognition, using the number of false positives and misses on List A Long Term Recognition) and Discriminant function score ((CVLT Total raw score)* (−0.00406) + (LDCR raw score)*(0.21099) + (Discriminability)*(0.04988) − 5.06399).
-
2. Fluency. Various versions of a letter fluency test were used among the different datasets. In dataset 2, the Dutch letter combination DAT was used (Schmand, Groenink, & van den Dungen, Reference Schmand, Groenink and van den Dungen2008) that corresponds with the US FAS-test. In dataset 4 to 6, the letter combinations of the Multilingual Aphasia Examination were used (Benton, Hamsher, & Sivan, Reference Benton, Hamsher and Sivan1994). Curtis, Thompson, Greve, and Bianchini (Reference Curtis, Thompson, Greve and Bianchini2008) defined a cutoff of a T-score below 32 as an indicator of possible noncredible performance.
-
3. Digit Span. Using the WAIS-III (Wechsler, Reference Wechsler1997) or WAIS-R (Wechsler, Reference Wechsler1981) Digit Span subtest in dataset 3 to 6, the Reliable Digit Span (RDS) was calculated. The RDS is the sum of the longest span forward and the longest span backward. An RDS score below 7 has been suggested as a cutoff for possible noncredible performance (Mathias, Greve, Bianchini, Houston, & Crouch, Reference Mathias, Greve, Bianchini, Houston and Crouch2002).
-
4. Rey-Osterrieth Complex Figure Test. The ROCFT (Rey, Reference Rey1941) was partly administrated in datasets 3 and 6. Dataset 3 contained information about the copy score and the 3-min delay score of the ROCFT. Dataset 6 contained information only about the copy score. Reedy et al. (2002) identified a copy score of 26 or lower and/or a 3-min delay score of 10 or lower as an indication of possible noncredible performance.
-
5. Rey Auditory Verbal Learning Test. Datasets 2 and 3 provided information about the recognition score of the RAVLT (van den Burg, Saan, & Deelman, Reference Van den Burg, Saan and Deelman1985). A recognition score of 9 or lower has been recommended as an indicator of possible noncredible performance (Boone, Lu, & Wen, Reference Boone, Lu and Wen2005).
-
6. Stroop test. The Dutch datasets 1 to 3 contained information on the Stroop color-word test (Hammes, Reference Hammes1978). Based on a study by Arentsen et al. (Reference Arentsen, Boone, Lo, Goldberg, Cottingham, Victor and Zeller2013) in which the superiority was demonstrated of Word Naming trials and Color Naming trials over an inverted Stroop effect as an effective PV measure, we choose the cutoff of ≥66 s for Word Naming and ≥93 s for Color Naming as indicators of possible noncredible performance, maintaining adequate specificity and sensitivity. In the Arentsen paper, the Comalli version of the Stroop Color Word Test was used. This version is highly comparable to the version used in the Dutch datasets (i.e., identical number of trials and items). The Dutch version included 4 response options compared to 3 in the Comalli version.
-
7. Trail Making Test (TMT). All datasets contained information on Part A and Part B of the TMT (Reitan, Reference Reitan1958). Iverson, Lange, Green, and Franzen (Reference Iverson, Lange, Green and Franzen2002) defined a completion time of 63 s or higher on Part A and/or a completion time of 200 s or higher on Part B as an indicator of possible noncredible performance.
-
8. Hiscock and Hiscock. Hiscock and Hiscock (Reference Hiscock and Hiscock1989) developed a stand-alone PVT using the forced-choice method. One dataset contained information from a modified Hiscock & Hiscock test where 20 cards were shown for 5 s with a 5-s delay between presentation and recognition conditions. A score below chance is regarded as an indicator of possible noncredible performance.
-
9. Minnesota Multiphasic Personality Inventory (MMPI). The FBS developed by Lees-Haley, English, and Glenn (Reference Lees-Haley, English and Glenn1991)and derived from items on the MMPI (Greene, Reference Greene1991) was intended to detect self-reported symptom magnification. Larrabee's (Reference Larrabee2003)regression formula was used to estimate the FBS score from the 370-item protocol for all patients in Dataset 5. A FBS score of ≥26 was used to identify possible symptom exaggeration (Nelson, Parsons, Grote, Smith, & Sisung, Reference Nelson, Parsons, Grote, Smith and Sisung2006).
-
10. Years of education. In the Dutch datasets 1 to 3, Verhage education scores (Verhage, Reference Verhage1964) were used, according to Dutch standards (Bouma, Mulder, Lindeboom, & Schmand, Reference Bouma, Mulder, Lindeboom and Schmand2012). The US datasets 4 to 6 describe years of education.
-
11. Fatigue. In datasets 1 to 3, the fatigue scale of the EORTC-QLQ-30 (Aaronson, Ahmedzi, Bergman, Bullinger, & Cull, Reference Aaronson, Ahmedzi, Bergman, Bullinger and Cull1993) was administrated. The fatigue scale consists of three items. Scores range from 0 to 100, with higher scores indicating more symptoms. In dataset 6 patients were asked to rate their fatigue level on a 0–10 scale with 10 being the worst.
-
12. Anxiety and Depression. In datasets 1 to 3, the Hopkins Symptom Checklist-25 (HSCL-25) (Hesbacher, Rickels, Downing, & Stepansky, Reference Hesbacher, Rickels, Downing and Stepansky1978) was used to determine anxiety and depression. The HSCL-25 is a self-report measure of psychological distress. It includes an anxiety and depression subscale. The instructions inquire about the intensity of symptoms in the previous week. Answers are scored on a scale from 1 (never) to 4 (always). A higher score indicates greater distress.
In dataset 4, the State-Trait Anxiety Inventory (STAI) (Spielberger, Gorsuch, Lushene, Vagg, & Jacobs, Reference Spielberger, Gorsuch, Lushene, Vagg and Jacobs1983) was used to determine anxiety. The STAI is a 20-item self-report measure to assess levels of situation-related anxiety. Items are rated on a 4-point Likert-type scale. The score is then converted into a Z-scale distribution based on comparisons with a healthy non-anxious sample. To assess depression, the Beck Depression Inventory (BDI) (Beck, Reference Beck1996) was used. The BDI is a 21-item self-report questionnaire measuring symptoms of depression. Items are rated on a 4-point scale. A higher score indicates greater depressive symptoms.
In dataset 5, the MMPI was used to determine anxiety and depression. MMPI scale-2 measures depression and MMPI scale-7 measures anxiety. Each scale is converted into a standardized T-score. A higher score indicates more symptoms.
In dataset 6, the BDI (Beck, Reference Beck1996) was used to determine depression. To determine anxiety the Beck Anxiety Inventory (BAI) (Beck, Reference Beck1990) was used. The BAI is a 21-item self-report measure to assess the cognitive aspects of anxiety. Items are rated on 4-point scale. A higher score indicates higher anxiety levels.
Statistical Analysis
The Statistical Package for Social Sciences (SPSS) version 20.0 was used for all statistical analyses. For each PVT index, a dichotomous variable was created indicating performance above or below the cutoff. The percentage of patients and controls with performances below the cutoff for each test and between group differences in these percentages were analyzed using χ2 tests. Possible noncredible performance at the subject level was operationally defined as performance below the cutoff on three or more PVT indices. As discussed in Larrabee (Reference Larrabee2007), combining PVTs improves specificity with evidence that failure on three indicators essentially ensures no false positive classifications. We applied this strict criterion to the measures as we primarily had these available in the datasets, the cutoffs for the measures were not derived in our study population, the subjects in our studies did not have incentive for external gain, and we wished to minimize risk of falsely declaring noncredible performance (as opposed to cognitive dysfunction) in a population where there is evidence of brain dysfunction associated with treatment. Only one embedded PVT index was counted for tests that yielded multiple embedded PVT measures (Larrabee, Reference Larrabee2008) with preference given to including PVT indices below the cutoff. Logistic regression analyses were performed to examine group differences. Additional factors including age, education, fatigue, anxiety and depression were also examined. Assuming a low but non-negligible base rate of noncredible performance of 10%, likelihood ratios, pre- and post-test odds and posttest probabilities were calculated using the sensitivity and specificity of the applied cutoffs as depicted in the literature, according to the method described by Larrabee (Reference Larrabee2008). Using chained likelihood ratios, comparisons were made between posttest probabilities of the different combinations of PVTs in each dataset.
Results
In total, 534 breast cancer patients and 214 healthy controls were included in the current analysis. The percentage of patients that performed below criterion on each PVT index and the percentage of subjects classified as exhibiting possible noncredible performance are listed in Table 4 (datasets 1 to 3) and Table 5 (datasets 4 to 6). Overall, only 1 patient was classified as exhibiting possible noncredible performance. None of the control subjects met the criteria for possible noncredible performance.
Table 4 Percentage of patients in NKI datasets performing below PVT criterion
![](https://static.cambridge.org/binary/version/id/urn:cambridge.org:id:binary-alt:20160922013803-89532-mediumThumb-S1355617714000022_tab4.jpg?pub-status=live)
1Missing data on 1 patient.
2Missing data on 2 patients.
*Significant differences between groups (p < .05).
# % of subjects who fail TMT PV measures when cutoffs are defined as 3 SD below age corrected normative mean (i.e., TMT A >87 s; TMT B > 260 s.)
LTCR = Long term cued recall; Stroop a = word-naming; Stroop b = color-naming. CTC = cyclophosphamide, thiotepa, carboplatin; FEC = 5-FU, epirubicin, cyclophosphamide; CMF = cyclophosphamide, MTX, 5-FU;TAM = tamoxifen; EXE = exemestane.
Table 5 Percentage of patients in MDACC datasets performing below PVT criterion
![](https://static.cambridge.org/binary/version/id/urn:cambridge.org:id:binary:20160921003542159-0582:S1355617714000022:S1355617714000022_tab5.gif?pub-status=live)
1Missing data on 1 patient.
2Missing data on 2 patients.
FAC = 5-fluorouracil, doxorubicin, cyclophosphamide TAM = tamoxifen.
In dataset 1, 0–7.1% of participants performed below criterion on any PVT. PVTs with at least one patient meeting criterion included: CVLT Total index, CVLT Recognition index, CVLT Recognition, CVLT Discriminability, CVLT Long Term Cued Recall index, Stroop A and Stroop B, and TMT Part A and B. At both assessment times, no group differences were found on any PVTs or on the frequency of possible noncredible performance.
In dataset 2, 0–21.2% of participants performed below criterion on a PVT. PVTs with at least one patient meeting the criterion included: COWA, Stroop A and B and TMT Part A and B. At the first assessment, significant differences between groups were found on COWA (χ2 = 8.1; p = .017; Tamoxifen > Control, p = .032), TMT Part A (χ2 = 10.1; p = .006; Exemestane > Control, p = .003), and TMT Part B (χ2 = 6.2; p = .044; Exemestane > Control, p = .025). Logistic regression analysis indicated no associations between age, level of education, fatigue, anxiety or depression, and PVT classification. At the second assessment, no significant group differences were found. For this particular set of older subjects we conducted an exploratory analysis of the TMT using an age-adjusted cutoff of 3 standard deviations below the normative mean. This cutoff corresponded to a TMT Part A performance > 87 s, and to a TMT Part B performance of > 260 s. Table 4 shows the rates of subjects failing this age adjusted PVT classification.
In dataset 3, 0% to 7.7% of participants performed below criterion on a PVT. PVTs with at least one patient meeting the criterion included: ROCFT Delayed Recall, Stroop A and B and the TMT Part A. No significant group differences on any PVTs or on frequency of possible noncredible performance were found.
In dataset 4, 0% to 4.8% of patients performed below criterion on a PVT. At the first assessment, PVTs with at least one patient performing below criterion included: COWA, Digit Span, and TMT Part A and B. One patient met the definition of noncredible performance. At the second, third, and fourth assessment, patients performed below the PVT criterion on the TMT Part A, the Digit Span and on the COWA, respectively.
In dataset 5, no patients showed performance below criterion on a PVT or the FBS scale at the first and third assessment time. At the second assessment, patients performed below criterion on Digit Span. Additionally, one patient at this time point scored above the cutoff for the FBS. However, examination of the cognitive test results found her to be stable or improved on 21 of 22 cognitive tests compared to the first assessment.
In dataset 6, 0% to 12.9% of patients performed below criterion on a PVT. PVTs with at least one patient below criterion included: COWA, ROCFT Copy, and TMT Part A and B.
Table 6 shows the diagnostic statistics and probabilities of noncredible performance for failure on three or more PVT measures. The post-probability of noncredible performance is on average well above .95, ranging from .918 to .999. Inspection of the data suggests that this conclusion remains true when taking expected intercorrelations between PVTs into account; that is, post-test probabilities are still well above .95 for combinations of PVTs that are derived from different performance domains.
Table 6 Diagnostic statistics and probabilities of noncredible performance for combinations of PVTs
![](https://static.cambridge.org/binary/version/id/urn:cambridge.org:id:binary-alt:20160922013803-15046-mediumThumb-S1355617714000022_tab6.jpg?pub-status=live)
Tot = California Verbal Learning Test Total; rec = CVLT recognition; ltc = CVLT Long Term Cued Recall; dis = CVLT discriminability. for = CVLT Formula; tma = Trail Making Test part A; tmb = Trail Making Test part B; sta = Stroop Word-naming card; stb = Stroop Color-naming card; cow = COWA T; rav = Rey Auditory Verbal Learning Test; rds = Reliable Digit Span; cop = Rey-Osterreith Complex Figure Test Copy; del = ROCFT 3-min delay.
Discussion
Cognitive changes in patients with cancer outside the CNS (e.g., breast cancer) and who receive therapy that is not directed at the CNS (e.g., chemotherapy) have been reported to demonstrate cognitive dysfunction on neuropsychological tests. However, there are no reports to date examining the potential contribution of noncredible performance. Ruling out this potential explanatory factor for the observed deficits in cognitive function before treatment and the decline in cognitive function after treatment would strengthen the conclusion that non-CNS cancer and therapies not targeting the CNS can have untoward effects on cognitive function.
The current study examined 534 breast cancer patients and 214 healthy controls from six datasets of studies on the effects of chemotherapy and/or endocrine therapy on cognitive functioning to determine if there was evidence for noncredible performance. Out of 917 assessments, only one patient was classified as exhibiting suspected noncredible performance at the pre-chemotherapy time point. No evidence of noncredible performance was found for any other time points in any other patient. No healthy controls were classified as exhibiting suspected noncredible performance at any time point.
Patients’ report of mood disturbance and affective distress was, as expected, quite varied with reports ranging from no distress to severe symptoms of depression/anxiety. The occurrence of noncredible performance was near absent despite the presence of an array of psychological and clinical symptoms that could influence effort/motivation, that is, an unbiased response. Similarly, there was no evidence that the current sample of healthy controls, who may not have any personal investment in the research study at hand, was prone to noncredible performance. These findings should not be interpreted as indicating that psychological and clinical symptoms have no impact on test performance, as this was not addressed by these analyses. However, in prior publications based on these subjects (Schagen et al., Reference Schagen, Muller, Boogerd, Mellenbergh and van Dam2006; Schilder et al., Reference Schilder, Seynaeve, Beex, Boogerd, Linn, Gundy and Schagen2010; Schagen et al., Reference Schagen, van Dam, Muller, Boogerd and Lindeboom1999; Van Dam et al., Reference Van Dam, Schagen, Muller, Boogerd, vd Wall, Droogleever Fortuyn and Rodenhuis1998; Wefel et al., Reference Wefel, Lenzi, Theriault, Davis and Meyers2004; Wefel et al., Reference Wefel, Saleeba, Buzdar and Meyers2010) as well as other studies in this area (O'Farrell et al., Reference O'Farrell, Mackenzie and Collins2013), the relationship between distress, fatigue, and cognition is minimal.
There are several limitations to the current study. This was a retrospective analysis; the original studies were not designed to specifically assess noncredible performance; the majority of PVT measures were “embedded” rather than stand alone; and, for some embedded effort measures the preferred test condition was not administered limiting the sensitivity and specificity of those measures.
Furthermore, many of the PVT cutoffs were established in much younger populations with known or suspected mild traumatic brain injury and it is unlikely that the performance characteristics of these cut points behave as well in our older medically ill population (Heilbronner et al., 2009). This is especially evident in Dataset 2 where the subjects on average were in their mid to late-60 s and performance below the criterion for the Trail Making Tests were much more frequent. The criterion for the Trail Making Tests was derived from a much younger population that were on average in their early 30 s. Measures dependent on processing speed (such as the Trail Making Tests) are known to be more vulnerable to age-related cognitive decline (Robins Wahlin, Bäckman, Wahlin, Winblad, Reference Robins Wahlin, Bäckman, Wahlin and Winblad1996; Rodriguez-Aranda & Martinussen, Reference Rodriguez-Aranda and Martinussen2010). Thus, using the Trail Making Tests as an embedded effort measure with conventional cutoff criterion such as described in this study leads to an overestimation of noncredible performance. Furthermore, several studies that examined the TMT as an embedded PVT have reported unsatisfactory diagnostic accuracy and have cautioned against its use (Busse & Whiteside, Reference Busse and Whiteside2012; Iverson et al., Reference Iverson, Lange, Green and Franzen2002; Powell, Locke, Smigielski, & McCrea, Reference Powell, Locke, Smigielski and McCrea2011). Below criterion performance on the ROCFT was the next most common pattern. However, when compared to the results of Reddy et al. (2013) from which the cutoffs were derived, there was no difference between rates of possible noncredible performance (12.9% in the current study compared to 11% in Reddy et al.).
An additional potential limitation is that the PVT cutoffs we used were based on studies conducted on English speaking participants in the United States. However, half the datasets were from studies conducted in the Netherlands that used different versions or translated versions of tests. One particular notable example of this difficulty is for the measures of lexical fluency. The cutoff for the lexical fluency measures was based on the English version of the FAS lexical fluency test (Curtis et al., Reference Curtis, Thompson, Greve and Bianchini2008). The US datasets contained the Multilingual Aphasia Examination Controlled Oral Word Association test, while the Dutch datasets contained the letter combination DAT, which corresponds with the US FAS-lexical fluency measure. In addition to possible cross-cultural differences in lexical fluency performances on these tests, the English FAS and COWA have been demonstrated not to be equivalent (Barry, Bates, & Labouvie, Reference Barry, Bates and Labouvie2008). Although the cutoff was based on the standardized score rather than the raw score, it is unclear if the diagnostic accuracy of this cutoff remains the same on different fluency measures and across cultures. That being said, the frequency of below criterion performance on the fluency tests was not dissimilar across most datasets.
Slick, Sherman, and Iverson (Reference Slick, Sherman and Iverson1999) proposed two or more types of evidence of noncredible performance from neuropsychological testing in combination with presence of a substantial external incentive as criteria of probable malingered neurocognitive dysfunction. However, the breast cancer participants and healthy controls in the datasets we investigated did not have any known external incentives that may have contributed to noncredible performance.
In his influential study of 2008, Larrabee demonstrates that ranges of post-test probability of noncredible performance vary greatly for one failed PVT, using chaining of likelihood ratios. This variation lessens for two failed PVT and is quite restricted when three PVT are failed. He applied this method to a sample of litigating or compensation-seeking subjects and found posttest probabilities of malingering ranging from .989 to .995 at various base rates. This suggests that even at low base rates of non-credible performance, failure on three or more PVT provides a high probability that the diagnosis of noncredible performance is correct. In the current study, we applied this method to our data, where the base rate for noncredible performance is low, and observed similar high chances of detecting noncredible performance, with post-test probabilities approaching nearly 1.00.
Based on the preponderance of the evidence, there is no suggestion that breast cancer patients who chose to participate in research studies examining cognitive function exhibit noncredible performance. Similarly, healthy controls who chose to participate in research studies examining cognitive function also did not exhibit test performances indicative of noncredible performance. It does not appear to be essential for research studies within this population to use PVTs on a routine basis.
Acknowledgment
The information in this manuscript has not been previously published either electronically or in print. The authors have no financial conflicts of interest. We received no financial support for this project.