INTRODUCTION
The ability to produce the correct words while speaking is a critical human function that, under normal circumstances, feels effortless to the speaker. Nevertheless, from time to time, most healthy adults experience word finding difficulty or “tip-of-the-tongue” (TOT) phenomena, in which a known word that is ordinarily retrieved with ease is temporarily inaccessible. Whereas occasional word finding or “naming” difficulty is not cause for concern, more severe naming difficulty is a hallmark, or at least a common symptom of several age-related neurodegenerative disorders including Alzheimer’s disease, primary progressive aphasia, frontotemporal dementia, Lewy body disease, and vascular dementia (Ferris & Farlow, Reference Ferris and Farlow2013; Kempler & Goral, Reference Kempler and Goral2008; Vuorinen et al., Reference Vuorinen, Laine and Rinne2000). Thus, naming assessment is a core component of neuropsychological assessment for adults in general, and for older adults, in particular.
Naming in Normal Aging
In contrast to most aspects of language that remain stable throughout adulthood, there is some evidence of age-related decline in naming. Although naming accuracy shows only subtle, if any decline with increasing age (Burke et al., Reference Burke, Worthley and Martin1988; Connor et al., Reference Connor, Spiro, Obler and Albert2004; Goodglass, Reference Goodglass, Obler and Albert1980; Goulet et al., Reference Goulet, Ska and Kahn1994), some decrements have been identified as prolonged response latency in naming of objects (Au et al., Reference Au, Joung, Nicholas, Obler, Kass and Albert1995; Connor et al., Reference Connor, Spiro, Obler and Albert2004; Hadac et al., Reference Hadac, Brozova, Tintera and Krsek2007; MacKay et al., Reference MacKay, Connor, Albert and Obler2002), actions (Au et al., Reference Au, Joung, Nicholas, Obler, Kass and Albert1995; Ramsay et al., Reference Ramsay, Nicholas, Au, Obler and Albert1999), and auditory sounds (Hanna-Pladdy & Choi, Reference Hanna-Pladdy and Choi2010). These declines are consistent with other established cognitive changes associated with normal aging, including reductions in working memory, processing speed, episodic memory, and executive functioning (e.g., (Harada et al., Reference Harada, Natelson Love and Triebel2013; Murman, Reference Murman2015), and a virtually linear decline in processing efficiency that begins in early adulthood (Salthouse, Reference Salthouse2003, Reference Salthouse2009, Reference Salthouse2010). Thus, the availability of valid naming assessment tools is crucial for differentiating normal changes associated with healthy aging and pathological naming difficulty.
Modality-Specific Naming: Rationale
The Boston Naming Test (BNT) (Kaplan et al., Reference Kaplan, Goodglass and Weintraub1983), consisting of 60 line-drawn objects, is undoubtedly the most frequently used measure of naming ability throughout the age span. The BNT has had profound influence on the neuropsychology of naming and has been a cornerstone in the assessment of productive language across numerous neurological disorders including stroke, epilepsy, dementia, and virtually all aphasic syndromes. In addition to its clinical applications, the BNT has been the most widely used measure of naming across decades of neuropsychological and neuroscientific investigations of brain and language. However, the BNT was developed in the clinical context of aphasia, and although visual object naming is sensitive to the anomic component of these syndromes, visual naming assesses only one mode of access into the semantic system.
In our early work with temporal lobe epilepsy (TLE) patients, we observed that patients rarely complained of difficulty naming objects in their environment; rather, they complained of difficulty retrieving words during everyday discourse. We, and others, also observed clinically that some patients who complained of word finding difficulty and demonstrated such difficulty in conversation had unimpaired scores on the BNT (personal communications; E. Strauss, June, 1998; M.R. Trennery, September, 1999). Preliminary work testing several auditory verbal word retrieval tasks found auditory description naming to be the most sensitive to left temporal, structural, and electrophysiological abnormalities that underlie left temporal lobe seizures (Hamberger & Tamny, Reference Hamberger and Tamny1999). Moreover, subjective ratings of word finding severity correlated with auditory naming, but not visual naming performance (Bell et al., Reference Bell, Seidenberg, Hermann and Douville2003; Hamberger & Seidel, Reference Hamberger and Seidel2003). Considering all the above, we reasoned that utilization of auditory naming together with visual naming assessment might provide valuable clinical information, including the ability to directly compare auditory and visual object naming performance. Hence, we developed the Auditory Naming Test (ANT) and complementary Visual Naming Test (VNT), based on a normative group of young- to middle-aged adults (Hamberger & Seidel, Reference Hamberger and Seidel2003).
Coinciding with and subsequent to this initial work, converging evidence from studies involving cortical stimulation mapping (Hamberger et al., Reference Hamberger, Goodman, Perrine and Tamny2001; Malow et al., Reference Malow, Blaxton, Susumu, Bookheimer, Kufta, Figlozzi and Theodore1996), functional neuroimaging (Bookheimer et al., Reference Bookheimer, Zeffiro, Blaxton, Gaillard, Malow and Theodore1998; Hamberger et al., Reference Hamberger, Habeck, Pantazatos, Williams and Hirsch2014; Tomaszewki-Farias et al., Reference Tomaszewki-Farias, Harrington, Broomand and Seyal2005), and electrocorticography (Cervenka et al., Reference Cervenka, Corines, Boatman-Reich, Eloyan, Sheng, Franaszczuk and Crone2013) has shown that naming based on visual object versus auditory–verbal cueing is supported by neuroanatomically distinct brain areas. Furthermore, auditory and visual naming have been shown to be differentially affected by lesion location (Hamberger & Seidel, Reference Hamberger and Seidel2009) and surgical resection (Hamberger et al., Reference Hamberger, Seidel, Goodman and McKhann2010), collectively, supporting the notion of distinct neural substrates underlying auditory and visual naming, as well as the clinical utility of using both measures for localization of dysfunction and characterization of naming ability.
Regarding cognitive aging, two studies have shown better identification of naming deficits associated with dementia based on ANT performance relative to VNT or BNT performance (Hirsch et al., Reference Hirsch, Cuesta, Jordan, Fonzetti and Levin2016; Miller et al., Reference Miller, Finney, Meader and Loring2010). Although these findings suggest potential clinical utility of the ANT in older individuals, these studies were limited by the absence of age-appropriate normative data, and the use of stimuli that might not be optimal in an older population (e.g., “a small crease in the skin from aging,” “what an old man uses to walk with”), as these might elicit emotional reactions that could affect performance.
In this context, we also aimed to improve upon the current standard of naming assessment, the BNT, with respect to performance parameters and stimuli by addressing the following factors:
-
1) Timing and phonemic cueing: Although delayed responding and reliance on phonemic cueing represent classic manifestations of word finding difficulty (Goodglass et al., Reference Goodglass, Theurkauf and Wingfield1984), untimed accuracy is the sole performance metric of the BNT and other traditional naming tests (Benton & Hamsher, Reference Benton and Hamsher1989; Kaplan et al., Reference Kaplan, Goodglass and Weintraub1983). While the BNT includes phonemic cueing, no normative data are provided for interpretation. We, and others, have found that incorporating response latency and reliance on phonemic cueing into the performance measures increases the sensitivity of the assessment (Bell et al., Reference Bell, Seidenberg, Hermann and Douville2003; Hamberger et al., Reference Hamberger, MacAllister, Seidel, Busch, Salinas, Klaas and Smith2019; Hamberger & Seidel, Reference Hamberger and Seidel2003).
-
2) Vocabulary level: Although naming is dependent on vocabulary knowledge, naming refers to the retrieval and production of words from an established mental lexicon; a naming test should not be a vocabulary test (Hamberger, Reference Hamberger2015). BNT performance is influenced significantly by education level and vocabulary knowledge in both younger (Baron, Reference Baron2004; Mitrushina et al., Reference Mitrushina, Boone, Razani, D’Elia, Mitrushina, Boone, Razani and D’Elia2005; Randolph et al., Reference Randolph, Lansing, Ivnik, Munro Cullum and Hermann1999; Roberts & Hamsher, Reference Roberts and Hamsher1984), and older adults (Melikyan et al., Reference Melikyan, Corrada, Dick, Whittle, Paganini-Hill and Kawas2019; Yochim et al., Reference Yochim, Kane and Mueller2009), due, at least in part, to inclusion of relatively uncommon items (e.g., yoke, abacus) (Hawkins & Bender, Reference Hawkins and Bender2002; Martielli & Blackburn, Reference Martielli and Blackburn2016; Schmitter-Edgecombe et al., Reference Schmitter-Edgecombe, Vesneski and Jones2000). In selecting potential naming test items, we chose target-item names with mid-range spoken word frequency, as low-frequency items would elicit errors in examinees with limited education and vocabulary due to lack of familiarity, while high-frequency items would be named rapidly by virtually all, and therefore, lack sensitivity. Although likely not possible to fully eliminate the influence of vocabulary, we attempted to reduce its influence such that assessment would reflect, principally, the integrity of cognitive processes that underlie targeted word retrieval.
-
3) Quality of visual stimuli. The BNT and other visual naming tests, developed when digitized color photographs were not readily available and reproducible, are comprised of line drawings, which can elicit errors due to misperception rather than naming difficulty (e.g., “pretzel” perceived as snake) (Martielli & Blackburn, Reference Martielli and Blackburn2016). We attempted to reduce the perceptual component of the task using digitized color images. Additionally, the use of color has been shown to mitigate the influence of literacy on naming (Reis et al., Reference Reis, Faísca, Ingvar and Petersson2006).
Building on knowledge and experience gained from the ANT and VNT for younger adults and more recently, for children (Hamberger & Seidel, Reference Hamberger and Seidel2003; Hamberger et al., Reference Hamberger, Seidel, MacAllister and Smith2018), this normative study developed auditory and visual naming tests for older adults, aged 56–100 years. Consistent with other aging studies of naming, we expected reductions in performance with increasing age. Considering the likely influence of education level, we hypothesized that both ANT and VNT scores would correlate with education. Accordingly, we anticipated that it would be necessary to incorporate education level in the normative data.
METHODS
Participants
Participants were 407 healthy adults, aged 56–100 years, with 43–67 participants per 5-year age group (56–60, 61–65, 66–70, 71–75, 76–80, 81–85, and 86–90 years), and 20 individuals aged 91–100 years, recruited via advertising at Columbia University Medical Center, word of mouth, posted notices in community centers in the New York metropolitan area, and internet websites (www.researchmatch.org, www.volunteermatch.org, Craigslist). All participants were required to be native English speakers or to have learned English by age of 5 years and been educated in English. Telephone pre-screening queried prospective participants about their neurological, psychiatric, and academic history. Individuals with a history of learning disabilities, language problems, head injury, stroke, or other neurological disorders were excluded. Corrected-to-normal vision and hearing were required for inclusion. All individuals were administered three screening measures; to screen for dementia or mild cognitive impairment (Petersen et al., Reference Petersen, Smith, Waring, Ivnik, Tangalos and Kokmen1999), participants with Mini Mental Status Exam (MMSE)<24, Mattis DRS-2 (Jurica et al., Reference Jurica, Leittan and Mattis2001) score below 125 (Fields et al., Reference Fields, Machulda, Aakre, Ivnik, Boeve, Knopman, Petersen and Smith2010), or estimated IQ (Wechsler, Reference Wechsler2011) <70 were excluded (Brown et al., Reference Brown, Lacomblez, Landwehrmeyer, Bak, Uttner and Dubois2010). This study was completed in accordance with the Helsinki Declaration and was approved by the Institutional Review Board at Columbia University Medical Center.
Selection and Screening of Stimuli
Auditory naming: We identified 19 descriptions from the earlier, 50-item version that appeared most appropriate for use with older adults (i.e., eliminating items such as “wrinkles” and “cane”) (Hamberger & Seidel, Reference Hamberger and Seidel2003) and generated 41 new potential descriptions. Descriptions were required to be presentable within 4 s, with low likelihood that target words could be named before the final word (e.g., “an object used for weighing”). Visual naming: None of the original 50 line drawings were selected for potential use in the older adult VNT, which would be comprised of digitized photographs. We required pictured objects, selected from bigstockphoto.com, to be “isolated” on a white background for visual uniformity across items and to eliminate contextual cues, as degree of context would be difficult to control across items (Fig. 1).
For both the ANT and VNT, we selected and subsequently screened target words that would be familiar (based on word frequency) to most individuals. For the first screening, 12 healthy, independent adults, aged 62–86 years (mean age: 73.3 ± 8.6 years; mean years education: 14.6 ± 3.3; gender: 7 women; mean modified MMSE (mMMSE; (Stern et al., Reference Stern, Sano, Paulson and Mayeux1987): 53.2 ± 1.9 (all >50/57)) were administered the descriptions and pictures by a trained assistant. Based on accuracy and response time (RT), we selected 50 stimuli each for the ANT and VNT that best met criteria: 1) correct response from a minimum of 11/12 subjects and 2) median RT<2 s.
Interim analyses were conducted on the 50 ANT and 50 VNT stimuli following data collection with 200 participants, aged 56–90 years. Items showing poor name agreement (i.e., <90% same response) and median responses >2 s, resulted in removal of 11 ANT and 8 VNT items. Additionally, six VNT and three ANT items were eliminated due to 100% accuracy and rapid scores (<2 s) by every participant (e.g., ANT: towel, kitchen; VNT: bicycle, umbrella), as these items provided minimal variance. This process resulted in 36 stimuli each for the ANT and VNT that best met these criteria: 1) correct response from ≥90% of subjects and 2) median RT < 2 s. When target words were not named in 20 s or following a phonemic cue, we asked participants if they knew the target word. “Don’t know” responses were extremely infrequent (<1%). Additionally, word frequencies, based on spoken English, were obtained from http://subtlexus.lexique.org, and a t test comparing mean word frequency for target words between tasks (mean ANT: 11.2 ± 13.3; mean VNT: 14.7 ± 22.0) indicated no significant difference (t(70) = −.82, p = 0.41). Furthermore, there was no difference in the distributions of word frequency between tasks, as assessed by the Kolmogorov–Smirnov test (p = 0.99). Also, word frequency did not differ between living (ANT: 15 items; VNT: 18 items) and nonliving items (ANT: 21 items; VNT: 18 items) in either task (ANT: t(34) = −.35, p = .72; VNT: t(34) = −.15, p = .88). No target words were repeated across tasks (to avoid priming effects), and no items overlapped with BNT items. Appendix A lists ANT stimuli and VNT item names.
Procedure
All participants were administered the two naming tasks, the two-subtest Wechsler Abbreviated Scale of Intelligence-II (WASI-II; Wechsler, Reference Wechsler2011), the MMSE (Folstein et al., Reference Folstein, Folstein and McHugh1975), Mattis DRS-2 (Jurica et al., Reference Jurica, Leittan and Mattis2001), and the BNT (Kaplan et al., Reference Kaplan, Goodglass and Weintraub1983). These scores and demographic information were obtained to characterize the normative sample (Table 2). Order of naming tasks was counterbalanced across subjects. Standardized instructions were read aloud by a trained examiner. For the ANT, timing via stopwatch began when the examiner completed the final word of the description and terminated with the subject’s correct response. For the VNT, timing began with picture presentation and terminated with the participant’s correct response. Participants were permitted a maximum of 20 s to provide a correct response. When the participant provided an incorrect response(s) before the time limit, the examiner queried, “What else?” If the participant failed to provide the correct response within the allotted 20 s, these trials were coded as “incorrect,” and the examiner provided a phonemic cue (e.g., “ha” for “hammer”), and allotted five additional seconds for a correct response before initiating the next trial.
A subset of 49 participants was retested approximately 1 month after the initial testing for assessment of test–retest reliability.
Performance Measures
Performance measures consisted of those used in the original adult tests plus additional measures based on clinical experience and included in the recently published Children’s Naming Tests (Hamberger et al., Reference Hamberger, Seidel, MacAllister and Smith2018). Original measures include 1) Number Correct: sum of correct responses within 20 s, 2) mean RT, and 3) TOT: sum of items named accurately in 2–20 s (“delayed responses”), plus items not named by 20 s, yet named accurately following a phonemic cue. Both delayed and cued responses indicate that the word is within the individual’s mental lexicon, yet additional time or phonemic cueing was necessary for word retrieval. We have also found utility in Rapid Responses, that is, items named in <2 s, representing the absence of word finding difficulty, and an overall Summary Score that utilizes best performance as its base (i.e., number of items named in <2 s), with a penalty for TOTs (i.e., subtracting delayed yet accurate and cued accurate responses). Given manual timing, we use 2 s (rather than 1.5 (Goodglass et al., Reference Goodglass, Theurkauf and Wingfield1984)) to demarcate automatic versus conscious processing, allowing for human error and variability. TOT and Summary Scores have shown clinical utility in lateralization and localization of dysfunction (Hamberger & Seidel, Reference Hamberger and Seidel2003; Hamberger & Seidel, Reference Hamberger and Seidel2009; Hamberger et al., Reference Hamberger, Seidel, MacAllister and Smith2018) and are easily calculated; therefore, we recommend these for clinical use. We recommend Number Correct to identify potential test validity issues such as impoverished vocabulary or poor effort. These scores are defined in Table 1. Normative data for all scores are presented in Supplementary Tables.
Statistical analyses
All measures were assessed for normality via histogram plots of the residuals. Homogeneity was assessed by ratios of the largest to smallest variance for each performance variable. Due to unequal sample sizes among age groups, we applied a conservative ratio of 2:1 (Salkind, Reference Salkind2010). All performance variables met assumptions for ANOVA except Number Correct scores. For variables that met criteria, multivariate ANOVAs (MANOVAs) assessed potential differences among age groups, and repeated measures MANOVA assessed differences between ANT and VNT performance. Scheffe’s post hoc tests assessed group differences following significant age group effects. For variables not meeting criteria for ANOVA, Kruskal–Wallis H-test and pairwise comparisons assessed differences among age groups and Friedman’s test assessed differences between ANT and VNT performance. Effect sizes were based on the value of partial eta squared (η2) for ANOVA and Epsilon squared for Kruskal–Wallis (Okada, Reference Okada2013).
Test–retest reliability was assessed via Pearson correlations. For analysis of internal consistency, assessed via Cronbach’s alpha with Spearman–Brown correction (Nunnally & Bernstein, Reference Nunnally and Bernstein1994) in which each item is treated as a “case,” it was desirable to have a data point for every response provided for each item, and for these data to have a reasonable extent of inherent variance. Thus, binary performance variables (i.e., correct/incorrect, presence/absence of TOT) were not well suited due to the severely restricted range, and although RT was considered, the absence of RT for items not named in 20 s reduced the dataset listwise. Therefore, we coded responses from 1 to 4, defined as follows: 1 = correct word provided in less than 2 s, 2 = correct word provided in 2–20 s, 3 = correct word provided following phonemic cue, and 4 = no correct response provided. This coded measure captured all possible response types and provided an adequate degree of variance. Validity was assessed via Pearson correlations between ANT and VNT scores (i.e., Number Correct, TOT, Summary Score) with BNT scores.
RESULTS
For the overall sample, mean IQ was in the average range (104.6 ± 16.4) and mean years of education was 15.0 ± 2.5 years. For naming performance, results of MANOVA and Kruskal–Wallis H test revealed a significant effect of age group for all ANT scores (Number Correct: H(7) = 49.3, p < .001; TOT: F(7,399) = 8.16, p < .001; Summary Score: F(7,399) = 9.42, p < .001]. and VNT scores (Number Correct: H(7) = 66.5, p < .001; TOT: F(7,399) =11.63, p < .001; Summary Score: F(7,399) = 12.25, p < .001]. However, results of post hoc comparisons indicated no significant differences between adjacent age groups as follows (56–60 and 61–65 years, 66–70 and 71–75 years, 76–80 and 81–85 years, and 86–90 and 91–100 years). Therefore, we combined adjacent age groups into 10-year age stratifications, with the oldest age group spanning 15 years (i.e., 56–65, 66–75, 76–85, and 86–100 years). As there were no sex differences on any of the ANT or VNT scores (all p > .14), normative data are combined for men and women.
Results of Pearson correlations reveal modest, yet significant correlations between education level and naming performance (ANT r’s = .26–.34, VNT r’s = .18–.22, BNT r = .37; all p < .001), and between WASI-II Vocabulary scores and naming performance (ANT r’s = .48–.52, VNT r’s = .31–.37; BNT r = .61; all p < .001). Therefore, we organized the normative naming data by education level, dichotomizing participants based on whether they completed college (≤15 years versus ≥16 years). This division split the group into approximately equal subsamples: 196 versus 211 and is consistent with that used in our 2003 normative study and other established normative datasets (Heaton et al., Reference Heaton, Miller, Taylor and Grant2004). Demographic characteristics are shown in Table 2, and normative naming data, presented as a function of age and education level, are presented in Tables 3 and 4.
Mean, SD, Range. FSIQ, Full Scale IQ from WASI- II; MMSE, Mini Mental Status Examination, maximum score = 30; DRS, Dementia Rating Scale raw score; education (years) based on highest level achieved. Note: Characteristics for 10-year age intervals are provided in Supplementary Table 1.
TOT, number of tip-of-the-tongue responses; Summary Score: (number of items named <2 sec) – (TOT score); superscript letters denote significant difference from listed age groups at p < .05.
a ≠ all other groups;
b ≠ Age 56–65 years;
c ≠ Age 56–65 years, Age 66–75 years;
e ≠ Age 76–85 years, Age 86–100 years;
f ≠ Age 86–100 years;
g No significant difference from other groups;
1 epsilon squared.
d≠ Age 56–65 years, Age 66–75 years, Age 76–86 years;
η 2 = partial eta squared,
Note: 1) Normative data for additional performance scores (RT, <2 sec, ≥2 sec, Phonemic cue, Summary Scores-2) are available in Supplementary Table.
TOT, number of Tip-of-the-tongue responses; Summary Score: (number of items named <2 sec) – (TOT score); superscript letters denote significant difference from listed age groups at p < .05.
b ≠ Age 56–65 years;
c ≠ Age 56–65 years, Age 66–75 years;
e ≠ Age 76–85 years, Age 86–100 years;
1 epsilon squared.
a≠ All other groups;
d≠ Age 56–65 years, Age 66–75 years, Age 76–86 years;
f≠ Age 86–100 years;
gNo significant difference from other groups;
η 2 = partial eta squared,
Note: 1) Normative data for additional performance scores (RT, <2 sec, ≥2 sec, Phonemic cue, Summary Scores-2) are available in Supplementary Table.
Inspection of Tables 3 and 4 and Fig. 2 suggest only subtle influence of age and education for untimed accuracy scores across tests, despite statistically significant effects of both age (ANT Number Correct: H(3) = 46.9, p < .001; VNT Number Correct: H(3) = 65.2, p < .001; BNT H(3) = 32.2, p < .001) and education level (ANT Number Correct: H(1) = 16.6, p < .001; VNT Number Correct: H(1) = 9.3, p = .002; BNT: H(1) = 35.2, p < .001). ANT and VNT TOT and Summary Scores appear to show greater variably as a function of age and education (Fig. 3). Accordingly, results of two-way (education group by age group) MANOVA revealed main effects of education (ANT TOT: F(1,399) = 25.3, p < .001; ANT Summary Score: F(1,399) = 30.4, p < .001; VNT TOT: F(1,399) = 8.3, p < .004; VNT Summary Score: F(1,399) = 8.4, p < .004) and age group (ANT TOT: F(3,399) = 19.7, p < .001; ANT Summary Score: F(3,399) = 22.5, p < .001; VNT TOT: F(3,399) = 25.1, p < .001; VNT Summary Score: F(1,399) = 26.5, p < .001). This was modified by the interaction between these two variables for ANT (ANT TOT: F(3,399) = 3.3, p = .02; ANT Summary Score: F(3,399) = 3.7, p = .01) but not VNT scores (VNT TOT: F(3,399) = .06, p = .98; VNT Summary Score: F(1,399) = .07, p = .97). Performance differences related to age within the two education groups, detailed in Tables 3 and 4, show greater differences between age groups for time- and cue-based scores relative to Number Correct scores.
ANT versus VNT Performance
Comparison of ANT and VNT performance revealed stronger VNT performance across all age groups in both education groups: (education-15 years: Number Correct: χ2(1) = 48.00, p < .001; TOT: F(1,191) = 41.41, p < .001; Summary Score: F(1,191) = 59.47, p < .001; education-16 years: Number Correct: χ2(1) = 28.40, p < .001; TOT F(1,208) = 23.35, p < .001; Summary Score F(1,208)= 27.81, p < .001), with no interaction between modality and age group in either education group.
Reliability and Validity
Internal consistency, assessed via Cronbach’s alpha with Spearman–Brown correction, was 0.83 for the ANT and 0.84 for the VNT, overall, reflecting a reasonable level of internal consistency. For test–retest reliability assessment, 49 participants (31 women), with proportionate representation across age groups (aged 56–65 years, n = 18; aged 66–75 years, n = 15; aged 76–85 years, n = 10; aged 86–100 years, n = 6), were retested approximately 1 month after their initial testing (mean days: 33.7 ± 4.5). There were no significant differences in IQ, education, or MMSE scores between those retested (mean IQ: 105.4 ± 15.0; education: 15.2 ± 2.2; MMSE: 28.9 ± 1.1) and not retested (mean IQ: 104.5 ± 16.4; p = .72; education: 15.0 ± 2.4, p = .61; MMSE: 28.7 ± 1.3, p = .21). Test–retest correlations (Table 5), which ranged from .46 to .76, were lowest for VNT Number Correct, which has a severely restricted range in healthy adults, and highest for Summary Scores, which we consider the most comprehensive clinical measure, and least susceptible to restricted range (both p < .001). Validity coefficients assessing relations with BNT performance were r = .59–.78 for ANT performance and r = .56–.59 for VNT performance.
RT, mean response time; TOT, number of tip-of-the-tongue responses; Summary Score: (number of correct responses <2 s – number of TOT responses).
DISCUSSION
Considering the prevalence of age-related conditions that affect expressive language, measures that provide rigorous assessment of naming in older adults are essential to clinical care and research involving neurocognitive disorders in this population. The naming tests developed here provide a promising update to the assessment of naming for older adults. Drawing from recent advances in the neuropsychology and cognitive neuroscience of naming, this normative study developed complementary auditory and visual naming tests for adults, aged 56–100 years. Our results showed age-related decrements in auditory and visual naming performance across the older adult age span. These changes, although statistically significant, were relatively subtle for untimed accuracy, whereas age effects were robust for time- and cue-based measures, underscoring the value of normative data for these more sensitive scores. With no naming differences related to sex, results were combined for men and women. However, due to a significant influence of education, normative data are stratified by both age and education level.
These measures advance naming assessment in older adults by 1) adding the auditory verbal modality to the clinical assessment of naming, which, historically, has been largely limited to visual object naming, 2) using target words that are highly likely to be within the working vocabulary of most healthy older adults, and 3) providing age-stratified and education referenced normative data, not only for traditional, untimed accuracy, but also for performance measures that capture delays in responding and reliance on phonemic cueing, that is, features that reflect efficiency of word retrieval from the mental lexicon. These measures showed sensitivity to age-related changes in healthy elders, increasing the likelihood of detecting subtle difficulties in naming at earlier stages of a degenerative process, and of potentially identifying features or patterns of naming performance associated with particular dementia subtypes.
Aging and Naming
While receptive language skills remain stable with increasing age (Burke et al., Reference Burke, Mackay, Worthley and Wade1991) and verbal knowledge has even been found to expand across the adult lifespan (Burke & Shafto, 2008; Verhaeghen, Reference Verhaeghen2003), productive language skills have been shown to decline with age. Relative to younger adults, older adults’ natural speech is characterized by simpler language, more vague terms, and empty pauses, and most relevant here, more frequent instances of TOT (Burke et al., Reference Burke, Mackay, Worthley and Wade1991; Burke & Shafto, 2008; Kemper & Sumner, Reference Kemper and Sumner2001), as we found in this study. The neural underpinnings of increased TOTs in older adults found here and reported by others are not entirely clear; although aging is associated with widespread changes in gray and white matter volume, the relation between the extent of neural change and cognitive performance is not straightforward. Neuroimaging studies of successful word retrieval and TOTs in younger and older adults show similar levels of increased activity in inferior frontal, left anterior insula, and anterior cingulate; however, older adults show weaker activation during TOT occurrences (Shafto et al., Reference Shafto, Stamatakis, Tam and Tyler2010). Unfortunately, the age difference, evident during TOT states, occurs “after the fact,” that is, well after the time window of impairment that led to the TOT state. On the other hand, if we consider that naming failures likely reflect reduced efficiency in retrieval or production (i.e., rather than information loss), a better explanation might be found in studies of resting-state fMRI (Geerligs, Maurits, et al., Reference Geerligs, Maurits, Renken and Lorist2014; Geerligs, Renken, et al., Reference Geerligs, Renken, Saliasi, Maurits and Lorist2014; Meunier et al., Reference Meunier, Achard, Morcom and Bullmore2009) which show, in young adults, modularly organized, brain-wide networks, with highly integrated local networks and weak connectivity between networks (Meunier et al., Reference Meunier, Achard, Morcom and Bullmore2009). By contrast, older adults show reductions in integration within networks, and increased connectivity between networks, possibly reflecting both decrements in within-network efficiency, and less effective attempts to compensate via recruitment of other brain areas (Meunier et al., Reference Meunier, Stamatakis and Tyler2014). These changes in integration and connectivity might underlie our observed age-related declines in auditory and visual naming and are consistent with the linear reduction in processing efficiency in other cognitive domains (Salthouse, Reference Salthouse2003, Reference Salthouse2009, Reference Salthouse2010).
Aging, Education, and Naming
Despite efforts to reduce the influence of education, correlations between education and performance were significant, resulting in education-based organization of the normative data. It is, however, possible that the influence of education on performance is not directly attributable to word knowledge of the test stimuli. As noted, lack of familiarity with target words was extremely rare. Rather, education and vocabulary might serve as proxies for the functional integrity of the mental lexicon, that is, not only breadth but also efficiency by which words are stored and retrieved. The correlation between education and cognitive performance in older adults has been framed within “cognitive reserve theory,” which suggests that environmental enrichment promotes an increase in synapses and vascularization, resulting in better performance (Speisman et al., Reference Speisman, Kumar, Rani, Foster and Ormerod2013; Stern, Reference Stern2012). An alternative, yet not mutually exclusive position points to the association between education and health (Albert, Reference Albert1995), whereby less educated individuals tend to have greater exposure to risks (e.g., occupational exposure, unhealthy habits), which might adversely affect cognition over time (Murphy et al., Reference Murphy, Rabelo, Silagi, Mansur and Schochat2016). Taken together, the influence of education on naming in older adults likely reflects the combination of breadth and quality of the individual’s working vocabulary and brain health, which affects the efficiency by which information is retrieved and produced.
Neural Correlates of Auditory and Visual Naming
Prior to phonological word access, both pictured objects and auditory descriptions elicit a set of dynamic processes beginning with the accumulation of sensory information. It is well established that visual object recognition engages the visual “ventral stream,” a processing pathway extending from occipital cortex to the temporal lobe (Haxby et al., Reference Haxby, Gobbini, Furey, Ishai, Schouten and Pietrini2001), and more recent, multivariate neuroimaging methods have shown distinct distributed patterns of brain activation in this same region associated with different object categories (Grootswagers et al., Reference Grootswagers, Wardle and Carlson2017). On the other hand, auditory description naming requires receptive verbal processing, engaging the left superior, posterior temporal region (Alsop et al., Reference Alsop, Detre, D’Esposito, Howard, Maldjian, Grossman and Atlas1996; Boatman et al., Reference Boatman, Lesser and Gordon1995; Vigneau et al., Reference Vigneau, Beaucousin, Herve, Duffau, Crivello, Houde, Mazoyer and Tzourio-Mazoyer2006). Although the syntactic structure of ANT descriptions is not particularly complex, syntactic processing of spoken language is required, likely engaging left middle temporal and inferior frontal cortex (Tyler, Cheung, Devereux, & Clarke, 2013). Additionally, the information provided by the descriptions, while distinctive, requires identification based on somewhat limited information, suggesting that the ANT might be more dependent than the VNT on frontally mediated executive mechanisms.
While most of the work on neural correlates of naming come from neuroimaging studies of visual object naming, investigations of both visual and auditory naming together, using electrocortical stimulation, electrocorticography, and other techniques focusing on the left temporal region provide additional insights. Cortical mapping studies have shown that anterior temporal stimulation tends to disrupt auditory naming, but not visual naming, whereas stimulation in the posterior temporal–parietal region tends to disrupt both visual naming and auditory naming or, at some posterior sites, visual naming only (Hamberger et al., Reference Hamberger, Goodman, Perrine and Tamny2001; Hamberger et al., Reference Hamberger, McClelland, Williams, Goodman and McKhann2007). Similarly, electrocorticography in refractory epilepsy patients has shown anatomical distinctions in high gamma activity associated with auditory naming and visual object naming (Cervenka et al., Reference Cervenka, Corines, Boatman-Reich, Eloyan, Sheng, Franaszczuk and Crone2013). Consistent with these invasive studies, behavioral work has shown that patients with posterior temporal abnormalities perform more poorly on visual naming compared to auditory naming – intimating neurocognitive specificity rather than merely task difficulty – with the reverse task-related asymmetry (auditory naming poorer than visual naming) found for patients with abnormalities in the anterior temporal region (Hamberger & Seidel, Reference Hamberger and Seidel2009). Moreover, these modality-related anatomical distinctions are not limited to clinical samples, as functional neuroimaging in healthy young adults has also shown both overlapping and task-specific areas involved in auditory versus visual naming (Hamberger et al., Reference Hamberger, Habeck, Pantazatos, Williams and Hirsch2014; Tomaszewki-Farias et al., Reference Tomaszewki-Farias, Harrington, Broomand and Seyal2005). Taken together, results from cortical stimulation, electrophysiological, neuroimaging, and behavioral studies suggest that while difficulty level might differ between tasks, auditory and visual naming probe different aspects of word retrieval that draw on distinct neural substrates and cognitive mechanisms.
Normal Aging versus Pathological Changes in Naming
Analysis of both traditional naming scores (i.e., naming accuracy within 20 s) and time- and cue-based measures suggest that naming begins to decline during one’s mid-70s. However, age-related differences were more readily apparent in both ANT and VNT TOT and Summary Scores than in untimed accuracy scores. Thus, TOT and Summary Scores might better capture symptomatic changes in naming efficiency during the early stages of neurocognitive disorders involving expressive language. As noted, other investigators have reported, relative to the BNT, better identification of both vascular and Alzheimer’s dementia using the ANT developed for younger adults (Hirsch et al., Reference Hirsch, Cuesta, Jordan, Fonzetti and Levin2016; Miller et al., Reference Miller, Finney, Meader and Loring2010). While these studies show promise for the clinical utility of the ANT, it is hoped that the availability of the age-appropriate ANT and VNT developed and standardized here will reliably distinguish between normal versus pathological changes in naming associated with age-related neurodegenerative conditions.
Practical considerations
From a practical standpoint, it might seem that it would have been sufficient to merely extend the normative sample for the already established 2003, “younger adult” ANT and VNT. With nearly two decades of experience with these original tests, including considerable feedback from users, we opted to update and improve the stimuli with the goal of providing more effective and efficient measures. Improvements included 1) the use of spoken rather than written word frequencies for target words, 2) shorter tests, as we found with both the younger adult (Hamberger & Seidel, Reference Hamberger and Seidel2003) and more recent children’s tests (Hamberger et al., Reference Hamberger, Seidel, MacAllister and Smith2018), that 36 items provided sufficient sensitivity, 3) use of digitized photographs rather than line drawings, and 4) exclusion of items that might evoke negative emotional reactions in an older population. Finally, language is culturally based; culture is potentially cohort-sensitive and, language and culture change over time. Accordingly, it could be argued that the 2003 tests for aged 16–55 years are due for revision. Fortunately, these efforts are underway.
Performance Measures
To assess the efficiency of targeted word retrieval, we utilized several performance measures that capture RT and reliance on phonemic cueing. These include mean TOT and Summary Scores that combine rapid, accurate responses, and TOTs. As mean RT is cumbersome to calculate and could potentially be skewed by a few outliers, for clinical use, we recommend the more readily calculated scores that have been shown to provide robust and reliable measures of naming. Specifically, in “young” adults with unilateral TLE, we found the TOT score to be particularly sensitive to left (dominant) temporal lobe dysfunction (Hamberger & Seidel, Reference Hamberger and Seidel2003), and in children, both TOT and Summary Scores have shown sensitivity to left hemisphere, and in particular, left temporal dysfunction (Hamberger et al., Reference Hamberger, Seidel, MacAllister and Smith2018). As it is not yet known which measures will be most sensitive to dysfunction in older adults, we provide age-stratified, education-based normative data for multiple scores in simple and combined forms (Supplementary Tables).
Finally, the ANT and VNT simulate the different contexts in which word retrieval occurs in day-to-day living. At times, we name objects in our visual environment (e.g., “Please pass the pepper.”), while, at others, we produce words following a speaker’s auditory verbal message (e.g., “What can I add for flavor?” “Pepper”). Auditory naming and visual naming rely on task-specific cognitive mechanisms that are supported by distinct neuroanatomical areas. As such, thorough assessment of naming requires assessment of both auditory and visual naming.
Limitations
Although we attempted to recruit comparable sample sizes across age groups with balanced representation of men and woman across the age span, recruitment of the oldest individuals, particularly men, was challenging. Additionally, although education level was consistent across age groups and included a wide range, the overall level of education could be considered somewhat high, as most participants completed high school and some college. Due to the significant correlation between education and naming performance, we dichotomized the data by education level. Ideally, a normative study would allow for more fine-grained stratification; however, the sample size was limited to some extent by resource constraints. As our sample was confined to healthy older adults, we were unable to assess validity with respect to a clinical sample, and therefore, compared ANT and VNT performance with that of the traditional standard, the BNT. With the completion of these measures, future work in our laboratory will aim to investigate clinical validity with relevant neurological patients. Additionally, semantic and perceptual features can influence object recognition and naming. Although beyond the scope of this study, future investigations could assess the clinical significance of features, including category crowding (Gerlach, Reference Gerlach2017), semantic and structural similarity (Dickerson & Humphreys; Lloyd-Jones & Humphreys, Reference Lloyd-Jones and Humphreys1997), and visual features (Humphreys et al., Reference Humphreys, Riddoch and Quinlan1988) of test items. Finally, we note that the ANT and VNT, like other verbal measures, are culturally bound, and therefore, intended for use with individuals raised and educated in the US. or similar cultural environment. We recommend caution in interpreting test performance in bilingual individuals, as we have found bilingualism to be associated with weaker naming performance, despite unimpaired expressive vocabulary (Gooding et al., Reference Gooding, Cole and Hamberger2018).
Closing Comments
Naming is integral to neuropsychological assessment, particularly in the context of aging. Traditional measures, such as the BNT, laid the groundwork for that foundation, bringing to light expressive language symptoms associated with age-related neurological disorders. Building on this foundation, the auditory and visual naming tasks developed here incorporate the knowledge gained from research in the neuropsychology and neuroscience of naming over the past several decades. The tests are psychometrically sound, demonstrating a reasonable level of internal and test–retest reliability for clinical application. With converging evidence from varied methodologies of distinct neurocognitive systems supporting auditory and visual naming, the ANT and VNT enable more comprehensive, and more fine-tuned assessment of naming. Further, incorporating clinical features of word finding difficulty into performance measures holds promise, not only of earlier detection of clinical changes associated with neurodegenerative disorders, but also of identifying disease-specific features that might assist in the differential diagnoses of degenerative processes. We hope and expect that utilization of these measures in both clinical and research settings will improve clinical assessment and facilitate advancements in our understanding of word retrieval in aging and age-related disorders.
SUPPLEMENTARY MATERIAL
To view supplementary material for this article, please visit https://doi.org/10.1017/S1355617721000552
Acknowledgments
We thank Kaitlin Carson, Leslie Church, Tess Jacobson and Audrey Li for assistance with data collection and Alicia Williams and Donovan Laing for assistance with data management.
FINANCIAL SUPPORT
This work was supported by the National Institutes of Health/the National Institute of Neurological Disorders and Stroke grant number R01 NS083976 (MH).
CONFLICT OF INTEREST
The authors have no conflicts of interest to disclose.
APPENDIX 1
AUDITORY NAMING TEST ITEMS VISUAL NAMING TEST ITEMS
1Detailed administration instructions, test stimuli, and record forms can be obtained from hambergerlab@gmail.com.