Introduction
In recent years, increasing emphasis has been placed on cultural neuropsychology in the field of clinical neuropsychology (Manly, Reference Manly2008). The reasons are at least twofold. First, there are rapidly rising demands for culturally appropriate evaluation. Secondly, in response to such demands, traditional neuropsychological practice calls for a new set of ethical and competency guidelines. One avenue to further develop cultural neuropsychology and facilitate service provision is through developing local norms for translations and adaptations of existing neuropsychological measuresFootnote 1 as well as indigenously developed tests. A considerable number of reports of locally developed normative data have been made for tests in Mandarin Chinese, the language with the world’s largest number of native speakers.
Cultural Neuropsychology
In neuropsychology, a cultural approach entails considerations of not only translation/adaptation of tests, but also the cultural appropriateness of the test paradigms and tested constructs. The Board of Directors of the American Academy of Clinical Neuropsychology (AACN) highlighted the ethical and competency issues pertaining to cultural neuropsychology in the Practice Guidelines for Neuropsychological Assessment and Consultation (2007). It urged neuropsychologists to acquire awareness of how cultural, linguistic, and other demographic and socioeconomic factors influence test participation and results interpretation and to attain relevant training and consultation.
Luria was among the pioneers pointing out that culture significantly impacts human cognition (Kotik-Friedgut & Ardila, Reference Kotik-Friedgut and Ardila2020). He emphasized that higher mental functions are developed within a specific cultural-historical environment. This is consistent with later development of cultural (neuro)psychology, where culture is not seen as an independent variable but an inherent component of human life (Cole, Reference Cole and Bergman1990; Jahoda, Reference Jahoda, Berry, Poortinga, Segall and Dasen1992). It contrasts with a cross-cultural approach, where culture is seen as one of the “antecedent variables” (e.g., Berry, Reference Berry1976). Additionally, indigenous psychology is another approach celebrating the notion that “psychological concepts and psychological theory, not just data collection techniques, should be developed within each culture” (Greenfield, Reference Greenfield2000). In a sense, mainstream neuropsychology can be understood as indigenous in the West (Greenfield, Reference Greenfield2000). Cultural and indigenous psychology can sometimes overlap in their systematic understanding of culture as a process instead of variables.
Practically, one approach to implement cultural neuropsychology is through indigenous development of valid neuropsychological tests with cultural, linguistic, and educational considerations for the intended population. While these indigenous tests require lengthy processes of development and validation, local data for translated and adapted tests with sound psychometric properties may serve as the next best option (Fernández & Abe, Reference Fernández and Abe2018). This endeavor, though not perfect, ensures that clinicians have “good enough” tools to serve patients who urgently need them. However, it should be highlighted that using appropriate testing tools and normative data is only one of the many elements of proper cultural neuropsychology. To implement the notion of seeing culture as a process instead of a variable, one needs to establish a wholistic appreciation of the languages, cultures, and contexts of subjects and clients beyond tests and norms. The ECLECTIC framework (Fujii, Reference Fujii2018) provides an excellent example of incorporating cultural considerations throughout a neuropsychological evaluation. We also include brief discussions on the key linguistic and cultural considerations for Mandarin-speaking individuals in the Discussion section.
Contexts of Mandarin Neuropsychology
Historically, the Halstead–Reitan Seattle/Changsha project was one of the first efforts introducing neuropsychology into China (Doerr & Storrie, Reference Doerr and Storrie1981). Xu, Gong, and Matthews (Reference Xu, Gong and Matthews1987) also developed a revised Chinese version of the Luria–Nebraska Neuropsychological Battery. However, these initial endeavors were limited by the quality of normative data and availability of test forms. Since the 1980s, clinical neuropsychology saw some development in Mandarin-speaking regions as efforts were made to adapt various tests (see Ponsford, Reference Ponsford2017; Yuan, Reference Yuan2000). However, neuropsychological services have been relatively limited, as most providers lacked a systematic training in neuropsychology (Chan, Leung, & Cheung, Reference Chan, Leung, Cheung and Fujii2011; Yuan, Reference Yuan2000). Following a Western-centric conceptualization of neuropsychology practice, it is estimated that there are fewer than 100 neuropsychologists in mainland China (Grote & Novitski, Reference Grote and Novitski2016) and approximately one dozen clinical neuropsychologists in Singapore (Ponsford, Reference Ponsford2017). However, these numbers may largely underestimate the actual numbers of providers/researchers in the field of neuropsychology. Taking China as an example, neuropsychology has not been recognized as an independent field and is still in its rudimentary stage of development. In the meantime, domain-specific neuropsychological tests and some selected subtests are frequently used in research settings and in clinical settings as supplementary tests to other health and mental health examinations. Nevertheless, examiners are mainly research trainees and/or testing technicians lacking systematic training in neuropsychology and therefore possess limited knowledge regarding the background of testing tools, integration of the test findings, and the clinical implication. In short, there are significant potentials for neuropsychological services to play a more prominent role in healthcare for Mandarin-speaking individuals.
One of the difficulties of neuropsychology development in Mandarin-speaking regions may be related to the complexity and distinctiveness of the Chinese linguistic system, compared to the alphabetical languages such as English and German. Orthographically, Chinese written system has been classified as logographic, morphemic (e.g., Leong, Reference Leong and Downing1973), or morphosyllabic (e.g., DeFrancis, Reference DeFrancis1989; Mattingly, Reference Mattingly, Frost and Katz1992). Characters, as basic writing units, bear semantics. Therefore, unlike alphabetical languages for which overwhelming evidence suggests that phonology is the single most important mediator of language processing, it is generally accepted that the process of reading Chinese characters is mediated by both phonological and orthographical information (Zhou & Marslen-Wilson, Reference Zhou and Marslen-Wilson1999). An example of how such psycholinguistic differences may impact neuropsychology practice can be observed in the phonemic verbal fluency test, in which word-searching with phonemic clues does not engage the major structure of phonemic knowledge as it does in English because that structure is much less prominent in Mandarin and other logographic languages. In addition, there is a vast linguistic variation within the Chinese language system. By estimation, in mainland China alone, there are over 2,000 dialects and subdialects (Li, Reference Li2006). Many of these dialects are mutually unintelligible and are classified as Chinese languages for sociopolitical reasons rather than linguistic similarities (Tang & Van Heuven, Reference Tang and Van Heuven2009). The Mandarin dialects, serving as the official language in mainland China (i.e., Putonghua) and Taiwan (i.e., Guoyu), are estimated to be spoken as a second language in over 30% of the population in mainland China (Li, Reference Li2006) and spoken fluently by only 67% of the population over 60 years old in Taiwan (Shih, Reference Shih2012). Though Putonghua and Guoyu are largely mutually intelligible (Erbaugh, Reference Erbaugh and Slobin1992; Ho, Reference Ho, Thurgood and LaPolla2003), the written language differs such that mainland China uses simplified Chinese characters whereas Taiwan uses traditional characters. Furthermore, in Singapore, Mandarin is only one of the four largely distinct official languages (English, Malay, Chinese, and Tamil), making indigenous development of neuropsychology tests even more essential, but challenging.
In contrast, clinical neuropsychology in Hong Kong underwent a faster development, as highly trained researchers and clinicians returned from overseas training. The Hong Kong Neuropsychological Association was established in 1998 and played an important role in the development of the specialty in the area (Chan, Leung, & Cheung, Reference Chan, Leung, Cheung and Fujii2011). A large amount of indigenous test development and test translation/adaptation also took place (Ponsford, Reference Ponsford2017). However, almost all neuropsychological instruments developed and/or adapted in Hong Kong are in Cantonese, the predominant dialect spoken in Hong Kong. The linguistic differences between Cantonese and Mandarin, as well as differences in population characteristics between Mainland China and Hong Kong, made it difficult to simply apply Cantonese-based tools to Mandarin-speaking examinees. Nonetheless, the professional development of clinical neuropsychology in Hong Kong has much to offer Mandarin-speaking neuropsychologists in Mainland China as well as to the field, in general.
Increasing Demands for Mandarin Neuropsychological Services
Mandarin, the language with the most native speakers, is spoken by around 898 million individuals across the globe (Simons & Fennig, Reference Simons and Fennig2017), over 800 millions of whom reside in mainland China. By 2050, it is estimated that the population of older adults (≥ 65 years old) in mainland China will reach 400 million, with over 90 million aged 80 years or older (“Prediction of Population Aging in China,” 2007). Neuropsychological services are in dire need given the population base for Mandarin speakers. Similarly, limited neuropsychology services, especially in the clinical setting, are provided in Taiwan and Singapore.
As for the Chinese diaspora in North America, a 2018 estimate states that over 480,000 Chinese-speaking households are considered a “limited English-speaking household,” among whom a significant portion is likely Mandarin-speaking, and over 1.8 million Chinese-speaking people in the United States reported speaking English less than “very well” (United States Census Bureau, 2019). As a result, needs for adequate Mandarin services are growing. This is particularly true in metropolitan areas where large Chinese communities continue to expand, such as New York City, Los Angeles, and Vancouver. Although the North American Chinese populations differ in many aspects (e.g., cultural, historical, socioeconomic, etc.) from those in their countries of origin, language is a key factor they share (i.e., being Mandarin-speaking). For the above-mentioned Mandarin-speaking populations, one limiting factor of adequate neuropsychological services is the scarcity of well-validated and culturally appropriate clinical tools (Lee, Wang, & Collinson, Reference Lee, Wang, Collinson, Barr and Bielauskas2016; Ponsford, Reference Ponsford2017). We attempt to bridge this critical gap by performing a comprehensive review of normative studies with Mandarin-speaking individuals. There is also a need for caution, because the widespread dialectical differences and use of Mandarin as a second language means that researchers and clinicians need to carefully investigate the individual’s language histories and properly apply the reviewed Mandarin norms.
The Current Study
Recently, there have been an increasing number of research studies involving neuropsychological tests and/or measures in Mandarin Chinese. For example, a literature search on PsycInfo resulted in 294 articles that contained neuropsychological data (see details of literature search in the Method section), 64% of which were published after 2010 (see Figure 1). Informal literature searches are often performed during clinical practices to locate relevant normative data for non-English speaking test-takers. These searches can be difficult due to accessibility barriers and publication biases (Anderson, Reference Anderson2001), and the search results may differ greatly in quality. In this context, a systematic review of neuropsychological studies for Mandarin Chinese speakers is timely and warranted.
In this study, we performed a systematic review of normative studies where Mandarin was indicated as the spoken dialect used. We focused on the adult population, given that pediatric neuropsychology often requires more intricate considerations with regard to the developmental stage and therefore requires separate research attention. We also focused on performance-based tests instead of self- or other-report questionnaires/scales, which also require separate research attention due to the volume of research and methodological differences. Normative data that are part of the efforts of for-profit publishers were excluded from the review due to accessibility and copyright issues. We reviewed the quality of the normative data, including descriptions of the sampling procedures, size and demographic information of the normative sample, and demographic adjustments. Practitioners are recommended to take these factors into consideration when they use these norms in the clinical setting. To make our study more relevant for neuropsychologists interested in using the measures included in this review, we also assessed the availability of the tests to our best knowledge. Our aim is for this review to not only serve as a reference guide to appropriate norms when working with a Mandarin-speaking patient, but also to guide future endeavors in test validation and development in areas where studies to date fall short.
Method
Literature Search
A literature search was performed in April 2019, September 2019, and again in December 2020 through electronic databases of PubMed and PsycInfo. Search terms included “‘neuropsych*’ AND ‘tool OR instrument OR scale OR battery OR batteries OR measure OR assessment OR test’ AND ‘China OR Chinese OR mandarin.’” Asterisks were used to truncate terms whose suffixes may differ. A literature search was performed again in August 2020 through the Chinese database of China Knowledge Resource Integrated Database utilizing the accordingly translated search terms (“‘神经心理’ AND ‘工具 OR 测评 OR 量表 OR 测量 OR 评估 OR 测试’ AND ‘中文 OR中国 OR 普通话’”.) An additional search was performed through reference lists of collected articles to ensure an exhaustive capture of relevant studies. The search resulted in 1155 articles for initial screening.
Inclusion and Exclusion Criteria
All studies with a mention of neuropsychological evaluations in Chinese languages were included for the initial review. Studies without reported neuropsychological assessment data were excluded. The remaining articles were further reviewed using the following inclusion criteria: a) the study had explicitly reported normative data, though a primary aim of normative data development was not required and b) normative data presented in the study had a total sample size of at least 150 (Strauss, Sherman, & Spreen, Reference Strauss, Sherman and Spreen2006). Articles were excluded if they met any of the following criteria: a) the normative sample was limited to a clinical population (e.g., patients with strokes), unless the discussed neuropsychological tool was intended to be used in this clinical population (e.g., the psychometric hepatic encephalopathy score); b) the study was an interventional study with neuropsychological tools used for pre- and post-intervention comparison; c) the study was a case study with no extractable data; d) the target neuropsychological tool was not a performance-based neuropsychological tool (e.g., self-report questionnaires and clinical interviews); e) the target tool was a research paradigm not intended for clinical use; f) the study utilized a normative sample that was previously included and did not offer additional information such as stratification; g) the study focused primarily on pediatric population (under the age of 16 years); or (h) normative data was published by a commercial publisher.
Data Extraction and Review
All studies were read by all three reviewers, who independently performed data extraction and reviewed all articles selected for inclusion. Discrepancies were resolved through obtaining consensus by at least two reviewers.
We first extracted relevant information from each normative study regarding the sample characteristics and research design (e.g., neuropsychological tests employed, country/region of normative samples, and sampling methods). We then assessed the availability and quality of normative data for each test. When assessing the quality of normative data, we employed the following key criteria including a) region of sample collection, b) the level of normative sample, c) sampling frame, d) stratification/adjustment based on demographic characteristics (i.e., age, gender, and education), e) total sample size, f) cell sample size, if stratification was performed, and g) whether relevant psychometric properties of the normative data were reported.
The level of normative sample was coded at four levels: national (samples aimed at representing the national population), regional (samples collected at multiple cities in the same region), local (samples collected at multiple sites in the same city), and single facility. We coded the sampling framework at two levels: randomized sampling representative of the target population and sample of convenience. For cell sample size, we calculated the average cell sample sizes by dividing the total sample size by the total number of cells to account for the differences between cell sample sizes in a given study. No animal or human subjects were involved in the current study. The research was completed in accordance with the Helsinki Declaration.
Results
Power Analysis
We performed an a priori power analysis in G*Power (Faul, Erdfelder, Buchner, & Lang, Reference Faul, Erdfelder, Buchner and Lang2009) to determine the appropriate cell sample size if stratifications for normative data were performed. Matched-sample t test was assumed, defining effect size d = 0.5 and significant level α = .05. We subsequently defined small and large power (1 - β) as .80 and .95, resulting in expected cell sample size of 27 and 45, respectively. Therefore, cell sample sizes of < 27, 27–45, and > 45 were considered as small, medium, and large, respectively.
Characteristics of the Norm Studies
Of the 1155 articles screened, 43 articles met our inclusion criteria. A flow chart reporting the article screening process is presented in Figure 2. Studies were analyzed separately if they included the same sample but focused on different neuropsychological tools and/or different subsamples. Characteristics of each study are presented in Table 1.
Note. ACE-R, Addenbrooke’s Cognitive Examination Revised; AFT, Animal Fluency Test; AMT, Abbreviated Mental Test; ANT, Animal Naming Test; Arith, Arithmetic; AVLT, Auditory Verbal Learning Test; BD, Block Design; BNT, Boston Naming Test; BVMT-R, Brief Visuospatial Memory Test-Revised; CD, Coding/Digit Symbol; CDT, Clock Drawing Test; CIS, Conflicting Instructions Task; CMMS, Chinese Mini-Mental Status; COST-M, Modified Common Objects Sorting Test; CPT-IP, Continuous Performance Test – Identical Pairs; CPT-AX, AX-Continuous Performance Test; CRT, Clock Reading Test; CSID, Community Screening Instrument for Dementia; CTT, Color Trails Test; DS, Digit Span; DST, Digit Symbol Test; FCT, Finger Construction Test; FDT, Five Digit Test; FOME-M, Modified Fuld Object Memory Evaluation; FOML, Fuld Object Memory Test; HNT, Huashan Naming Test; HVLT, Hopkins Verbal Learning Test; IHDS, International HIV Dementia Scale; LTT, Line Tracing Test; LM, Logical Memory; MoCA, Montreal Cognitive Assessment; MMSE, Mini Mental Status Exam; MSCEIT, Mayer-Salovey-Caruso Emotional Intelligence Test; NCT, Number Connection Test; NLCA, Non-Language-Based Cognitive Assessment; NUCOG, Neuropsychiatry Unit Cognitive Assessment Tool; OA, Object Assembly; PHES, Tests of the Psychometric Hepatic Encephalopathy Score ; PVLT, Philadelphia Verbal Learning Test; RBANS, Repeatable Battery for the Assessment of Neuropsychological Status; RMBT, Renminbi Test; ROCFT, Rey–Osterrieth Complex Figure Test; RPT, Raven’s Standard Progressive Matrices; SDT, Serial Dotting Test; SI, Similarities; SIS, Six-Item Screener; ST, Stick Test; STT, Shape Trail Test; SC, Symbol Coding; SCWT, Stroop Color-Word Test; SDMT, Symbol Digit Modalities Test; SMR, Story Memory and Recall; TH, Tower of Hanoi; TMT, Trail Making Test; TYM, Test Your Memory; VFT, Verbal Fluency Test; VOSP, Visual Object and Space Perception; VPA, Verbal Paired Associates; VR, Visual Reproduction; VST, Victoria Stroop Test; WCST, Wisconsin Card Sorting Test; WL, Word Listing Learning; WMS, Wechsler Memory Scale.
Region: MC, Mainland China; S, Singapore; TW, Taiwan.
Sampling frame: R, randomized sampling representative of the target population; C, sample of convenience; ND, no description.
Age/gender/education: Sig., significant; Non-Sig., non-significant; ND, no description.
Cell size: L, large; M, medium; S, small.
Psychometric property: α, Cronbach’s alpha; AUC, area under the receiver operating characteristic curve; r, Pearson’s correlation coefficient; ICC, intraclass correlation coefficient; IRR, inter-rater reliability; ROC, receiver operating curves; Sens., sensitivity value; Spec., specificity value; TRR, test–retest reliability; ND, no description.
* Articles written in Mandarin Chinese
All studies were published from 1997 to 2020: 1 study from 1997, 6 from 2005 to 2009, 22 from 2010 to 2014, 11 from 2015 to 2019, and 3 from 2020. The majority of the studies were conducted in mainland China (n = 36), with others conducted in Singapore (n = 5) and Taiwan (n = 2). Of the studies conducted in mainland China, 2 studies collected their samples within a single facility, 30 from multiple sites in the same city (local), 2 from multiple cities in the same region (regional), and 2 aimed to represent the national population (national). Of the 43 studies, 22 studies utilized a convenience sampling approach, 18 utilized randomized sample representative of the target population, and 3 did not provide descriptions regarding their sampling approach. Total sample size ranged from 151 to 4573. Twenty-seven studies provided stratified normative data, 21 of which had large cell sizes, 2 had medium cell sizes, and 4 had small cell sizes.
Regarding demographic factors, 36 of the 43 studies analyzed the impact of one or more demographic variables on testing results. Of the 36 studies, 35 tested the influence of education and reported significant results, and 22 of them provided stratification by education level. Thirty-five studies tested the influence of age, 32 of which reported significant results, and 19 provided stratification by age. Twenty-seven studies tested the influence of gender, 18 of which reported significant results, and 9 provided stratification by gender. Finally, regarding sample age range, except for five studies that did not specify the age range, the majority reported normative data for the age of 50 years and above (n = 29), and only nine studies reported data including age below 50 years. Of note, one study (Ding et al., 2015) did not specifically analyze the association of demographic variables on testing results within the target group but provided stratified normative data by age and education.
Psychometric properties of tests were reported in 24 of the 43 studies. Of the 24 studies, the majority (n = 21) involved classification accuracy statistics (e.g., sensitivity, specificity) for tests used as screeners [e.g., Montreal Cognitive Assessment (MoCA)]. Some metrics of reliability were reported in 10 studies.
Characteristics of Neuropsychological Tests
Regarding neuropsychological tests, the identified 43 studies covered a total number of 65 distinctive tests. Table 2 summarized the characteristics of each test for practitioners’ easier reference, organized by each test’s primarily targeted domain and cross-referenced with the normative studies. A wide range of domains were covered, including screeners (n = 11), language (n = 8), memory (n = 17), visuospatial/visuoconstruction (n = 7), executive function (n = 12), working memory/processing speed (n = 9), and others (n = 2).
Note. Table organized first by test domain, then by test development type, followed by normative study number as listed in Table 1.
Test development: D, direct implementation of published tests, T, direct translation, A, adaption, O, original test; data type: F, full normative data, C, cutoff score.
Region: MC, Mainland China; S, Singapore; TW, Taiwan.
*Mandarin testing protocol available on public domains.
^Mandarin testing protocol with detailed accounts of cultural/linguistic adaptations written in English.
Of note, several identified tests had multiple versions, making up a total number of 127 versions. We characterized these versions into four categories. Four studies (n = 4) were identified as originally developed tests. Most studies were linguistically and/or culturally adapted from their original version (n = 75), which were further differentiated as translation and adaptation, guided by the International Test Commission’s (ITC) Guidelines for Translating and Adapting Tests (2017). That is, test translation is restricted to the “choosing of language to move the test from one language and culture to another to preserve the linguistic meaning,” whereas test adaptation is a broader concept referring to moving a test from one language and culture to another, including activities of “deciding whether or not a test is in a second language and culture could measure the same construct in the first language; selecting translators; choosing a design for evaluating the work of test translators (e.g., forward and backward translations); choosing any necessary accommodations; modifying the test format; conducting the translation; checking the equivalence of the test in the second language and culture and conducting other necessary validity studies” (p. 6–7).
Guided by the recommended test development guidelines, among the 75 adapted versions, 6 versions were characterized as direct translation, where either the translation was restricted to test instruction only (i.e., testing content unmodified) or the testing content was directly translated into Mandarin Chinese without further adaptation (e.g., “As some of our enrolled subjects were not familiar with the English alphabet, we replaced the alphabet in NCT-B with the Chinese alphabet in the same order”; Li et al., Reference Li, Wang, Yu, Wang, Li and Xu2013, p. 8746). The other 69 versions were characterized as adaptation, where the study either utilized a well-validated adapted version (e.g., WAIS-RC) or described their adaptation procedure in detail. For example,
Using the forward–backward translation procedure, the RBANS (Repeatable Battery for the Assessment of Neuropsychological Status. Copyright # 1998. NCS Pearson, Inc. Reproduced with permission. All rights reserved. Mandarin Chinese translation copyright # 2012. NCS Pearson, Inc. Translated and reproduced with permission.) was translated into Mandarin and dialect by a committee of trained multilingual research psychologists. The nature of the translation procedure ensured that the phrasing of the test items was accurate and appropriate. As certain words were not translatable or meaningful in some dialects, appropriate alternatives that were comprehensible in all languages and dialects were chosen. Wherever possible the words chosen were as semantically and phonemically as close as possible to the original English word. In some cases, culturally foreign items were replaced with a local equivalent (Collinson et al., Reference Collinson, Fang, Lim, Feng and Ng2014, p. 443–444).
Finally, the remaining 48 versions were coded as “no descriptions,” as the authors provided a generic citation of an original test in non-Chinese languages (e.g., Stroop Color-Word Test; Stroop, Reference Stroop1935) without specifying a formerly validated/translated version or their own translation/adaptation procedures. Full normative data are available for the majority of these versions (n = 100), with another 4 versions providing cutoff scores alone and 16 versions providing both full normative data and cutoff scores.
Discussion
The notion that cultural and linguistic factors are inextricable components of brain–behavior relationships has long been embedded in the founding work of neuropsychology (Cagigas & Manly, Reference Cagigas, Manly, Parsons and Hammeke2014). To provide adequate cultural neuropsychological services, neuropsychologists must have explicit knowledge of available instruments and norms in the examinee’s native language and linguistic and cultural underpinnings to fully understand the examinee’s neuropsychiatric presentation (Cagigas & Manly, Reference Cagigas, Manly, Parsons and Hammeke2014). Mandarin-speaking individuals present an increasing demand for adequate neuropsychological services in Mandarin. In this review, we attempted to address this critical issue by presenting a comprehensive review of available normative studies for this population, while acknowledging that adequate normative data alone is necessary, but by no means sufficient, for culturally informed neuropsychological practice (Fujii, Reference Fujii2017).
The current review identified 43 normative studies that met the inclusion criteria. The majority of these studies utilized a convenience sampling method and most performed partial or full demographic adjustment including age, gender, and/or education. Normative samples were mostly collected in mainland China, with a few in Taiwan and Singapore. Approximately two-thirds of the studies focused on individuals over 50 years old, resulting in a relatively small number of normative data involving the full adult age span. The studies covered a wide range of performance-based neuropsychological tools, including brief screeners [e.g., Mini Mental Status Exam (MMSE), An et al., Reference An, Feng, Zhang, Wang, Wang, Tao and Xiao2018; MoCA, Zhai et al., Reference Zhai, Chao, Li, Wang, Xu, Wang and Wang2016], comprehensive batteries (e.g., RBANS, Collinson et al., Reference Collinson, Fang, Lim, Feng and Ng2014; MATRICS Consensus Cognitive Battery, Shi et al., Reference Shi, Kang, Yao, Ma, Li, Liang and Zhang2015), and domain-specific tests (e.g., Auditory Verbal Learning Test, Guo et al., Reference Guo, Sun, Yu, Hong and Lv2007; Boston Naming Test, Guo et al., Reference Guo, Hong, Shi and Lu2006). However, a shortage on verbal-based tools was observed.
Methodologies of Reviewed Studies
We highlighted a few methodological issues noted in the reviewed studies involving sampling procedures, demographic adjustment, and reporting of psychometric properties. The importance of standard sampling procedures for proper normative data development was articulated by Strauss, Sherman,and Spreen (Reference Strauss, Sherman and Spreen2006). However, the majority of the studies from the literature search presented “normative data” in the form of descriptive neuropsychological scores from direct clinical implementation. Randomized sampling was rarely used and the “normative sample” was often confused with clinical comparison data (for a definition, see Mitsushina et al., Reference Mitsushina, Boone and D’Elia1999, p. 9). In our review, we accepted convenience sampling only if normative data were explicitly reported and not limited to one clinical sample. We caution the readers that convenience sampling constitutes a limitation, since the subjects included often have a higher socioeconomic status, compromising the representativeness of the sample. We included information about sampling methods in Table 1, based on which we urge practitioners to consider this variable when selecting their normative data. Relatedly, we considered the level of sampling (i.e., single facility, local, regional, and national) as another methodology parameter. Randomized sampling at a national level was considered a representation of the general population. However, normative data collected within a single city also have benefits as it can more accurately approximate the local population. Use of national versus local normative data should be weighed in light of goodness of fit to the case at hand, and local data are not necessarily inferior to a nationally collected norm.
In addition to proper sampling methods and sampling level, normative data should also approximate the examinee’s key demographics as closely as possible (Mitsushina et al., Reference Mitsushina, Boone and D’Elia1999). Of the 43 studies, 27 provided full or partial demographic adjustment (see Tables 1 and 2). We included age, gender, and education as demographic variables because they were commonly examined in the studies reviewed and found to impact neuropsychological performances. In fact, education was found to be a significant predictor of neuropsychological performances in all reviewed studies whenever examined, and gender and age were found to be significant predictors in the majority of the reviewed studies. Despite the well-established associations, these demographic variables were only adjusted in less than half of the studies reviewed. Additionally, the three demographic variables included in our tables did not constitute a complete list of demographics impacting cognitive performance in Mandarin-speaking individuals. For instance, socioeconomic status was found to account for several cognitive and mental health outcome discrepancies in developing regions including mainland China (Jia et al., Reference Jia, Zhou, Wei, Jia, Wang, Li and Chu2014; Li et al., Reference Li, Hou, Yang, Jian and Wang2019; Xiang et al., Reference Xiang, Su, Liu, Zhang, Li, Hu and Zhang2018) but was not accounted for in any of the studies reviewed. We recommend practitioners to attend to details of demographic impacts, especially when the examinee presents with demographics that deviate from the center of the normative sample distribution. We also recommend future researchers to incorporate demographic adjustments to normative data whenever appropriate. Close examination of how demographic variables interact with each other may provide a more parsimonious adjustment strategy and result in a better balance with the requirements of sample size.
Psychometric properties (e.g., score distribution, reliability, and validity) are fundamental to neuropsychological tests and vary by populations. Unfortunately, we found a general lack of psychometric properties reported in the normative studies reviewed, particularly when tests were not used as diagnostic screeners. With unknown psychometric properties of tests for the intended population, the clinical utility of the collected normative data was limited. We included information about psychometric properties in Table 1. In the case where psychometric properties of a given test were not reported, we strongly recommend practitioners to verify if such information was documented elsewhere before using the test and its normative data. Conversely, we strongly recommend future normative research to include psychometric properties of the tests used or to provide citations if such information were available elsewhere.
Sample Characteristics
It is important to recognize the vast cultural and linguistic variations within Mandarin-speaking regions. Even in regions where Mandarin is the only official language, Mandarin proficiency varies among individuals and is often impacted by factors such as age, prestige of the local dialect, distinctiveness between the local dialect and Mandarin, mobility of people in the local area, and level of formal education. Given that the Mandarin dialect serves as the official language in mainland China and Taiwan, Mandarin proficiency was either screened for or assumed in the normative studies conducted in these areas. In clinical practice, however, Mandarin proficiency cannot be assumed simply based on individuals’ region of origin and providers may encounter individuals who speaks Mandarin only as a second language. Developing testing instruments and normative data for each of the 2000 dialects and subdialects in the Mandarin-speaking regions is likely impractical. One possible approach is to develop testing instruments in “universal” Chinese, where dialect-specific vocabulary and foreign derivatives are minimized. The impact of language of administration should be examined and adjusted for, if applicable. In fact, this approach has been represented in the normative studies performed in Singapore given that Mandarin is one of the four largely distinct official languages there: English, Malay, Chinese, and Tamil. In this context, most of the normative studies performed in Singapore did not provide monolingual norms; instead, tests were administered in the first language identified by the test-taker. Language of test administration was found to have significant correlation with at least some test performances in three studies (Collinson et al., Reference Collinson, Fang, Lim, Feng and Ng2014; Lim et al., Reference Lim, Collinson, Feng and Ng2010; Sahadevan et al., Reference Sahadevan, Tan, Tan and Tan1997). Unfortunately, language adjustment was not performed in these studies, raising concerns for clinical use of these aggregated norms. Taken together, practitioners should always assess test-takers’ language preference and proficiency and use normative data that best approximate the examinee’s linguistic context when possible. Conversely, researchers should consider and reflect the linguistic variation and provide relevant clarification and adjustment.
Beyond linguistic variations, education and socioeconomic status are prominent variables in Mandarin-speaking regions. Although years of education was tested and/or adjusted in the majority of the 43 studies reviewed, a series of studies indicated it was the quality rather than quantity of education that accounted for variations in neuropsychological performance (Manly et al., Reference Manly, Byrd, Touradji, Sanchez and Stern2004; Manly & Echemendia, Reference Manly and Echemendia2007). Education quality can be negatively impacted by socioeconomic status, which has disproportionately impacted the ethnic minority and rural areas of mainland China (Qian & Smyth, Reference Qian and Smyth2008; Yang & Wu, Reference Yang and Wu2009). Specifically, Gupta et al. (Reference Gupta, Vaida, Riggs, Jin, Grant, Cysique and Heaton2011) documented significant effects of urban versus rural residence on domain-specific cognitive performance in a sample collected in mainland China. However, except for one study that focused on the rural population specifically (Yang et al., Reference Yang, Kao, Cheng, Yang, Wang, Yu and Hua2012), proxies of socioeconomic status such as rural versus urban residence and ethnic minority status were not reported or adjusted in any other reviewed studies. This potentially creates a huge underserved population as roughly 9% of the population in mainland China are ethnic minorities and 50% live in rural areas (China Statistics/Beijing Info, 2010). Readers interested in information on the education system and socioeconomic development may consult reliable sources such as the World Bank (https://www.worldbank.org/) and the United Nations Educational, Scientific and Cultural Organisation (http://uis.unesco.org/). We encourage future research to incorporate indicators of socioeconomic status to better represent the diverse population within the Mandarin-speaking regions, and we encourage readers of the current review to be mindful of the representativeness of the normative data used.
The age span of the normative samples included in this review is also noteworthy. Over two-thirds of the studies reviewed included only cohorts over 50 years old. This is likely related to the increased global epidemiology of aging-related neurocognitive disorders (Ferri et al., Reference Ferri, Prince, Brayne, Brodaty, Fratiglioni, Ganguli and Jorm2005). In fact, the majority of the datasets that provided adequate sample sizes to conduct normative studies were cohorts collected for aging research. This is largely consistent with the utility of neuropsychological services in the United States. According to a survey with physicians (Temple, Carvalho, & Tremont, Reference Temple, Carvalho and Tremont2006), the largest number of referrals to neuropsychology was placed by geriatric physicians (24%). However, the survey also revealed a high utility rate of neuropsychological services by medical providers who work with individuals across the lifespan for purposes including diagnosing, documenting cognitive function, and treatment planning. These demands, particularly the latter two, cannot be achieved through nonbehavioral methods (e.g., neuroimaging) or even brief cognitive screeners (e.g., MoCA), and yet they are often instrumental in helping patients with relatively subtle cognitive changes who expect to successfully reintegrate into their community. Therefore, we recommend that more normative studies be performed with individuals across the adult span.
In addition to those residing in Mandarin-speaking regions, Chinese diaspora is another large population of around 40 million worldwide (Li & Li, Reference Li, Li and Tan2013). Evaluations of this population require special consideration of acculturation, which has been repeatedly shown to impact performance on both verbal and nonverbal cognitive tests (Boone et al., Reference Boone, Victor, Wen, Razani and Pontón2007; Harris, Cullum, & Puente, Reference Harris, Cullum and Puente1995; for a systematic review, see Tan, Burgess, & Green, Reference Tan, Burgess and Green2020). Unfortunately, normative studies for the Chinese immigrant population are extremely scarce. Two groups (Dick et al., Reference Dick, Teng, Kempler, Davis, Taussig and Ferraro2002; Wallace, Berry, & Shores, Reference Wallace, Berry and Shores2018) attempted to provide normative data for this particular population. Both groups selected and translated tests that were believed to have relatively low cultural-specific concepts and/or contents. Unfortunately, both studies were excluded from the current review as they render limited information for Mandarin-speaking immigrants. Dick and colleagues’ (Reference Dick, Teng, Kempler, Davis, Taussig and Ferraro2002) study has a small sample size of 71 with no further categorization of various Chinese languages and country/region of origin, and the work by Wallace, Berry and Shores (Reference Wallace, Berry and Shores2018) had a normative group that was primarily Cantonese-speaking. We hope to see continuous research on test and norm development for the immigrant population, acknowledging both their acculturation experience as well as the cultural and linguistic heterogeneity of their country/region of origin. In the meantime, this current review can be a resource to help improve quality of care for those who reside overseas but are more strongly acculturated to their original Mandarin-speaking environment. To aid in the decision-making, subjective report of language usage (e.g., percentage of Mandarin spoken at home, at work, and in social settings), objective assessment of language dominance (e.g., Multilingual Naming Test, Sheng, Lu, and Gollan, Reference Sheng, Lu and Gollan2014), and assessment of acculturation (for a list of example questions, see Fujii, Reference Fujii2017) should be considered.
In sum, given the cultural and linguistic diversity within the Mandarin-speaking regions, it is recommended that practitioners utilize normative data that approximate test-takers’ cultural and linguistic backgrounds as closely as possible. Providers are encouraged to gather information from the patient directly and to consult reliable sources to create a cultural context prior to the evaluation (for a full guidance, see Fujii, Reference Fujii2017).
Neuropsychological Tools
Close examination of the neuropsychological tests covered in these normative studies revealed three important considerations. Firstly, despite the abundance of normative data on confrontational naming (e.g., Boston Naming Test) and semantic verbal fluency (e.g., animal fluency), there is a shortage of language-specific tools. Particularly, we observed a lack of tools available to assess language function as well as general verbal abilities, including measures of premorbid estimation. Given the fundamental psycholinguistic differences between Mandarin and English, direct translation and non-substantial adaption of language tests developed in the English-speaking environments (e.g., phonemic verbal fluency, test of premorbid function) are likely inapplicable and/or inappropriate, highlighting the importance of indigenous test development and advanced understanding of how language performances are (differently or similarly) associated with neural substrates in Mandarin-speaking individuals.
Secondly, only 15 of the 43 included normative studies involved neuropsychological batteries measuring more than 2 cognitive domains. Though the practice of using fixed batteries in neuropsychology has decreased in popularity, many psychologists argue that co-normed batteries are necessary to generate more reliable and valid interpretations of test scores, particularly when properly randomized sampling procedure is not used (Rohling et al., Reference Rohling, Miller, Axelrod, Wall, Lee and Kinikini2015). Co-normed tests can minimize the interference caused by variations of test versions as well as cohort effects of normative data collected in different decades. We encourage future normative studies to consider employing comprehensive batteries, so that providers finding the normative sample group appropriate for their test-takers can perform comprehensive evaluation with higher confidence in the interpretability of the test results.
Lastly, we noticed that some studies employed different versions of the same test (e.g., Victoria version vs. Golden’s version of the Stroop Color-Word Interference Test) without explicit clarifications. For those who use the same version, translation and adaptation were rarely “uniformed,” rendering little comparability between studies. The importance of proper translation for psychological tests has long been acknowledged (Hambleton, Merenda, & Spielberger, Reference Hambleton, Merenda and Spielberger2005; Sue & Sue, Reference Sue and Sue2000), and efforts to make translated materials reasonably accessible has also been encouraged (Fujii, Reference Fujii2017). Among the 127 versions of the 65 tests covered in our review, only a handful had enough information to allow readers with access to the original testing protocols to replicate the translation/adaptation with a reasonable level of accuracy (these tests are denoted in Table 2). Low accessibility of testing materials creates a major barrier for the utility of the normative data collected. We encourage researchers to accurately describe and cite the version of tests used in their study and to provide detailed accounts on steps of translation/adaptation whenever applicable. Additionally, we encourage authors to consider sharing their translation/adaptation output when possible, so that a quality translation/adaptation can be appreciated by more researchers and providers, and hence to increase practice consistency within the field. We encourage providers to carefully review the tests before evaluation and to reach out to the authors for clarification or to procure test materials. If not possible, clinicians may consider translating testing material for their own use as a last resort. Several precautions when employing this method were outlined by Fujii (Reference Fujii2017) and should be read by those who consider engaging in this practice.
Clinical Implications
We present a brief case example here to illustrate how we envision this review can be clinically utilized. Mr. Zhang is a 66-year-old Chinese man residing in a large city in the United States, who is referred by his neurologist for complaints of memory loss. Mr. Zhang grew up in Guangdong province, China, and identifies Cantonese as his first language. However, he received formal education mainly in Mandarin and completed the 8-year medical school (18 to 26 years of age) in Beijing, a Mandarin-speaking region, and has primarily spoken Mandarin since that time. He practiced medicine in Beijing until he moved to the United States 5 years ago following the birth of his grandchild. He has not worked out of the home since moving to the United States and spends most of his time homemaking. He speaks Mandarin with his immediate family and conversational English outside of his home. He also speaks Cantonese occasionally when contacting his family members who still reside in Guangdong province, China. He reports using media platforms (e.g., TV, newspaper, internet) mainly in Mandarin (simplified characters for written Chinese). In this case, although Mr. Zhang’s first language is Cantonese, Mandarin was used throughout his formal education and has been his primary language since college. It is deemed to be the most appropriate language for testing. Given the relatively recent immigration history and low acculturation level, normative data collected in mainland China are believed to most closely approximate Mr. Zhang’s key demographics. Given his age and the referral question, a memory disorder battery is used as a foundation, which includes the following performance-based tests: Boston Naming Test; Brief Visuospatial Memory Test-Revised; Clock Drawing Test; Controlled Oral Word Association Test; RBANS; Stroop Color and Word Test; Trail Making Test; Wechsler Adult Intelligence Scale, Fourth Edition (WAIS-IV), Digit Span, Coding, and Information. Using Table 2, normative data for a 66-year-old individual from mainland China with education stratification (a medical doctorate degree is significantly higher than the country’s average schooling years of around 9, justifying a need for education-stratified norms) is available for the following tests: Boston Naming Test; Clock Drawing Test; RBANS; Stroop Color and Word Test; Trail Making Test. Tests without appropriate normative data are replaced. WAIS-IV Digit Span is replaced with the WAIS-RC Digit Span, which is a direct Mandarin translation of the WAIS-R Digit Span. The WAIS-IV Coding is replaced with the Symbol Digit Modalities Test. The WAIS-IV information is eliminated due to the strong culture-specific contents and an alternative is not accessible. The Brief Visuospatial Memory Test-Revised is replaced with the Rey–Osterrieth Complex Figure Test, because the former test is only validated in a sample of younger adults in mainland China and a sample of older adult in Singapore. The phonemic fluency of the Controlled Oral Word Association Test is eliminated due to major linguistic differences, and the category fluency is retained. Citations for all relevant normative studies can be found in Table 1. These articles should be downloaded prior to seeing Mr. Zhang to ensure the availability of adequate testing material and normative data. To illustrate the clinical implications of appropriate normative data, we provide two examples on how standardized scores may differ using different norms. For instance, a raw score of 13 on the WAIS-R Digit Span test administered in Chinese results in a scaled score of 7 using a norm collected in a Mandarin-speaking population (Wang et al., Reference Wang, Sun, Ma, Wang, Yao and Deng2011), whereas the same raw score results in a significantly higher scaled score of 10 using the Mayo’s normative data for older Americans (Ivnik et al., Reference Ivnik, Malec, Smith, Tangalos, Petersen, Kokmen and Kurland1992). In contrast, a raw score of 33” on the Trail Making Test-A administered in Chinese results in a percentile of 91 using normative data collected in a Mandarin-speaking population (Wang et al., Reference Wang, Sun, Ma, Wang, Yao and Deng2011), whereas the same raw score results in a significantly lower percentile rank of 41–64% using the Mayo’s normative data for older Americans (Steinberg et al., Reference Steinberg, Bieliauskas, Smith and Ivnik2005). It is important to note, however, that the Mandarin normative data (Wang et al., Reference Wang, Sun, Ma, Wang, Yao and Deng2011) has its limitations. Specifically, the psychometric properties of the tests in this normative sample were not reported. Although WAIS-RC has established psychometric properties in the Chinese population (Dai et al., Reference Dai, Gong and Zhong1990), the Trail Making Test does not. In this case, scores from the Trail Making Test using both norms may be calculated but they should be considered tentative or supplemental when integrating with other clinical information.
Future Directions and Limitations
We would like to identify important future considerations for both researchers and practitioners who work with Mandarin-speaking individuals. Firstly, we highlight the need for Mandarin normative studies, particularly those conducted with standardized sampling methods and comprehensive batteries. Given the scope of logistic and financial resources required for proper norm development, we would like to encourage institutional support in this critical effort. Secondly, we urge researchers and practitioners to consider the cultural and linguistic diversity within the Mandarin-speaking regions, particularly less commonly used factors such as socioeconomic status, and to make appropriate clarifications and adjustments in the normative data. Thirdly, we urge researchers to provide information about the psychometric properties of the tests used in the normative studies. Fourthly, due to the scope of the current review, we provided general information but did not elaborate on the test development procedures (e.g., translation vs. adaptation) of these studies. Future research should examine test development for Mandarin-speaking cultures, emphasizing the role of culture on test translation, adaptation, indigenous test development, and test administration. Lastly, we encourage transparent communication between researchers and practitioners to enhance the accessibility of Mandarin testing tools. We also suggest international organizations such as the International Neuropsychological Society and the Asian Neuropsychological Association to help enhance global communications and tests accessibility.
The current review has the following limitations. Firstly, this manuscript does not cover all existing normative data for Mandarin-speaking populations. In particular, tests and normative data from commercial publishers (e.g., the Chinese WAIS-IV; Neuropsychological Measures: Normative Data for Chinese, Second Edition Revised, Lee and Wang, Reference Lee and Wang2010) are excluded from this review due to accessibility reasons but are extremely valuable in research and clinical practice. Secondly, we intend to provide context for this review such as characterization of global Mandarin-speaking populations, Mandarin-speaking cultures and societies, history of neuropsychology in China and Chinese languages, and test development and translation. These summaries are kept relatively brief due to the immediate scope of this review. However, we highlight the importance of creating an appropriate cultural context when working with Mandarin-speaking individuals and urge practitioners to engage in further readings (e.g., Chan, Leung, & Cheung, Reference Chan, Leung, Cheung and Fujii2011; Wong, Reference Wong and Fujii2011) if they are not familiar with the culture. Relatedly, this manuscript tries to summarize the key information from the normative studies and may not include all the information that a provider needs for clinical judgment. Therefore, we ask that the practitioners carefully read the original normative study before using the data. Thirdly, as noted before, testing material of the normative studies are not always readily accessible, which may significantly limit the utility of the normative data. We strongly recommend practitioners who work with Mandarin-speaking individuals to regularly compile relevant resources in advance. Fourthly, the practice of neuropsychology in this review follows a primarily Western approach and does not represent the full range of neuropsychology services and global research. However, we consider these normative data an invaluable resource with the hope of enhancing a range of clinical practice that may not be explicitly covered in this study. Fifthly, we limit our focus to performance-based tests. We emphasize, however, that culturally appropriate neuropsychological practice consists of efforts that extend well beyond the testing process alone, such as proper understanding of the cultural background, reliable collection of clinical information, appropriate data interpretation, and effective communication of findings and recommendations. These elements and their implications are captured in the ECLECTIC framework (Fujii, Reference Fujii2018). Lastly, effects of interpreter-mediated testing are not discussed in this review. Considerations for interpreter-mediated testing are documented elsewhere (e.g., Artiola i Fortuny & Mullaney, Reference Artiola i Fortuny and Mullaney1998; Casas et al., Reference Casas, Guzmán-Vélez, Cardona-Rodriguez, Rodriguez, Quiñones, Izaguirre and Tranel2012) and we encourage practitioners to familiarize themselves with the literature and be mindful when using the normative data presented in this review during interpreter-mediated testing.
References marked with an asterisk indicate studies included in the systematic review
Financial Support
There was no financial support for this work.
CONFLICTS OF INTEREST
None.