A metaphor is a paradigmatic type of figurative language involving discrepancy between the encoded, “literal” meaning of words, and their occasion-specific use (Camp, Reference Camp and Cummings2009; Carston, Reference Carston2010). Metaphors can appear in many forms, such as “Sally is a chameleon” or “Your theory is falling apart.” Accordingly, different accounts of metaphor comprehension have been proposed (see Bowdle & Gentner, Reference Bowdle and Gentner2005; Gibbs, Reference Gibbs2011; Gibbs & Tendahl, Reference Gibbs and Tendahl2006; Gluksberg, Reference Gluksberg2001; Wilson, Reference Wilson2011). Among them, pragmatic accounts (e.g., relevance theory) focus on metaphor in communication, highlighting the inferential mechanisms that lead to adjusting the linguistically encoded concepts to arrive at the speaker’s intended meaning (Sperber & Wilson, Reference Sperber, Wilson, Wilson and Sperber2012). For instance, in “Sally is a chameleon,” the adjustment results in the broadening of the concept CHAMELEON to include not only a species of lizard but also individuals with certain psychological features (Carston, Reference Carston, Allan and Jaszczolt2012). In contrast, cognitive linguistics accounts (e.g., conceptual metaphor theory) emphasize the role of metaphor in thought, seeing it in terms of conceptual mappings across cognitive domains (Gibbs, Reference Gibbs2011; Lakoff & Johnson, Reference Lakoff and Johnson1980). The conceptual mappings emerge in our metaphorical use of language, as in “Your theory is falling apart,” for the mapping of theories onto physical constructs such as buildings (THEORIES ARE BULDINGS).
Regardless of the theoretical approach, there is an agreement that metaphors are a ubiquitous part of language and appear frequently in everyday communication, school-books, academic texts, literature, and media communications (Golden, Reference Golden, Low, Todd, Deignan and Cameron2010; Steen, Dorst, & Hermann, Reference Steen, Dorst and Hermann2010). Hence, difficulty in understanding metaphors may impede social communication, the ability to obtain information, as well as academic attainment.
In individuals with typical development (TD) metaphor comprehension skills mature throughout childhood until adolescence, and it is commonly assumed that the age of 10 represent a crucial moment (Lecce, Ronchi, Del Sette, Bischetti, & Bambini, Reference Lecce, Ronchi, Del Sette, Bischetti and Bambini2019; Winner, Rosenstiel, & Gardner, Reference Winner, Rosenstiel and Gardner1976). There is, however, also awareness that metaphorical competence is evident earlier, if assessed with age-appropriate tasks (Pouscoulous, Reference Pouscoulous2011, Reference Pouscoulous and Matthews2014; Vosniadou, Ortony, Reynolds, & Wilson, Reference Vosniadou, Ortony, Reynolds and Wilson1984). In contrast, profound and lasting difficulty in metaphor comprehension has traditionally been considered characteristic for individuals with autism spectrum disorder (ASD; Adachi et al., Reference Adachi, Koeda, Hirabayashi, Maeoka, Shiota, Wright and Wasa2004; Happé, Reference Happé1993; Rundblad & Annaz, Reference Rundblad and Annaz2010a), a neurodevelopmental condition characterized by impairments in social communication and interaction, as well as restricted and stereotyped behaviors (American Psychiatric Association, 2013). In particular, individuals with ASD have been reported to interpret metaphors literally (Happé, Reference Happé1993), a phenomenon referred to as the “literality bias” or concretism (see Rossetti, Brambilla, & Papagno, Reference Rossetti, Brambilla and Papagno2018, for explanation of these terms).
However, there is a discrepancy in study findings. For example, several studies show no statistically significant difference between ASD and TD groups in figurative language comprehension, including metaphors (Hermann et al., Reference Hermann, Haser, van Elst, Ebert, Müller-Feldmeth, Riedel and Konieczny2013; Kasirer & Mashal, Reference Kasirer and Mashal2014; Mashal & Kasirer, Reference Mashal and Kasirer2011; Norbury, Reference Norbury2005). These findings indicate that variables other than characteristics intrinsic to ASD may explain the variation in results across studies. Group matching strategy and general language ability have previously been found to explain some of the between-study variance in figurative language comprehension (see Kalandadze, Norbury, Nærland, & Næss, Reference Kalandadze, Norbury, Nærland and Næss2018, for a review). Yet, the remaining unexplained variance requires an investigation of additional relevant variables.
In the behavioral and neurological literature in TD and clinical populations, there is an agreement that the ability to understand metaphors hinges on the task properties such as response format (i.e., multiple-choice vs. verbal explanation task), or absence of linguistic context (see Pouscoulous, Reference Pouscoulous2011, Reference Pouscoulous and Matthews2014, for discussion of studies with TD participants, and Rossetti et al., Reference Rossetti, Brambilla and Papagno2018, for discussion of literature on schizophrenia). For instance, children with TD show earlier competence in metaphor comprehension when tested with an act-out rather than a verbal explanation task, perhaps due to the differences in linguistic and cognitive demands that verbal and other types of tasks pose (Pouscoulous, Reference Pouscoulous2011). Similarly, response format could explain how individuals with ASD perform on metaphor tasks. For example, individuals with ASD might understand metaphors comparably to individuals with TD but have more difficulties in explaining the meaning verbally due to difficulties with expressive language (Kwok, Brown, Smyth, & Cardy, Reference Kwok, Brown, Smyth and Cardy2015). The same might be true for other properties of the metaphors (e.g., the amount and type of context available to interpret the expression, or the familiarity of the expression; Pouscoulous, Reference Pouscoulous2011, Reference Pouscoulous and Matthews2014).
Despite this knowledge, the properties of metaphor comprehension assessment tasks in studies that compare individuals with ASD to individuals with TD have yet to be comprehensively and systematically explored. In addition, the potential interrelationships between the task properties and any between-study variation have not been systematically investigated. Reviews that have been conducted focused on ASD and figurative language in general, rather than on metaphor specifically (Gernsbacher & Pripas-Kapit, Reference Gernsbacher and Pripas-Kapit2012; Kalandadze et al., Reference Kalandadze, Norbury, Nærland and Næss2018; Melogno, Pinto, & Levi, Reference Melogno, Pinto and Levi2012; Vulchanova, Saldaña, Chahboun, & Vulchanov, Reference Vulchanova, Saldaña, Chahboun and Vulchanov2015). However, the comprehension of metaphor might differ from the comprehension of other figurative language types in several respects (Vulchanova, Milburn, Vulchanov, & Baggio, Reference Vulchanova, Milburn, Vulchanov and Baggio2019). For example, the comprehension of irony seems to depend on Theory of Mind (i.e., the ability to attribute one’s own mental states and those of others) more than comprehension of a metaphor (Happé, Reference Happé1993). In addition, metonymy is processed faster than metaphor, probably due to the routinization of metonymic shifts (Bambini, Ghio, Moro, & Schumacher, Reference Bambini, Ghio, Moro and Schumacher2013). Moreover, the majority of the existing reviews utilized a narrative approach (Gernsbacher & Pripas-Kapit, Reference Gernsbacher and Pripas-Kapit2012; Melogno, Pinto, et al., Reference Melogno, Pinto and Levi2012; Vulchanova et al., Reference Vulchanova, Saldaña, Chahboun and Vulchanov2015), which differs from our systematic approach in fundamental ways, especially regarding transparency and systematicity of methods used (Borenstein, Hedges, Higgins, & Rothstein, Reference Borenstein, Hedges, Higgins and Rothstein2009).
Here, we provide a novel and thorough systematic review and meta-analysis of the properties of the metaphor tasks used in ASD research. We quantitatively compared performance on metaphor comprehension tasks between groups of individuals with ASD and TD and investigated the potential role of the task properties in between-study variation.
By systematically summarizing and synthesizing the available research in the field fulfilling certain inclusion criteria, our study provides robust results that will ultimately have implications when designing future figurative language/metaphor comprehension research, for advancing assessment practices as well as for guiding the research-based intervention paradigms for individuals with ASD.
The following sections provide an overview of metaphor task properties that have been identified as critical for metaphor comprehension in TD and clinical populations (e.g., Pouscoulous, Reference Pouscoulous2011; Rossetti et al., Reference Rossetti, Brambilla and Papagno2018). These are (a) response format (e.g., multiple-choice, meaningfulness decision, etc.), and (b) linguistic characteristics (metaphor familiarity, syntactic structure of the metaphor, linguistic context, and stimulus modality).
Response format
Evidently, the different ways of eliciting the responses when measuring metaphor comprehension pose diverse cognitive and linguistic demands. For example, earlier studies that tested metaphor comprehension of young children by asking them to explain or paraphrase a metaphor concluded that metaphor comprehension was not fully acquired until later in development (e.g., Winner et al., Reference Winner, Rosenstiel and Gardner1976; see Winner, Reference Winner1988, for an overview; see Pouscoulous, Reference Pouscoulous2011, Reference Pouscoulous and Matthews2014, for discussion). Alternatively, these findings may be explained by other variables such as response format demands (Pouscoulous, Reference Pouscoulous and Matthews2014). For example, metaphor explanation or justification tasks require a participant to articulate associations between metaphor topic and vehicle (e.g., “sister” and “butterfly” in “My sister is a butterfly”). Therefore, performance also depends on metalinguistic judgment as well as expressive language and executive control skills. In addition, verbal explanation tasks require participants to explain the meaning of a metaphor to another person, and are therefore more socially demanding than written or computer-based tasks. Explanation tasks might also trigger the processing of the other person’s reactions indicating whether the message was understood or not, thus engaging social-communication skills. By contrast, multiple-choice tasks do not rely on expressive language or meta-linguistic skills and require minimal social interaction with the examiner. However, multiple-choice tasks might be more costly in terms of need for inhibiting the false alternative(s) and selecting the correct one, as suggested by evidence from patients with brain lesions (Rapp, Felsenheimer, Langohr, & Klupp, Reference Rapp, Felsenheimer, Langohr and Klupp2018). The important role of the response format in metaphor comprehension is also supported by studies explicitly comparing different tasks. For instance, a study by Perlini et al. (Reference Perlini, Bellani, Finos, Lasalvia, Bonetto and Scocco2018) showed that only results from verbal explanation (but not multiple-choice) tasks yielded statistically significant difference between patients in the early phases of psychosis and controls. In addition, Arcara et al. (Reference Arcara, Tonini, Muriago, Mondin, Sgarabottolo, Bertagnoni and Bambini2019) showed that individuals with traumatic brain injury have more difficulties in performing verbal explanation tasks on figurative language (especially proverbs) compared with multiple-choice tasks.
Linguistic characteristics
Here, we present available evidence regarding the role played by different linguistic characteristics of the metaphor: metaphor familiarity, syntactic structure of the metaphor, linguistic context, and stimulus modality.
Metaphor familiarity
Metaphors are often differentiated according to whether they are conventional (i.e., well established and often encountered in a language), or novel (i.e., not familiar, based on creative invention; Bowdle & Gentner, Reference Bowdle and Gentner2005; Rossetti et al., Reference Rossetti, Brambilla and Papagno2018; Varga et al., Reference Varga, Schnell, Tényi, Németh, Simon, Hajnal and Herold2014). For instance, a metaphor like “The sky’s scarf is colored” (Melogno, D’Ardia, Pinto, & Levi, Reference Melogno, D’Ardia, Pinto and Levi2012) is considered novel, while “There is a flood outside the museum” (Rundblad & Annaz, Reference Rundblad and Annaz2010a), where flood refers to “lots of people,” is considered a lexicalized/conventional metaphor. Both behavioral and neuroimaging evidence from different populations suggests different processing patterns for metaphor familiarity modulation, and, in particular, a facilitation for conventional compared to novel metaphors (Bambini, Gentili, Ricciardi, Bertinetto, & Pietrini, Reference Bambini, Gentili, Ricciardi, Bertinetto and Pietrini2011; Blasko & Connine, Reference Blasko and Connine1993; Gluksberg, Gildea, & Bookin, Reference Gluksberg, Gildea and Bookin1982; Lee & Dapretto, Reference Lee and Dapretto2006; Mashal, Faust, Hendler, & Jung-Beeman, Reference Mashal, Faust, Hendler and Jung-Beeman2009; Rapp et al., Reference Rapp, Felsenheimer, Langohr and Klupp2018; Rossetti et al., Reference Rossetti, Brambilla and Papagno2018; Varga et al., Reference Varga, Schnell, Tényi, Németh, Simon, Hajnal and Herold2014). This might be because at least highly conventional metaphors are to be retrieved from the long-term memory where they are stored as learned lexical units, whereas novel metaphors might to a greater degree depend on the pragmatic ability to make context-relevant inferences (see Pouscoulous, Reference Pouscoulous2011, Reference Pouscoulous and Matthews2014; Wilson & Carston, Reference Wilson and Carston2006, for discussions). Conventional metaphors may, therefore, be understood more quickly and with less cognitive effort, whereas the online processing required for novel metaphors could result in longer processing time involving pragmatic ability to a greater extent. Nevertheless, the exact nature of the difference in comprehension of conventional versus novel metaphors is still debated (Cardillo, Watson, Schmidt, Kranjec, & Chatterjee, Reference Cardillo, Watson, Schmidt, Kranjec and Chatterjee2012).
Syntactic structure of the metaphors
Metaphors in the literature and discourse appear in various syntactic structures. For example, nominal metaphors express the metaphoric meaning using a noun (e.g., “Caroline is a princess”; Wilson & Carston, Reference Wilson and Carston2006), predicate metaphors use a verb (e.g., “The rumor flew through the office”; Utsumi & Sakamoto, Reference Utsumi and Sakamoto2011), and adjective metaphors use an adjective (e.g., “sharp tongue”; Kasirer & Mashal, Reference Kasirer and Mashal2014).
The cognitive effort required for the comprehension of the metaphors of different syntactic structure is likely to diverge (Cardillo et al., Reference Cardillo, Watson, Schmidt, Kranjec and Chatterjee2012; Chen, Widick, & Chatterjee, Reference Chen, Widick and Chatterjee2008). For instance, understanding nominal metaphors is suggested to entail either comparison (the assumption that metaphors convey similarities between semantically distinct concepts; Gentner, Bowdle, Wolff, & Boronat, Reference Gentner, Bowdle, Wolff, Boronat, Centner, Holyoak and Kokinov2001), categorization, (the establishment of taxonomic relations between semantically distinct concepts; Gluksberg, Reference Gluksberg2003), or both comparison and categorization (Bowdle & Gentner, Reference Bowdle and Gentner2005). On the contrary, predicate metaphors may be understood through a process of highlighting core abstract conceptual features of a verb (Chen et al., Reference Chen, Widick and Chatterjee2008). Adjective metaphors are also said to be comprehended through categorization (Gluksberg, Reference Gluksberg2001; Glucksberg & Keysar, Reference Glucksberg and Keysar1990) or by a two-stage categorization process (Utsumi & Sakamoto, Reference Utsumi and Sakamoto2007). This variation resulting from the different syntactic structures of metaphors may impact study outcomes.
Linguistic context
Metaphors in real life are usually encountered in sentences and/or discourse. Therefore, presenting metaphors with little or no context creates an artificial situation and may obscure the individual’s ability to interpret a metaphorical expression. A number of studies on figurative language in individuals with TD as well as clinical populations (i.e., schizophrenia) suggest that the presence of a supportive context can significantly facilitate access to nonliteral meaning (Chakrabarty et al., Reference Chakrabarty, Sarkar, Chatterjee, Ghosal, Guha and Deogaonkar2014; Pouscoulous, Reference Pouscoulous2011, Reference Pouscoulous and Matthews2014). In line with this, event-related brain potential studies have shown that, in the earlier phases of processing, higher integration efforts are required for metaphoric expressions presented in minimal context compared to supportive context (Bambini, Bertini, Schaeken, Stella, & Di Russo, Reference Bambini, Bertini, Schaeken, Stella and Di Russo2016).
Stimulus modality
The mode of the metaphor stimuli (i.e., auditory vs. written/visual) may also impact performance. For example, young children are usually tested with auditory tasks where they listen to the verbal metaphors and instructions because of their not-yet-adequate reading ability to complete written tasks or read instructions. However, it is not entirely clear whether and how the stimulus modality impacts metaphor comprehension in older children. In addition, metaphor tasks often incorporate a picture/image component to facilitate comprehension of verbal metaphor (e.g., in Rundblad & Annaz, Reference Rundblad and Annaz2010b). Evidence from brain damaged patients suggests that right-hemisphere damaged patients performed better on a verbal than on a visuoverbal test relative to the control group of participants without brain damage (Rinaldi, Marangolo, & Baldassarri, Reference Rinaldi, Marangolo and Baldassarri2004). This might be explained by a disadvantage in processing visual information or by the challenges associated with cross-modal processing.
In sum, evidence suggests that task properties are essential to performance on metaphor comprehension tasks. This may give rise to different processing strategies in individuals with TD and ASD and affect statistical differences between clinical and control groups. As the task properties are often associated with changes in behavioral and neural response in processing metaphors, psycho- and neurolinguistic studies are increasingly based on extensive ratings of metaphor materials. To this end, norms have been established offering metaphorical expression characterizations along several linguistic dimensions, such as familiarity, interpretability, naturalness, and imageability (e.g., Bambini, Resta, & Grimaldi, Reference Bambini, Resta and Grimaldi2014; Cardillo, Schmidt, Kranjec, & Chatterjee, Reference Cardillo, Schmidt, Kranjec and Chatterjee2010; Cardillo, Watson, & Chatterjee, Reference Cardillo, Watson and Chatterjee2017; Jacobs & Kinder, Reference Jacobs and Kinder2017). These linguistic dimensions, however, are much less established in the literature on metaphor comprehension in ASD.
Metaphor comprehension task properties in studies with participants with ASD
Studies that compare individuals with ASD to individuals with TD on metaphor comprehension have employed a variety of tasks with different properties. For example, both Happé (Reference Happé1993) and Norbury (Reference Norbury2005) employed a sentence completion task where the participants were asked to finish each sentence with a word they could choose from a list. Another type of multiple-choice format was used in the study conducted by Adachi et al. (Reference Adachi, Koeda, Hirabayashi, Maeoka, Shiota, Wright and Wasa2004). They tested metaphor comprehension by metaphoric scenarios where the children were asked to read the questions silently and choose from the four response options (one correct and three incorrect). In their study, Rundblad and Annaz (Reference Rundblad and Annaz2010a) employed a different format, whereby open verbal responses were given in response to short stories that were accompanied by images/pictures to aid comprehension.
These studies yielded distinct results regarding the magnitude of group-level differences in metaphor comprehension between individuals with ASD and controls with TD. In particular, Adachi et al. (Reference Adachi, Koeda, Hirabayashi, Maeoka, Shiota, Wright and Wasa2004), Happé (Reference Happé1993), and Rundblad and Annaz (Reference Rundblad and Annaz2010a) found significantly lower ability to understand metaphors, whereas Norbury (Reference Norbury2005) found no statistically significant difference between language-ability matched groups. Presumably, the open verbal response format used in Rundblad & Annaz’s (Reference Rundblad and Annaz2010a) study could be more challenging for at least some individuals with ASD with impaired metalinguistic, expressive language or executive function-related skills (Bishop & Norbury, Reference Bishop and Norbury2005; Kwok et al., Reference Kwok, Brown, Smyth and Cardy2015; Lewis, Murdoch, & Woodyatt, Reference Lewis, Murdoch and Woodyatt2007; Melogno, Pinto, & Levi, Reference Melogno, Pinto and Levi2015).
Furthermore, including pictures in a metaphor task may also influence performance of individuals with ASD. In individuals with TD, including pictures in a metaphor task can be an advantage because visualization can aid comprehension of verbal metaphors. Using visual support properly, for example pictures accompanying verbal instruction to aid comprehension, is generally also encouraged in work with individuals with ASD (e.g., Dettmer, Simpson, Smith Myles, & Granz, Reference Dettmer, Simpson, Smith Myles and Granz2000; Nelson, McDonnell, Johnston, Crompton, & Nelson, Reference Nelson, McDonnell, Johnston, Crompton and Nelson2007; Rao & Gagie, Reference Rao and Gagie2006). There is evidence from a priming study of probable benefit of using pictures over words to access meaning in ASD (Kamio & Toichi, Reference Kamio and Toichi2000). Nevertheless, it should be noted that a task presented in two modalities may be more challenging for individuals with ASD as they may struggle to switch between visual and auditory information. This can be hypothesized on the basis of studies such as Reed and McCarthy (Reference Reed and McCarthy2011), where individuals with ASD, compared with participants with TD, showed greater difficulty when different modalities were employed than when only one modality was required. However, the individual needs vary (Rao & Gagie, Reference Rao and Gagie2006), resulting in some individuals with ASD benefiting most from picture support, while others from the written support.
Certain task properties might be more suitable than others for individuals across the spectrum, given the cognitive and linguistic strengths (i.e., unimpaired rote memory or interest in details) and differences or challenges (i.e., executive functions) often observed in this population. For example, with regard to the metaphor familiarity, individuals with ASD might have more difficulties than individuals with TD in understanding novel metaphors because comprehension of novel metaphors involves pragmatic operations to a greater degree than conventional ones (Pouscoulous, Reference Pouscoulous2011). In particular, by being innovative and occasion-specific, novel metaphors rely on pragmatic inference involving context-specific meaning adjustments (Recanati, Reference Recanati2004; Sperber & Wilson, Reference Sperber, Wilson, Wilson and Sperber2012; Wilson & Carston, Reference Wilson and Carston2006), while conventional metaphors should depend less on inferencing and more on lexical knowledge (Pouscoulous, Reference Pouscoulous and Matthews2014). Nevertheless, because they are likely to be stored in the lexicon and thus linked to vocabulary knowledge, conventional metaphors might also pose problems for individuals on the spectrum (Pouscoulous, Reference Pouscoulous2011). Individuals with ASD have often been shown to have compromised or biased vocabulary (Tager-Flusberg, Reference Tager-Flusberg1992; Tager-Flusberg et al., Reference Tager-Flusberg, Calkings, Nolin, Baumberger, Anderson and Chadwick-Dias1990). As vocabulary knowledge is closely related to metaphor comprehension in individuals with TD (Nippold, Reference Nippold2016), compromised vocabulary knowledge might be linked to difficulties in metaphor comprehension in individuals with ASD with poorer vocabulary.
Some examples of the different task properties employed in the ASD literature on metaphor comprehension are provided in Table 1. The substantial variability in the assessment tasks employed may account for differences in the results of the studies, making it critical to inspect the properties of these tasks. This issue has been highlighted in a few narrative reviews. For instance, Melogno, D’ardia, et al. (Reference Melogno, D’Ardia, Pinto and Levi2012) stressed the heterogeneity of the tasks requiring diverse comprehension skills as the main difficulty in assessing the contribution of different tasks/variables, and they emphasized the urgent need of a careful review of the literature. Likewise, a more recent review by Siqueira, Marques, and Gibbs (Reference Siqueira, Marques and Gibbs2016) claimed that contrasting findings across studies of figurative language (including metaphors) in different clinical populations (including ASD) may be related more to issues related to data collection than to a specific difficulty one population may have in understanding a certain type of figurative language.
Note: Original items extracted from the studies are enclosed within single quotation marks with metaphoric vehicles italicized. The instruction directly cited is enclosed within double quotation marks.
* The expression “Now I view it” is used as a novel version of the conceptual mapping KNOWING IS SEEING, as opposed to the lexicalized version “Now I see it.”
The current study: objectives and research questions
The overarching aim of this study was to advance the knowledge and awareness of the impact of task properties on metaphor comprehension performance in individuals with ASD compared to individuals with TD. We aimed to accumulate the existing knowledge by synthesizing the earlier research using the methods of systematic review and a meta-analysis.
The present study (a) explored the properties of the metaphor tasks used in ASD research; (b) investigated the group difference between individuals with ASD and TD on metaphor comprehension, as well as the relationship between the task properties and any between-study variation. We anticipated larger between-study differences in studies employing verbal explanation formats than studies using alternative response formats (e.g., multiple-choice response format).
Method
This study was preregistered in the International Register of Systematic Reviews, PROSPERO, with the registration number CRD42017057231 (available from http://www.crd.york.ac.uk/PROSPERO/display_record.php?ID=CRD42017057231). A dual approach was utilized: a systematic review and a meta-analysis. Preferred Reporting Items for Systematic Reviews and Meta-Analysis (PRISMA) statement (http://www.prisma-statement.org/) was consulted to ensure methodological rigor.
We systematically reviewed the included studies in terms of metaphor task properties (response format and linguistic characteristics). We then undertook a meta-analysis to compare individuals with ASD to individuals with TD on metaphor comprehension, as well as to examine the relationship between response format and any between-study variation.
Data collection, study inclusion, and coding
A systematic literature search was initially conducted on April 14, 2016, and was updated on April 4, 2017. The words for the literature search and the search strategies were selected after discussions in the authors’ team and in close collaboration with two librarians at the University of Oslo library with expertise in literature searching. The librarians’ responsibility was to ensure that the right search strategies were used and adapted correctly to the different databases. The following electronic databases were searched: Psychinfo, Linguistics and Language Behavior Abstracts (LLBA), Eric, Embase, Norart, Medline, Web of science. The following terms were used as keywords: ASD OR asperger* OR autis* OR “pervasive developmental disorder” combined with allegor* OR analogy OR analogies OR “figure* of speech” OR “figurative language” OR imagery OR imageries OR metaphor* OR simile*. No restrictions in terms of the publication year were applied.
In addition to the searches in the databases, the key terms (ASD and metaphor comprehension; Asperger and metaphor comprehension) were applied to Google scholar to identify any gray literature (literature that are not published in scientific journals, e.g., working papers, conference proceedings) to minimize potential publication bias in the meta-analysis. This step is important because studies with significant results and large effect sizes are more easily published than studies that report nonsignificant findings or small effect sizes (Borenstein, et al., Reference Borenstein, Hedges, Higgins and Rothstein2009). In addition, we manually searched the tables of contents of the following key journals: Journal of Autism and Developmental Disorders and Autism. Finally, we went through the reference lists of the included articles and book chapters.
To be included in both the systematic review and the meta-analysis, articles were required to meet the following predetermined criteria: (a) the studies had to report on metaphor comprehension separately (when results on metaphor comprehension were part of the results on one global figurative language variable the study was excluded); (b) only participants with ASD were included. Of note, although we consistently use the term “ASD” according to the DSM-5 (American Psychiatric Association, 2013), we expected that diagnoses in the included studies would be based on the Diagnostic and Statistical Manual of Mental Disorders (DSM-IV or DSM-IV-TR; American Psychiatric Association, 1994, 2000), or International Classification of Diseases (ICD-10; World Health Organization, 1992) criteria, which prevailed at the time the studies was conducted. Thus, participants might have been diagnosed with autistic disorder, Asperger’s syndrome/disorder, or pervasive developmental disorder—not otherwise specified; (c) only the studies involving participants with primary diagnosis of ASD (without any co-occurring conditions) were included to avoid the influence of other conditions on the outcome; (d) study design had to compare individuals with ASD to individuals with TD (the groups could either be equated for chronological age [CA], CA and other variables including verbal abilities, or verbal abilities only). No CA restrictions were applied because metaphor comprehension difficulties in ASD are also found in adults with ASD (Happé, Reference Happé1993); (e) studies had to report data necessary to calculate effect sizes such as mean and standard deviation or p values as well as information and/or examples about the metaphor stimuli that were used; (f) studies could be reported in English, Norwegian, Italian, Russian, Swedish, or Danish because at least one of the authors is competent in each of these languages. By including several languages, we aimed to avoid the language bias often observed in systematic reviews, which is characterized by overrepresentation of English studies (Borenstein et al., Reference Borenstein, Hedges, Higgins and Rothstein2009). Titles and abstracts obtained from the search were screened for relevance based on the predetermined inclusion criteria by the first author. In case of insufficient information to decide the relevance on the study in the title and abstract, the full-text was reviewed. Finally, 14 studies met the inclusion criteria. For further information on the screening process and a summary of the reasons that studies were excluded see Figure 1.
We coded the following study characteristics: author(s), publication year, diagnostic status, comparison group, CA of the participants (mean and standard deviations), sample sizes in each group, and means and standard deviations or p values for measures of metaphor comprehension. The following information about the task properties was coded: response format, metaphor familiarity, syntactic structure of the metaphor, linguistic context, and stimulus modality.
Several considerations were made when extracting the means, standard deviations, or p values for calculating effect sizes in the meta-analysis. First, for the studies with multiple data collection points (e.g., intervention studies), only data from the first time point was coded. This was to ensure the results were not influenced by any intervention effects. Second, to avoid estimate dependency, the data from the largest sample was extracted when overlapping samples existed (Borenstein et al., Reference Borenstein, Hedges, Higgins and Rothstein2009). Third, to avoid the problems with assigning more weight to studies with more outcome variables (Borenstein et al., Reference Borenstein, Hedges, Higgins and Rothstein2009), we calculated a composite score of multiple outcomes (e.g., novel and conventional metaphors) within each of the studies. The composite score is the mean effect size, with a variance that considers the correlation among the different outcomes. Thus, every study including multiple outcomes was represented by one score, which was used as the unit of analysis.
As predetermined, the authors initially discussed the coding procedure, then the first and the second authors double-coded the data from 10 randomly selected papers and discussed the coding of the remaining 4 papers. The interrater agreements for the coded variables in the 10 randomly selected papers were as follows: 100% for author, publication year, ASD and comparison group, age of the participants, sample size in each group, metaphor familiarity, syntactic structure of the metaphor, and linguistic context; 97% for response format and stimulus modality; 93.10% for the metaphor comprehension measures (mean with SDs and p values). Of note, a divergence on the metaphor comprehension measures emerged with regard to the study by Kasirer and Mashal (Reference Kasirer and Mashal2014). The divergence was due to the inverted values for the ASD and TD groups reported on the table in the original article. The last author of the original paper has confirmed the typo in email correspondence. The correct values were used for calculating the effect sizes. The other disagreements between the raters were resolved by discussion and/or by consulting the original papers.
The procedure of systematically reviewing the task properties
A comprehensive coding scheme was developed for the scrutiny of the relevant data from the included studies. Data on metaphor properties were analyzed in detail for response format and linguistic characteristics (metaphor familiarity, syntactic structure, linguistic context, and stimulus modality). The exact number of studies reporting on each of these properties was identified. The findings of the studies that experimentally examined a property of interest are presented in the Results section descriptively. Lack of taking into account the properties was also considered a noteworthy finding. If the studies did not report task properties, we tried to obtain the necessary information by locating a description of the task from previous studies through searching Google web by the task name.
Meta-analytical procedure
Statistical analyses were conducted using the Comprehensive Meta-Analysis software Version 3 (Borenstein, Hedges, Higgins, & Rothstein, Reference Borenstein, Hedges, Higgins and Rothstein2014). Because of the importance of adjusting a meta-analysis to the studies examined (Borenstein et al., Reference Borenstein, Hedges, Higgins and Rothstein2009), we made some considerations for effect size computations. In particular, we used the Hedges’ formula for standardized mean difference with a confidence interval (CI) of 95% to report effect sizes. Hedges’ g was selected because it is corrected for sample sizes (Hedges, Reference Hedges1981) and studies on metaphor comprehension in ASD often include small samples. A positive Hedges’ g value indicated that individuals with ASD had the higher group mean; a negative Hedges’ g value indicated that the groups differed in favor of TD group. A 95% CI was calculated for each effect size to indicate whether it was significantly greater than zero. The effect is statistically significant if the CI does not cross zero. The effect sizes were interpreted based on Cohen’s (Reference Cohen1988) benchmarks, with effect d ≤ 0.2 reflecting a small effect, d ≤ 0.5 considered medium effect, and d ≤ 0.8 indicating a large effect. However, these values are relative and somewhat arbitrary both to each other and to the specific study and research method employed (Cohen, Reference Cohen1988; Thompson, Reference Thompson2007). Therefore, interpreting these guidelines in relation to the clinical consequences that the effect size may have (Lakens, Reference Lakens2013) is important to avoid misleading suggestions to the practice. Hence, reporting the effect sizes in the Results section of this paper is complemented by a descriptive review.
Effect sizes across studies were averaged using a random-effects model, which does not assume that all studies in the meta-analysis share a common true effect size (Borenstein et al., Reference Borenstein, Hedges, Higgins and Rothstein2009).
To visualize the distribution of effect sizes and CIs, and to detect possible outliers, a forest plot was used. We also performed sensitivity analysis to determine the impact of potential outliers. Sensitivity analysis makes it possible to estimate the adjusted overall effect size after removing studies one by one when extreme effect sizes are detected.
Heterogeneity
We used the Q test of homogeneity (Hedges & Olkin, Reference Hedges and Olkin1985) to examine the heterogeneity in effect sizes. The Q statistic with its p value in a random effect model is a test of significance and reflects whether the variance is significantly different from zero. In addition, we used I², which reflects the extent of overlap of confidence intervals and is considered a measure of inconsistency.
Publication bias
Despite our efforts to identify gray literature, low-effect or nonsignificant studies could still be missing from the meta-analysis. To detect and statistically estimate the potential retrieval bias, we examined a funnel plot, in which a sample-size dependent statistic is plotted on the y-axis and the effect size is plotted on the x-axis. In the absence of publication bias, this plot should form a symmetrical funnel (Cooper, Hedges, & Valentine, Reference Cooper, Hedges and Valentine2009). However, the funnel plot can be difficult to interpret visually when using a random effects model (Lau, Ioannidis, Terrin, Schmid, & Olkin, Reference Lau, Ioannidis, Terrin, Schmid and Olkin2006). Therefore, in addition, a “Trim and Fill” analysis (Duval & Tweedie, Reference Duval and Tweedie2000) was applied. In the eventual presence of publication bias, the “Trim and Fill” analysis would be used to impute values in the funnel plot to make it symmetrical, and an adjusted overall mean effect size would be calculated.
Results
The results from the literature search are reported, followed by the description of results from the systematic review of the task properties. Finally, we present the results from the meta-analysis.
Results from the literature search
The electronic search yielded 1,219 references. In addition, one study was identified through searching in the references. All hits were screened and 14 studies (13 published papers and 1 conference proceeding that met the inclusion criteria were included in the systematic review and meta-analysis. Information on the screening process and the reasons for study exclusion are reported in Figure 1.
Results from the systematic review of metaphor task properties
The detailed description of the task properties of the included studies is presented in Table 2.
Response format
The answers across the tasks were elicited by the following response formats: verbal explanation or justification, where participants were asked to explain the meaning of the expression (n = 2; Melogno, D’Ardia, et al., Reference Melogno, D’Ardia, Pinto and Levi2012; Rundblad & Annaz, Reference Rundblad and Annaz2010a); multiple-choice, where participants had to choose the correct answer among a series of 3, 4, or 5 options (n = 7; Adachi et al., Reference Adachi, Koeda, Hirabayashi, Maeoka, Shiota, Wright and Wasa2004; Huang, Oi, & Taguchi, Reference Huang, Oi and Taguchi2015; Kasirer & Mashal, Reference Kasirer and Mashal2014, Reference Kasirer and Mashal2016; Mashal & Kasirer, Reference Mashal and Kasirer2011; Olofson et al., Reference Olofson, Casey, Oluyedun, van Herwegen, Becerra and Rundblad2014; Zheng, Jia, & Liang, Reference Zheng, Jia and Liang2015); meaningfulness decision, where participants were asked to decide whether the expression makes sense or not (yes/no; n = 4; Chouinard & Cummine, Reference Chouinard and Cummine2016; Gold & Faust, Reference Gold and Faust2010; Gunter, Ghaziuddin, & Ellis, Reference Gunter, Ghaziuddin and Ellis2002; Hermann et al., Reference Hermann, Haser, van Elst, Ebert, Müller-Feldmeth, Riedel and Konieczny2013). Two studies (Gunter et al., Reference Gunter, Ghaziuddin and Ellis2002; de Villiers et al., Reference de Villiers, de Villiers, Diaz, Cheung, Alig and Raditz2011) combined multiple-choice or meaningfulness decision and verbal explanation/justification formats. De Villiers et al. (Reference de Villiers, de Villiers, Diaz, Cheung, Alig and Raditz2011) used multiple-choice picture modality followed by the question requiring verbal explanation. Metaphor explanation responses were reported in the results. However, the scoring strategy is not explained in their paper and, therefore, it is not clear whether the responses from the multiple-choice task have also been merged in the reported results. Gunter et al. (Reference Gunter, Ghaziuddin and Ellis2002) used three tasks (multiple-choice combined with verbal explanation and meaningfulness decision task requiring to decide whether metaphors were plausible or not). However, the tasks were not described in detail in the paper, so we obtained the necessary information about the task properties by searching previous studies that employed the same tasks (Bottini et al., Reference Bottini, Corcoran, Sterzi, Paulesu, Schenone, Scarpa and Frith1994; Jodzio, Lojek, & Bryan, Reference Jodzio, Lojek and Bryan2005). Furthermore, Gunter et al. (Reference Gunter, Ghaziuddin and Ellis2002) did not explain how the answers were scored and how the results obtained from the multiple-choice and verbal explanation tasks were presented in relation to each other.
None of the included studies manipulated response format in order to investigate its impact on performance.
Metaphor familiarity
Most studies employed tasks that included novel as well as conventional metaphors (n = 7; Gold & Faust, Reference Gold and Faust2010; Gunter et al., Reference Gunter, Ghaziuddin and Ellis2002; Kasirer & Mashal, Reference Kasirer and Mashal2014, Reference Kasirer and Mashal2016; Mashal & Kasirer, Reference Mashal and Kasirer2011; Olofson et al., Reference Olofson, Casey, Oluyedun, van Herwegen, Becerra and Rundblad2014; Zheng et al., Reference Zheng, Jia and Liang2015), while others included only novel (n = 2; Hermann et al., Reference Hermann, Haser, van Elst, Ebert, Müller-Feldmeth, Riedel and Konieczny2013; Melogno, D’Ardia, et al., Reference Melogno, D’Ardia, Pinto and Levi2012) or only conventional metaphors (n = 1; Rundblad & Annaz, Reference Rundblad and Annaz2010a). Four studies (Adachi et al., Reference Adachi, Koeda, Hirabayashi, Maeoka, Shiota, Wright and Wasa2004; Chouinard & Cummine, Reference Chouinard and Cummine2016; Huang et al., Reference Huang, Oi and Taguchi2015; de Villiers et al., Reference de Villiers, de Villiers, Diaz, Cheung, Alig and Raditz2011) did not specify metaphor familiarity.
Based on the results of the included studies, the impact of familiarity varied across studies, with some studies reporting group differences for conventional, but not for novel, metaphors (Kasirer & Mashal, Reference Kasirer and Mashal2016; Mashal & Kasirer, Reference Mashal and Kasirer2011), while others reported no group differences based on familiarity (Kasirer & Mashal, Reference Kasirer and Mashal2014). For example, some studies found that individuals with ASD could interpret both conventional metaphors (e.g., “Susan is a warm person”) and novel metaphors (e.g., “Susan is a toasty person”; Olofson et al., Reference Olofson, Casey, Oluyedun, van Herwegen, Becerra and Rundblad2014), and others found that novel metaphors were more difficult for individuals with ASD than conventional metaphors, yet this was also the case for individuals with TD (Gold & Faust, Reference Gold and Faust2010; Zheng et al., Reference Zheng, Jia and Liang2015).
Syntactic structure
Based on those studies that provided information about syntactic structure or examples of metaphor items, the tasks varied greatly according to this variable as well. Six studies (Adachi et al., Reference Adachi, Koeda, Hirabayashi, Maeoka, Shiota, Wright and Wasa2004; Chouinard & Cummine, Reference Chouinard and Cummine2016; Gunter et al., Reference Gunter, Ghaziuddin and Ellis2002; Hermann et al., Reference Hermann, Haser, van Elst, Ebert, Müller-Feldmeth, Riedel and Konieczny2013; Huang et al., Reference Huang, Oi and Taguchi2015; Zheng et al., Reference Zheng, Jia and Liang2015) involved (mostly) nominal or mixed syntactic structure. Five studies (de Villiers et al., Reference de Villiers, de Villiers, Diaz, Cheung, Alig and Raditz2011; Gold & Faust, Reference Gold and Faust2010; Kasirer & Mashal, Reference Kasirer and Mashal2014, Reference Kasirer and Mashal2016; Mashal & Kasirer, Reference Mashal and Kasirer2011) involved word pairs (noun–adjective pairs). Note that word-pair metaphors in de Villiers et al. (Reference de Villiers, de Villiers, Diaz, Cheung, Alig and Raditz2011) were incorporated in interrogative sentence (“Which one is the blind house?”), while other studies did not embed word-pair metaphors in any context. Syntactic structure for conventional metaphors in Gunter et al. (Reference Gunter, Ghaziuddin and Ellis2002) was not specified. Melogno, D’Ardia, et al. (Reference Melogno, D’Ardia, Pinto and Levi2012) and Rundblad and Annaz (Reference Rundblad and Annaz2010a) included sentences. Olofson et al. (Reference Olofson, Casey, Oluyedun, van Herwegen, Becerra and Rundblad2014) also included sentences with either verbs (predicate metaphors) or adjectives and was the only included study that explicitly focused on conceptual metaphors. Of note, the syntactic structure might be linked to different theoretical accounts of metaphor. For example, pragmatics-oriented scholars mostly consider “X is Y” expressions, while the literature in cognitive linguistics focuses on the multiplicity of linguistic structures that might reflect underlying conceptual metaphors and considers metaphorically used verbs or longer expressions. However, this kind of theory-driven distinction has not been considered in the literature on ASD.
Overall, because some studies failed to provide information on syntactic structure, and several papers included only a few examples of metaphors without indicating whether the metaphor task was consistent in terms of the syntactic structure, the exact number of studies using any specific syntactic structure is impossible to report. Moreover, it is important to note that there might have been inconsistent items in the data sets. For instance, Gunter et al. (Reference Gunter, Ghaziuddin and Ellis2002) adopted novel (or unusual, as they are call them in the paper) metaphors from Bottini et al. (Reference Bottini, Corcoran, Sterzi, Paulesu, Schenone, Scarpa and Frith1994), which were mostly nominal (X is Y). Following our methodological choice of basing the review on what is reported by the authors in the paper, we made a judgment based on this information and classified the items used in this study as nominal. However, we are aware that at least some metaphor items are not nominal (see the metaphor examples provided by Bottini et al., Reference Bottini, Corcoran, Sterzi, Paulesu, Schenone, Scarpa and Frith1994). Similarly, Adachi et al. (Reference Adachi, Koeda, Hirabayashi, Maeoka, Shiota, Wright and Wasa2004) used metaphors with mixed structures. Huang et al. (Reference Huang, Oi and Taguchi2015) translated the same stimuli used by Adachi et al. (Reference Adachi, Koeda, Hirabayashi, Maeoka, Shiota, Wright and Wasa2004) from Japanese into Taiwanese. One of the example items in both the Adachi et al. (Reference Adachi, Koeda, Hirabayashi, Maeoka, Shiota, Wright and Wasa2004) and Huang et al. (Reference Huang, Oi and Taguchi2015) studies is however translated into English as a simile. Although metaphors and similes are different figurative types and are understood differently (Happé, Reference Happé1993), we decided to maintain these studies in the analysis both to be consistent with our methodological approach (basing the review on what was reported by the authors) and because the other example items in Adachi et al. (Reference Adachi, Koeda, Hirabayashi, Maeoka, Shiota, Wright and Wasa2004) were metaphors.
For all the above reasons, and also because none of the included studies manipulated syntactic structure, the impact that variation in this linguistic variable might have on the group differences in metaphor comprehension is not clear.
Linguistic context
The type of context across the studies varied from none or minimal context (word pairs or sentence-level, n = 8; Chouinard & Cummine, Reference Chouinard and Cummine2016; Gold & Faust, Reference Gold and Faust2010; Gunter et al., Reference Gunter, Ghaziuddin and Ellis2002; Hermann et al., Reference Hermann, Haser, van Elst, Ebert, Müller-Feldmeth, Riedel and Konieczny2013; Kasirer & Mashal, Reference Kasirer and Mashal2014, Reference Kasirer and Mashal2016; Mashal & Kasirer, Reference Mashal and Kasirer2011; Melogno, D’Ardia, et al., Reference Melogno, D’Ardia, Pinto and Levi2012) to scenarios or short stories with or without accompanying pictures (n = 6; Adachi et al., Reference Adachi, Koeda, Hirabayashi, Maeoka, Shiota, Wright and Wasa2004; de Villiers et al., Reference de Villiers, de Villiers, Diaz, Cheung, Alig and Raditz2011; Huang et al., Reference Huang, Oi and Taguchi2015; Olofson et al., Reference Olofson, Casey, Oluyedun, van Herwegen, Becerra and Rundblad2014; Rundblad & Annaz, Reference Rundblad and Annaz2010a; Zheng et al., Reference Zheng, Jia and Liang2015). The task employed by Melogno, D’Ardia, et al. (Reference Melogno, D’Ardia, Pinto and Levi2012) involved metaphors presented both in decontextualized sentences and in short story context. However, no results relating to the influence of the context are reported in that study. Other studies did not manipulate the context experimentally. Thus, no results regarding the impact of linguistic context on group differences in metaphor comprehension can be reported in this review.
Stimulus modality
Five studies (Adachi et al., Reference Adachi, Koeda, Hirabayashi, Maeoka, Shiota, Wright and Wasa2004; Gold & Faust, Reference Gold and Faust2010; Hermann et al., Reference Hermann, Haser, van Elst, Ebert, Müller-Feldmeth, Riedel and Konieczny2013; Huang et al., Reference Huang, Oi and Taguchi2015; Zheng et al., Reference Zheng, Jia and Liang2015) presented the stimuli in written modality. Four studies (Gunter et al., Reference Gunter, Ghaziuddin and Ellis2002, for the conventional metaphor task only; Chouinard & Cummine, Reference Chouinard and Cummine2016; Olofson et al., Reference Olofson, Casey, Oluyedun, van Herwegen, Becerra and Rundblad2014; Rundblad & Annaz, Reference Rundblad and Annaz2010a) delivered metaphor comprehension aurally. Computer-based tasks were administered either aurally (Olofson et al., Reference Olofson, Casey, Oluyedun, van Herwegen, Becerra and Rundblad2014) or in written form (Gold & Faust, Reference Gold and Faust2010; Hermann et al., Reference Hermann, Haser, van Elst, Ebert, Müller-Feldmeth, Riedel and Konieczny2013). Five studies (Gunter et al., Reference Gunter, Ghaziuddin and Ellis2002; Kasirer & Mashal, Reference Kasirer and Mashal2014, Reference Kasirer and Mashal2016; Mashal & Kasirer, Reference Mashal and Kasirer2011; Melogno, D’Ardia, et al., Reference Melogno, D’Ardia, Pinto and Levi2012) did not specify the modality. Gunter et al. (Reference Gunter, Ghaziuddin and Ellis2002) did not report information about the stimulus modality, but we could identify the modality (for conventional metaphors only) in the previous study (Jodzio et al., Reference Jodzio, Lojek and Bryan2005). Stimulus modality is not specified in Melogno, D’Ardia, et al. (Reference Melogno, D’Ardia, Pinto and Levi2012). De Villiers et al. (Reference de Villiers, de Villiers, Diaz, Cheung, Alig and Raditz2011) employed stimuli with pictures, but without any indication whether participants were asked to read the metaphors or whether the questions were asked aurally. Three additional studies included stimulus material with pictures (Olofson et al., Reference Olofson, Casey, Oluyedun, van Herwegen, Becerra and Rundblad2014; Rundblad & Annaz, Reference Rundblad and Annaz2010a; Zheng et al., Reference Zheng, Jia and Liang2015).
As a final remark, the only study that reported that they used a test validated for the age group of the participants was of Melogno, D’Ardia, et al. (Reference Melogno, D’Ardia, Pinto and Levi2012).
Metaphor comprehension in individuals with ASD and TD controls: A meta-analysis
Fourteen independent effect sizes, involving 336 individuals with ASD (mean sample size = 24, SD = 15.01, range 8–54) and 498 individuals with TD (mean sample size = 35.57, SD = 48.47, range 8–199), examined the differences in metaphor comprehension between the two groups. The standardized mean effect size was moderate, g = −0.63, 95% CI [−0.80, −0.46], p < .001, in favor of individuals with TD. This indicates that individuals with ASD on average have more difficulties in metaphor comprehension compared to individuals with TD.
The heterogeneity between studies was not significant, Q (13) = 16.50, p = .22, and 21.20% of true variability (I 2) could be explained by individual study characteristics. Higgins, Thompson, Deeks, and Altman (Reference Higgins, Thompson, Deeks and Altman2003) provide some rough benchmarks for I², which refer to the question of what proportion of the observed variation is real. They suggest considering values below 25% as low.
Sensitivity analysis showed that the overall effect size ranged from g = −0.66, 95% CI [−0.85, −0.46], to g = −0.57, 95% CI [−0.71, −0.42]. The funnel plot showed symmetrical distribution indicating no publication bias. No studies were imputed in a Trim and Fill analysis indicating again that no publication bias was detected. The forest plot (Figure 2) shows the group differences and CIs between individuals with ASD and TD in terms of the metaphor comprehension.
Impact of response format on between-study variance
We intended to examine the response format as a potential moderator of between-study variation. However, due to the limited number of studies on each response format category (e.g., only two studies on verbal explanation format), a meta-regression or a subgroup analysis (which may be considered as a special case of meta-regression; Fu et al., Reference Fu, Gartlehner, Grant, Shamliyan, Sedrakyan, Wilt and Trikalinos2011) would yield nonreliable results because of low statistical power. Specifically, it is recommended that for a categorical subgroup variable (response format in our case), each subgroup should include a minimum of four studies (Fu et al., Reference Fu, Gartlehner, Grant, Shamliyan, Sedrakyan, Wilt and Trikalinos2011). Therefore, we qualitatively report the observed effect sizes with CIs to identify the patterns of possible relationships between response format and the heterogeneity between studies. Although not aggregated, the descriptively reported effect sizes can still guide interpretation of results and inform future studies.
Among the four types of response format identified in the included studies, the two studies that required verbal explanations showed moderate to large effect sizes (Melogno, D’Ardia, et al., Reference Melogno, D’Ardia, Pinto and Levi2012; Rundblad & Annaz, Reference Rundblad and Annaz2010a). One of these studies (Rundblad & Annaz, Reference Rundblad and Annaz2010a) generated the largest effect size from the included studies: g = −2.20, 95% CI [−3.14, 1.27]. This study employed an open verbal explanation task in which the short stories were accompanied with simple, hand-drawn pictures (hence, two modalities were involved). The experimenter read each story while presenting the child with one simple picture showing one story character. The child was asked to report what that character saw. In the other study that used verbal explanation response format (Melogno, D’Ardia, et al., Reference Melogno, D’Ardia, Pinto and Levi2012), the yielded effect size was moderate, g = −0.62, 95% CI [−1.18, −0.04]. This study assessed metaphor comprehension using the Junior Metaphor Comprehension Test, a validated tool for use with a pediatric population (Pinto, Melogno, & Iliceto, Reference Pinto, Melogno and Iliceto2008).
Large group differences were found in the two studies that combined verbal explanation with other response formats. De Villiers et al. (Reference de Villiers, de Villiers, Diaz, Cheung, Alig and Raditz2011) combined verbal justification/explanation and picture multiple-choice response formats in the same task and yielded large effect size: g = −0.84, 95% CI [−1.41, −0.27]. In Gunter et al. (Reference Gunter, Ghaziuddin and Ellis2002) the combined effect size for the three tasks used (multiple-choice combined with verbal explanation and meaningfulness decision) was large: g = −1.14, 95% CI [−2.17, 0.11]. Two caveats related to this study must be mentioned: first, this study included a very small sample (n = 8), and second, the stimulus material in the meaningfulness task involved linguistically complex language (i.e., “The politician who didn’t give straight answers was jumping ditches”; “The meaning of life is an itch you can’t scratch”; or “The old man had a head full of dead leaves”; see Bottini et al., Reference Bottini, Corcoran, Sterzi, Paulesu, Schenone, Scarpa and Frith1994, for more examples).
For the seven studies that employed multiple-choice approach only (Adachi et al., Reference Adachi, Koeda, Hirabayashi, Maeoka, Shiota, Wright and Wasa2004; Huang et al., Reference Huang, Oi and Taguchi2015; Kasirer & Mashal, Reference Kasirer and Mashal2014, Reference Kasirer and Mashal2016; Mashal & Kasirer, Reference Mashal and Kasirer2011; Olofson et al., Reference Olofson, Casey, Oluyedun, van Herwegen, Becerra and Rundblad2014; Zheng et al., Reference Zheng, Jia and Liang2015), effect sizes varied from small to large: g = −0.37, 95% CI [−0.83, −0.09] to g = −0.91, 95% CI [−1.69, −0.12]. In the three studies that used meaningfulness decision tasks, effect sizes ranged from small to moderate: g = −0.39, 95% CI [−1.00, 0.23] to g = −0.52, 95% CI [−1.03, −0.01].
Discussion
The aim of the present systematic review and meta-analytic study was twofold: first, we sought to explore the properties of the metaphor tasks used in research involving individuals with ASD in a systematic manner. Second, we intended to examine the extent to which the groups of individuals with ASD differed from individuals with TD on metaphor comprehension, and whether any between-study variation could be explained by the properties of metaphor comprehension tasks.
We found that the included studies employed different types of materials and tasks either invented by the researchers who designed the studies or adopted (and sometimes translated) from previous studies. Although the task properties varied greatly, the potential impact of the task properties was rarely considered. Regarding the group differences, overall, individuals with ASD fell behind their TD controls in comprehension of metaphors. The patterns show that verbal explanation response format (either pure verbal explanation or in combination with other response formats) resulted in the large effect sizes. However, due to the scarce experimental manipulation of task properties, their moderating role could not be established based on the included studies.
Properties of the tasks are seldom considered and/or controlled in the studies
In terms of the response format, different approaches are adopted across the studies, with the most common being multiple-choice format. Less often used response formats include verbal explanation, followed by verbal explanation combined with another response format, and meaningful decision format. It is possible that the studies involving individuals with ASD avoid using verbal explanation tasks because of known challenges related to this type of response format (i.e., cognitively, linguistically, and socially more demanding). As the impact of response format has been associated with the between-group difference in other populations (see for example, Perlini et al., Reference Perlini, Bellani, Finos, Lasalvia, Bonetto and Scocco2018), we anticipated detecting similar patterns in studies comparing individuals with ASD to those with TD. However, the included studies did not experimentally manipulate the response format. Therefore, firm conclusions based on the results reported in these studies cannot be drawn.
A noteworthy finding of this review is that the impact of some of the properties, such as metaphor familiarity, are more frequently considered than others. The reason might be that ASD is a suitable condition for studying the distinction between novel and conventional metaphors, due to the common impairment observed in pragmatic language in this population (Paul & Norbury, Reference Paul and Norbury2012). Specifically, individuals with ASD should have more problems with comprehending novel as compared to conventional metaphors due to the involvement of inferential pragmatic ability to a greater extent in novel metaphors than in conventional metaphors. Yet results regarding the impact of metaphor familiarity on group difference between individuals with ASD and individuals with TD are mixed and inconclusive. This might partially be explained by different ways in which the studies have rated the degree of familiarity. For example, familiarity is often assessed based on the ratings collected from a limited number of participants, which might be not reliable given the large differences in subjective judgment on familiarity. Accordingly, Thibodeau, Sikos, and Durgin (Reference Thibodeau, Sikos and Durgin2018) have questioned construct validity of sentence-level subjective ratings of metaphors collected from native speakers and argued that familiarity ratings are likely to be confounded with processing fluency (i.e., how easily people understand the sentences). Moreover, it may also be that other properties that covary with familiarity, such as word-level psycholinguistic characteristics (e.g., frequency, concreteness, and length), as well as metaphors’ characteristics such as interpretability, naturalness, and imageability may account for distinct results (Cardillo et al., Reference Cardillo, Schmidt, Kranjec and Chatterjee2010). Therefore, we argue that using stimuli for which these properties have been rated by large samples of participants, and controlled for, could offer a more robust benchmark to explore the difference between familiar and unfamiliar metaphors. In addition, the use of controlled materials will favor the comparison of the findings across studies, which will be a great advantage for future systematic reviews and meta-analytic studies.
Syntactic structure is the least explored property and was not normally controlled for in the studies. Given that individuals with ASD frequently show impairments in syntax (Brynskov, Krøjgaard, & Eigsti, Reference Brynskov, Krøjgaard and Eigsti2016), and given the evidence that metaphors in different syntactic structures are comprehended differently (Cardillo et al., Reference Cardillo, Watson, Schmidt, Kranjec and Chatterjee2012; Chen et al., Reference Chen, Widick and Chatterjee2008), the syntactic structure of the metaphoric items may have been important to take into account in research with individuals with ASD. However, as most studies do not report on syntactic structure (or offer inconsistent examples), we cannot conclude that the stimuli did consistently display the same structure throughout the task. Based on our results, reporting the number of studies according to the syntactic structure of the metaphors should be therefore considered with caution.
The impact of context on the between-group difference is also poorly explored in ASD research. This finding is striking given that inferring meaning from context has been reported to be challenging for individuals with ASD due to a cognitive difference in the normal drive for coherence (Frith, Reference Frith1989; Frith & Happé, Reference Frith and Happé1994; Happé, Reference Happé1999; see, however, Brock, Norbury, Einav, & Nation, Reference Brock, Norbury, Einav and Nation2008, suggesting that differences in processing linguistic context in individuals with ASD are actually related to individual differences in their core language abilities). This implies that, although context may facilitate comprehension in TD, it may pose problems in individuals with ASD, which is in line with the “context blindness” hypothesis referring to a lack of contextual sensitivity in ASD (Vermeulen, Reference Vermeulen2014).
One study that was screened within this review (but excluded in the full-text screening stage due to the reported co-occuring conditions among individuals with ASD) found that context facilitates metaphor comprehension in ASD (Giora, Gazal, Goldstein, Fein, & Stringaris, Reference Giora, Gazal, Goldstein, Fein and Stringaris2012). However, one study is not enough to infer a pattern concerning the role of context. Of note, not only the presence or absence of context, but also the type of context may matter, because context with a large amount of information could hamper comprehension by overloading participants’ working memory and affecting attention (see Boxhoorn et al., Reference Boxhoorn, Lopez, Schmidt, Schulze, Hänig and Freitag2018; Pennington & Ozonoff, Reference Pennington and Ozonoff1996).
Regarding stimulus modality, most included studies used written tasks. This could be preferable when measuring metaphor comprehension in individuals with ASD because written tasks do not pose high social interaction demands and are less taxing for memory. In addition, aurally delivered tasks might be difficult for individuals with ASD due to their characteristics in processing auditory semantic information from spoken language (see O’Connor, Reference O’Connor2012, for a review). In contrast, it is still unclear whether written words facilitate comprehension processes for individuals with ASD in general. There is some evidence that young individuals with ASD benefit more from written word priming (not metaphorical, but conventional, “literal” words) in their lexical access than young TD controls and older individuals with ASD (Harper-Hill, Copland, & Arnott, Reference Harper-Hill, Copland and Arnott2014; see, however, Kamio & Toichi, Reference Kamio and Toichi2000, suggesting the possible advantage of pictures over words in access to semantics in ASD). It is unknown if similar effects encompass the case of metaphor.
In sum, there is a lack of attention to the role of task properties in performance on metaphor comprehension tasks in the existing ASD research. We observed the discrepancy in the task properties across the studies, as well as the limited number of studies experimentally manipulating the task properties. Therefore, strong conclusions about the extent to which task properties can explain the distinct findings in the ASD literature cannot be drawn from this study. Nevertheless, our study offers new insights into how studies in ASD have assessed metaphor comprehension and directs the focus toward the importance of acknowledging the substantial variability in tasks and their properties when interpreting the results from the existing studies. In addition, it calls for careful consideration when designing and reporting on task properties in metaphor studies.
Metaphor comprehension is more challenging for individuals with ASD than for individuals with TD
Overall, individuals with ASD as a group exhibited more difficulties in metaphor comprehension than the comparison group of individuals with TD. This finding is consistent with the results from prior studies (i.e., Happé, Reference Happé1993; Rundblad & Annaz, Reference Rundblad and Annaz2010a; van Herwegen & Rundblad Reference van Herwegen and Rundblad2018), as well as with the findings from a recent meta-analysis (Kalandadze et al., Reference Kalandadze, Norbury, Nærland and Næss2018).
Taken together, this evidence indicates that, as a group, individuals with ASD more frequently experience problems in metaphor comprehension. Nevertheless, we need to acknowledge that there are several possible explanations for the significant group difference. For example, the meta-analysis (Kalandadze et al., Reference Kalandadze, Norbury, Nærland and Næss2018) and single studies have found that group-matching strategies could explain the between-study variation on figurative language comprehension. In particular, if ASD and TD groups were matched for language ability, the groups have been found to not differ significantly on metaphor comprehension (Norbury, Reference Norbury2005). These variables should necessarily be taken into account when explaining the difficulties with metaphor comprehension in individuals with ASD, together with the role of the metaphor task properties, which, despite its well-documented importance for metaphor comprehension, has not been examined until now.
Observed pattern of the associations between the response format and between-study variation
As hypothesized, verbal explanation tasks (pure verbal explanation or combined with other response formats) are, based on the observed effect sizes, most challenging for individuals with ASD as compared to TD controls. This is not surprising because explaining metaphorical meaning is cognitively, linguistically, and socially demanding, as it requires planning and formulating utterances, and thus relies on expressive language as well as metalinguistic and executive skills, which have often been found to be challenging for individuals with ASD (Hill, Reference Hill2004; Kwok et al., Reference Kwok, Brown, Smyth and Cardy2015; Lewis et al., Reference Lewis, Murdoch and Woodyatt2007). This finding also converges with results from an irony processing study (another type of figurative language) in which minimizing the verbal and pragmatic demands of the task resulted in the similar accuracy in judging speaker’s intent for ironic criticism between the groups of individuals with ASD and TD (Pexman et al., Reference Pexman, Rostad, McMorris, Climie, Stowkowy and Glenwright2011).
However, using a verbal explanation task, if validated for use with the target group, might reduce the magnitude of the group difference. For example, Melogno, D’Ardia, et al. (Reference Melogno, D’Ardia, Pinto and Levi2012) used the test Junior Metaphor Comprehension Test designed for the specific age range (4–6 years) and validated it for use with a “pediatric population.” Using the validated tool likely resulted in a smaller effect size compared to other studies that used non-validated verbal explanation tasks. The distinction between the results based on assessing metaphor comprehension with a validated versus non-validated tool is fundamental, but, unfortunately, our result is based on one only study (Melogno, D’Ardia, et al., Reference Melogno, D’Ardia, Pinto and Levi2012). In order to draw clearer conclusions, more studies involving validated materials to study metaphor comprehension are needed.
Meaningfulness decision tasks seem to be the least challenging for individuals with ASD when compared to TD individuals. Although somehow surprising, this might be because this type of task does not require expressive language skills, planning and formulating the responses as in verbal explanation tasks, nor inhibiting the incorrect alternatives as in multiple-choice tasks. In addition, meaningfulness decision tasks might be less socially demanding because they require less interaction with the examiner than verbal tasks. Meaningfulness decision tasks might, therefore, be less taxing for individuals with ASD than verbal explanation tasks.
Limitations
Several limitations of this study should be considered when interpreting the findings. First, the inconsistency in using a response format across the studies made it unfeasible to examine the potential moderating effect of this variable. Although descriptively reported effect sizes are informative, a meta-regression or a subgroup analysis would allow for a more accurate examination of the relationships between the response format and between-study variation. Second, none of the included studies attached the stimulus materials in appendices. Some studies presented a few examples of the metaphorical items, whereas others did not even report the examples. Although we did not have access to the full list of stimuli, the information provided in the papers was sufficient for our purposes. Future reviews that want to examine the consistency of the metaphor stimuli in the existing studies, which is definitely worth investigating, should contact the authors and request the full set of stimuli. Future reviews should also examine what types of metaphors the stimuli contain (e.g., nominal metaphors, as in “Sally is a chameleon,” or conceptual metaphors, like “I see it,” where “seeing” indicates “knowing” (KNOWING IS SEEING).
Another important methodological limitation was the small sample sizes in some included studies (e.g., eight participants in Gunter et al., Reference Gunter, Ghaziuddin and Ellis2002). Because larger sample sizes correspond to less sampling bias (Borenstein et al., Reference Borenstein, Hedges, Higgins and Rothstein2009), high-powered studies would provide better effect size estimates for this meta-analysis. However, the advantage of the meta-analysis over a single study is the increased statistical power achieved via aggregating the effect sizes from multiple samples. We, therefore, propose that the magnitude of the group difference reported in our study gives a reliable result. These limitations will be overcome once more tightly conducted studies are available, allowing for consistent examination of potentially important variables affecting metaphor comprehension in individuals with ASD as compared to individuals with TD.
Implications
The main implication of the findings of our study for future research on metaphor comprehension is that stimuli and tasks need to be created by carefully taking into account a range of characteristics, such as the linguistic properties and response format, whose role in modifying behavioral and neural responses is well known in the literature (e.g., Bambini et al., Reference Bambini, Resta and Grimaldi2014; Schmidt & Seger, Reference Schmidt and Seger2009). Furthermore, studies should consistently examine the role of task properties experimentally to investigate their relationships with the performance among individuals with ASD and those with TD.
Of note, the different tasks may each have advantages, but at the same time they can impede comprehension if inappropriately used with individuals with ASD. For instance, multiple-choice tasks can be desirable from the psychometric perspective due to an easy and precise scoring (Rapp et al., Reference Rapp, Felsenheimer, Langohr and Klupp2018) and high reliability (see, for instance, the different reliability values of figurative language tasks-multiple choice vs. verbal explanation in Carotenuto et al., Reference Carotenuto, Arcara, Orefice, Cerillo, Giannino, Rasulo and Bambini2018). In contrast, multiple-choice tasks are more susceptible for measurement error due to the possibility of guessing the responses (Kline, Reference Kline2009), as well as due to tapping more executive functions because of the need to inhibit the incorrect alternatives and select the correct one. Another aspect that should be weighed up when designing metaphor comprehension tasks in ASD research concerns the number of options provided in multiple-choice tasks. For instance, there is evidence from another pragmatic domain (i.e., scalar implicatures) that presenting two versus three options might account for the presence or absence of group differences between individuals with ASD and individuals with TD (Schaeken, Van Haeren, & Bambini, Reference Schaeken, Van Haeren and Bambini2018).
As for verbal explanation tasks, these appear to be more sensitive than other tasks in detecting impairment in metaphor comprehension and allow us to establish with more confidence whether the metaphors were understood or not. However, it must be pointed out that verbal explanation tasks are not recommended for use with vulnerable groups because of the extra demands they pose on the participants (see Norbury, Reference Norbury2004). Moreover, when using verbal explanation tasks, it is important that experimenters receive adequate training in order to achieve adequate reliability in scoring responses.
In general, using metaphor tasks created ad hoc for the specific purpose of the study is often preferable for researchers, given the multifaceted nature of a metaphor. However, greater advantages would be obtained from the use of validated or/and standardized tests with good psychometric properties. Absence of tests with properties that are consistently controlled for across the studies makes comparison of the results difficult. Only one study (Melogno, D’Ardia, et al., Reference Melogno, D’Ardia, Pinto and Levi2012) used a validated instrument, and despite using the demanding response format of verbal explanation, the effect size yielded was smaller than in the study by Rundblad and Annaz (Reference Rundblad and Annaz2010a), which used a non-validated verbal explanation task. This single observation should be investigated in future studies.
Another important suggestion from our findings is that the stimulus materials should be attached to the published papers. In addition, providing a detailed description of the stimulus materials is essential to enable interpretation of the findings. In general, we propose that journals develop criteria for reporting metaphor studies in order to make quality appraisal of research for the readers and for future review studies possible.
Furthermore, based on the results of our study indicating limited number of studies using online methods, more high-quality studies on metaphor comprehension in ASD are needed combining offline and online comprehension methods widely used in psycholinguistic research. Considering the many demands offline tasks pose (i.e., social and linguistic), it is difficult to pinpoint the real sources of possible difficulties in metaphor comprehension when assessed offline. Online tasks such as those employing eye-tracking methodology, priming paradigms, and computerised tests can therefore add important insight to the knowledge of metaphor comprehension in ASD by measuring implicit processing (see, for example, Naigles, Reference Naigles2017, for innovative paradigms and methods to investigate language in ASD that could beneficially be used in metaphor research as well). For instance, priming paradigms might offer fine-grained measures of the difficulties experienced by individuals with ASD, elucidating patterns in response times (Chahboun, Vulchanov, Saldaña, Eshuis, & Vulchanova, Reference Chahboun, Vulchanov, Saldaña, Eshuis and Vulchanova2017). In addition, behavioral data could profitably be combined with the data on brain functioning to explain the neurocognitive and neurolinguistic processes underlying metaphor comprehension in individuals with ASD as compared to TD controls. For instance, Gold, Faust, and Goldstein (Reference Gold, Faust and Goldstein2010) employed event-related potential recordings to examine difficulties in semantic integration in ASD. The sample in this study, however, overlapped with the sample in Gold and Faust (Reference Gold and Faust2010) and therefore the study was not included in the meta-analysis. We did not identify any other study with data about the brain response that met the inclusion criteria for this review.
The main practical implication of our findings is that individuals with ASD need extensive support to learn metaphor comprehension strategies explicitly and that plans on how to promote metaphor comprehension should be made. Intervention programs concerning metaphor comprehension in ASD are very few, but results are promising. For instance, Mashal and Kasirer (Reference Mashal and Kasirer2011) and Melogno, Pinto, and Di Filippo (Reference Melogno, Pinto and Di Filippo2017) used thinking maps to enhance the abstraction of semantic features in metaphors. Teachers, special educators, and speech and language therapists could capitalize on this evidence and develop strategies to stimulate metaphorical skills. To begin with, the students could be reminded that figurative language involves the use of words in nonliteral ways. Then the students could be encouraged to use their metalinguistic skills to consider the overlapping features between the topic and vehicle of the metaphor (Nippold, Reference Nippold2016), similarly to the approach adopted in Mashal and Kasirer (Reference Mashal and Kasirer2011) and in Melogno et al. (Reference Melogno, Pinto and Di Filippo2017). In addition, the students could be asked to collect metaphors from different sources such as advertisements and literature, including the context in which they occur (Nippold, Reference Nippold2016). Teachers may also incorporate metaphors of different degrees of familiarity in minimal or short story contexts and present them both aurally and in print, with and without pictures, eliciting the answers through different response formats (see Nippold, Reference Nippold2016, for more ideas).
Conclusions
This paper reports the systematic review and meta-analysis concerning task properties of the metaphor tasks used in ASD research and the role that they play in determining the differences between groups of individuals with ASD and those with TD in metaphor comprehension. By focusing on the impact of the task properties, this study contributes to the ongoing debate about the potential sources of between-study variation in metaphor comprehension in individuals with ASD and offers novel insights into figurative language in this population.
The included studies used an array of different tasks with a range of properties, whose impact was rarely considered and/or experimentally manipulated. Individuals with ASD in general exhibited more difficulties in metaphor comprehension than their TD counterparts, but this difference is likely to be partially related to the task properties such as the response format. Yet, more research is needed to confirm the relationship between the task properties and between-study variance.
In light of the findings of our study, we argue that future metaphor comprehension studies comparing individuals with ASD to those with TD should carefully take into account task properties such as response format and linguistic characteristics (i.e., metaphor familiarity, syntactic structure of the metaphor, linguistic context, and stimulus modality).
Consideration of task properties is also necessary in order to design appropriate educational programs to improve figurative language competence and ultimately improve communication and academic skills of individuals with ASD.
Author Contribution
Tamar Kalandadze conceptualized, designed, and administered the study; created the coding protocol; established the inclusion and exclusion criteria; collected the data; screened the titles and abstracts; read/screened full-text articles; created the coding scheme and coded the variables; analyzed and interpreted the results; drafted the manuscript; and had the main responsibility for revising and resubmitting the manuscript after peer review.
Valentina Bambini contributed to the study design, especially to the selection of the metaphor task properties and the consideration of metaphor theory aspects; did the double-coding; contributed to analysis and interpretation of the results, especially of the systematic review; contributed to the writing of the manuscript; provided feedback as well as final approval of this paper; and contributed to revising of the manuscript.
Kari-Anne B. Næss supervised the research process; contributed to the study design, especially in the meta-analysis part; contributed to the interpretation of the results; contributed to the writing of the manuscript; provided feedback as well as final approval of this paper; and contributed to revising of the manuscript.
Acknowledgments
The authors would like to thank Ingrid Lossius Falkum for her contribution in selecting the terms for our literature search. The librarians at the University of Oslo, Glenn Karlsen Bjerkenes and Magnus Heie Gregersen, deserve thanks for assisting in the literature search process. The authors would also thank Ellie Wilson at Science in English for proofreading and providing feedback on the paper.