A systematic review and meta-analysis of studies on metaphor comprehension in individuals with autism spectrum disorder: Do task properties matter?

Tamar Kalandadze; Valentina Bambini; Kari-Anne B. Næss

doi:10.1017/S0142716419000328

A systematic review and meta-analysis of studies on metaphor comprehension in individuals with autism spectrum disorder: Do task properties matter?

Published online by Cambridge University Press: 18 October 2019

and

Tamar Kalandadze*: Affiliation:
University of Oslo, Norway Knowledge Centre for Education, University of Stavanger, Norway
Valentina Bambini: Affiliation:
University School for Advanced Studies IUSS Pavia, Italy
Kari-Anne B. Næss: Affiliation:
University of Oslo, Norway
*: *Corresponding author. Email: tamar.kalandadze@isp.uio.no/tamar.kalandadze@uis.no

Article contents

Abstract
Response format
Linguistic characteristics
Metaphor comprehension task properties in studies with participants with ASD
The current study: objectives and research questions
Method
Results
Discussion
Author Contribution
References

Rights & Permissions

Abstract

Individuals with autism spectrum disorder (ASD) often experience difficulty in comprehending metaphors compared to individuals with typical development (TD). However, there is a large variation in the results across studies, possibly related to the properties of the metaphor tasks. This preregistered systematic review and meta-analysis (a) explored the properties of the metaphor tasks used in ASD research, and (b) investigated the group difference between individuals with ASD and TD on metaphor comprehension, as well as the relationship between the task properties and any between-study variation. A systematic search was undertaken in seven relevant databases. Fourteen studies fulfilled our predetermined inclusion criteria. Across tasks, we detected four types of response format and a great variety of metaphors in terms of familiarity, syntactic structure, and linguistic context. Individuals with TD outperformed individuals with ASD on metaphor comprehension (Hedges’ g = −0.63). Verbal explanation response format was utilized in the study showing the largest effect size in the group comparison. However, due to the sparse experimental manipulations, the role of task properties could not be established. Future studies should consider and report task properties to determine their role in metaphor comprehension, and to inform experimental paradigms as well as educational assessment.

Keywords

autism spectrum disorder experimental pragmatics figurative language response format

Type: Original Article
Information: Applied Psycholinguistics , Volume 40 , Issue 6 , November 2019 , pp. 1421 - 1454

DOI: https://doi.org/10.1017/S0142716419000328 [Opens in a new window]
Copyright: © Cambridge University Press 2019

A metaphor is a paradigmatic type of figurative language involving discrepancy between the encoded, “literal” meaning of words, and their occasion-specific use (Camp, Reference Camp and Cummings2009; Carston, Reference Carston2010). Metaphors can appear in many forms, such as “Sally is a chameleon” or “Your theory is falling apart.” Accordingly, different accounts of metaphor comprehension have been proposed (see Bowdle & Gentner, Reference Bowdle and Gentner2005; Gibbs, Reference Gibbs2011; Gibbs & Tendahl, Reference Gibbs and Tendahl2006; Gluksberg, Reference Gluksberg2001; Wilson, Reference Wilson2011). Among them, pragmatic accounts (e.g., relevance theory) focus on metaphor in communication, highlighting the inferential mechanisms that lead to adjusting the linguistically encoded concepts to arrive at the speaker’s intended meaning (Sperber & Wilson, Reference Sperber, Wilson, Wilson and Sperber2012). For instance, in “Sally is a chameleon,” the adjustment results in the broadening of the concept CHAMELEON to include not only a species of lizard but also individuals with certain psychological features (Carston, Reference Carston, Allan and Jaszczolt2012). In contrast, cognitive linguistics accounts (e.g., conceptual metaphor theory) emphasize the role of metaphor in thought, seeing it in terms of conceptual mappings across cognitive domains (Gibbs, Reference Gibbs2011; Lakoff & Johnson, Reference Lakoff and Johnson1980). The conceptual mappings emerge in our metaphorical use of language, as in “Your theory is falling apart,” for the mapping of theories onto physical constructs such as buildings (THEORIES ARE BULDINGS).

Regardless of the theoretical approach, there is an agreement that metaphors are a ubiquitous part of language and appear frequently in everyday communication, school-books, academic texts, literature, and media communications (Golden, Reference Golden, Low, Todd, Deignan and Cameron2010; Steen, Dorst, & Hermann, Reference Steen, Dorst and Hermann2010). Hence, difficulty in understanding metaphors may impede social communication, the ability to obtain information, as well as academic attainment.

In individuals with typical development (TD) metaphor comprehension skills mature throughout childhood until adolescence, and it is commonly assumed that the age of 10 represent a crucial moment (Lecce, Ronchi, Del Sette, Bischetti, & Bambini, Reference Lecce, Ronchi, Del Sette, Bischetti and Bambini2019; Winner, Rosenstiel, & Gardner, Reference Winner, Rosenstiel and Gardner1976). There is, however, also awareness that metaphorical competence is evident earlier, if assessed with age-appropriate tasks (Pouscoulous, Reference Pouscoulous2011, Reference Pouscoulous and Matthews2014; Vosniadou, Ortony, Reynolds, & Wilson, Reference Vosniadou, Ortony, Reynolds and Wilson1984). In contrast, profound and lasting difficulty in metaphor comprehension has traditionally been considered characteristic for individuals with autism spectrum disorder (ASD; Adachi et al., Reference Adachi, Koeda, Hirabayashi, Maeoka, Shiota, Wright and Wasa2004; Happé, Reference Happé1993; Rundblad & Annaz, Reference Rundblad and Annaz2010a), a neurodevelopmental condition characterized by impairments in social communication and interaction, as well as restricted and stereotyped behaviors (American Psychiatric Association, 2013). In particular, individuals with ASD have been reported to interpret metaphors literally (Happé, Reference Happé1993), a phenomenon referred to as the “literality bias” or concretism (see Rossetti, Brambilla, & Papagno, Reference Rossetti, Brambilla and Papagno2018, for explanation of these terms).

However, there is a discrepancy in study findings. For example, several studies show no statistically significant difference between ASD and TD groups in figurative language comprehension, including metaphors (Hermann et al., Reference Hermann, Haser, van Elst, Ebert, Müller-Feldmeth, Riedel and Konieczny2013; Kasirer & Mashal, Reference Kasirer and Mashal2014; Mashal & Kasirer, Reference Mashal and Kasirer2011; Norbury, Reference Norbury2005). These findings indicate that variables other than characteristics intrinsic to ASD may explain the variation in results across studies. Group matching strategy and general language ability have previously been found to explain some of the between-study variance in figurative language comprehension (see Kalandadze, Norbury, Nærland, & Næss, Reference Kalandadze, Norbury, Nærland and Næss2018, for a review). Yet, the remaining unexplained variance requires an investigation of additional relevant variables.

In the behavioral and neurological literature in TD and clinical populations, there is an agreement that the ability to understand metaphors hinges on the task properties such as response format (i.e., multiple-choice vs. verbal explanation task), or absence of linguistic context (see Pouscoulous, Reference Pouscoulous2011, Reference Pouscoulous and Matthews2014, for discussion of studies with TD participants, and Rossetti et al., Reference Rossetti, Brambilla and Papagno2018, for discussion of literature on schizophrenia). For instance, children with TD show earlier competence in metaphor comprehension when tested with an act-out rather than a verbal explanation task, perhaps due to the differences in linguistic and cognitive demands that verbal and other types of tasks pose (Pouscoulous, Reference Pouscoulous2011). Similarly, response format could explain how individuals with ASD perform on metaphor tasks. For example, individuals with ASD might understand metaphors comparably to individuals with TD but have more difficulties in explaining the meaning verbally due to difficulties with expressive language (Kwok, Brown, Smyth, & Cardy, Reference Kwok, Brown, Smyth and Cardy2015). The same might be true for other properties of the metaphors (e.g., the amount and type of context available to interpret the expression, or the familiarity of the expression; Pouscoulous, Reference Pouscoulous2011, Reference Pouscoulous and Matthews2014).

Despite this knowledge, the properties of metaphor comprehension assessment tasks in studies that compare individuals with ASD to individuals with TD have yet to be comprehensively and systematically explored. In addition, the potential interrelationships between the task properties and any between-study variation have not been systematically investigated. Reviews that have been conducted focused on ASD and figurative language in general, rather than on metaphor specifically (Gernsbacher & Pripas-Kapit, Reference Gernsbacher and Pripas-Kapit2012; Kalandadze et al., Reference Kalandadze, Norbury, Nærland and Næss2018; Melogno, Pinto, & Levi, Reference Melogno, Pinto and Levi2012; Vulchanova, Saldaña, Chahboun, & Vulchanov, Reference Vulchanova, Saldaña, Chahboun and Vulchanov2015). However, the comprehension of metaphor might differ from the comprehension of other figurative language types in several respects (Vulchanova, Milburn, Vulchanov, & Baggio, Reference Vulchanova, Milburn, Vulchanov and Baggio2019). For example, the comprehension of irony seems to depend on Theory of Mind (i.e., the ability to attribute one’s own mental states and those of others) more than comprehension of a metaphor (Happé, Reference Happé1993). In addition, metonymy is processed faster than metaphor, probably due to the routinization of metonymic shifts (Bambini, Ghio, Moro, & Schumacher, Reference Bambini, Ghio, Moro and Schumacher2013). Moreover, the majority of the existing reviews utilized a narrative approach (Gernsbacher & Pripas-Kapit, Reference Gernsbacher and Pripas-Kapit2012; Melogno, Pinto, et al., Reference Melogno, Pinto and Levi2012; Vulchanova et al., Reference Vulchanova, Saldaña, Chahboun and Vulchanov2015), which differs from our systematic approach in fundamental ways, especially regarding transparency and systematicity of methods used (Borenstein, Hedges, Higgins, & Rothstein, Reference Borenstein, Hedges, Higgins and Rothstein2009).

Here, we provide a novel and thorough systematic review and meta-analysis of the properties of the metaphor tasks used in ASD research. We quantitatively compared performance on metaphor comprehension tasks between groups of individuals with ASD and TD and investigated the potential role of the task properties in between-study variation.

By systematically summarizing and synthesizing the available research in the field fulfilling certain inclusion criteria, our study provides robust results that will ultimately have implications when designing future figurative language/metaphor comprehension research, for advancing assessment practices as well as for guiding the research-based intervention paradigms for individuals with ASD.

The following sections provide an overview of metaphor task properties that have been identified as critical for metaphor comprehension in TD and clinical populations (e.g., Pouscoulous, Reference Pouscoulous2011; Rossetti et al., Reference Rossetti, Brambilla and Papagno2018). These are (a) response format (e.g., multiple-choice, meaningfulness decision, etc.), and (b) linguistic characteristics (metaphor familiarity, syntactic structure of the metaphor, linguistic context, and stimulus modality).

Response format

Evidently, the different ways of eliciting the responses when measuring metaphor comprehension pose diverse cognitive and linguistic demands. For example, earlier studies that tested metaphor comprehension of young children by asking them to explain or paraphrase a metaphor concluded that metaphor comprehension was not fully acquired until later in development (e.g., Winner et al., Reference Winner, Rosenstiel and Gardner1976; see Winner, Reference Winner1988, for an overview; see Pouscoulous, Reference Pouscoulous2011, Reference Pouscoulous and Matthews2014, for discussion). Alternatively, these findings may be explained by other variables such as response format demands (Pouscoulous, Reference Pouscoulous and Matthews2014). For example, metaphor explanation or justification tasks require a participant to articulate associations between metaphor topic and vehicle (e.g., “sister” and “butterfly” in “My sister is a butterfly”). Therefore, performance also depends on metalinguistic judgment as well as expressive language and executive control skills. In addition, verbal explanation tasks require participants to explain the meaning of a metaphor to another person, and are therefore more socially demanding than written or computer-based tasks. Explanation tasks might also trigger the processing of the other person’s reactions indicating whether the message was understood or not, thus engaging social-communication skills. By contrast, multiple-choice tasks do not rely on expressive language or meta-linguistic skills and require minimal social interaction with the examiner. However, multiple-choice tasks might be more costly in terms of need for inhibiting the false alternative(s) and selecting the correct one, as suggested by evidence from patients with brain lesions (Rapp, Felsenheimer, Langohr, & Klupp, Reference Rapp, Felsenheimer, Langohr and Klupp2018). The important role of the response format in metaphor comprehension is also supported by studies explicitly comparing different tasks. For instance, a study by Perlini et al. (Reference Perlini, Bellani, Finos, Lasalvia, Bonetto and Scocco2018) showed that only results from verbal explanation (but not multiple-choice) tasks yielded statistically significant difference between patients in the early phases of psychosis and controls. In addition, Arcara et al. (Reference Arcara, Tonini, Muriago, Mondin, Sgarabottolo, Bertagnoni and Bambini2019) showed that individuals with traumatic brain injury have more difficulties in performing verbal explanation tasks on figurative language (especially proverbs) compared with multiple-choice tasks.

Linguistic characteristics

Here, we present available evidence regarding the role played by different linguistic characteristics of the metaphor: metaphor familiarity, syntactic structure of the metaphor, linguistic context, and stimulus modality.

Metaphor familiarity

Metaphors are often differentiated according to whether they are conventional (i.e., well established and often encountered in a language), or novel (i.e., not familiar, based on creative invention; Bowdle & Gentner, Reference Bowdle and Gentner2005; Rossetti et al., Reference Rossetti, Brambilla and Papagno2018; Varga et al., Reference Varga, Schnell, Tényi, Németh, Simon, Hajnal and Herold2014). For instance, a metaphor like “The sky’s scarf is colored” (Melogno, D’Ardia, Pinto, & Levi, Reference Melogno, D’Ardia, Pinto and Levi2012) is considered novel, while “There is a flood outside the museum” (Rundblad & Annaz, Reference Rundblad and Annaz2010a), where flood refers to “lots of people,” is considered a lexicalized/conventional metaphor. Both behavioral and neuroimaging evidence from different populations suggests different processing patterns for metaphor familiarity modulation, and, in particular, a facilitation for conventional compared to novel metaphors (Bambini, Gentili, Ricciardi, Bertinetto, & Pietrini, Reference Bambini, Gentili, Ricciardi, Bertinetto and Pietrini2011; Blasko & Connine, Reference Blasko and Connine1993; Gluksberg, Gildea, & Bookin, Reference Gluksberg, Gildea and Bookin1982; Lee & Dapretto, Reference Lee and Dapretto2006; Mashal, Faust, Hendler, & Jung-Beeman, Reference Mashal, Faust, Hendler and Jung-Beeman2009; Rapp et al., Reference Rapp, Felsenheimer, Langohr and Klupp2018; Rossetti et al., Reference Rossetti, Brambilla and Papagno2018; Varga et al., Reference Varga, Schnell, Tényi, Németh, Simon, Hajnal and Herold2014). This might be because at least highly conventional metaphors are to be retrieved from the long-term memory where they are stored as learned lexical units, whereas novel metaphors might to a greater degree depend on the pragmatic ability to make context-relevant inferences (see Pouscoulous, Reference Pouscoulous2011, Reference Pouscoulous and Matthews2014; Wilson & Carston, Reference Wilson and Carston2006, for discussions). Conventional metaphors may, therefore, be understood more quickly and with less cognitive effort, whereas the online processing required for novel metaphors could result in longer processing time involving pragmatic ability to a greater extent. Nevertheless, the exact nature of the difference in comprehension of conventional versus novel metaphors is still debated (Cardillo, Watson, Schmidt, Kranjec, & Chatterjee, Reference Cardillo, Watson, Schmidt, Kranjec and Chatterjee2012).

Syntactic structure of the metaphors

Metaphors in the literature and discourse appear in various syntactic structures. For example, nominal metaphors express the metaphoric meaning using a noun (e.g., “Caroline is a princess”; Wilson & Carston, Reference Wilson and Carston2006), predicate metaphors use a verb (e.g., “The rumor flew through the office”; Utsumi & Sakamoto, Reference Utsumi and Sakamoto2011), and adjective metaphors use an adjective (e.g., “sharp tongue”; Kasirer & Mashal, Reference Kasirer and Mashal2014).

The cognitive effort required for the comprehension of the metaphors of different syntactic structure is likely to diverge (Cardillo et al., Reference Cardillo, Watson, Schmidt, Kranjec and Chatterjee2012; Chen, Widick, & Chatterjee, Reference Chen, Widick and Chatterjee2008). For instance, understanding nominal metaphors is suggested to entail either comparison (the assumption that metaphors convey similarities between semantically distinct concepts; Gentner, Bowdle, Wolff, & Boronat, Reference Gentner, Bowdle, Wolff, Boronat, Centner, Holyoak and Kokinov2001), categorization, (the establishment of taxonomic relations between semantically distinct concepts; Gluksberg, Reference Gluksberg2003), or both comparison and categorization (Bowdle & Gentner, Reference Bowdle and Gentner2005). On the contrary, predicate metaphors may be understood through a process of highlighting core abstract conceptual features of a verb (Chen et al., Reference Chen, Widick and Chatterjee2008). Adjective metaphors are also said to be comprehended through categorization (Gluksberg, Reference Gluksberg2001; Glucksberg & Keysar, Reference Glucksberg and Keysar1990) or by a two-stage categorization process (Utsumi & Sakamoto, Reference Utsumi and Sakamoto2007). This variation resulting from the different syntactic structures of metaphors may impact study outcomes.

Linguistic context

Metaphors in real life are usually encountered in sentences and/or discourse. Therefore, presenting metaphors with little or no context creates an artificial situation and may obscure the individual’s ability to interpret a metaphorical expression. A number of studies on figurative language in individuals with TD as well as clinical populations (i.e., schizophrenia) suggest that the presence of a supportive context can significantly facilitate access to nonliteral meaning (Chakrabarty et al., Reference Chakrabarty, Sarkar, Chatterjee, Ghosal, Guha and Deogaonkar2014; Pouscoulous, Reference Pouscoulous2011, Reference Pouscoulous and Matthews2014). In line with this, event-related brain potential studies have shown that, in the earlier phases of processing, higher integration efforts are required for metaphoric expressions presented in minimal context compared to supportive context (Bambini, Bertini, Schaeken, Stella, & Di Russo, Reference Bambini, Bertini, Schaeken, Stella and Di Russo2016).

Stimulus modality

The mode of the metaphor stimuli (i.e., auditory vs. written/visual) may also impact performance. For example, young children are usually tested with auditory tasks where they listen to the verbal metaphors and instructions because of their not-yet-adequate reading ability to complete written tasks or read instructions. However, it is not entirely clear whether and how the stimulus modality impacts metaphor comprehension in older children. In addition, metaphor tasks often incorporate a picture/image component to facilitate comprehension of verbal metaphor (e.g., in Rundblad & Annaz, Reference Rundblad and Annaz2010b). Evidence from brain damaged patients suggests that right-hemisphere damaged patients performed better on a verbal than on a visuoverbal test relative to the control group of participants without brain damage (Rinaldi, Marangolo, & Baldassarri, Reference Rinaldi, Marangolo and Baldassarri2004). This might be explained by a disadvantage in processing visual information or by the challenges associated with cross-modal processing.

In sum, evidence suggests that task properties are essential to performance on metaphor comprehension tasks. This may give rise to different processing strategies in individuals with TD and ASD and affect statistical differences between clinical and control groups. As the task properties are often associated with changes in behavioral and neural response in processing metaphors, psycho- and neurolinguistic studies are increasingly based on extensive ratings of metaphor materials. To this end, norms have been established offering metaphorical expression characterizations along several linguistic dimensions, such as familiarity, interpretability, naturalness, and imageability (e.g., Bambini, Resta, & Grimaldi, Reference Bambini, Resta and Grimaldi2014; Cardillo, Schmidt, Kranjec, & Chatterjee, Reference Cardillo, Schmidt, Kranjec and Chatterjee2010; Cardillo, Watson, & Chatterjee, Reference Cardillo, Watson and Chatterjee2017; Jacobs & Kinder, Reference Jacobs and Kinder2017). These linguistic dimensions, however, are much less established in the literature on metaphor comprehension in ASD.

Metaphor comprehension task properties in studies with participants with ASD

Studies that compare individuals with ASD to individuals with TD on metaphor comprehension have employed a variety of tasks with different properties. For example, both Happé (Reference Happé1993) and Norbury (Reference Norbury2005) employed a sentence completion task where the participants were asked to finish each sentence with a word they could choose from a list. Another type of multiple-choice format was used in the study conducted by Adachi et al. (Reference Adachi, Koeda, Hirabayashi, Maeoka, Shiota, Wright and Wasa2004). They tested metaphor comprehension by metaphoric scenarios where the children were asked to read the questions silently and choose from the four response options (one correct and three incorrect). In their study, Rundblad and Annaz (Reference Rundblad and Annaz2010a) employed a different format, whereby open verbal responses were given in response to short stories that were accompanied by images/pictures to aid comprehension.

These studies yielded distinct results regarding the magnitude of group-level differences in metaphor comprehension between individuals with ASD and controls with TD. In particular, Adachi et al. (Reference Adachi, Koeda, Hirabayashi, Maeoka, Shiota, Wright and Wasa2004), Happé (Reference Happé1993), and Rundblad and Annaz (Reference Rundblad and Annaz2010a) found significantly lower ability to understand metaphors, whereas Norbury (Reference Norbury2005) found no statistically significant difference between language-ability matched groups. Presumably, the open verbal response format used in Rundblad & Annaz’s (Reference Rundblad and Annaz2010a) study could be more challenging for at least some individuals with ASD with impaired metalinguistic, expressive language or executive function-related skills (Bishop & Norbury, Reference Bishop and Norbury2005; Kwok et al., Reference Kwok, Brown, Smyth and Cardy2015; Lewis, Murdoch, & Woodyatt, Reference Lewis, Murdoch and Woodyatt2007; Melogno, Pinto, & Levi, Reference Melogno, Pinto and Levi2015).

Furthermore, including pictures in a metaphor task may also influence performance of individuals with ASD. In individuals with TD, including pictures in a metaphor task can be an advantage because visualization can aid comprehension of verbal metaphors. Using visual support properly, for example pictures accompanying verbal instruction to aid comprehension, is generally also encouraged in work with individuals with ASD (e.g., Dettmer, Simpson, Smith Myles, & Granz, Reference Dettmer, Simpson, Smith Myles and Granz2000; Nelson, McDonnell, Johnston, Crompton, & Nelson, Reference Nelson, McDonnell, Johnston, Crompton and Nelson2007; Rao & Gagie, Reference Rao and Gagie2006). There is evidence from a priming study of probable benefit of using pictures over words to access meaning in ASD (Kamio & Toichi, Reference Kamio and Toichi2000). Nevertheless, it should be noted that a task presented in two modalities may be more challenging for individuals with ASD as they may struggle to switch between visual and auditory information. This can be hypothesized on the basis of studies such as Reed and McCarthy (Reference Reed and McCarthy2011), where individuals with ASD, compared with participants with TD, showed greater difficulty when different modalities were employed than when only one modality was required. However, the individual needs vary (Rao & Gagie, Reference Rao and Gagie2006), resulting in some individuals with ASD benefiting most from picture support, while others from the written support.

Certain task properties might be more suitable than others for individuals across the spectrum, given the cognitive and linguistic strengths (i.e., unimpaired rote memory or interest in details) and differences or challenges (i.e., executive functions) often observed in this population. For example, with regard to the metaphor familiarity, individuals with ASD might have more difficulties than individuals with TD in understanding novel metaphors because comprehension of novel metaphors involves pragmatic operations to a greater degree than conventional ones (Pouscoulous, Reference Pouscoulous2011). In particular, by being innovative and occasion-specific, novel metaphors rely on pragmatic inference involving context-specific meaning adjustments (Recanati, Reference Recanati2004; Sperber & Wilson, Reference Sperber, Wilson, Wilson and Sperber2012; Wilson & Carston, Reference Wilson and Carston2006), while conventional metaphors should depend less on inferencing and more on lexical knowledge (Pouscoulous, Reference Pouscoulous and Matthews2014). Nevertheless, because they are likely to be stored in the lexicon and thus linked to vocabulary knowledge, conventional metaphors might also pose problems for individuals on the spectrum (Pouscoulous, Reference Pouscoulous2011). Individuals with ASD have often been shown to have compromised or biased vocabulary (Tager-Flusberg, Reference Tager-Flusberg1992; Tager-Flusberg et al., Reference Tager-Flusberg, Calkings, Nolin, Baumberger, Anderson and Chadwick-Dias1990). As vocabulary knowledge is closely related to metaphor comprehension in individuals with TD (Nippold, Reference Nippold2016), compromised vocabulary knowledge might be linked to difficulties in metaphor comprehension in individuals with ASD with poorer vocabulary.

Some examples of the different task properties employed in the ASD literature on metaphor comprehension are provided in Table 1. The substantial variability in the assessment tasks employed may account for differences in the results of the studies, making it critical to inspect the properties of these tasks. This issue has been highlighted in a few narrative reviews. For instance, Melogno, D’ardia, et al. (Reference Melogno, D’Ardia, Pinto and Levi2012) stressed the heterogeneity of the tasks requiring diverse comprehension skills as the main difficulty in assessing the contribution of different tasks/variables, and they emphasized the urgent need of a careful review of the literature. Likewise, a more recent review by Siqueira, Marques, and Gibbs (Reference Siqueira, Marques and Gibbs2016) claimed that contrasting findings across studies of figurative language (including metaphors) in different clinical populations (including ASD) may be related more to issues related to data collection than to a specific difficulty one population may have in understanding a certain type of figurative language.

Table 1. Examples of metaphor task properties taken from the included studies

Note: Original items extracted from the studies are enclosed within single quotation marks with metaphoric vehicles italicized. The instruction directly cited is enclosed within double quotation marks.

* The expression “Now I view it” is used as a novel version of the conceptual mapping KNOWING IS SEEING, as opposed to the lexicalized version “Now I see it.”

The current study: objectives and research questions

The overarching aim of this study was to advance the knowledge and awareness of the impact of task properties on metaphor comprehension performance in individuals with ASD compared to individuals with TD. We aimed to accumulate the existing knowledge by synthesizing the earlier research using the methods of systematic review and a meta-analysis.

The present study (a) explored the properties of the metaphor tasks used in ASD research; (b) investigated the group difference between individuals with ASD and TD on metaphor comprehension, as well as the relationship between the task properties and any between-study variation. We anticipated larger between-study differences in studies employing verbal explanation formats than studies using alternative response formats (e.g., multiple-choice response format).

Method

This study was preregistered in the International Register of Systematic Reviews, PROSPERO, with the registration number CRD42017057231 (available from http://www.crd.york.ac.uk/PROSPERO/display_record.php?ID=CRD42017057231). A dual approach was utilized: a systematic review and a meta-analysis. Preferred Reporting Items for Systematic Reviews and Meta-Analysis (PRISMA) statement (http://www.prisma-statement.org/) was consulted to ensure methodological rigor.

We systematically reviewed the included studies in terms of metaphor task properties (response format and linguistic characteristics). We then undertook a meta-analysis to compare individuals with ASD to individuals with TD on metaphor comprehension, as well as to examine the relationship between response format and any between-study variation.

Data collection, study inclusion, and coding

A systematic literature search was initially conducted on April 14, 2016, and was updated on April 4, 2017. The words for the literature search and the search strategies were selected after discussions in the authors’ team and in close collaboration with two librarians at the University of Oslo library with expertise in literature searching. The librarians’ responsibility was to ensure that the right search strategies were used and adapted correctly to the different databases. The following electronic databases were searched: Psychinfo, Linguistics and Language Behavior Abstracts (LLBA), Eric, Embase, Norart, Medline, Web of science. The following terms were used as keywords: ASD OR asperger* OR autis* OR “pervasive developmental disorder” combined with allegor* OR analogy OR analogies OR “figure* of speech” OR “figurative language” OR imagery OR imageries OR metaphor* OR simile*. No restrictions in terms of the publication year were applied.

In addition to the searches in the databases, the key terms (ASD and metaphor comprehension; Asperger and metaphor comprehension) were applied to Google scholar to identify any gray literature (literature that are not published in scientific journals, e.g., working papers, conference proceedings) to minimize potential publication bias in the meta-analysis. This step is important because studies with significant results and large effect sizes are more easily published than studies that report nonsignificant findings or small effect sizes (Borenstein, et al., Reference Borenstein, Hedges, Higgins and Rothstein2009). In addition, we manually searched the tables of contents of the following key journals: Journal of Autism and Developmental Disorders and Autism. Finally, we went through the reference lists of the included articles and book chapters.

To be included in both the systematic review and the meta-analysis, articles were required to meet the following predetermined criteria: (a) the studies had to report on metaphor comprehension separately (when results on metaphor comprehension were part of the results on one global figurative language variable the study was excluded); (b) only participants with ASD were included. Of note, although we consistently use the term “ASD” according to the DSM-5 (American Psychiatric Association, 2013), we expected that diagnoses in the included studies would be based on the Diagnostic and Statistical Manual of Mental Disorders (DSM-IV or DSM-IV-TR; American Psychiatric Association, 1994, 2000), or International Classification of Diseases (ICD-10; World Health Organization, 1992) criteria, which prevailed at the time the studies was conducted. Thus, participants might have been diagnosed with autistic disorder, Asperger’s syndrome/disorder, or pervasive developmental disorder—not otherwise specified; (c) only the studies involving participants with primary diagnosis of ASD (without any co-occurring conditions) were included to avoid the influence of other conditions on the outcome; (d) study design had to compare individuals with ASD to individuals with TD (the groups could either be equated for chronological age [CA], CA and other variables including verbal abilities, or verbal abilities only). No CA restrictions were applied because metaphor comprehension difficulties in ASD are also found in adults with ASD (Happé, Reference Happé1993); (e) studies had to report data necessary to calculate effect sizes such as mean and standard deviation or p values as well as information and/or examples about the metaphor stimuli that were used; (f) studies could be reported in English, Norwegian, Italian, Russian, Swedish, or Danish because at least one of the authors is competent in each of these languages. By including several languages, we aimed to avoid the language bias often observed in systematic reviews, which is characterized by overrepresentation of English studies (Borenstein et al., Reference Borenstein, Hedges, Higgins and Rothstein2009). Titles and abstracts obtained from the search were screened for relevance based on the predetermined inclusion criteria by the first author. In case of insufficient information to decide the relevance on the study in the title and abstract, the full-text was reviewed. Finally, 14 studies met the inclusion criteria. For further information on the screening process and a summary of the reasons that studies were excluded see Figure 1.

Figure 1. Flow chart for the screening and inclusion of studies based on the PRISMA statement (Moher, Liberati, Tetzlaff, & Altman, Reference Moher, Liberati, Tetzlaff and Altman2009).

We coded the following study characteristics: author(s), publication year, diagnostic status, comparison group, CA of the participants (mean and standard deviations), sample sizes in each group, and means and standard deviations or p values for measures of metaphor comprehension. The following information about the task properties was coded: response format, metaphor familiarity, syntactic structure of the metaphor, linguistic context, and stimulus modality.

Several considerations were made when extracting the means, standard deviations, or p values for calculating effect sizes in the meta-analysis. First, for the studies with multiple data collection points (e.g., intervention studies), only data from the first time point was coded. This was to ensure the results were not influenced by any intervention effects. Second, to avoid estimate dependency, the data from the largest sample was extracted when overlapping samples existed (Borenstein et al., Reference Borenstein, Hedges, Higgins and Rothstein2009). Third, to avoid the problems with assigning more weight to studies with more outcome variables (Borenstein et al., Reference Borenstein, Hedges, Higgins and Rothstein2009), we calculated a composite score of multiple outcomes (e.g., novel and conventional metaphors) within each of the studies. The composite score is the mean effect size, with a variance that considers the correlation among the different outcomes. Thus, every study including multiple outcomes was represented by one score, which was used as the unit of analysis.

As predetermined, the authors initially discussed the coding procedure, then the first and the second authors double-coded the data from 10 randomly selected papers and discussed the coding of the remaining 4 papers. The interrater agreements for the coded variables in the 10 randomly selected papers were as follows: 100% for author, publication year, ASD and comparison group, age of the participants, sample size in each group, metaphor familiarity, syntactic structure of the metaphor, and linguistic context; 97% for response format and stimulus modality; 93.10% for the metaphor comprehension measures (mean with SDs and p values). Of note, a divergence on the metaphor comprehension measures emerged with regard to the study by Kasirer and Mashal (Reference Kasirer and Mashal2014). The divergence was due to the inverted values for the ASD and TD groups reported on the table in the original article. The last author of the original paper has confirmed the typo in email correspondence. The correct values were used for calculating the effect sizes. The other disagreements between the raters were resolved by discussion and/or by consulting the original papers.

The procedure of systematically reviewing the task properties

A comprehensive coding scheme was developed for the scrutiny of the relevant data from the included studies. Data on metaphor properties were analyzed in detail for response format and linguistic characteristics (metaphor familiarity, syntactic structure, linguistic context, and stimulus modality). The exact number of studies reporting on each of these properties was identified. The findings of the studies that experimentally examined a property of interest are presented in the Results section descriptively. Lack of taking into account the properties was also considered a noteworthy finding. If the studies did not report task properties, we tried to obtain the necessary information by locating a description of the task from previous studies through searching Google web by the task name.

Meta-analytical procedure

Statistical analyses were conducted using the Comprehensive Meta-Analysis software Version 3 (Borenstein, Hedges, Higgins, & Rothstein, Reference Borenstein, Hedges, Higgins and Rothstein2014). Because of the importance of adjusting a meta-analysis to the studies examined (Borenstein et al., Reference Borenstein, Hedges, Higgins and Rothstein2009), we made some considerations for effect size computations. In particular, we used the Hedges’ formula for standardized mean difference with a confidence interval (CI) of 95% to report effect sizes. Hedges’ g was selected because it is corrected for sample sizes (Hedges, Reference Hedges1981) and studies on metaphor comprehension in ASD often include small samples. A positive Hedges’ g value indicated that individuals with ASD had the higher group mean; a negative Hedges’ g value indicated that the groups differed in favor of TD group. A 95% CI was calculated for each effect size to indicate whether it was significantly greater than zero. The effect is statistically significant if the CI does not cross zero. The effect sizes were interpreted based on Cohen’s (Reference Cohen1988) benchmarks, with effect d ≤ 0.2 reflecting a small effect, d ≤ 0.5 considered medium effect, and d ≤ 0.8 indicating a large effect. However, these values are relative and somewhat arbitrary both to each other and to the specific study and research method employed (Cohen, Reference Cohen1988; Thompson, Reference Thompson2007). Therefore, interpreting these guidelines in relation to the clinical consequences that the effect size may have (Lakens, Reference Lakens2013) is important to avoid misleading suggestions to the practice. Hence, reporting the effect sizes in the Results section of this paper is complemented by a descriptive review.

Effect sizes across studies were averaged using a random-effects model, which does not assume that all studies in the meta-analysis share a common true effect size (Borenstein et al., Reference Borenstein, Hedges, Higgins and Rothstein2009).

To visualize the distribution of effect sizes and CIs, and to detect possible outliers, a forest plot was used. We also performed sensitivity analysis to determine the impact of potential outliers. Sensitivity analysis makes it possible to estimate the adjusted overall effect size after removing studies one by one when extreme effect sizes are detected.

Heterogeneity

We used the Q test of homogeneity (Hedges & Olkin, Reference Hedges and Olkin1985) to examine the heterogeneity in effect sizes. The Q statistic with its p value in a random effect model is a test of significance and reflects whether the variance is significantly different from zero. In addition, we used I², which reflects the extent of overlap of confidence intervals and is considered a measure of inconsistency.

Publication bias

Despite our efforts to identify gray literature, low-effect or nonsignificant studies could still be missing from the meta-analysis. To detect and statistically estimate the potential retrieval bias, we examined a funnel plot, in which a sample-size dependent statistic is plotted on the y-axis and the effect size is plotted on the x-axis. In the absence of publication bias, this plot should form a symmetrical funnel (Cooper, Hedges, & Valentine, Reference Cooper, Hedges and Valentine2009). However, the funnel plot can be difficult to interpret visually when using a random effects model (Lau, Ioannidis, Terrin, Schmid, & Olkin, Reference Lau, Ioannidis, Terrin, Schmid and Olkin2006). Therefore, in addition, a “Trim and Fill” analysis (Duval & Tweedie, Reference Duval and Tweedie2000) was applied. In the eventual presence of publication bias, the “Trim and Fill” analysis would be used to impute values in the funnel plot to make it symmetrical, and an adjusted overall mean effect size would be calculated.

Results

The results from the literature search are reported, followed by the description of results from the systematic review of the task properties. Finally, we present the results from the meta-analysis.

Results from the literature search

The electronic search yielded 1,219 references. In addition, one study was identified through searching in the references. All hits were screened and 14 studies (13 published papers and 1 conference proceeding that met the inclusion criteria were included in the systematic review and meta-analysis. Information on the screening process and the reasons for study exclusion are reported in Figure 1.