Autism spectrum disorders (ASDs) are neurodevelopmental disorders characterized by deficits in social interaction and communication, along with a propensity to engage in repetitive behaviors or have restricted interests (American Psychiatric Association, 2000). The severity of these deficits and the ways in which they are expressed vary considerably. Until recently, most children diagnosed with ASD had severe language impairments or delays, and researchers estimated that as many as half were nonverbal (Lord & Paul, Reference Lord, Paul, Cohen and Volkmar1997). However, more recent estimates suggest that 80%–86% of children with ASDs have some functional language (Lord, Risi, & Pickles, Reference Lord, Risi, Pickles, Rice and Warren2004). A substantial proportion of the school-aged children with ASD do not appear to have deficits in vocabulary, articulation, or syntax (Joseph, Tager-Flusberg, & Lord, Reference Joseph, Tager-Flusberg and Lord2002; Kjelgaard & Tager-Flusberg, Reference Kjelgaard and Tager-Flusberg2001. We will be referring to children with this profile as highly verbal.
However, there are two domains of language that appear to be impaired even in highly verbal children with ASD. First, persons with ASD have impairments in pragmatics (Kelley, Paul, Fein, & Naigles, Reference Kelley, Paul, Fein and Naigles2006; Tager-Flusberg, Paul, & Lord, Reference Tager-Flusberg, Paul, Lord, Volkmar, Paul, Klin and Cohen2005; Young, Diehl, Morris, Hyman, & Bennetto, Reference Young, Diehl, Morris, Hyman and Bennetto2005), which seem related to their deficits in social interaction. Pragmatics represents the skills that allow us to use language as a social tool by going beyond the literal meaning of an utterance to understand the role that it plays in a particular interaction. Highly verbal persons with ASD often perform well on highly structured measures of pragmatic ability (Young et al., Reference Young, Diehl, Morris, Hyman and Bennetto2005). Nonetheless, children with this profile have been found to have deficits in interpreting the conversational intentions (and sometimes the meaning) of nonliteral speech acts (Adachi et al., Reference Adachi, Koedac, Hirabayashi, Maeoka, Shiota and Wright2004; MacKay & Shaw, Reference MacKay and Shaw2004; Martin & McDonald, Reference Martin and McDonald2004); determining the amount or kind of information to provide in a conversation (Ghaziuddin & Gerstein, Reference Ghaziuddin and Gerstein1996; Paul, Orlovski, Marchinko, & Volkmar, Reference Paul, Orlovski, Marchinko and Volkmar2009); producing pragmatically appropriate responses during a conversation (Adams, Green, Gilchrist, & Cox, Reference Adams, Green, Gilchrist and Cox2002); and inferring information that is missing from the discourse (Le Sourn-Bissaoui, Caillies, Gierski, & Motte, Reference Le Sourn-Bissaoui, Caillies, Gierski and Motte2009; Loukusa et al., Reference Loukusa, Leionen, Kuusikko, Jussila, Mattila and Ryder2007).
Second, the use of prosody is often atypical in ASD, even in persons with no structural language impairments (see Tager-Flusberg et al., Reference Tager-Flusberg, Paul, Lord, Volkmar, Paul, Klin and Cohen2005, for review). The term prosody refers to the suprasegmental characteristics of speech, including pitch, duration, and intensity. Descriptions of prosody in ASD have varied from flat and monotonous to variable, singsong, or pedantic (e.g., Kanner, Reference Kanner1943; Lord & Paul, Reference Lord, Paul, Cohen and Volkmar1997; Provonost, Wakstein, & Wakstein, Reference Provonost, Wakstein and Wakstein1966). Atypical prosodic production has been documented at all levels of ability within the autism spectrum (e.g., Baltaxe, Reference Baltaxe1984; Diehl & Paul, Reference Diehl and Paul2012, Reference Diehl and Paul2013; Grossman, Bemis, Plesa Skwerer, & Tager-Flusberg, Reference Grossman, Bemis, Plesa Skwerer and Tager-Flusberg2010; Nadig & Shaw, Reference Nadig and Shaw2012; Paul, Augustyn, Klin, & Volkmar, Reference Paul, Augustyn, Klin and Volkmar2005; Peppé, McCann, Gibbon, O'Hare, & Rutherford, Reference Peppé, McCann, Gibbon, O'Hare and Rutherford2007; Shriberg, Paul, Black, & van Santen, Reference Shriberg, Paul, Black and van Santen2011). There is a smaller but growing body of research exploring whether people with ASD also have deficits in the use of prosody during language comprehension (see next section for a brief review). Much of this work has focused on prosodic cues to a speaker's emotional state and pragmatic intentions, but prosody also plays a role in lexical segmentation, lexical identification, and syntactic parsing. Research on these nonpragmatic functions of prosody in ASD is critical for determining whether there are prosodic deficits that are separate from the general pragmatic deficit noted earlier. Although work in this area has begun, the findings so far leave many questions unanswered (Chevallier, Noveck, Happe, & Wilson, Reference Chevallier, Noveck, Happé and Wilson2009; Diehl, Bennetto, Watson, Gunlogson, & McDonough, Reference Diehl, Bennetto, Watson, Gunlogson and McDonough2008; Grossman et al., Reference Grossman, Bemis, Plesa Skwerer and Tager-Flusberg2010; Paul et al., Reference Paul, Augustyn, Klin and Volkmar2005; Peppé et al., Reference Peppé, McCann, Gibbon, O'Hare and Rutherford2007). The present study explores how children and adolescents with ASD use prosodic cues to disambiguate the syntactic structure of an utterance. It also addresses unanswered questions about how this ability develops in typical children between the ages of 7 and 17. In our paradigm, participants hear instructions with syntactic ambiguities that are resolved by the placement of prosodic boundaries, while their eye movements are recorded. This design allows us to measure how prosody influences comprehension over time. In the remainder of the Introduction, we discuss the prior evidence for deficits in the perception and comprehension of prosody in ASD, with a focus on syntactic parsing; recent work on prosody and syntactic parsing in TD preschoolers; and the hypotheses that motivate the present experiment.
The Perception and Comprehension of Prosody in ASD
Prosody is a structure that organizes the phonetic form of an utterance into larger units (e.g., prosodic words and intonational phrases) and assigns prominence to units within this structure (Beckman, Reference Beckman1996; Selkirk, Reference Selkirk1986; Shattuck-Hufnagel & Turk, Reference Shattuck-Hufnagel and Turk1996). This prosodic structure is marked by changes in the acoustic properties of speech such as fundamental frequency, duration, pausing, and intensity. The prosodic form that a speaker uses for an utterance is shaped by its lexical content, its syntactic structure, the role of the utterance in the discourse, the speaker's emotional state and speech rate, and the intended audience (for reviews, see Shattuck-Hufnagel & Turk, Reference Shattuck-Hufnagel and Turk1996; Wagner & Watson, Reference Wagner and Watson2010). Thus prosodic form contains valid cues to the syntactic, semantic, and pragmatic interpretation of an utterance. These cues are rapidly exploited by listeners during language comprehension (Ito & Speer, Reference Ito and Speer2008; Snedeker & Trueswell, Reference Snedeker and Trueswell2003; for a review, see Wagner & Watson, Reference Wagner and Watson2010).
Much of the research on the comprehension of prosody in ASD has focused on information at the pragmatic level. For example, several studies have found that even high-functioning persons with ASD have deficits in using vocal cues to identify the speaker's emotion (Chevallier, Noveck, Happé, & Wilson, Reference Chevallier, Noveck, Happé and Wilson2011; Golan, Baron-Cohen, Hill, & Rutherford, Reference Golan, Baron-Cohen, Hill and Rutherford2007; Järvinen-Pasley, Peppé, King-Smith, & Heaton, Reference Järvinen-Pasley, Peppé, King-Smith and Heaton2008; Kleinman, Marciano, & Ault, Reference Kleinman, Marciano and Ault2001; Peppé et al., Reference Peppé, McCann, Gibbon, O'Hare and Rutherford2007; Rutherford, Baron-Cohen, & Wheelwright, Reference Rutherford, Baron-Cohen and Wheelwright2002) and using contrastive stress as a cue to discourse structure (Nappa & Snedeker, Reference Nappa and Snedeker2012; Paul et al., Reference Paul, Augustyn, Klin and Volkmar2005), despite fairly strong general language abilities. However, there are good reasons for suspecting that the prosodic comprehension deficit in ASD extends beyond the use of prosody as a pragmatic cue. Electrophysiological studies suggest that the processing of the acoustic correlates of prosodic structure (such as frequency and intensity) is atypical in ASD (Kujala et al., Reference Kujala, Aho, Lepistö, Jansson-Verkasalo, Nieminen-von Wendt and von Wendt2007, Reference Kujala, Kuuluvainen, Saalasti, Jansson-Verkasalo, von Wendt and Lepistö2010; Lepistö, Silokallio et al., Reference Lepistö, Silokallio, Nieminen-von Wendt, Alku, Näätänen and Kujala2006; Russo et al., Reference Russo, Skoe, Trommer, Nicol, Zecker and Bradlow2008). Because the perception of pitch is critical for determining the prosodic structure of an utterance, a deficit of this kind would be expected to interfere with both the pragmatic and the nonpragmatic functions of prosody.
Thus, it is somewhat surprising that the existing studies on prosodic comprehension provide only weak evidence for deficits in nonpragmatic tasks. The three studies which have explored the use of prosodic stress for lexical identification (e.g., re-CORD vs. RE-cord) have found no differences between persons with ASD and well-matched controls (Chevallier et al., Reference Chevallier, Noveck, Happé and Wilson2009; Grossman et al., Reference Grossman, Bemis, Plesa Skwerer and Tager-Flusberg2010; Paul et al., Reference Paul, Augustyn, Klin and Volkmar2005), though there is a consistent decrement in performance across the three studies (3%–6%) that fails to reach significance.
In addition, the role of prosody in syntactic parsing has been explored in five experiments, with a mixed pattern of findings. Four of these experiments used judgment tasks, in which participants heard an utterance with a grouping ambiguity (e.g., chocolate biscuits and jam vs. chocolate, biscuits, and jam) and then selected a picture or gloss that matched the utterance or judged whether a picture matched the utterance. Three of these studies found no difference between persons with ASD and typically developing (TD) controls (Chevallier et al., Reference Chevallier, Noveck, Happé and Wilson2009; Paul et al., Reference Paul, Augustyn, Klin and Volkmar2005; Peppé, McCann, Gibbon, O'Hare, & Rutherford, Reference Peppé, McCann, Gibbon, O'Hare and Rutherford2007), while one found that persons with ASD performed reliably worse than age- and language-matched controls (Järvinen-Pasley et al., Reference Järvinen-Pasley, Peppé, King-Smith and Heaton2008). Although these differences are open to many interpretations, age and developmental level may play a role. The participants in the Järvinen-Pasley et al. study were younger (M = 12 years, 7 months [12;7]) and less verbally proficient than those in the Paul et al. and Chevallier et al. studies, suggesting that deficits in the use of prosody for syntax may resolve over development. In contrast, the children in the Peppé et al. study were even younger (M = 9;10); but the performance of the control group was quite low, suggesting that the task may have been too difficult for these younger language-matched children (M = 6;10).
However, this explanation cannot readily account for the fifth study. Diehl, Bennetto, Watson, Gunlogson, and McDonough (Reference Diehl, Bennetto, Watson, Gunlogson and McDonough2008) compared the prosodic comprehension in adolescents with high-functioning ASD to a control group matched on age, IQ, and receptive language abilities. Participants heard syntactically ambiguous sentences, like (1) and (2), in which prosody could be used to determine the correct action.
1. Put the dog … in the box on the star (Put the dog into the box that's on a star).
2. Put the dog in the box … on the star (Put a dog that's in a box onto a star).
The group with ASD was less likely than their TD peers to act in concordance with the prosodic cue. Diehl et al.'s participants were similar in age, IQ, and language level to those in the Chevallier et al. and Paul et al. studies. Thus, any difference in performance presumably reflects the differences in the tasks that were used. One possibility is that the overt judgment tasks used by Chevallier et al. and Paul et al. may have drawn participants’ attention to the ambiguity and the contrasting prosodies, allowing them to adopt an explicit strategy incorporating these cues (Klin, Jones, Schultz, & Volkmar, Reference Klin, Jones, Schultz and Volkmar2003; Paul et al., Reference Paul, Augustyn, Klin and Volkmar2005). In contrast, the participants in the Diehl et al. task may have followed the commands without becoming aware of the ambiguity. If this were true, we would expect ASD participants to be slower in making the overt judgments than are controls, who presumably do not need to devise task-specific strategies. However, Chevallier et al. found no difference in reaction times between the groups.
A second possibility is that the ASD group had difficulties with the Diehl et al. task that were unrelated to prosody. Diehl et al. used ambiguous sentences with the verb put, which requires two postverbal arguments: an object to be moved and a location to which it should be moved. This creates a strong bias to initially interpret the first prepositional phrase (PP; in the box) as a destination, resulting in verb-phrase attachment (Tanenhaus, Spivey-Knowlton, Eberhard, & Sedivy, Reference Tanenhaus, Spivey-Knowlton, Eberhard and Sedivy1995). In the Diehl et al. study, the ASD group only had difficulty with stimuli in which the prosodic cue was in conflict with this initial lexical bias. In typical adults, this initial bias can be revised if subsequent information indicates that this interpretation is incorrect (e.g., the prosodic break followed by a second PP), but young children fail to revise these initial commitments (Trueswell, Sekerina, Hill, & Logrip, Reference Trueswell, Sekerina, Hill and Logrip1999). This ability to revise emerges gradually between 5 and 11 years of age (Weighall, Reference Weighall2008). Therefore, it is possible that the performance of the ASD group in the Diehl et al. study is not the result of a deficit in the use of prosody but instead reflects a deficit in the ability to revise misinterpreted sentences.
The Use of Prosody for Syntactic Analysis in TD Children
When younger TD children (3–7 years) are tested on prosodic parsing using choice tasks like those above, they also perform quite poorly (Choi & Mazuka, Reference Choi and Mazuka2003; Mazuka, Jincho, & Oishi, Reference Mazuka, Jincho and Oishi2009). These failures are unlikely to result from a basic deficit in prosodic perception. Prosody plays a central role in early speech perception: newborns prefer languages that are prosodically similar to their own (Mehler et al., Reference Mehler, Jusczyk, Lambertz, Halsted, Bertoncini and Amiel-Tison1988), older infants use prosodic structure to find words in the speech stream (Johnson & Jusczyk, Reference Johnson and Jusczyk2001), and prosodic information may even be used during the acquisition of syntax (Christophe, Millotte, Bernal, & Lidz, Reference Christophe, Millotte, Bernal and Lidz2008; Morgan, Reference Morgan and Warren1996).
Snedeker and Yuan (Reference Snedeker and Yuan2008) suggested that young children's failure in prosody for parsing tasks was due to the design of these experiments (see also Mazuka et al., Reference Mazuka, Jincho and Oishi2009). Specifically, like the ASD studies, these experiments used within-subject designs that required children to shift between two response types across trials. Thus, to succeed in these tasks, children must override the interpretation that they got on the last trial to arrive at the correct interpretation on the next. Snedeker and Yuan (Reference Snedeker and Yuan2008) tested this hypothesis using a blocked design. In the first half of the study, prosodic form was manipulated between participants: half the children received instructions like (3) and half received ones like (4).
3. You can pinch the bear … with the barrette. (Use the barrette to pinch)
4. You can pinch … the bear with the barrette. (Pinch the one that has a barrette)
Then, in the second half, the conditions flipped, and participants were given new sentences with the other prosody. These sentences contain only a single ambiguous PP (in contrast with the Diehl et al. study), thus there is no need for participants to revise their analysis of this phrase based on subsequent words. Children 4 and 5 years old carried out the instructions as accurately as adults in the first half, indicating that they were sensitive to these prosodic cues and able to use them for syntactic parsing. However, in the second half, the children tended to perseverate resulting in chance-level performance.
Snedeker and Yuan (Reference Snedeker and Yuan2008) used an additional measure: as participants listened to the instructions, their eye movements were recorded, providing information about how their interpretation of the utterance changed over time (Tanenhaus et al., Reference Tanenhaus, Spivey-Knowlton, Eberhard and Sedivy1995). They found that, during the first block, children began using prosodic information about 500 ms after the onset of the critical word (“barrette” in Instructions 3 and 4), just a few hundred milliseconds after the adults. Thus, they concluded that young children rapidly and spontaneously use prosodic information to resolve syntactic ambiguities, but these abilities can be masked by perseveration across trials in within-subject designs.
The Goals of This Study
In the present study, we use the Snedeker and Yuan (Reference Snedeker and Yuan2008) task to explore prosodic comprehension in children and adolescents with high-functioning ASD and in TD peers matched for age, language ability, and IQ. This will allow us to address four open questions. First, are highly verbal children and adolescents with ASD less likely to use prosodic information during syntactic parsing than are TD children? As noted above, the findings of the prior experiments are mixed and their interpretation is uncertain. If there is a prosodic comprehension deficit in ASD, which disappears in explicit judgment tasks that focus attention on prosodic cues, then this deficit should be visible in the open-ended act-out task, particularly in the first block when participants have heard only one form of the utterance. However, if the differences between groups in the previous studies are due solely to difficulties with syntactic revision or perseveration, then the ASD group should perform as well as TD peers in the first block where there is no need to revise or resist prior interpretations.
Second, do children and adolescents with ASD make use of prosodic cues to syntax as rapidly as TD peers do? If prosodic comprehension in ASD is the result of slow strategic processes, then prosody should have little or no influence on the early eye movements of this group. In contrast, if individuals with ASD are processing this information in the same way as same-age peers, then the effects of prosody should emerge at the same time for both groups.
Third, how does this profile of abilities change from middle childhood into adolescence? To date, there is no research on how online use of prosody develops in TD children after the age of 6. The prior studies using explicit judgment tasks tentatively suggest that performance in typical children improves rapidly around 6 to 9 years of age (Vogel & Raimy, Reference Vogel and Raimy2002), but this improvement may be delayed by a few years in children with ASD, resulting in group differences during the later part of middle childhood (Järvinen-Pasley et al., Reference Järvinen-Pasley, Peppé, King-Smith and Heaton2008), which resolve by adolescence (Chevallier et al., Reference Chevallier, Noveck, Happé and Wilson2009; Paul et al., Reference Paul, Augustyn, Klin and Volkmar2005). To test this developmental hypothesis, we tested two age groups: children (8–12 years) and adolescents (13–18 years).
Fourth, how are prosodic comprehension abilities in both populations affected by interference from prior utterances? We know that typical adults flexibly shift between interpretations in this task, and preschoolers do not, but we do not know when typical children gain this ability. If it emerges at the same time as the ability to revise garden path sentences, then we should expect substantial changes between ages 5 and 11. If the ability to resist interference is impaired in ASD, then performance on the second block of trials should be worse than performance on the first.
Methods
Participants
ASD group
Participants in this group were 48 individuals with high-functioning ASD who were between 7 and 17 years and had verbal abilities within (or above) the normal range (see Table 1). Participants were recruited from databases at the University of Notre Dame, Harvard University, and the Yale Child Study Center; thus, approximately half of our sample was living in the Midwest and half in the Northeast. During an initial phone screening, parents were asked about the results of previously administered standardized tests to facilitate group matching. Families were invited to participate if this interview suggested that they would meet the inclusion criteria (see below).
Table 1. Descriptive characteristics of the sample by diagnostic group
![](https://static.cambridge.org/binary/version/id/urn:cambridge.org:id:binary-alt:20160709212121-50617-mediumThumb-S0954579414000741_tab1.jpg?pub-status=live)
Note: Participants younger than 12.5 were included in the child groups, and those above 12.5 were included in the teen groups. IQ was measured using either the Wechsler Abbreviated Scale of Intelligence (Wechsler, Reference Wechsler1999) or the Differential Ability Scales (Elliott, Reference Elliott1990). ASD, autism spectrum disorder; TD, typically developing comparison group; CELF-IV, Clinical Evaluation of Language Fundamentals, Fourth Edition (Semel et al., Reference Semel, Wiig and Secord2003).
Each participant was independently evaluated by our research team for diagnostic confirmation, and met DSM-IV-TR (American Psychiatric Association, 2000) criteria for one of the three ASD diagnoses (autistic disorder, Asperger disorder, or pervasive developmental disorder, not otherwise specified). Diagnostic confirmation was based on administration of the Autism Diagnostic Observation Schedule—Generic (Lord et al., Reference Lord, Risi, Lambrecht, Cook, Leventhal and DiLavore2000), the Autism Diagnostic Interview—Revised (Rutter, Le Couteur, & Lord, Reference Rutter, Le Couteur and Lord2003), or the Social Communication Questionnaire—Lifetime Form (Rutter, Bailey, Berument, Lord, & Pickles, Reference Rutter, Bailey, Berument, Lord and Pickles2003), as well as the judgment of the experienced clinicians on the research team (which included a clinical psychologist with considerable experience in ASD diagnosis and a licensed speech–language pathologist). IQ was measured using either the Wechsler Abbreviated Scale of Intelligence (Wechsler, Reference Wechsler1999) or the Differential Ability Scales (Elliott, Reference Elliott1990). Participants also completed the subtests in the Clinical Evaluation of Language Fundamentals—Fourth Edition (CELF-4; Semel, Wiig, & Secord, Reference Semel, Wiig and Secord2003), necessary for a Receptive Language Index (RLI) score. Participants were excluded from this study: if they had a full scale IQ (FSIQ), verbal IQ (VIQ), or CELF-4 RLI score below 80; if English was not their first language and the primary language spoken at home; or if they had any uncorrected vision or hearing deficits that would have interfered with study administration. Twenty-one participants were recruited for our ASD group but were not included in our final sample of 48 participants (8 did not meet diagnostic criteria, 12 had IQ or language scores below 80, and 1 was dropped because of technical issues with session video). Our ASD sample was 96% Caucasian and 4% other. We did not collect data on the socioeconomic status of our participants.
TD comparison group
Participants included a sample of 48 individuals between the ages of 7 and 17 (see Table 1). TD participants were recruited from databases at the University of Notre Dame, Harvard University, and the Yale Child Study Center. All participants in this group had no first-degree relatives with an ASD, no previous history of clinical diagnosis or special educational services, and were reported to be in the appropriate grade for their age in school. Participants were screened for an ASD diagnosis using the Social Communication Questionnaire—Lifetime Form and the clinical judgment of the research team described above. All had FSIQ, VIQ, and CELF-4 RLI standard scores above 80. Eight participants were recruited for our comparison group but were excluded from the final sample of 48 (3 for technical problems, 3 for failure to complete the study, and 2 were removed before data analysis to facilitate group matching). Our comparison group was 92% Caucasian, 4% African American, and 4% other.
Groups and matching
Participants in each group were divided into two groups based on an age cutoff (12.5 years), creating four groups: participants with ASD younger than the cutoff (ASD child); participants with ASD older than the cutoff (ASD teen); TD peers below the cutoff (TD child); and TD peers above the cutoff (TD teen). The child groups had an average age and developmental level that was between that of the participants in the Peppé et al. (Reference Peppé, McCann, Gibbon, O'Hare and Rutherford2007) and Järvinen-Pasley et al. (Reference Järvinen-Pasley, Peppé, King-Smith and Heaton2008) studies, while the teen groups were similar in age to the participants in the Chevallier et al. (Reference Chevallier, Noveck, Happé and Wilson2009) and Paul et al. (Reference Paul, Augustyn, Klin and Volkmar2005) studies. The TD and ASD participants in each age group were matched on chronological age, and all four groups were matched on gender, FSIQ, VIQ, and CELF-4 RLI (Table 1).
Procedure
Participants were tested individually in a quiet room in our laboratories or in the participant's home. The procedure for the experimental task was modeled closely on Snedeker and Yuan (Reference Snedeker and Yuan2008). Participants were told that they would be playing a game about following instructions. They were seated in front of an inclined podium with props (see Figure 1) and a camera in the middle that was focused on the participant's face, allowing us to code eye fixations after the experiment was completed. A second camera, placed behind the participant and to the side, recorded the participant's actions. At the beginning of each trial, the experimenter laid out the props and labeled each one twice. Then he or she played prerecorded sound files through external computer speakers. On each trial, the child heard an instruction to look at a fixation point at the center of a display, followed by a command to act on the toys. After completing this action, the child heard a second command and completed it, before moving on to the next trial. The experimenter moved out of the child's view before the first sentence and remained there until the action was completed.
![](https://static.cambridge.org/binary/version/id/urn:cambridge.org:id:binary-alt:20160709212121-65041-mediumThumb-S0954579414000741_fig1g.jpg?pub-status=live)
Figure 1. (Color online) Sample trial in experimental setup. The setup would be accompanied by the utterance “You can feel the frog with the feather.” The feather represents the target instrument, the frog (holding a feather) is the target animal, the feather that the frog is holding is the mini-instrument, and the candle and the leopard (holding a candle) are the distracter instrument and the distracter animal, respectively.
Stimuli
The sound files and the toy sets that were used in the present study were the same as those used in Snedeker and Yuan (Reference Snedeker and Yuan2008) and are described in greater detail in that paper. On the critical trials, the commands were syntactically ambiguous as in (5).
5. You can feel the frog with the feather.
Specifically, each critical instruction contained a PP headed by with that could be syntactically parsed as a part of the noun phrase (NP-attachment) or as a part of the verb phrase (VP-attachment). NP-attachment results in the phrase being semantically interpreted as a modifier (the frog that has the feather), while VP-attachment results in it being interpreted as an instrument (use the feather to feel the frog). These sentences were constructed to ensure that the verb and prepositional object were not biased toward either a modifier or an instrument reading (see Snedeker & Trueswell, Reference Snedeker and Trueswell2004).
Prosody was manipulated by placing an intonational phrase break before the first NP (You can feel … the frog with the feather) to indicate a modifier reading or before the PP (You can feel the frog … with the feather) to indicate an instrument reading. This manipulation of prosody was based on the production patterns observed in child-directed speech (Snedeker & Yuan, Reference Snedeker and Yuan2008) and adult-directed speech (Snedeker & Trueswell, Reference Snedeker and Trueswell2003). The set of toys that accompanied each critical trial consisted of: a target instrument, a full-scale object that could be used to carry out the action (e.g., a feather); a target animal, a stuffed animal holding a small replica of the target instrument (e.g., a frog with a feather); a distractor instrument; and a distractor animal holding a small replica of the distractor instrument (see Figure 1). The placement of the toys on the shelves was counterbalanced across trials such that each type of toy (e.g., target instrument) appeared in each quadrant.
We would expect participants who heard instrument prosody to arrive at a syntactic analysis where the PP was VP-attached and semantically interpreted as an instrument. This should result in more looks to the target instrument after the onset of the critical word (“feather”) and use of the target instrument to act upon the target animal. In contrast, participants who heard modifier prosody should interpret the PP as an NP-attached modifier indicating (redundantly) the animal that they should act upon. This should result in few looks to the target instrument and actions upon the target animal without the use of any instrument. Prior studies have documented this pattern in both adults and preschool-aged children (Snedeker & Trueswell, Reference Snedeker and Trueswell2003, Reference Snedeker and Trueswell2004; Snedeker & Yuan, Reference Snedeker and Yuan2008).
Design
We used a blocked design: prosody was manipulated within participant, but the instrument and modifier prosody trials were not intermixed. Instead, participants were given all the trials of one prosody type before hearing any trials of the other type. Prosody was counterbalanced across lists such that every sentence occurred with both modifier and instrument prosody across participants and each participant heard just one version of each sentence. Trial order was also counterbalanced. As a result, half the participants in every group heard the instrument prosody first and half heard the modifier prosody first. The critical trials were interspersed with filler trials using instructions that were globally unambiguous. The experiment began with two practice trials, followed by 19 trials (8 critical trials and 11 unambiguous fillers). Each trial included 2 commands. On the critical trials, the first command was always the critical command and the second instruction was an unambiguous filler. Thus, participants heard a total of 38 commands (not including practice trials), 8 of which were critical ambiguous commands.
Coding
Trained coders, who were naive to group membership and study goals, watched the videos from the action camera and classified actions into one of four categories: instrument responses (i.e., the target instrument was used to execute the act on the target animal); mini-instrument responses (i.e., the participant used the small version of the instrument that was attached to the target animal to execute the action); modifier responses (i.e., the participant executed the action on the target animal themselves, without the instrument); and other responses (i.e., the participant performed a different action than was specified in the command, or acted on one of the distractor objects). Mini-instrument responses were treated as instrument responses in the data analysis, following Snedeker and Yuan (Reference Snedeker and Yuan2008). Reliability between coders, performed on 20% of the participants, was very high (k = 0.96, range = 0.77–1.00), and disagreements were resolved by consensus.
Eye movements were coded from the videotape of the participant's face, using frame-by-frame viewing. The video was recorded at the standard 30 frames per second. One coder, who had the audio on, recorded the time at which the critical sentence began and the time at which the action began. A second coder was provided with this information and coded the onset of each change in gaze and the direction of each subsequent fixation during this time window, with the audio off. This coder was blind to the prosodic form of the utterance and to the location of each toy (because the toys were not visible in video). The participant's direction of fixation was coded as being to one of the four quadrants of the podium, to the center hole (at the camera), or away from the display. Any frames in which the participants’ eyes were not visible were excluded from the analyses. Blinks without a fixation change were coded as being to the quadrant of the fixation and blinks with a fixation change during the blink were coded as being to the quadrant of the subsequent fixation (like all other saccades). Twenty-three percent of participants were coded by an additional coder, who achieved high reliability on direction of gaze (k = 0.84, range = 0.64–1.0). Disagreements were resolved by a third person. This method of collecting and coding eye movements has been used extensively and validated against an automated eye-tracking system (see Snedeker & Trueswell, Reference Snedeker and Trueswell2004).
We coded and analyzed the eye movements for all trials, including those where the participant did not give the predicted response (e.g., a modifier action for a sentence with instrument prosody). In reaction time studies, alternative responses are generally considered errors, and are usually excluded from the analysis. In contrast, in visual-world studies using an act-out task, alternative responses are typically included in the eye-gaze analyses for a number of reasons. First, these responses also presumably reflect linguistic processing (because the correct action and animal are used), rather than simple guesses. Second, the goal of an eye-tracking study is to determine how the interpretation of an utterance unfolds over time, independent of the ultimate response. Third and most critically, removing data based on the participant's response could result in false findings because where a participant is looking at one time can shape his or her subsequent interpretation of an ambiguous phrase (e.g., Trueswell et al., Reference Trueswell, Sekerina, Hill and Logrip1999).
Results
The results are divided into two sections below. First, we analyze the participants’ actions to understand their final interpretation of the ambiguous utterance. Second, we analyze the participants’ fixations as the utterance unfolds over time to explore the process of moment-to-moment language comprehension. Because of the prior evidence for perseveration in this task in younger children, we analyzed each block of trials separately.
Our primary analyses were mixed-effects logistic regressions that included fixed effects for: prosody (modifier or instrument), age (child or teen), diagnosis (TD or ASD), and all interactions of these variables. An effects-coding scheme was used with first listed level of each variable coded as –1 and the second as 1. Thus, the main effects in these analyses can be interpreted as if they were analyses of variance (ANOVAs).Footnote 1 Whenever we found an interaction between one of the participant variables (age or diagnosis) and prosody, we split the sample on that participant variable and conducted separate analyses of the two groups to understand the nature of the effect. In addition, we conducted separate analyses of the four populations (ASD child, TD child, ASD teen, and TD teen) to determine which effects were reliably present in an individual group. All analyses included random effects for both subject and verb.Footnote 2
Actions
Figure 2 plots the proportion of trials in the first block on which participants performed instrument actions, thus revealing that they had interpreted the ambiguous PP as VP-attached. Figure 3 plots the proportion of instrument actions during the second block of trials. Table 2 lists the results of the mixed models for both blocks.
![](https://static.cambridge.org/binary/version/id/urn:cambridge.org:id:binary-alt:20160709212121-16465-mediumThumb-S0954579414000741_fig2g.jpg?pub-status=live)
Figure 2. Actions in Block 1. The proportion of instrument responses in Block 1, by group. We would expect that instrument prosody would elicit a large number of instrument responses, whereas the modifier prosody would elicit few instrument responses. ASD, Autism spectrum disorder; TD, typically developing comparison group.
![](https://static.cambridge.org/binary/version/id/urn:cambridge.org:id:binary-alt:20160709212121-95879-mediumThumb-S0954579414000741_fig3g.jpg?pub-status=live)
Figure 3. Actions in Block 2. The proportion of instrument responses in Block 2, by group. We would expect that instrument prosody would elicit a large number of instrument responses, whereas the modifier prosody would elicit few instrument responses. ASD, Autism spectrum disorder; TD, typically developing comparison group.
Table 2. Analysis of actions on objects carried out by participants by block of presentation
![](https://static.cambridge.org/binary/version/id/urn:cambridge.org:id:binary-alt:20160709212121-88518-mediumThumb-S0954579414000741_tab2.jpg?pub-status=live)
Note: The dependent variable is whether the action involved the instrument. ASD, Autism spectrum disorder; TD, typically developing comparison group. Bold indicates reliable effects.
*p < .05. **p < .01.
On Block 1, all four groups were strongly influenced by prosody and used it to roughly the same degree, resulting in a robust effect of prosody, no effect of age or diagnosis, and no interactions between these variables and prosody. Separate models for each of the four groups of participants (child TD, child ASD, teen ASD, and teen TD) confirmed that the effect of prosody was reliable in all of them.
The pattern in Block 2 was different. Again there was a reliable main effect of prosody in the omnibus ANOVA. However, there were also reliable interactions between prosody and age and between prosody and diagnosis, indicating that the effect of our manipulation varied across participant groups. To follow up on the interaction of prosody and age, we analyzed the teens and the children separately. In the teens, we found the expected effect of prosody (z = 4.98, p < .001, β = 2.11) but no main effect of or interaction with diagnosis (|z|s < 1.0, ps > .5, |β| < 0.25). Thus, teenagers with ASD performed as well as TD teens. In contrast, in the children, there was both a main effect of prosody (z = 3.21, p = .001, β = 3.21) and a robust interaction of diagnosis and prosody (z = –2.88, p = .004, β = –0.79), indicating that the children with ASD performed worse than their TD peers.
To follow up on the interaction between prosody and diagnosis, we analyzed the TD children and the children with ASD separately. These analyses confirmed the pattern described above. In the TD group, we found a robust effect of prosody (z = 5.78, p < .0001, β = 1.72) and no effect of or interaction with age (zs < 1, ps > .3, |β| < 0.25). However, in the ASD group, the effect of prosody (z = 2.77, p = .006, β = 0.98) was superseded by an interaction of age and prosody (z = 2.42, p = .02, β = 0.82).
It is critical that when we constructed separate models for each of the four groups, we found that both groups of teens and the TD child group showed a reliable effect of prosody on their actions, whereas the ASD child group did not. Thus, although all participants were able to use prosody to guide their final interpretation of the utterances in the first block of trials, those in the ASD child group were at chance in the second block, suggesting that they had difficulty shifting their interpretation of the ambiguous utterance when the prosodic cues changed.
To get an approximate measure of reaction times, we calculated the number of frames from the onset of the prepositional object to the onset of the action. The reaction times for each block were analyzed using a mixed-model linear regression with the same independent variables as the action analyses. We found no effects of age or diagnosis and no interactions with these variables (|z|s < 1.7, ps > .1, |β|s < 3.0). In Block 1, participants responded more slowly to instrument prosody than to modifier prosody (M = 1561 ms for instrument and M = 1325 ms for modifier, z = 2.55, p = .01, β = 3.18). In Block 2, they responded more quickly to instrument prosody than modifier prosody (M = 1176 ms for instrument and M = 1455 ms for modifier, z = –3.29, p = .001, β = –4.28).
Temporal analysis of eye movements
To explore how participants’ interpretation of the utterance changed over time, we examined fixations to the target instrument for both Block 1 (Figure 4a, b) and Block 2 (Figure 5a, b). The first three data points in each panel represent the proportion of time that participants were looking at the target instrument during each critical time window, while the last data point in each figure shows the proportion of instrument actions. The critical time windows are synchronized to the onset of the object in the PP (e.g., feather in “You can feel the frog with the feather”). Each time window begins 200 ms after the onset of the critical linguistic information, to account for the time that it takes to program and launch an eye movement (Allopena, Magnuson, & Tanenhaus, Reference Allopena, Magnuson and Tanenhaus1998). Our first time window (33–200 ms) is called the with-window because it includes gaze shifts that occurred in response to the initial part of the PP (with the). Our early-PP window (233–700 ms) includes fixations initiated after the onset of the critical word (feather), and our late-PP window (733–1200 ms) includes fixations initiated after the utterance ended. Participants could begin anticipating the upcoming noun (the potential instrument or modifier) as soon as they encounter the preposition (with). Thus, we might see effects of prosody as early as the with-window (during the beginning of the with the PP). However, Snedeker and Yuan (Reference Snedeker and Yuan2008) found that prosody appeared somewhat later: during the early-PP window (e.g., after the critical word) for adults and during the late-PP window (e.g., after the utterance had ended) for preschool-aged children.
![](https://static.cambridge.org/binary/version/id/urn:cambridge.org:id:binary-alt:20160709212121-57604-mediumThumb-S0954579414000741_fig4g.jpg?pub-status=live)
Figure 4. Target instrument looking time relative to the ambiguous prepositional phrase in Block 1. (a) The responses of the child groups (8.0–12.5 years), and (b) the responses of the teen groups (12.5–17.0 years). Action responses are included in the right window for comparison. ASD, Autism spectrum disorder; TD, typically developing comparison group; PP, prepositional phrase. With-window represents 33–200 ms period during the beginning of the prepositional phrase “with the.” Early-PP window represents 233–700 ms period after onset of “feather.” Late-PP window represents 733–1200 ms after the utterance had ended.
![](https://static.cambridge.org/binary/version/id/urn:cambridge.org:id:binary-alt:20160709212121-39997-mediumThumb-S0954579414000741_fig5g.jpg?pub-status=live)
Figure 5. Target instrument looking time relative to the ambiguous prepositional phrase in Block 2. (a) The performance of the child groups (8.0–12.5 years), and (b) the performance of the teen groups (12.5–17.0 years). Action responses are included in the right panel for comparison. ASD, Autism spectrum disorder; TD, typically developing comparison group; PP, prepositional phrase. With-window represents 33–200 ms period during the beginning of the prepositional phrase “with the.” Early-PP window represents 233–700 ms period after onset of “feather.” Late-PP window represents 733–1200 ms after the utterance had ended.
The dependent variable in these analyses was whether there was a look to the target instrument during the time window. A fixation or a saccade to the target instrument at any point during the window was coded as “1” and all other trials were coded as “0.” Tables 3 and 4 list the results of the ANOVAs for the critical variables for Block 1 and Block 2, respectively.
Table 3. Temporal analyses of gaze fixations for Block 1
![](https://static.cambridge.org/binary/version/id/urn:cambridge.org:id:binary-alt:20160709212121-37639-mediumThumb-S0954579414000741_tab3.jpg?pub-status=live)
Note: The dependent variable is whether there was a look to the target instrument during that time window. PP, Prepositional phrase; ASD, autism spectrum disorder; TD, typically developing comparison group. Bold indicates reliable effects.
*p < .05. **p < .01.
Table 4. Temporal analysis of fixations for Block 2
![](https://static.cambridge.org/binary/version/id/urn:cambridge.org:id:binary-alt:20160709212121-85842-mediumThumb-S0954579414000741_tab4.jpg?pub-status=live)
Note: The dependent variable is whether there was a look to the target instrument during that time window. PP, Prepositional phrase; ASD, autism spectrum disorder; TD, typically developing comparison group. Bold indicates reliable effects.
*p < .05. **p < .01.
Block 1
During the with-window (33–200 ms, initiation of “with the”) there was a significant interaction between prosody and diagnosis. To determine the source of the interaction, we analyzed the two diagnostic groups separately. In the TD group, there was no effect of prosody nor any effect of age or interaction with age (|z|s < 1.0, ps > .4, |β|s < 0.2). In contrast, for the ASD group, there was a reliable effect of prosody (z = 2.48, p = .01, β = 0.58). Participants with ASD looked at the target instrument more often in the instrument prosody condition than in the modifier prosody condition, suggesting that they had anticipated the role of the upcoming noun on the basis of intonation. While Figure 4 suggests that this effect is largely carried by the ASD teens, there was no effect of age or interaction with age (|z| < 1.0, p > .3, |β| < 0.3). Thus, the interaction between prosody and diagnosis in the with-window indicates that the prosodic manipulation had a more rapid effect on the ASD group than it did on the TD group.
In the early-PP window (233–700 ms, after onset of critical word feather), there was a large effect of prosody, with more looks to the target instrument in the instrument prosody condition. There was no interaction with either age or diagnosis, suggesting that the use of intonation was similar across the groups of participants. Separate models for each of the groups (TD child, ASD child, ASD teen, and TD teen) confirmed that the effect of prosody was reliable in each of them (see Table 5). Thus, we see no evidence that the initial processing of the PP differs between children with ASD and their TD peers. This pattern persisted in the late-PP window (733–1200 ms, after the utterance had ended). There was a robust effect of prosody, which was reliable in each of the four participant groups (see Table 5).
Table 5. The effect of prosody within each group in Block 1
![](https://static.cambridge.org/binary/version/id/urn:cambridge.org:id:binary-alt:20160709054232-58447-mediumThumb-S0954579414000741_tab5.jpg?pub-status=live)
Note: TD, Typically developing comparison group; ASD, autism spectrum disorder; PP, prepositional phrase.
*p < .05. **p < .01.
In sum, children and teens with ASD were able to use prosodic cues to guide their unfolding interpretations of the utterances during the first block of trials. The one reliable difference between the ASD group and TD controls (the interaction between diagnosis and prosody in the with-window, 33–200 ms, during the onset of the PP) indicated that participants with ASD were making more rapid use of the prosodic information.
Block 2
The eye movements in Block 2 (Figure 5a, b) showed a very different pattern. In the with-window (33–200 ms), there was a reliable interaction of prosody and age but no other effects. To explore this interaction, we conducted separate analyses for the two age groups. In the teen group, there were no reliable effects or interactions (|z|s < 1.5, ps > .1, |β|s < 0.4). However, in the child group, there was a robust reverse prosody effect: those who heard sentences with modifier prosody looked at the target instrument more than those who heard instrument prosody (z = –3.06, p = .002, β = –1.19). Furthermore, there was no main effect of diagnosis or interaction between prosody and diagnosis (|z|s < 0.3, ps > .7, |β|s < 0.2), suggesting that the reverse prosody effect was no larger in the child ASD group than in the child TD group.
These effects are not presumably a result of the prosodic manipulation itself but instead stem from interference from the earlier utterances. The children who had heard instrument prosody in Block 1 continued looking at the target instrument in Block 2, even though they were now hearing modifier prosody. Similarly, the children who had heard modifier prosody in Block 1 continued ignoring the target instrument in Block 2. Thus, this effect suggests that the child participants in both groups were predicting the meaning of the utterance on the basis of their initial experiences in the study and were (at first) failing to notice the prosodic cues that signaled a shift in interpretation.
This difference between the two age groups persists into the early-PP window (233–700 ms, after critical word), where there is a main effect of prosody, an effect of age, and an interaction between prosody and age. To better understand the interaction, we conducted separate analyses of the children and teens. In the child group, there was no longer any effect of prosody, nor was there any effect of or interaction with diagnosis (|z|s < 1.6, ps > .1, |β|s < 0.3). In contrast, for the teen group, there was a robust effect of prosody (z = 4.63, p < .0001, β = 0.90) and no effect of or interaction with diagnosis (|z|s < 0.5, p > .5, |β|s < 0.1). Analyses of the four groups of participants revealed robust effects of prosody in the ASD teen and TD teen groups, a marginal effect in the TD child group, but no effect in the ASD child group (see Table 6).
Table 6. The effect of prosody within each group in Block 2
![](https://static.cambridge.org/binary/version/id/urn:cambridge.org:id:binary-alt:20160709212121-39827-mediumThumb-S0954579414000741_tab6.jpg?pub-status=live)
Note: TD, Typically developing comparison group; ASD, autism spectrum disorder; PP, prepositional phrase.
*p < .05. **p < .01.
Finally, in the late-PP window (733–1200 ms, after end of utterance), there was a main effect of prosody in the omnibus analysis and no other effects or interactions. Analyses of the four groups of participants (Table 6) indicated that this effect was large and robust in both of the teen groups and the TD child group but absent in the ASD child group.
In sum, the teens and children show very different processing patterns in the second block. The teens use the prosodic form of the utterance to close in on the correct interpretation of the PP in the early-PP window (230–700 ms, after critical word). In contrast, the children initially predict that the utterance will have the same interpretation as the sentences in the previous block, resulting in reverse-prosody effects in the with-window (33–200 ms). This effect disappears in the later time windows, suggesting that the children are beginning to revise their interpretation of the sentence. In the TD child group, this process eventually results in patterns of fixations and actions that correctly reflect the prosody of the utterance. For the ASD children, revision is less successful, resulting in chance performance.
Discussion
The results of this experiment answer the questions that we posed in the Introduction. First, we found that children and adolescents with ASD are as likely as TD peers to use prosodic information to resolve syntactic ambiguity, provided that there is no need to revise their interpretation of the utterance or override perseveration. On the initial block of trials, both groups responded correctly about 80% of the time. Second, the ASD groups were able to use prosodic cues to syntax at least as rapidly as TD peers, suggesting that similar comprehension mechanisms were used by both populations. Specifically, in the ASD group, prosody had a reliable effect on eye movements immediately after the onset of the preposition (Block 2, with-window, 33–200 ms), suggesting that they were beginning to anticipate the content and interpretation of the PP. This effect was not present in the TD participants in the initial block of trials. Nevertheless, after the onset of the critical word, prosody had a robust effect on interpretation that was similar in all groups (Block 1, early PP-window, 233–700 ms; after critical word feather). This effect persisted throughout the trial (Block 1, late PP-window, 733–1200 ms; after end of utterance). Third, these results reveal that the developmental changes from ages 7 to 17, for both TD and ASD children, are primarily related to the ability to shift one's interpretation of an ambiguous sentence. In Block 2, both the ASD and the TD child groups initially misinterpreted the critical sentence, predicting that it would have the same interpretation as the critical sentences in Block 1. In contrast, the teens were able to quickly use the prosodic cue to shift their interpretation of the utterance and showed no evidence of interference in Block 2. Fourth, our findings suggest that younger children with ASD are less able to overcome this interference than are their TD peers (Block 2; actions).
In the remainder of this paper, we explore how these results can be reconciled with prior studies of prosody and parsing in ASD, how to account for the perseveratory errors in the child ASD group, how these findings constrain our understanding of the broader prosodic impairment in ASD, what they tell us about typical development, and the limitations of the present study.
Reconciling the findings on prosody and parsing in ASD
The results of this study are consistent with the prior literature on prosody and parsing in ASD and provide insight into findings that had seemed incompatible. As we noted in the introductory section, all of the prior studies used tasks in which participants had to shift between two different syntactic structures, creating the potential for interference across trials. We find that the ability to override this interference develops by the age of 7 or 8 in TD children and is delayed in children with ASD, emerging when they reach a verbal age of about 12. This finding suggests that children with ASD will diverge from language-matched TD peers on these tasks when they have verbal mental ages between 8 and 12 years, but otherwise they perform similarly. This prediction is confirmed for the studies using judgment tasks; studies in which the average verbal age is below 8 (Peppé et al., Reference Peppé, McCann, Gibbon, O'Hare and Rutherford2007) or over 13 (Paul et al., Reference Paul, Augustyn, Klin and Volkmar2005; Chevallier et al., Reference Chevallier, Noveck, Happé and Wilson2009) have found no differences between groups, while the one study with a sizable portion of children in this critical developmental window did find a difference between children with ASD and language-matched children with developmental delays (Järvinen-Pasley et al., Reference Järvinen-Pasley, Peppé, King-Smith and Heaton2008).
The Diehl et al. (Reference Diehl, Bennetto, Watson, Gunlogson and McDonough2008) study does not conform to this pattern. The participants were very similar to the ASD teen group in the present study: they had IQ and language scores within the normal range and were primarily over 12 (11–19 years, M = 15;3). Our ASD teen group avoided perseveration and performed as well as TD controls. In contrast, the ASD group in the Diehl et al. study performed worse than their TD peers. We believe that this difference is attributable to the kind of sentences used in their study. As we noted earlier, the difference between the ASD and TD groups in the Diehl et al. study was limited to the condition where prosody was in conflict with the preferred interpretation of the sentence (“Put the dog in the basket … on the star”). In this case, participants must use prosodic structure to revise an initial misinterpretation of the first PP (Tanenhaus et al., Reference Tanenhaus, Spivey-Knowlton, Eberhard and Sedivy1995). In contrast, in the present study, the critical prosodic cues always occur before the ambiguous phrase, and thus reanalysis is never required. The ability to revise misparsed utterances develops during middle childhood and appears to involve executive functions such as cognitive control and working memory (Novick, Trueswell, & Thompson-Schill, Reference Novick, Trueswell and Thompson-Schill2005; Trueswell et al., Reference Trueswell, Sekerina, Hill and Logrip1999). Thus, the group differences in the Diehl et al. study could be attributable to deficits in executive functions and syntactic revision, rather than deficits in prosody comprehension. Our results rule out an alternate interpretation of the prior literature. The discrepancy between the Diehl et al. data and the explicit judgment studies cannot be attributed to differences between action-based tasks and reflective tasks (that might give rise to strategizing); we also employed an act-out paradigm with similar task demands but found no deficits in prosodic processing for adolescents with ASD.
Why do children with ASD have difficulty overcoming interference?
The one difference that we observed between the ASD and TD groups was the poorer performance of the child ASD group on the second block, which appears to reflect a failure to override interference from the earlier trials. This failure is not absolute. If the children were completely immune to the change in prosody, they would continue responding as they did in the first block, resulting in a reliable reverse-prosody effect in their actions. This finding cannot be attributed to the presence of two groups (one that perseverates and one that switches): two thirds of the ASD child group produced both correct and incorrect responses in the second block. Instead, the pattern of performance across these four trials shows that the children with ASD are gradually adjusting their interpretation to match the new prosodic form: children get 78% of the actions right on Block 1, on the first trial after the switch performance drops to 38% correct, and then it gradually recovers, reaching 67% on the final trial. In contrast, for the other three groups, performance is above chance on the first switch trial (75% for ASD teen, 73% for TD child, and 71% for TD teen) and does not improve on subsequent trials. Thus, between the ages of 7 and 12, children with ASD are able to form strong expectations about syntax on the basis of prosodic information, but they have difficulty overriding these expectations when prosody changes. Nevertheless, they do detect the change in prosody, and over the course of a few trials, they shift their interpretation of the ambiguity to match it.
One possible interpretation of this finding is that it reflects a deficit in executive function. Children with ASD perform more poorly than controls on a wide range of executive function measures (for reviews see Hill, Reference Hill2004; Russo et al., Reference Russo, Flanagan, Iarocci, Berringer, Zelazo and Burack2007). The deficit that is most consistently found is a difficulty switching between different rules on the Wisconsin Card Sorting Task and similar paradigms; persons with ASD tend to perseverate, producing responses that are consistent with the rule that they had learned earlier, much like they did in our prosody task. These deficits are present even in highly verbal persons with ASD and even when participants are matched on the basis of their verbal abilities (Ambery, Russell, Perry, Morris, & Murphy, Reference Ambery, Russell, Perry, Morris and Murphy2006; Ozonoff, Pennington, & Rogers, Reference Ozonoff, Pennington and Rogers1991; Ozonoff et al., Reference Ozonoff, Cook, Coon, Dawson, Joseph and Klin2004; Verte, Guerts, Roeyers, Ooosterlaan, & Sergeant, Reference Verte, Guerts, Roeyers, Ooosterlaan and Sergeant2006). Thus, although we did not collect any information about executive function abilities in our sample, it is possible that the ASD and TD groups differed in this respect but that the executive function requirements of this task are simple enough that both groups have mastered them by age 13.
Characterizing the prosodic deficit in ASD
The present results also constrain our understanding of the broader prosodic deficit in ASD. If children with ASD can use prosody to determine the syntactic structure of an utterance, it suggests that their other prosodic difficulties are not due to a global deficit in prosodic comprehension, but instead reflect a more circumscribed problem. The literature suggests three possibilities.
Hypothesis 1:
The perception of prosody is intact in ASD and deficits appear only when prosodic information is used by another process (like pragmatics) which is itself impaired.
This hypothesis predicts that people with ASD should have no deficits in using prosody for nonpragmatic functions, when compared to controls with similar lexical and syntactic abilities. This prediction is consistent with most of the prior research. Children with ASD perform as well as language-matched peers in tasks tapping the use of prosody for syntactic comprehension (see above), syntactic production (Paul et al., Reference Paul, Augustyn, Klin and Volkmar2005; Peppé et al., Reference Peppé, McCann, Gibbon, O'Hare and Rutherford2007), and lexical comprehension (Chevallier et al., Reference Chevallier, Noveck, Happé and Wilson2009; Grossman et al., Reference Grossman, Bemis, Plesa Skwerer and Tager-Flusberg2010; Paul et al., Reference Paul, Augustyn, Klin and Volkmar2005). The findings on the production of lexical stress (reCORD vs. REcord) are less clear: while one study found less accurate production in persons with ASD (Paul et al., Reference Paul, Augustyn, Klin and Volkmar2005), the other found that speakers with ASD were equally accurate (Grossman et al., Reference Grossman, Bemis, Plesa Skwerer and Tager-Flusberg2010). This could reflect extraneous demands of the experimental tasks. The Paul et al. study used the reading task similar to that used to study the disambiguation of homographs (the wind blows/wind up the string). In these studies, persons with ASD typically perform worse than matched controls (Frith & Snowling, Reference Frith and Snowling1983; Happé, Reference Happé1997; Jolliffe & Baron-Cohen, Reference Jolliffe and Baron-Cohen1999; Lopez & Leekam, Reference Lopez and Leekam2003), because they have difficulty using context to identify the correct meaning (Happé, Reference Happé1997) or to inhibiting one pronunciation of a string shortly after using another (Hala, Pexman, & Glenwright, Reference Hala, Pexman and Glenwright2007).
This hypothesis is also consistent with the prior evidence for a deficit in using prosody for pragmatic purposes, such as determining a speaker's emotional state on the basis of his or her tone of voice (Golan et al., Reference Golan, Baron-Cohen, Hill and Rutherford2007; Järvinen-Pasley et al., Reference Järvinen-Pasley, Peppé, King-Smith and Heaton2008; Kleinman et al., Reference Kleinman, Marciano and Ault2001; Peppé et al., Reference Peppé, McCann, Gibbon, O'Hare and Rutherford2007; Rutherford et al., Reference Rutherford, Baron-Cohen and Wheelwright2002), or using pitch accents to encode the discourse function of a word (Baltaxe & Guthrie, Reference Baltaxe and Guthrie1987; McCaleb & Prizant, Reference McCaleb and Prizant1985; Paul et al., Reference Paul, Augustyn, Klin and Volkmar2005; Peppé et al., Reference Peppé, McCann, Gibbon, O'Hare and Rutherford2007). One challenge for this hypothesis is accounting for those studies that have failed to find deficits in the use of prosody for pragmatic purposes. Many of these null findings may be attributable to ceiling effects, floor effects, or the use of small and heterogeneous samples. For example, performance on the contrastive stress comprehension task used by Peppé et al. (Reference Peppé, McCann, Gibbon, O'Hare and Rutherford2007) is near chance for both groups, suggesting that the task may be too hard for this age group. In other cases, null findings could reflect the boundaries of the pragmatic deficit in ASD. For example, across a wide age range, persons with ASD perform as well as controls at using prosody to distinguish questions from statements (Chevallier et al., Reference Chevallier, Noveck, Happé and Wilson2009; Järvinen-Pasley et al., Reference Järvinen-Pasley, Peppé, King-Smith and Heaton2008; Peppé et al., Reference Peppé, McCann, Gibbon, O'Hare and Rutherford2007; Paul et al., Reference Paul, Augustyn, Klin and Volkmar2005), but this pragmatic inference is simple and does not require modeling the speaker's mental state. Pragmatic inferences with this profile (such as scalar implicatures) do not appear to be impaired in highly verbal ASD (Chevallier, Wilson, Happé, & Noveck, Reference Chevallier, Wilson, Happé and Noveck2010; Pijnacker et al., Reference Pijnacker, Geurts, Van Lambalgen, Kan, Buitelaar and Hagoort2009).
Hypothesis 2:
There are impairments in the perception of prosody in ASD, but they affect prosodic features that are not needed for prosodic parsing.
Theories of prosody generally make a distinction between paralinguistic prosody and prosodic structure. Paralinguistic prosody is the global properties of an utterance (speed, mean pitch, and tone of voice), which can provide information about the physiological/emotional state of a speaker. Prosodic structure has two dimensions: intonational phrasing groups words together into prosodic units, while the placement of pitch accents indicates the prominence of units within this structure (Speer & Ito, Reference Speer and Ito2009; Wagner & Watson, Reference Wagner and Watson2010). Syntactic structure is systematically linked to intonational phrasing but not to pitch prominence (Lee & Watson, Reference Lee and Watson2011) or paralinguistic prosody. In contrast, accent placement is about how an utterance relates to the prior discourse. This suggests a theoretical possibility: perhaps the use of intonational phrasing is unimpaired in ASD while the use of pitch accents is impaired.
This theory correctly predicts that children with ASD should do poorly on tasks involving contrastive stress but well on tasks in which intonational boundaries provide syntactic information. This hypothesis is also consistent with the naturalistic production studies, which suggest that ASD is characterized by the use of repetitive and simple pitch contours, which are produced with more extreme pitch variation (Diehl, Watson, Bennetto, McDonough, & Gunlogson, Reference Diehl, Watson, Bennetto, McDonough and Gunlogson2009; Green & Tobin, Reference Green and Tobin2009; Nadig & Shaw, Reference Nadig and Shaw2012). Finally, this hypothesis suggests that pragmatic inferences that depend on paralinguistic prosody (e.g., judgments of emotion or intended audience) may show different patterns of impairment across populations than those that depend on pitch prominence.
The appeal of this hypothesis is its potential to connect deficits in linguistic tasks to deficits in perception. If pitch prominence was primarily signaled by one acoustic parameter (e.g., fundamental frequency) and intonational boundaries by another (e.g., duration or pausing), then deficits in processing accents could arise from atypical sensory processes. This hypothesis is consistent with the literature on auditory perception, which suggests that atypical processing of frequency is more common than atypical processing of duration and intensity (Jones et al., Reference Jones, Happé, Baird, Simonoff, Marsden and Tregay2009; Marco, Hinkley, Hill, & Nagarajan, Reference Marco, Hinkley, Hill and Nagarajan2011; O'Connor, Reference O'Connor2012).
Hypothesis 3:
True prosodic deficits in ASD only occur in persons who also have language delays.
Two lines of reasoning suggest that the nature of the prosodic deficit in ASD might vary with a person's overall level of linguistic functioning. First, because prosody plays a central role in language acquisition and comprehension, it seems unlikely that a child with a broad prosodic deficit would acquire language on the TD timetable. For example, both infants and adults use prosody during speech perception to find the boundaries of words (lexical segmentation; see Cutler, Dahan, & van Donselaar, Reference Cutler, Dahan and van Donselaar1997; Johnson & Jusczyk, Reference Johnson and Jusczyk2001). Consequently, someone who was prosodically insensitive would be expected to have delays in vocabulary acquisition (because of difficulty learning word forms) and spoken language comprehension (because lexical access would be slower).
Second, prosodic deficits in ASD may be more common in participants who have lexical and syntactic delays. Peppé et al. (Reference Peppé, Cleland, Gibbon, O'Hare and Martínez-Castilla2011) found that children with ASD with a history of preschool language delay performed worse than language-matched controls on most measures of expressive prosody. In contrast, those without early language impairments performed similarly to controls on every measure except imitation. In general, studies in which the ASD participants have mild language delays are more likely to find evidence for prosodic deficits (Järvinen-Pasley et al., Reference Järvinen-Pasley, Peppé, King-Smith and Heaton2008; Peppé et al., Reference Peppé, McCann, Gibbon, O'Hare and Rutherford2007) than are studies where the participants have language abilities at age level (Chevallier et al., Reference Chevallier, Noveck, Happé and Wilson2009, Reference Chevallier, Noveck, Happé and Wilson2011; Grossman et al., Reference Grossman, Bemis, Plesa Skwerer and Tager-Flusberg2010; Paul et al., Reference Paul, Augustyn, Klin and Volkmar2005).
What these findings tell us about typical development
While the primary goal of the present study was to understand prosodic parsing in ASD, this research also provides novel information about the development of this ability in TD children. We found that initial sensitivity to prosody does not change between the ages of 4 and 12: on the first block of trials, both the 4- and 5-year-olds in Snedeker and Yuan (Reference Snedeker and Yuan2008) and the 8- to 12-year-olds in the present study acted in accordance with prosody about 75% of the time. This is remarkable given the dramatic changes in attention, motivation, and education across this age range. It suggests that the present task has few extraneous demands. However, a comparison of the eye-movement data from the first block of trials indicates that school-aged (8- to 12-year-old) children are faster to use prosody than are preschoolers. In preschoolers, the effects of prosody emerged during the late-PP window (500–1000 ms after the critical word), but in older children and adolescents (in the present study) and in adults (in Snedeker & Yuan, Reference Snedeker and Yuan2008) these effects emerged during the early-PP window (0–500 ms). Thus, prosodic parsing becomes more rapid between 5 and 8 years of age, but it may not get faster after that time.
We found that the tendency to perseverate across trials, which is robustly present in 4- and 5-year-olds, has disappeared by school age (7–12 years). This is not because school-age children do not experience interference across blocks (their early eye movements suggest that they do); instead, it reflects an improved ability to overcome interference and respond in accordance with the prosodic cues. This pattern is reminiscent of the change that occurs in children's ability to revise garden-path sentences. Young children, like adults, use the information that they encounter early in a sentence to determine how to interpret syntactic ambiguities. Adults will revise these commitments on the basis of cues that occur later in the sentence, but young children will not (Trueswell et al., Reference Trueswell, Sekerina, Hill and Logrip1999). Performance on these tasks improves rapidly at around the age of 8 (Trueswell et al., Reference Trueswell, Sekerina, Hill and Logrip1999; Weighall, Reference Weighall2008), but no eye-movement data has been published from school-aged children to confirm that revision is involved. Our data fills this gap by providing clear evidence that school-aged children misanalyze this structural ambiguity (with window Block 2) and then correctly revise their interpretation (late-PP window after the utterance ended, and actions Block 2). Thus, our findings suggest that syntactic revision improves substantially at around 8 years of age. This change has been argued to reflect the development of executive functions (specifically cognitive control; see Novick et al., Reference Novick, Trueswell and Thompson-Schill2005), and the failure to revise in the ASD children, who may have executive function impairments, is consistent with this hypothesis.
Finally, we found that the TD adolescents differed from the TD children in one critical respect: their eye movements suggested that they did not experience prolonged interference when they shifted interpretations. It is unlikely that the adolescents failed to make any prediction about the syntactic ambiguity on the basis of the earlier trials; this kind of syntactic priming is a robust feature of comprehension across the lifespan (Thothathiri & Snedeker, Reference Thothathiri and Snedeker2008a, Reference Thothathiri and Snedeker2008b). One possibility is that adolescents made predictions but were able to quickly update them as the experiment progressed. If this were the case, the incorrect prediction on the first trial after the switch would be balanced out by correct predictions on subsequent trials, rendering it invisible in the present data. This ability to quickly shift perspectives could be due to an awareness of the ambiguity of the sentence and the expectation that both interpretations will be present in the study. In prior studies using similar materials, adults were typically aware of syntactic ambiguities (Snedeker & Trueswell, Reference Snedeker and Trueswell2003, Reference Snedeker and Trueswell2004).
Limitations
The present study has a number of limitations that should be considered in assessing its clinical relevance. First, we focused solely on children with strong structural language skills; thus, the results may not generalize to the broader population of children with ASD. However, as we noted above, an increasing proportion of children with ASD diagnoses have language abilities within the normal range. Furthermore, this population may be of particular relevance to those working with ASD children in mainstream educational settings. Second, we did not test children under 7, and thus we do not know whether the early development of prosodic parsing in ASD deviates from that of TD children. Third, we tested only one of the possible manipulations of prosody, and thus we cannot know whether other aspects of the syntax–prosody interface develop more slowly in ASD, perhaps because they are more subtle or complex. Fourth, we did not directly measure executive function. Such measures would be useful to test our hypothesis about the role of executive function in switching between responses in the second block of the task.
Conclusions
In sum, these results provide a window into developmental changes and moment-to-moment prosody processing in individuals with ASD and their TD peers. The subtle but striking differences found in this study highlight the importance of understanding how language comprehension unfolds over time, in addition to the final behavioral responses. To date, only one published study has used the visual-world paradigm to explore moment-to-moment language comprehension in ASD (Brock, Norbury, Einav, & Nation, Reference Brock, Norbury, Einav and Nation2008). The visual-world paradigm provides rich information about how interpretation involves over time with minimal task demands, and thus it is well suited to exploring the processes that underlie language comprehension in developmental disorders. The present experiment demonstrates this. The eye-movement data allowed us to conclude that children and adolescents with ASD not only use prosody to resolve ambiguity but also use a mechanism that has a similar temporal profile to the one used by their TD peers. This suggests that prosodic processing in ASD does not involve task-specific strategies, because presumably these would be slower and less efficient. The eye-movement data also clarifies the nature of the difference between TD and ASD children: both groups develop expectations about the utterances on the basis of their experience, but while TD children overcome this initial misanalysis, those with ASD do not. Finally, the eye data clarifies the nature of the developmental changes that occur between childhood and adolescence. If we had only the actions, we might have thought that adolescents with ASD perform better than children with ASD because they are able to resolve interference like TD peers. Our data suggests that adolescents, in both groups, appear to avoid interference altogether. Future studies using this approach will provide a richer understanding of language comprehension in individuals with ASD.