Introduction
The ability to recognise and decode the emotional expressions of others is a vital component of human communication and learning, and the understanding of emotion develops throughout childhood. Overall, evidence suggests that children can first differentiate between general positive and negative emotions, and then gradually identify more specific categories of emotion as they get older (Sauter, Panattoni & Happe, Reference Sauter, Panattoni and Happe2013; Vicari, Snitzer Reilly, Pasqualetti, Vizzotto & Caltagirone, Reference Vicari, Snitzer Reilly, Pasqualetti, Vizzotto and Caltagirone2000; Widen, Reference Widen2013). Emotional recognition and understanding appear to vary depending on a range of individual, familial, and environmental factors. For example, better emotion recognition is related to better language ability and theory of mind skills, more secure attachment, older age, being female, more discussion about feelings within the family, higher quality of family interactions, and higher parental education and status of occupation (Cutting & Dunn, Reference Cutting and Dunn1999; Dunn & Cutting, Reference Dunn and Cutting1999). Additionally, children with an autism spectrum disorder (ASD) often display delays or deficits in their ability to decode and respond to emotional information (e.g., Dawson,Webb, Carver, Panagiotides & McPartland, Reference Dawson, Webb, Carver, Panagiotides and McPartland2004). Even adults who show “autism-like” traits, but do not meet a diagnosis for ASD, show differences in their recognition of emotion (e.g., Hosokawa, Nakadoi, Watanabe, Sumitani & Ohmori, Reference Hosokawa, Nakadoi, Watanabe, Sumitani and Ohmori2015; Ingersoll, Reference Ingersoll2010; Poljac, Poljac & Wagemans, Reference Poljac, Poljac and Wagemans2012). However, the extent to which autism-like traits relate to emotion processing in childhood is unknown. Such knowledge may advance understanding of individual differences in emotion processing, the impact these differences have upon other skills such as learning, as well as how these processes relate to ASD.
Research suggests that emotional information is attended to and processed more efficiently than other types of information. For instance, adults tend to allocate more attention to emotional content during word processing (e.g., Hinojosa, Rtolo & Pozo, Reference Hinojosa, Rtolo and Pozo2010). Consequently, perceiving emotional stimuli has been shown to influence cognitive processes, such as attention, memory, and learning, in typical children. For example, children as young as 3 years of age are better able to remember emotional stories and emotional details from stories compared to non-emotional stories and details (Bergen, Wall & Salmon, Reference Bergen, Wall and Salmon2015; Christodoulou & Burke, Reference Christodoulou and Burke2016). Further, children between 8 and 11 years of age are better than younger children at remembering emotional details from stories, and can remember events better when they are associated with emotion (Davidson, Luo & Burden, Reference Davidson, Luo and Burden2001; Leventon & Bauer, Reference Leventon and Bauer2016). Recent work in the fields of education and neuroscience show that the experience and perception of emotion can have complex and profound impacts on learning processes. Emotion processing recruits areas of the brain which are also involved in other cognitive processes, such as memory and reasoning. Therefore, emotion should be considered in relation to learning processes in children (Immordino-Yang, Reference Immordino-Yang2016; Tyng, Amin, Saad & Malik, Reference Tyng, Amin, Saad and Malik2017).
When adults communicate with children, they often express exaggerated emotions in their facial expressions and tone of voice (i.e., prosody). Research shows that exaggerated speech is important for facilitating language learning in early life (Floccia et al., Reference Floccia, Keren-Portnoy, DePaolis, Duffy, Luche, Durrant, White, Goslin and Vihman2016; Nelson, Hirsh-Pasek, Jusczyk & Cassidy, Reference Nelson, Hirsh-Pasek, Jusczyk and Cassidy1989). However, the influence of emotional speech on language learning later in childhood has not been explored, despite evidence that older children remain subject to this phenomenon (Holodynski, Reference Holodynski2004). Further, due to substantial changes in emotion processing throughout development, it is unclear how emotion and language intersect later in childhood (Aguert, Laval, Lacroix, Gil & Le Bigot, Reference Aguert, Laval, Lacroix, Gil and Bigot2013; Mondloch, Reference Mondloch2012). Given that emotional information can influence learning processes in many domains, and that emotional cues are often embedded in child-directed speech, it is important to uncover how emotion may impact language learning in particular.
Multiple factors can influence the process of acquiring new words in childhood. Children are able to learn new words for new objects after hearing the word just once; a process known as fast mapping. The ability to learn the words can be impacted by the level of familiarity of the objects, the semantic context (such as other descriptive information about the objects), and social cues like eye gaze (Brady & Goodman, Reference Brady and Goodman2014; Carey & Bartlett, Reference Carey and Bartlett1978; Halberda, Reference Halberda2006). In relation to emotion, research shows that children can use emotional information appropriately as a referent for determining correct associations between objects and words – for example, associating a word spoken with sad prosody with sad faces or something broken (Berman, Graham, Callaway & Chambers, Reference Berman, Graham, Callaway and Chambers2013; Herold, Nygaard, Chicos & Namy, Reference Herold, Nygaard, Chicos and Namy2011). However, the influence of emotional information on word learning ability when the emotion is not an overtly relevant cue (i.e., emotional cues which have no direct relevance to the object or the context) remains undetermined, despite the tendency of adults to speak to children in exaggerated emotional tones regardless of the context (Trainor, Austin & Desjardins, Reference Trainor, Austin and Desjardins2000). Moreover, studies that include emotional cues as referential information demonstrate that emotion can be useful for directing associations and distinguishing between multiple referents. However, these studies do not provide insight into the influence of emotional cues on language learning processes when such cues are not relevant for determining the intended referent. Removing the contextual relevance of emotional cues and the ambiguity of multiple referents from such tasks would allow for investigation of the interconnection between emotion perception and language learning processes when emotional cues are extraneous (for a review on this issue, see Doan, Reference Doan2010).
Additionally, it is important to consider individual variability in the processing and understanding of emotions to gain a more complete understanding of how emotional cues may impact language learning in childhood (Cutting & Dunn, Reference Cutting and Dunn1999; Dawson et al., Reference Dawson, Webb, Carver, Panagiotides and McPartland2004). ASD presents an extreme variation from typical development in which children often show differences in both emotion processing and language development, including deficits in both receptive and expressive single word vocabulary (Boucher, Reference Boucher2012). Thus, the present investigation aimed to determine the influence of irrelevant emotional cues on language learning while also considering individual differences in autism-like traits.
Emotion and language in ASD
Children with ASD often perceive emotional faces differently compared to typically developing children. For example, children with ASD are less likely to spontaneously match faces based on similar emotional expressions than they are to match faces based on other features (e.g., similar physical features such as wearing glasses; Begeer, Rieffe, Terwogt & Stockman, Reference Begeer, Rieffe, Terwogt and Stockman2006). Eye-tracking studies have shown that children with ASD tend to look less at faces in general compared to typically developing children (Tsang, Reference Tsang2016). Nuske, Vivanti and Dissanayake (Reference Nuske, Vivanti and Dissanayake2014) showed that children with ASD looked less at neutral faces compared to typically developing children, but looked an equal amount at fearful faces. This finding suggests that the presence of an emotional expression helps to capture the attention of children with ASD relative to neutral faces. However, when looking at emotional faces, children with ASD tend to focus more on the mouth than the eyes, while the opposite is true for typically developing children, with the exception of happy faces (Johnels et al., Reference Johnels, Hovey, Zurcher, Hippolyte, Lemonnier, Gillberg and Hadjikhani2017; Klin, Jones, Schultz & Volkmar, Reference Klin, Jones, Schultz and Volkmar2003).
A prominent theory in the literature suggests that language impairments in ASD can be explained by the socio-emotional-communicative impairments that are characteristic of ASD, including reduced attention toward social stimuli such as faces and voices (Baron-Cohen, Baldwin & Crowson, Reference Baron-Cohen, Baldwin and Crowson1997; Boucher, Reference Boucher2012). However, higher nonverbal IQ has been found to be associated with better language outcomes in ASD children, suggesting that they may be using compensatory mechanisms, such as high rote learning ability, to learn words (Prizant, Reference Prizant1983). Therefore, while such children may seem similar to typical children in their language learning ability, impairments may become evident when considering the links between social information and word learning. For example, children with ASD may not show the same preference for learning words from emotional faces and voices as typical children do (Clement, Bernard, Grandjean & Sander, Reference Clement, Bernard, Grandjean and Sander2013; Singh, Morgan & White, Reference Singh, Morgan and White2004).
A number of studies have used eye-tracking to determine the degree to which children with ASD use social cues to learn words. Norbury, Griffiths and Nation (Reference Norbury, Griffiths and Nation2010) found that 6- to 9-year-old children with ASD looked less at a speaker's face when they spoke a new word, compared to typically developing children. Despite this, children with ASD were better able to recall the words than typically developing children, although they were less able to describe features of the object associated with the word. Norbury and colleagues suggested that the children with ASD showed superior memorisation ability which acted as a compensatory strategy for learning the words, as opposed to typically developing children, who used social cues to associate words with meaning. Further, Tenenbaum, Amso, Abar and Sheinkopf (Reference Tenenbaum, Amso, Abar and Sheinkopf2014) found that both children with ASD and typically developing children who looked more to a speaker's face (mouth and eyes) during a word learning task recognised the words faster than those who looked to the speaker's face relatively less. This finding suggests that attending to social information can facilitate word learning.
Children with ASD generally perform similarly to typical children when making associative pairings between a label and an object, although they perform worse when social cues (e.g., gaze direction) are necessary for determining object-word pairings, and do not benefit from social feedback in the same way as typically developing children (Baron-Cohen et al., Reference Baron-Cohen, Baldwin and Crowson1997; Bedford, Gliga, Frame, Hudry, Chandler, Johnson, Baron-Cohen, Bolton, Elsabbagh, Fernandes, Garwood, Pasco, Tucker & Volein, Reference Bedford, Gliga, Frame, Hudry, Chandler, Johnson, Baron-Cohen, Bolton, Elsabbagh, Fernandes, Garwood, Pasco, Tucker and Volein2013; Norbury et al., Reference Norbury, Griffiths and Nation2010). When required to use emotional cues as a referent to guide object-word associations during a task measuring fast mapping ability, children with ASD did not learn the words as accurately as typically developing children. This finding suggests that children with ASD do not attend to or process the emotional information as effectively (Thurman et al., Reference Thurman, McDuffe, Kover, Hagerman, Channell, Mastergeorge and Abbeduto2015).
Emotion, language, and the broader autism phenotype
Individuals with a diagnosis of ASD are theorised to represent an extreme end of a spectrum of traits which vary amongst the entire population. Those who present with higher levels of autism-like traits but do not meet the criteria for a clinical diagnosis for ASD are considered to make up what is termed the “broader autism phenotype” (BAP; Piven et al., Reference Piven, Palmer, Landa, Santangelo, Jacobi and Childress1997). Baron-Cohen, Wheelwright, Skinner, Martin and Clubley (Reference Baron-Cohen, Wheelwright, Skinner, Martin and Clubley2001) developed the Autism-Spectrum Quotient (AQ) to measure the degree of autism-like traits in the general population. Regarding emotion processing, using the AQ with typical populations can provide insight into how emotion processing varies as a function of autism-like traits, and progresses understanding of the BAP. Having a deeper understanding of the BAP can aid with understanding the complexities of ASD, such as aetiologies and risk factors. Among adult populations, differences in the processing of emotion have been observed according to individual differences in autism-like traits, such that those with higher levels of traits show less proficient recognition of emotional facial expressions and cues (Hosokawa et al., Reference Hosokawa, Nakadoi, Watanabe, Sumitani and Ohmori2015; Ingersoll, Reference Ingersoll2010; Poljac et al., Reference Poljac, Poljac and Wagemans2012). Although the AQ has been adapted to measure autism-like traits in children (AQ-Child; Auyeung, Baron-Cohen, Wheelwright & Allison, Reference Auyeung, Baron-Cohen, Wheelwright and Allison2008), this measure has not been explored with respect to individual differences in emotion processing in childhood, or differences in language learning, despite demonstrated differences in these abilities in children with ASD.
West, Copland, Arnott, Nelson and Angwin (Reference West, Copland, Arnott, Nelson and Angwin2017) conducted a word learning study with adults in which nonsense words for novel objects were presented with happy, fearful, and neutral prosody, which was irrelevant to the novel objects. Participants were required to learn the words within a single session, and were measured for their autism-like traits using the AQ. Results showed that word recognition accuracy was poorer for words presented with fearful prosody compared to words presented with neutral prosody for all participants, suggesting that irrelevant emotional cues interfere with word learning processes for adults, potentially due to the emotional prosody being distracting. Further, individuals with lower levels of autism-like traits also had poorer performance learning words spoken in a happy voice compared to words spoken in a neutral voice, suggesting that these individuals were more susceptible to this interference. This finding supports evidence that those with higher levels of autism-like traits process emotion less efficiently (Hosokawa et al., Reference Hosokawa, Nakadoi, Watanabe, Sumitani and Ohmori2015; Ingersoll, Reference Ingersoll2010; Poljac et al., Reference Poljac, Poljac and Wagemans2012). However, given the research outlined regarding emotion processing and word learning in both typically developing children and children with ASD, it is important to investigate when and how emotional information influences word learning throughout development.
The current study aimed to address the question of whether irrelevant emotional information influences word learning in late childhood, and how this might differ for children with higher or lower levels of autism-like traits. Children aged 7 to 9 years were targeted due to their higher proficiency recognising emotion across modalities compared to younger children (Aguert et al., Reference Aguert, Laval, Lacroix, Gil and Bigot2013; Mondloch, Reference Mondloch2012). This proficiency ensures less ambiguity in their interpretation of the stimuli, which is crucial for capturing the direct effects of emotion processes on language learning, particularly given that any interaction between emotion and language has not previously been examined at this age. Further, this age group allows for more accurate identification of autism-like traits compared to younger age groups, and most studies which have used the AQ-Child have targeted age groups similar to the present study (Melling & Swinson, Reference Melling and Swinson2016; Petalas et al., Reference Petalas, Hastings, Nash, Hall, Joannidi and Dowey2012; Ruzich et al., Reference Ruzich, Allison, Smith, Ring, Auyeung and Baron-Cohen2016). Eye-tracking was utilised during a word learning task to determine whether visual attention toward a speaker versus a novel object interacts with autism-like traits, emotion, and word learning performance. Emotional information conveyed by the speaker included facial expressions, prosody, and emotional adjectives (e.g., “lovely”, “scary”).
It was expected that children would learn new words differently if they were presented with emotional information compared to neutrally presented words. The direction of the influence of emotion on word learning was difficult to predict; however, emotional information enhances general learning for children and social information is valuable in word learning (Boucher, Reference Boucher2012; Immordino-Yang, Reference Immordino-Yang2016; Tyng et al., Reference Tyng, Amin, Saad and Malik2017). Therefore, it was hypothesised that emotionally presented words would be learned better than neutrally presented words. Further, it was expected that children with higher levels of autism-like traits, as indicated by higher scores on the AQ-Child (Auyeung et al., Reference Auyeung, Baron-Cohen, Wheelwright and Allison2008), would be less affected by emotional cues during word learning, such that they would show no difference in word learning performance between emotionally and neutrally presented words.
In regard to eye gaze, we expected that children would fixate more on the face of the speaker, and less on the object, when an emotion was being expressed compared to when expressions were neutral. However, children with higher AQ-Child scores were expected to show a similar pattern of eye gaze on faces and objects for both emotional and neutral conditions. Additionally, we expected that children with higher AQ-Child scores would fixate less on the face overall, and more on the object, in comparison to children with lower AQ-Child scores. Finally, given evidence showing that attention toward social cues can influence word learning (Brady & Goodman, Reference Brady and Goodman2014; Carey & Bartlett, Reference Carey and Bartlett1978; Halberda, Reference Halberda2006; Tenenbaum et al., Reference Tenenbaum, Amso, Abar and Sheinkopf2014), we investigated the relationships between eye-gaze and learning performance in each emotion condition. We expected that children who looked more at the face of the speaker would be more influenced by emotional cues when learning words.
Methods
Participants
Typically developing children were recruited via a research database managed by the University of Queensland School of Psychology Early Cognitive Development Centre. Families in this database are recruited from the community, and are predominantly middle- to high-income and Caucasian. A total of 66 typically developing children participated in the study. Of these, eight were excluded from analysis due to having a diagnosis of a developmental or neurological disorder (4), being bilingual (3), or having missing data (1). Additionally, parents completed the Children's Communication Checklist, 2nd ed. (CCC-2; Bishop, Reference Bishop2006) to check for communication difficulties. Eight children were excluded for having CCC-2 scores which indicated potential language impairment or ASD (Global Communication Composite score <55 in combination with either scaled scores <5 on the social relationships and interests subscales, or a Social Interaction Deviance Composite score of ⩽ -15; Bishop, Reference Bishop2006), and two were excluded for incomplete forms. Thus, data from 48 participants (26 male), aged 7 to 9 years (M = 7.89, SD = 0.80) were included in the behavioural analyses. Based on parent report, these children had never received a diagnosis of any autism spectrum disorder, attention deficit hyperactivity disorder, intellectual disability, or any other developmental or neurological disorder. All children attended mainstream schooling and had normal or corrected-to-normal vision and hearing. Four children were left-handed. Written informed consent was obtained from parents/guardians prior to participation, along with verbal assent from children, and ethical approval for the study was granted by the University of Queensland School of Psychology Human Research Ethics Committee.
Assessments
Autism-like traits were measured using the Autism-Spectrum Quotient: Children's version (AQ-Child; Auyeung et al., Reference Auyeung, Baron-Cohen, Wheelwright and Allison2008). This parent-report questionnaire was adapted from the self-report Autism-Spectrum Quotient (Baron-Cohen et al., Reference Baron-Cohen, Wheelwright, Skinner, Martin and Clubley2001) to be applicable for children aged between 4 and 11 years. The measure determines the degree to which a child displays traits similar to those with a diagnosis of ASD. Parents indicate their level of agreement with 50 statements reflecting an autism-like trait (e.g., “S/he prefers to do things the same way over and over again”), half of which are reverse scored (e.g., “S/he prefers to do things with others rather than on her/his own”). Responses are given on a 4-point Likert-type response scale, with 1 indicating “definitely agree” and 4 indicating “definitely disagree”. Each item is scored from 0–3 depending on the degree to which the response endorses an autism-like trait. Thus, possible total scores range from 0–150, with lower scores indicating lower levels of autism-like traits, and individuals with ASD scoring above 76 (Auyeung et al., Reference Auyeung, Baron-Cohen, Wheelwright and Allison2008). The AQ-Child has been determined to have sufficient validity and reliability (Auyeung et al., Reference Auyeung, Baron-Cohen, Wheelwright and Allison2008), and has been used in previous research to measure variation of autism-like traits in typically developing children (Melling & Swinson, Reference Melling and Swinson2016; Petalas et al., Reference Petalas, Hastings, Nash, Hall, Joannidi and Dowey2012; Ruzich et al., Reference Ruzich, Allison, Smith, Ring, Auyeung and Baron-Cohen2016).
To assess general emotional understanding, the Assessment of Children's Emotional Skills (ACES; Schultz, Izard & Bear, Reference Schultz, Izard and Bear2004) was used. This measure is administered by the researcher with the child, and includes three subtests designed to assess children's identification of emotion from expressions, behaviours, and situations. In the expressions subtest, participants are shown 26 images of children's faces and asked whether the child feels happy, angry, sad or scared in each image. In the behaviours subtest, 15 short scenarios are read to the child describing behaviours which indicate specific emotional states (e.g., “Rosa has her arms crossed”). In the situations subtest, 15 short scenarios are read to the child describing situations which would elicit a particular emotion (e.g., “Kelly just finished colouring a picture. You tell her that it looks nice”). In both subtests, children are asked whether the person feels happy, angry, sad or scared. Scores are determined by the sum of items answered accurately across the three subtests, with possible scores ranging from 0–56.
Word learning task stimuli
Objects
Stimuli included images of nine distinct novel objects for the learning trials and nine alternate versions of the stimuli for the generalisation trials, selected from the Novel Object and Unusual Name (NOUN) database, 2nd edition (Horst & Hout, Reference Horst and Hout2014), which consists of images of unusual toys for which no known name exists. Objects which had available images of alternate versions (i.e., different colours) for use in the generalisation portion of the current study were chosen from the database. Object images were arranged into three clusters to be randomly allocated to the three emotional conditions. The clusters were formed to ensure that objects within a condition were different colours, and that the general shape of the object (round or vertical) was balanced between the emotional conditions. In addition, eight images of familiar objects (e.g., balloon, shovel) were used for practice trials.
Words
Nine nonsense words were sourced from a list on the NOUN database, which is compiled from previous childhood word learning research (Horst & Hout, Reference Horst and Hout2014). All chosen nonsense words were two syllables long and had different onset letters (e.g., “blicket” and “modi”). Nonsense words were arranged into three groups to be randomly allocated to the three emotional conditions. The words within a group differed in their second letter and offset sounds, to ensure that each word was distinct within a given condition.
Videos
A trained actress, holding a framed picture of an object, and stating a sentence using each of the nonsense words with happy, fearful, or neutral emotional cues was filmed for the learning trials. Images of the object stimuli were superimposed onto the frame in the video using Adobe After Effects software, such that it appeared that the actress was holding a framed image of each object. In each video, the nonsense word was stated twice using the following script: “Look, it's a [nonsense word], what a/an [emotional adjective] [nonsense word]”.
There were three emotional conditions: happy, fearful, and neutral. For the happy condition, the emotional adjective “lovely” was used, along with a happy facial expression and prosody. For the fearful condition, the emotional adjective “scary” was used, with a fearful facial expression and prosody. Finally, for the neutral condition, the adjective “ordinary” was used, with a neutral face and voice. The actress generally maintained eye contact with the camera, but looked to the object each time she said the nonsense word. All videos were 10 seconds in duration. Each subgroup of three nonsense words was paired with each set of three objects and presented in each emotion condition, which resulted in 27 videos to be split over three experiment versions (see Table 1). Thus, each version of the experiment consisted of nine videos, three from each emotion condition, and with each video paired with a different novel object. Participants were randomly assigned to experiment versions.
Table 1. Counter-balanced experiment versions.
![](https://static.cambridge.org/binary/version/id/urn:cambridge.org:id:binary:20220407002652593-0300:S0305000921000192:S0305000921000192_tab1.png?pub-status=live)
Pilot testing
To validate the emotional expressions, the 27 videos were presented to 21 undergraduate psychology students (M age = 18.62 years, SD age = 1.96 years) who rated the level of happiness, fear, anger, and sadness presented in each video on 7-point Likert scales (1 = slightly, 7 = very much). The anger and sadness scales were included to ensure that the videos did not unintentionally reflect these emotions. Participants were also given the option to select “neutral” if they felt that no emotion was expressed. To be considered a valid reflection of happy or fearful emotions, videos had to receive a rating of 5 or above on the appropriate emotion scale by the majority of participants (>50%), and, at the same time, not receive a rating of 5 or above on any other emotion by more than 20% of participants. Similarly, neutral videos were considered valid if the majority of participants (>50%) selected the “neutral” option, and the video was not rated as 5 or above on any emotion by more than 20% of participants. Results showed that each of the happy videos were rated 5 or above on the happy scale by at least 81% of participants (mean happy rating = 5.80), and none were rated 5 or above on any other emotion. Fearful videos were each rated 5 or above on the fear scale by at least 71% of participants (mean fear rating = 5.68), and were not rated 5 or above on any other emotion by more than 14% of participants. For each neutral video, at least 55% of participants selected the “neutral” option, and videos were not rated 5 or above on any emotion by more than 14% of participants.
Apparatus
Children were seated at a 45cm viewing distance from a 22-inch computer screen, on which the task was displayed (1680 x 1050 px screen resolution). Eye gaze was measured using an SMI eye-tracker, with 120 hertz sampling rate. Areas of Interest (AOI) were specified for the face and the object. Eye movements were calibrated using a 5-point grid. Calibration was repeated if the error value was more than 1° on the x or y axis. A measure of Dwell Time (i.e., sum of time spent looking at a given AOI throughout each trail, measured in milliseconds) was used as the key dependent variable for eye-tracking analyses.
Procedure
The task began with three recognition practice trials and three recall practice trials, to ensure children understood and could complete the tasks. In each recognition practice trial, four familiar objects were presented simultaneously on the computer screen and the children were given both written and verbal instructions to point to a target object (e.g., “Which one is the balloon?”). In each recall practice trial, one familiar object was presented at a time and children were asked to label each object by saying the name out loud.
Following the training trials, eye movements were calibrated, and children were informed that they would be shown videos of a lady telling them the names of some strange toys, and that they should try and remember as many of the names as they could. There was a learning phase, in which all nine videos were played to the children in random order, followed by a recognition phase. In the recognition phase, each trial consisted of four objects presented simultaneously, and children were given written and verbal instruction to identify which object matched a particular nonsense word (e.g., “Which one is a Modi?”). The four images presented in each recognition trial included the target object, one other object from the same emotional condition, and one object from each of the other two emotional conditions. The order of trials was pseudorandomised to ensure that emotion conditions were evenly spread across the beginning, middle and end of the phase, that the target image appeared in each of the four positions a balanced number of times, and that the same emotion condition and target object position did not occur more than twice in succession. The experimenter live-scored the object to which the child pointed in each trial, and these were later scored for accuracy.
Following the recognition phase, eye movements were re-calibrated, and the learning and recognition phases were repeated a second time, and then children completed two recall phases. The first recall phase (hereafter referred to as ‘free recall’) was designed to assess unaided, free recall of the words. Children were presented with each object one at a time and asked to label them by saying the word out loud. In the second recall phase (hereafter referred to as ‘prompted recall’), children saw all objects again one at a time, and the experimenter provided the first phoneme of each word as a prompt to aid recall for any object that the child did not label in the earlier free recall phase. All children were given identical prompts and the prompt was given once at the beginning of each trial. In both recall phases, trials were pseudorandomised to ensure an even spread of the emotion conditions across the phase, and that the same emotion condition did not occur more than two trials in succession. The experimenter transcribed children's responses in each trial, which were later scored as correct if the word was pronounced entirely accurately or with just one inaccurate phoneme (e.g., “blickeb” was an acceptable variation of “blicket”). Responses with more than one inaccurate phoneme or omitted responses were scored as incorrect.
Finally, there was a generalisation phase, which was identical to the recognition phases, except that alternate versions of the objects (i.e., different colours) were presented. The generalisation phase was conceptually similar to the recognition phases in measuring the ability to recognise word-object associations, but determined whether these associations remained stable with altered versions of the same objects. A response was considered correct when the child selected the same shape, despite the change of colour. Thus, the generalisation phase was included as a third recognition phase for analyses. Figure 1 displays the order of phases in the word learning task. Following completion of the task, the experimenter administered the ACES with children. Parents were given the AQ-Child and the CCC-2 to complete during testing.
![](https://static.cambridge.org/binary/version/id/urn:cambridge.org:id:binary:20220407002652593-0300:S0305000921000192:S0305000921000192_fig1.png?pub-status=live)
Figure 1. Procedure of word learning task.
Results
Behavioural performance
AQ-Child scores ranged from 19 to 81 (M = 50.42, SD = 16.70), and were not significantly different between genders (M female = 53.09, SD female = 16.03; M male = 48.15, SD male = 17.23; t(46) = 1.02, p = .312). Both recognition and recall accuracy were calculated as the proportion correct out of the total number of trials for each emotion condition. Tests of skewness and kurtosis determined data to be normally distributed, as all values in each condition were within the acceptable limits of -2 and +2 (George & Mallery, Reference George and Mallery2010).
Linear mixed effects modelling (LMEM) was performed in SPSS to examine recognition performance. The model specified the following fixed effects: phase (recognition 1, recognition 2, generalisation), emotion (fear, happy, neutral), AQ score (continuous), the interaction between emotion and AQ score, and the interaction between phase and emotion. Phase and emotion were specified as repeated measures with a first-order autoregressive (AR1) covariance matrix to control for the dependence within participants. For phase, generalisation was set as the reference condition; for emotion, neutral was set as the reference condition. Proportional data for recognition accuracy was entered as the dependent variable.
The LMEM analysis resulted in a main effect of phase, F(1, 227) = 13.04, p < .001, η p2 = .054, such that accuracy was significantly higher in both the second recognition phase and the generalisation phase compared to the first recognition phase, t(47) = 5.29, p < .001, d = 0.73, 95% CI [0.25, 0.11]; t(47) = 5.42, p < .001, d = 0.78, 95% CI [0.28, 0.13], respectively. The effect of emotion was approaching significance, F(2, 269) = 2.72, p = .068, with accuracy marginally lower in the happy condition compared to fear, t(47) = 2.00, p = .052. Descriptive statistics of recognition performance are displayed in Table 2.
Table 2. Proportional recognition accuracy means (standard deviations in parenthesis) for each emotion condition and recognition phase.
![](https://static.cambridge.org/binary/version/id/urn:cambridge.org:id:binary:20220407002652593-0300:S0305000921000192:S0305000921000192_tab2.png?pub-status=live)
The interaction between emotion and AQ score was not significant, F(2, 267) = 1.40, p = .248, providing no support for our hypothesis that children with higher levels of autism-like traits would be less affected by emotional cues during word learning. Further, the interaction between phase and emotion was not significant, F(4, 333) = 0.34, p = .852, indicating no differential influence of emotion at different stages of the learning process.
Two additional LMEM analyses were performed for the free and prompted recall phases separately. Each model specified the fixed effects of emotion, AQ score, and the interaction between the two variables. Emotion was specified as a repeated measure with a first-order autoregressive (AR1) covariance matrix. The neutral condition was set as the reference, and recall accuracy was entered as the dependent variable. There were no effects found in either analysis. Descriptive statistics for recall performance are displayed in Table 3.
Table 3. Proportional recall accuracy means (standard deviations in parenthesis) for free recall and prompted recall as a function of emotion condition.
![](https://static.cambridge.org/binary/version/id/urn:cambridge.org:id:binary:20220407002652593-0300:S0305000921000192:S0305000921000192_tab3.png?pub-status=live)
Eye-tracking analyses
Twelve participants were excluded from the eye-tracking analyses due to technical error (7), the eye tracker recording less than 60% of the trials (4), or being poorly calibrated as determined by an error value greater than 2° on the x or y axis in both experiment phases (i.e., before each exposure to the videos; 1 participant). Thus, data from 36 participants were included in the analysis of dwell time.
LMEM was performed in SPSS to examine eye-gaze patterns. The model specified the following fixed effects as per the hypotheses: AOI (face, object), emotion (fear, happy, neutral), AQ score (continuous), the interactions between AOI and emotion, and AOI and AQ score, and a 3-way interaction between AOI, emotion, and AQ score. AOI and emotion were specified as repeated measures with a first-order autoregressive (AR1) covariance matrix to control for the dependence within participants. For AOI, object was set as the reference condition; for emotion, neutral was set as the reference condition. Dwell time (in milliseconds) was entered as the dependent variable. Dwell time reflected the time spent looking at either the face or object in each 10 second trial, collapsed across the two learning phasesFootnote 1.
The results revealed a main effect of AOI, F(1, 180) = 10.26, p = .002, η p2 = .054, such that the object was looked at for significantly longer than the face, t(35) = 4.26, p < .001, d = 0.71, 95% CI [490.72, 1383.78]. Further, there was a significant interaction between AOI and emotion, F(2, 157) = 3.50, p = .033, η p2 = .043. Follow-up pairwise comparisons showed that the object was looked at for significantly less time in the fearful condition compared to both the happy and neutral conditions, t(35) = 2.78, p = .009, d = 0.46, 95% CI [109.45, 699.82]; t(35) = 3.07, p = .004, d = 0.51, 95% CI [168.37, 827.49], respectively. Dwell times toward the face and object in each emotion condition are depicted in Figure 2.
![](https://static.cambridge.org/binary/version/id/urn:cambridge.org:id:binary:20220407002652593-0300:S0305000921000192:S0305000921000192_fig2.png?pub-status=live)
Figure 2. Mean dwell time (milliseconds), and standard error, for object and face looking in each emotion condition.
Finally, the interaction between AOI, emotion, and AQ score was approaching significance, F(4, 161) = 2.27, p = .064. Due to the relevance of this effect for our hypotheses, additional LMEM analyses were performed for the Face and Object AOIs separately, to further explore the impact of AQ score. Both analyses tested the fixed effects of emotion, AQ score, and the emotion x AQ score interaction. Emotion was specified as a repeated measure with AR1 covariance matrix. The neutral condition was set as the reference, and dwell time was the dependent variable. Both analyses of face and object looking revealed a significant interaction between emotion and AQ score, F(2, 67) = 4.28, p = .018, η p2 = .113; F(2, 65) = 3.63, p = .032, η p2 = .101, respectively. In the analysis of face looking, higher AQ score was associated with shorter dwell time in the happy condition, relative to neutral, β = -11.01, SE = 5.51, t(67) = 2.00, p = .05, 95% CI[0.01, 22.02]. In contrast, for the analysis of object looking, higher AQ score was associated with longer dwell time in the happy condition, relative to neutral, β = 24.06, SE = 9.27, t(65) = 2.60, p = .012, 95% CI[5.55, 42.57]. In other words, higher AQ scores were associated with longer looking at objects and less at faces in the happy condition compared to neutral. This association is displayed in Figure 3.
![](https://static.cambridge.org/binary/version/id/urn:cambridge.org:id:binary:20220407002652593-0300:S0305000921000192:S0305000921000192_fig3.png?pub-status=live)
Figure 3. Associations between AQ score and dwell time for Face and Object looking, in each emotional condition.
Correlations
Pearson bivariate correlation analyses were performed to determine the relationships between dwell time toward face and object and recall (collapsed across free and prompted recall phases) and recognition (collapsed across phases) performance. There were no correlations between eye gaze and performance overall, nor for each emotion condition.
Additionally, correlational analyses were conducted to determine whether there were relationships between individual factors (age, ACES score), recall and recognition performance, and dwell time data. Age was not correlated with behavioural performance, but was positively correlated with overall looking time toward the face, r = 0.41, p = .012. ACES score was positively correlated with recall performance in the happy condition, r = 0.27, p = .041, but was not correlated with performance in the fear or neutral conditions, nor with any dwell time measures. Finally, age and ACES score were positively correlated with one another, r = 0.34, p = .041.
Discussion
The aim of this study was to determine the influence of emotional cues on eye gaze patterns and word learning ability in typically developing children who varied in their level of autism-like traits. Children's ability to correctly match an object to a label was measured twice throughout the experiment. Their ability to generalise their recognition to different coloured versions of the objects, and to label the objects in free recall and prompted recall tasks was also examined. In addition, eye-tracking technology measured the duration of looking toward the object or toward the face of the speaker during learning.
Behavioural performance
Contrary to predictions, neither recognition nor recall performance was affected by emotion or AQ-Child score. There was a trend toward recognising fewer words that had been presented with happy cues compared to fearful cues, but the effect did not reach significance (p = .052). The behavioural results for recognition showed that performance increased in the second phase and generalisation phase compared to the first phase. This result shows that the children effectively learned to associate the nonsense words with the objects after two exposures to the videos and were able to generalise these associations to altered versions of the objects. Recall performance was low overall, indicating general difficulty with this task.
The lack of an effect of emotion on learning performance conflicts with previous research showing an impact of emotional information on memory and learning in childhood. Studies suggesting that memory is enhanced for emotional stories and events in children (Bergen et al., Reference Bergen, Wall and Salmon2015; Christodoulou & Burke, Reference Christodoulou and Burke2016; Davidson et al., Reference Davidson, Luo and Burden2001; Leventon & Bauer, Reference Leventon and Bauer2016), and that emotion is important within education (Immordino-Yang, Reference Immordino-Yang2016; Tyng et al., Reference Tyng, Amin, Saad and Malik2017), all include emotional information which is meaningful or directly relevant to the individual or task. Even in previous child word learning research which included emotional cues, the emotion was used as a referent for guiding associations (Berman et al., Reference Berman, Graham, Callaway and Chambers2013; Herold et al., Reference Herold, Nygaard, Chicos and Namy2011; Thurman et al., Reference Thurman, McDuffe, Kover, Hagerman, Channell, Mastergeorge and Abbeduto2015). In this aspect, the current study is distinct from previous research, as the emotional information did not provide any relevant referential information. Although the emotional cues offered descriptive information about the objects, this information was extraneous and not overtly linked to the object characteristics (e.g., there was nothing overtly scary about the objects labelled as ‘scary’). The results of this study indicate that children between the ages of 7 and 9 years are not automatically influenced by irrelevant emotional cues when learning words. While word recognition accuracy for items presented with fearful cues was marginally better than for those presented with happy cues, this effect did not reach significance. Further, the results applied to all children, and recognition and recall performance did not differ according to the child's level of autism-like traits.
Studies with adults have shown that irrelevant emotional information interferes with word learning, particularly for those with lower levels of autism-like traits (West et al., Reference West, Copland, Arnott, Nelson and Angwin2017). Thus, our findings highlight a potential age-related difference in the influence of irrelevant emotional information on word learning, indicating that such an influence might develop after 9 years of age. As the majority of language learning occurs in childhood, it is possible that learning mechanisms are less likely to be impeded in childhood than in adulthood. Further, these results align with the findings of Singh et al. (Reference Singh, Morgan and White2004) which showed that happy voices provided no advantage for infants’ recognition of words. Therefore, it is possible that emotional cues may capture attention, but may not always influence word learning processes in infants and children. Future research should determine more explicitly the degree to which emotion is attended when it is peripheral to a task, perhaps by measuring whether the children remembered the emotion associated with each object (e.g., “Was this a lovely, scary, or ordinary object?”). Additionally, a delayed test of recognition and recall, perhaps on the following day, would be useful to determine whether emotional cues impacted word consolidation. Finally, research on the interactions between emotions and learning more broadly for children who have attention or learning difficulties is warranted.
Eye-gaze results
In regard to eye-gaze, it was expected that children would fixate more on the face of the speaker and less on the object when an emotion was being expressed compared to when the speaker was neutral. However, children with higher autism-like traits were expected to show less difference in their eye gaze patterns between emotional and neutral conditions. The eye gaze results showed, firstly, that children looked for longer at the object than the face overall, suggesting the object was the primary focus of attention while forming word-object associations. However, all children spent less time looking at the object in the fear condition, relative to the happy and neutral conditions, although this decrease was not accompanied by increased looking to the face in the fear condition.
There were no overall differences in face and object looking time according to the level of autism-like traits, failing to support our hypothesis. In previous research showing that children with ASD looked less at a face compared to typically developing children during word learning (Norbury et al., Reference Norbury, Griffiths and Nation2010; Tenenbaum et al., Reference Tenenbaum, Amso, Abar and Sheinkopf2014), the faces provided useful referential information (e.g., directional gaze). This is distinct from the current study as there was no ambiguity as to which object a word was associated with; therefore, facial information may have been less salient and the object more so.
The finding that all children looked less at the object in the fear condition compared to neutral supports evidence suggesting that emotional information impacts attentional processes (Hinojosa et al., Reference Hinojosa, Rtolo and Pozo2010; Singh et al., Reference Singh, Morgan and White2004; Tyng et al., Reference Tyng, Amin, Saad and Malik2017). Interestingly, while this effect held true across participants for the fearful cues, the relative impact of neutral vs happy cues was mediated by the levels of autism-like traits. Specifically, those with lower levels of autism-like traits demonstrated less looking at the object in the happy condition compared to neutral. By contrast, higher levels of autism-like traits were associated with more looking at the object in the happy condition. These findings suggest that, for children with relatively higher levels of autism-like traits, there was a differential effect of valence in how the conditions affected visual attention. Specifically, negative emotional cues diverted attention away from the object, while positive emotional cues diverted attention away from the face and toward the object. It is important to note, however, that the emotional stimuli were not balanced for intensity in the current study, and future research should address this limitation.
It was finally hypothesised that eye gaze would be related to the influence of emotion on learning, such that children who looked more at the face would be more influenced by the emotional cues. This hypothesis was not supported. This finding can potentially be explained by the strong word learning capacity of children at this age. Despite the influence of emotional cues on eye gaze patterns, word learning performance did not differ in relation to eye gaze across conditions. Another potentially useful future direction may be to record eye gaze during post-learning recognition tests to investigate whether emotional information impacts this aspect of processing. Further, the current study showed that increases in general emotional understanding (i.e., ACES score) were related to improved recall ability in the happy condition. Age also appears to be related to greater attention toward social cues, as results showed that looking time toward the face increased with age. Scores on the ACES test also improved with age, which supports previous findings that emotional identification gradually improves with age throughout childhood (Widen, Reference Widen2013).
Conclusions
When considering the impact of emotional cues on attention and word learning processes throughout development, it is interesting to note the similarities between the current findings and findings with adults by West et al. (Reference West, Copland, Arnott, Nelson and Angwin2017). Both studies show that fearful cues distract attention in some way during word learning for all individuals, regardless of the level of autism-like traits. For adults, fear decreased word learning accuracy for all participants; for children, fear reduced looking time toward the target object for all participants. Both studies also showed that happy cues affected participants differently according to the level of autism-like traits. For adults, individuals with lower levels of autism-like traits had reduced word learning accuracy in the happy condition, but this was not the case for individuals with higher levels of autism-like traits. In children, lower levels of autism-like traits were associated with less looking to the target object in the happy condition, and higher levels of autism-like traits were associated with more looking to the target object in the happy condition. Hence, both studies suggest that, for adults and children of this age, fearful stimuli influence attention in a similar way for participants, regardless of the level of autism-like traits, but happy stimuli influence attention differently according to the level of autism-like traits. Although the effect of emotion did not impact behavioural performance in children, perhaps these effects in adulthood have their origin in the changes to visual attention with emotional cues evident in childhood.
Overall, our results indicate that emotional cues impact visual attention in children, and do so differently according to the level of autism-like traits. However, with irrelevant emotional cues, even an impact on visual attention does not translate to behavioural differences in word learning performance at this age. It is possible that strong language learning abilities in childhood compensate for these attentional distractions. Differences in attention toward emotion may accumulate across development and gradually lead to larger differences in emotion processing relative to word learning ability. By adulthood, irrelevant emotional cues impact word learning performance (West et al., Reference West, Copland, Arnott, Nelson and Angwin2017). Therefore, our findings suggest that such influence does not become apparent until after the age of 9 years. Regarding autism-like traits, it is possible that a wider range in the levels of autism-like traits is needed to capture potential behavioural differences, and future research should endeavour to include a broader range. It may be important within education to consider the impact of emotion on attention, and that differences in the reactivity to emotional information in childhood may be related to the underlying traits of autism. However, despite this consideration, language learning processes in childhood are resilient.
Acknowledgements
We would like to thank Olivia Wilson for acting in our video stimuli. David Copland was supported by a UQ Vice Chancellor's Fellowship. This research was conducted with support from the ARC Centre of Excellence for the Dynamics of Language (Project ID: CE140100041).
Disclosure of interest
The authors report no conflicts of interest.