INTRODUCTION
One of the hallmarks of models of adult language comprehension is the notion that linguistic information incrementally propagates across different levels of representation (MacDonald, Pearlmutter & Seidenberg, Reference MacDonald, Pearlmutter and Seidenberg1994; Trueswell & Tanenhaus, Reference Trueswell, Tanenhaus, Clifton and Frazier1994). A prime example of this is the case of word recognition (McClelland & Elman, Reference McClelland and Elman1986; Marslen-Wilson, Reference Marslen-Wilson1987; Dell, Schwartz, Martin, Saffran & Gagnon, Reference Dell, Schwartz, Martin, Saffran and Gagnon1997). By most accounts, identifying a word like logs begins with the mapping of speech sounds onto phonological representations. These phonemes then activate all lexical candidates consistent with the input and these entries in turn are linked to semantic representations of meaning (Figure 1). This description highlights two notable features of the linguistic architecture. First, since these representations are situated across multiple levels, their activation within the system is ordered. Thus some degree of phonological processing must logically precede lexical processing since the relevant phonemic features must be analyzed in order for a word to be recognized. Critically, however, these linguistic procedures are not strictly sequential: analysis at one level of representation can begin before analysis at the preceding level is complete.
A particularly persuasive illustration of this comes from a study by Yee & Sedivy (Reference Yee and Sedivy2006), demonstrating that hearing a word not only activates other words with the overlapping phonological representations but also activates the semantic associates of words in this phonological cohort. For example, Yee and Sedivy found that adults who were instructed to select a picture of logs made spurious looks to picture of a key in the display. This presumably occurred because the word logs activated absent members of its phonological cohort like lock, which led to semantic priming of related concepts like key. This short-lived activation of the phono-semantic competitor was time-locked to the initial 300 ms of ambiguity between the Target and the mediating phonological associate. Findings such as these demonstrate that adult word recognition is a characterized by an informational cascade whereby partial phonological information incrementally activates semantic representations.
But how might this ability develop? Is this informational cascade a basic architectural feature of the lexicon or is it a late-emerging capacity? To explore these questions, we looked for evidence of cascaded processing in children's word recognition. Among prior developmental research, there is ample evidence that children rapidly use phonological information to restrict reference in visual forced-choice tasks (Swingley, Pinto & Fernald, Reference Swingley, Pinto and Fernald1999; Swingley & Aslin, Reference Swingley and Aslin2000; Fernald, Swingley & Pinto, Reference Fernald, Swingley and Pinto2001; Sekerina & Brooks, Reference Sekerina and Brooks2007). For example, infants aged 1;6 reliably fixate on a correct referent of a word after hearing only its onset (e.g. the /bei/ in baby; Fernald et al., Reference Fernald, Swingley and Pinto2001). Furthermore, when asked to identify a word like doggie, two-year-olds are slower to look at the referent when it is paired with a member of the same phonological cohort, like doll, than when it is paired with a non-cohort member, like tree (Swingley et al., Reference Swingley, Pinto and Fernald1999).
However, while these findings demonstrate that children can rapidly use phonological information during reference restriction, they do not provide direct evidence that semantic representations are incrementally activated as words unfold. Specifically, findings of this kind can be explained by two other types of mechanisms. First, rapid reference resolution could rely on direct links between phonological form and the visual form of the referent which bypass semantics altogether. Many studies of children's lexical processing include a familiarization phase which could facilitate direct mappings of this kind by providing the child with repeated pairings of the target picture (e.g. dog) and the label (e.g. ‘That's a doggie!’). In studies without familiarization trials (Swingley & Aslin, Reference Swingley and Aslin2000; Sekerina & Brooks, Reference Sekerina and Brooks2007) any direct link between phonological form and a referent would have to reflect the child's prior beliefs about prototypical referents for that label, rather than direct mappings to the experimental pictures.
Second, in these earlier studies looks to depicted referents could reflect silent spontaneous naming. Seeing pictures of common objects may lead children to spontaneously retrieve nouns, activating their phonological form. Once these forms are active, the incoming speech can be compared against them, without transferring information from the phonological level to the semantic level. This alternative gains plausibility from recent work documenting that very young children do spontaneously activate the labels of depicted objects during preferential looking tasks, even when the object has never been named during the experiment, resulting in phonological interference on subsequent trials (Mani & Plunkett, Reference Mani and Plunkett2010). Thus, while the existing data from young children are consistent with cascaded lexical processing, these findings could be explained by other plausible mechanisms. No study to date has provided direct evidence of the incremental transfer of information from phonological to semantic representations in young children.
To address this question, we need a measure of semantic activation that does not rely on looks to a displayed referent. We accomplished this by adapting the task from Yee & Sedivy (Reference Yee and Sedivy2006) for use in five-year-olds. Children in this age range are of particular interest because they are linguistically competent by most measures yet they differ from adults in many important ways. Unlike adults, most five-year-olds are functionally illiterate, have substantially smaller vocabularies and possess limited metalinguistic awareness. Thus their experiences with language are considerably different from those of the well-educated adults who are typically studied. Furthermore, children at this age differ from adults by other cognitive measures. They have smaller memory spans (Dempster, Reference Dempster1981; Schneider & Bjorklund, Reference Schneider, Bjorklund, Kuhn, Siegler and Damon1998), slower processing speed (Kail, Reference Kail and Reese1991; Kail & Salthouse, Reference Kail and Salthouse1994) and are notoriously poor at tasks which require the inhibition of dominant responses (Piaget, Reference Piaget1946; Flavell, Reference Flavell1986; Welsh, Pennington & Groisser, Reference Welsh, Pennington and Groisser1991; Passler, Isaac & Hynd, Reference Passler, Isaac and Hynd1985; Permer & Wimmer, Reference Permer and Wimmer1985; Hughes & Graham, Reference Hughes and Graham2002).
These differences could have profound implications for the development of the language processing system. For example, resource limitations or a slower processing speed might hamper children's ability to simultaneously activate phonological and semantic representations. Similarly, poor inhibitory processing could make it more difficult for children to deactivate semantic competitors, possibly increasing the costs of incrementality. Prior research on developmental sentence processing suggests that comprehension in children may be more modular or dependent on bottom-up information than comprehension in adults (Traxler, Reference Traxler2002; Joseph, Liversedge, Blythe, White, Gathercole & Rayner, Reference Joseph, Liversedge, Blythe, White, Gathercole and Rayner2008; Trueswell, Sekerina, Hill & Logrip, Reference Trueswell, Sekerina, Hill and Logrip1999; Snedeker & Trueswell, Reference Snedeker and Trueswell2004; Mazzocco, Reference Mazzocco1997; Doherty, Reference Doherty2004; Huang & Snedeker, Reference Huang and Snedeker2009). This is precisely the pattern we would expect if children were less incremental, resolving ambiguity at lower levels before passing information on to higher ones.
In the following experiment, adults and children were asked to select a target (logs) in the presence of a competitor (key) that was semantically related to an absent phonological associate (lock). If incremental propagation of information across multiple levels of representation is a late-developing property of comprehension, we would expect children to generate few or no looks to the phono-semantic competitor. If, however, it is an inherent constraint of the architecture of the processing system, we would expect these looks to be common in children as well as adults.
EXPERIMENT 1
Methods
Participants
Twenty-six undergraduate students and thirty five-year-olds (ranging from 5;2 to 5;7, mean age 5;5) participated in this study. All participants were native English speakers.
Procedure and materials
Participants sat in front of an inclined podium divided into four quadrants, each containing a shelf where pictures could be placed (Figure 2). A camera at the center of the display was focused on the participant's face and recorded the direction of their gaze while they were performing the task. A second camera recorded both the location of the items in the display and participants' subsequent actions. For every trial, the experimenter took out four pictures and placed them on each shelf in a prespecified order. This presentation took approximately five seconds. The experimenter then played a prerecorded utterance on a computer, which instructed participants to select one of the pictures (‘Pick up the logs’).
We defined the target (logs) as the picture specified by the instruction. For critical trials, the competitor (key) was semantically related to an absent member of the Target's phonological cohort (lock). The average length of phonological ambiguity between the Target and this phonological associate was 300 ms and the average degree of semantic similarity between the Competitor and phonological associate was M=0·18 (SD=0·20).Footnote 1 To avoid other potential sources of priming, the Competitor was selected to be both phonologically unrelated (i.e. not sharing in onset cluster) and semantically unrelated to the Target (M<0·01). For control trials, the Competitor was replaced with an unrelated control item (carrot) that was phonologically and semantically unrelated to the Target as well as its phonological associate (both Ms<0·01). For each display, the Target and Competitor/Control items were paired with two additional Distractors that were selected based on the same criteria as the Control item (whale and shirt). Pictures of these items were pretested with a separate group of participants to ensure that they spontaneously named the images with the word we intended to use.
Sixteen base triplets consisting of a Target, Competitor and Control item were used to generate two versions of each item (Critical vs. Control trial) which appeared in two presentation lists such that each list contained eight items in each condition and that each base item appeared just once in every list (Appendix). Every item that appeared as a Control item on one list appeared as a Competitor on the other list, ensuring that any differences between the two item types could not be due to differences in the perceptual salience of a particular item. Each of the two lists was presented to half the participants.
Eye-movements were coded, frame-by-frame, from the videotape of the participant's face by a research assistant who was blind to the location of each. Each recorded trial began at the onset of the instruction and ended with completion of the corresponding action. Each change in direction of gaze was coded as towards one of the quadrants, at the center or missing due to looks away from the display or blinking. Twenty-five percent of the trials were checked by a second coder who confirmed the fixation locations for 96·1% of the coded frames. This method of measuring eye-movements has produced data equivalent to that collected using head-mounted eye-tracking (see Appendix D of Snedeker & Trueswell, Reference Snedeker and Trueswell2004).
Results and discussion
Figures 3 and 4 illustrate that Target looks for both adults and children were initially around chance prior to the onset of the critical word and rapidly increased following this target word. To assess the degree of phono-semantic priming, we calculated the total looking time to the Competitor or Control as a proportion of looking time to all four cards. Each time window began and ended 200 ms after the relevant marker in the speech stream to account for the time needed to program saccadic eye-movements (Matin, Shao & Boff, Reference Matin, Shao and Boff1993) and was analyzed with both subjects and items ANOVAs.
We first examine fixations during a baseline period prior to the onset of the target word (−400 through 100 ms window) and found no difference in the looks to the Competitor and Control pictures in either adults (31% vs. 29%) or children (28% vs. 24%; all ps >0·15). However, following the onset of the target word (logs), looks in these conditions began to diverge. To establish when differences emerged, we calculated the proportion of fixations to the Competitor and Control pictures for 100 ms intervals beginning from the onset of the target word and continuing until 1000 ms later. Each of the eight time windows (200–900 ms) is defined by the period from the labeled time point to the frame prior to the onset of the next interval.
In adults, an omnibus ANOVA revealed a significant interaction between time window and trial type (Critical vs. Control) (F1(7,175)=2·96, p=0·006, η2=0·11; F2(7,105)=2·06, p=0·05, η2=0·12). Follow-up analyses revealed that fixations to the Competitor were greater than the Control in the 300 ms (F1(1,24)=4·89, p=0·03, η2=0·19; F2(1,15)=4·23, p=0·06, η2=0·22) and 400 ms time windows (F1(1,24)=4·54, p=0·04, η2=0·18; F2(1,15)=3·25, p=0·09, η2=0·18).Footnote 2 Like Yee & Sedivy (Reference Yee and Sedivy2006), we found evidence of a short-lived activation of the phono-semantic competitor that was time-locked to the initial ambiguity between the Target and the mediating phonological associate. A parallel ANOVA on children's fixations also revealed a significant interaction between time window and trial type (F1(7,203)=2·41, p=0·02, η2=0·08; F2(7,105)=1·59 p=0·15, η2=0·10). Follow-up analyses revealed that fixations to the Competitor were greater than the Control from the 200 ms (F1(1,28)=3·98, p=0·05, η2=0·13; F2(1,15)=2·87, p=0·11, η2=0·16) through 600 ms time windows (F1(1,28)=4·45, p=0·04, η2=0·13; F2(1,15)=3·88, p=0·07, η2=0·21). Thus children, like adults, demonstrated a period of semantic priming from a phonological competitor.
We compared the degree of priming in these two groups by analyzing the mean proportion of looks to the Competitor/Control pictures in an ANOVA with trial type (Critical vs. Control) as a within-subjects variable and age (Adult vs. Child) as a between-subjects variable. We focus on the region of significant priming in children (200–600 ms window) to determine whether priming in this group differed significantly from the priming seen in adults. Figure 5 illustrates that looks to the Competitor were significantly greater than those to the Control picture among both adults (20% vs. 14%) and children (23% vs. 15%) (F1(1,54)=12·86, p=0·001, η2=0·19; F2(1,30)=9·24, p=0·005, η2=0·24). However, there were no effects of age or interaction between age and trial type (all ps>0·20), suggesting that children exhibited the same degree of phono-semantic priming as adults.
Surprisingly, the children's actions provided additional insight into the development of lexical processing. While adults made no errors in this task, children mistakenly selected a non-Target picture in 4% of all trials. Figure 6 illustrates that while children were equally likely to select a Distractor object in the two trial types (ps>0·80, Fisher's exact test), they were far more likely to mistakenly select the Competitor on critical trials than they were to select the matched Control item on control trials (p=0·01, Fisher's exact test). This suggests that children were sometimes unable to inhibit the activation of the phono-semantic prime. Altogether, our findings suggest that early lexical processing involves cascading activation across levels of representation: partial phonological activation of word forms is propagated up to the semantic level resulting in eye-movements to (and sometimes selection of) semantic associates.
Finally, perusal of Figures 3 and 4 suggests one potential limitation of these data. While the significant preference for the Competitor over the Control item did not appear until after the onset of the critical word, there was a small, non-significant difference between the two that emerged towards the end of the baseline period. These early looks to the Competitor could reflect processing of the word based on coarticulatory information. Because we did not splice the instructions, participants may have had access to relevant acoustic information prior to the first 100 ms time window. Alternately, this could reflect differences in visual salience. While we attempted to control for salience by using the same pictures as Competitors and Controls, the salience of an item in context, presumably depends on the other items in the scene which were necessarily different across the two trial types. Finally it could simply be noise.
To explore whether the preference for the Competitor could be due to perceptual biases of this kind, we conducted two additional analyses. First, we compared changes in the proportion of looks to the Competitor/Control pictures in critical and control trials during two time periods of interest: the baseline period prior to the onset of the target word (‘Pick up the’) and the critical windows associated with the phono-semantic priming (300–400 ms window in adults and 200–600 ms window in children). Critically, we found a significant interaction between trial type and time period (F1(1,54)=6·21, p=0·02, η2=0·10; F2(1,30)=4·17, p=0·05, η2=0·12), suggesting that the onset of the target word was followed by an increased preference for the Competitor. There was no further interaction with age (all ps>0·50), suggesting that this effect did not differ across the two groups. Second, we examined a subset of items in which looks to the Competitor and Control item were matched prior to the onset of the critical word (eight out of sixteen items). In adults, looks to Competitor were no different than those to the Control item during the baseline period (29% vs. 29%; p>0·90), but during the critical window, looks to the Competitor exceeded those to the Control item (34% vs. 17%; F1(1,25)=8·39, p=0·01, η2=0·25). A similar pattern emerged in children where looks to Competitor and Control item were no different during the baseline period (28% vs. 29% respectively; p>0·90) but during the critical window, looks to the Competitor exceeded those to the Control item (26% vs. 16%; F1(1,29)=19·14, p=0·001, η2=0·31).
However, another way to definitively distinguish whether looks to the Competitor truly reflect lexical access of the Target is to use the same displays but modify the instructions to ask for an unrelated picture (e.g. ‘Pick up the shirt’). If prior preference for the Competitor is not specifically linked to linguistic processing, then we should again expect to find greater fixations to the Competitor. If, however, this preference reflects phono-semantic priming, then looks to the Competitor should no longer differ from the Control.
EXPERIMENT 2
Methods
Participants
Twenty-six undergraduate students and thirty five-year-olds (ranging from 5;1 to 5;6, mean age 5;3) participated in this study. All participants were native English speakers.
Procedure and materials
The procedure and materials were identical to Experiment 1, but the target utterance now asked for a Distractor, e.g. ‘Pick up the shirt’. We will now refer to this picture as the Target but will continue to refer to the pictures of interest as the Competitor and Control items. The data was coded in the manner described in Experiment 1. Twenty-five percent of trials were double coded and inter-coder reliability was 95·6%.
Results and discussion
Figures 7 and 8 illustrate that Target looks for both adults and children again began around chance prior to the onset of the critical word and rapidly increased following this target word. As in Experiment 1, we found no difference in the proportion of looks to the Competitor and Control picture prior to the onset of the target word in both adults (22% vs. 23%) or children, (27% vs. 29%; all ps>0·50). We then calculated the proportion of Competitor and Control fixations for 100 ms intervals beginning from the onset of the target word and continuing until 1000 ms later. However, unlike in Experiment 1, an omnibus ANOVA here revealed no significant interaction between time window and trial type (Control vs. Critical) in both adults and children (all ps>0·50). A closer examination of the fine-grained time windows also revealed no effect of trial type in each of the individual intervals (all ps>0·20).
Next we focused on Competitor/Control looks during the significant priming windows established in Experiment 1 (300–400 ms window in adults and 200–600 ms window in children). Using an ANOVA, we compared how looks to these items varied with respect to trial type (Critical vs. Control) as a within-subjects variable and Experiment (1 vs. 2) as a between-subjects variable. Both adults (F1(1,50)=6·61, p=0·01, η2=0·12; F2(1,30)=3·77, p=0·06, η2=0·11) and children (F1(1,58)=5·86, p=0·02, η2=0·10; F2(1,30)=5·25, p=0·03, η2=0·15) demonstrated the predicted interaction between trial type and Experiment. This suggests that looks to the Competitor were only greater than looks to the Control item in situations where the Competitor was semantically related to a phonological associate of the spoken target word.
Finally, as in Experiment 1, adults never made errors in their actions in this task. Similarly, children made fewer incorrect selections in this task compared to Experiment 1 (4% vs. 1% of all trials, Z=2·05, p=0·04). Critically, the frequency of errors in Experiment 2 did not differ across selection of the Competitor and Control items (p>0·40, Fisher's exact test). Focusing just on the Critical trials, we found a greater preference to select the Competitor in Experiment 1 compared to Experiment 2 (p=0·02, Fisher's exact test). This suggests that children's errors were driven by their failure to inhibit the activation of the phono-semantic prime.
General discussion
This study demonstrates the presence of informational cascade in early word recognition. Like adults, children map partial speech input onto phonological representations which in turn activate candidate lexical entries and their semantic representations. These findings provide converging evidence that the ability to incrementally process information across multiple levels of representation is a basic architectural feature of the lexicon (Swingley et al., Reference Swingley, Pinto and Fernald1999; Fernald et al., Reference Fernald, Swingley and Pinto2001; Sekerina & Brooks, Reference Sekerina and Brooks2007). However, our results also point to a possible difference between the two age groups. Unlike adults, children continue to look at the phono-semantic prime even after the ambiguity between the referent and the mediating phonological associate had been resolved. Furthermore, children were more likely to mistakenly select this prime relative to an unrelated item. Thus while adults are able to rapidly use subsequent phonological information to swiftly rule out the phono-semantic competitor, children sometimes fail to do so.
This suggests the possibility that children are less adept at resolving the competition between the target and phono-semantic prime. Evidence of parallel difficulties in overriding an initial misinterpretation occur in a variety of linguistic domains ranging from syntactic ambiguity resolution (Trueswell et al., Reference Trueswell, Sekerina, Hill and Logrip1999; Snedeker & Trueswell, Reference Snedeker and Trueswell2004), to homonym interpretation (Mazzocco, Reference Mazzocco1997; Doherty, Reference Doherty2004), and pragmatic inferencing (Huang & Snedeker, Reference Huang and Snedeker2009). For example, Trueswell and his colleagues (Reference Trueswell, Sekerina, Hill and Logrip1999) presented adults and five-year-olds with temporarily ambiguous sentence like ‘Put the frog on the napkin in the box’. When the sentences were presented in contexts with just one frog, both adults and children initially misinterpreted the first prepositional phrase (‘on the napkin’) as a location. However, when adults heard the disambiguating phrase (‘in the box’), they quickly reinterpreted the first preceding phrase as a modifier of the noun. Children, however, never made this revision and continued to interpret the phrase as a location, even performing actions that reflected this misanalysis.
Novick and colleagues have suggested that children's inability to revise, despite the presence of incongruent linguistic cues, may be due in part to the immaturity of cognitive control mechanisms at this age (Novick, Trueswell & Thompson-Schill, Reference Novick, Trueswell and Thompson-Schill2005). Cognitive control, it is argued, is necessary for any task in which one must reconcile conflicting information or override a preferred analysis. These abilities continue to develop throughout middle childhood, as evidenced by children's poor performance on measures such as the Stroop task, the go/no-go task (Bunge, Dudukovic, Thomason, Vaidya & Gabrieli, Reference Bunge, Dudukovic, Thomason, Vaidya and Gabrieli2002), delayed-response tasks (Diamond & Doar, Reference Diamond and Doar1989) and tasks of selective attention (Luciana & Nelson, Reference Luciana and Nelson1998; Pearson & Lane, Reference Pearson and Lane1991). This would also be in line with a recent study demonstrating that children's ability to inhibit a default interpretation during language comprehension is related to their performance on a Dimensional Change Card Sorting task (Jincho, Mazuka & Yamane, Reference Jincho, Mazuka and Yamane2007).
However, while the cognitive control hypothesis seeks explanations for children's linguistic behavior by examining co-occurring changes across multiple domains, an alternate strategy is to closely examine the process of word recognition itself in search of mechanisms which might account for the observed differences between the adults and children. Two possibilities come to mind. First, the more persistent activation of the phono-semantic prime in the children could reflect slower or less efficient processing of the incoming phonological information. This could result in a weaker advantage for the target word-form relative to the absent cohort competitor, and thus might lead to continued interference from the phono-semantic competitor. Our results provide some support for this hypothesis. In the first 100 ms window following the onset of the target word, adults' average looks to the Target exceeded those of children in both Experiment 1 (36% vs. 27%; F1(1,54)=3·91, p=0·05, η2=0·09; F2(1,30)=3·10, p=0·09, η2=0·09) and Experiment 2 (38% vs. 28%; F1(1,54)=11·33, p=0·001, η2=0·17; F2(1,30)=7·88, p=0·009, η2=0·21). This latter difference is particularly informative since it suggests that children's delays were not driven solely by the semantic priming of the Competitor but instead might reflect the reduced efficiency of bottom-up activation from the speech signal. This hypothesis also provides an alternate account for recent findings demonstrating extended phonological cohort competition in children at this age (e.g. looks to a lock after hearing logs; Sekerina & Brooks, Reference Sekerina and Brooks2007).
Second, the children's failure could reflect the immaturity of a mechanism which inhibits competing lexical representations. Such mechanisms are a common feature of current models of adult word recognition. For example, the TRACE model (McClelland & Elman, Reference McClelland and Elman1986) includes both excitatory connections between phonological and lexical units as well as inhibitory connections between units at the same level. These latter connections serve an important role in resolving competition among active candidate forms, yet they do so through a much more bottom-up process: inhibition of one node is a passive result of activation of some other node. Thus, on this account, a developmental change in the inhibition of lexical competitors would be captured by increasing the strength of these local inhibitory connections over time. Such a proposal seems quite different in spirit than one invoking the development of a central control process.
These experiments suggest several lines of inquiry. First, they raise the question of whether there are early individual differences in the processes underlying word recognition and whether these differences have implications on later development. Recent developmental work suggests that there are robust individual differences in the speed of word recognition in infancy which predict differences in linguistic and cognitive abilities throughout early childhood (Fernald, Perfors & Marchman, Reference Fernald, Perfors and Marchman2006; Marchman & Fernald, Reference Marchman and Fernald2008). Work on adult word recognition has also highlighted individual differences in frequency and cohort effects which in turn influence the speed of lexical processing (Mirman, Dixon & Magnuson, Reference Mirman, Dixon and Magnuson2008). Second, these results raise the question of whether incremental propagation is present at even earlier stages of lexical development. We are currently using this procedure to examine word recognition in three-year-olds. Evidence of phono-semantic priming in this age would provide further support for the hypothesis that incremental propagation is a basic architectural feature of the lexicon that is present early in development.