Introduction
Processing spoken language is a complex task, in part due to the multi-dimensional and variable characteristics of speech. This ability continues to develop throughout childhood, even into adolescence (Rigler, Farris-Trimble, Greiner, Walker, Tomblin & McMurray, Reference Rigler, Farris-Trimble, Greiner, Walker, Tomblin and McMurray2015). Language complexity is further intensified in multilingual contexts: learners who are exposed to more than one language also receive more variable speech input (Byers-Heinlein & Fennell, Reference Byers-Heinlein and Fennell2014), and the multiplicity of phonetic cues from different languages can create challenges for the learner. The way that bilingual learners cope with variability is not well documented, especially in the context of spoken word recognition. Coarticulation, a process where sounds in words influence each other (Fowler, Reference Fowler1980), is an ever-present source of phonetic variability. Here, we investigate how children use fine-grained phonetic information (i.e., coarticulation) during word processing. First, we examine potential developmental changes between younger and older monolingual children. Second, we compare how older monolingual and bilingual children process coarticulation cues within words to examine the influence of bilingualism on language processing.
Coarticulation processing in adults and children
Coarticulation and phonetic variability impact spoken word recognition in adults (Archibald & Joanisse, Reference Archibald and Joanisse2011; Beddor, McGowan, Boland, Coetzee & Basher, Reference Beddor, McGowan, Boland, Coetzee and Brasher2013; Dahan, Magnuson, Tanenhaus & Hogan, Reference Dahan, Magnuson, Tanenhaus and Hogan2001; Desmeules-Trudel & Zamuner, Reference Desmeules-Trudel and Zamuner2019; Gow, Reference Gow2003; McMurray, Clayards, Tanenhaus & Aslin, Reference McMurray, Clayards, Tanenhaus and Aslin2008; McMurray, Tanenhaus & Aslin, Reference McMurray, Tanenhaus and Aslin2002; Salverda, Kleinschmidt & Tanenhaus, Reference Salverda, Kleinschmidt and Tanenhaus2014) and children (Cross & Joanisse, Reference Cross and Joanisse2018; Johnson & Jusczyk, Reference Johnson and Jusczyk2001; Mahr, McMillan, Saffran, Weismer & Edwards, Reference Mahr, McMillan, Saffran, Ellis Weismer and Edwards2015; Paquette-Smith, Fecher & Johnson, Reference Paquette-Smith, Fecher and Johnson2016; Zamuner, Moore & Desmeules-Trudel, Reference Zamuner, Moore and Desmeules-Trudel2016). One coarticulatory phenomenon that influences auditory word recognition in monolingual English-speaking adults and young children is vowel nasalization (Beddor et al., Reference Beddor, McGowan, Boland, Coetzee and Brasher2013; Zamuner et al., Reference Zamuner, Moore and Desmeules-Trudel2016). Regressive vowel nasalization is a well-known coarticulatory pattern in English: vowels are partially nasalized before nasal consonants (Beddor, Reference Beddor2009; Cohn, Reference Cohn1990). The gesture associated with the nasal consonant (i.e., lowering of the velum) starts early and overlaps with the preceding vowel. This results in a vowel that has a different acoustic quality than when it is followed by a non-nasal consonant, likely due to a general loss in spectral energy (Delattre, Reference Delattre1965; Maeda, Reference Maeda, Huffman and Krakow1993) and the emergence of nasal formants and antiformants (Fujimura, Reference Fujimura1962; Kurowski & Blumstein, Reference Kurowski, Blumstein, Huffman and Krakow1993). In perception, Beddor et al. (Reference Beddor, McGowan, Boland, Coetzee and Brasher2013) found that English adult listeners recognize words with nasal consonants (e.g., scent) more quickly when the vowel was nasalized for a longer proportion of its duration than when it was nasalized for a shorter period. This indicates that coarticulation cues that occur early in the speech signal contributed to faster recognition of the target than coarticulation cues that occur late in the signal, even though nasalization cues are noncontrastive, optional, and variable in English. This finding demonstrates that listeners use coarticulatory vowel nasalization as a cue to word recognition. This evidence also suggests that word representations are rich and include some amount of fine-grained phonetic detail (see also Browman & Goldstein, Reference Browman and Goldstein1986; Pierrehumbert, Reference Pierrehumbert, Gussenhoven and Warner2002) which are traditionally considered redundant in the literature on lexical and phonological storage (Archangeli, Reference Archangeli1988; Keating, Reference Keating1988; Lahiri & Marslen-Wilson, Reference Lahiri and Marslen-Wilson1991; Steriade, Reference Steriade and Goldsmith1995).
Only a few studies have investigated the influence of coarticulation on word processing in development. Mahr et al. (Reference Mahr, McMillan, Saffran, Ellis Weismer and Edwards2015) demonstrated the influence of coarticulation across word boundaries in 18–24-month-olds. In another study, Paquette-Smith et al. (Reference Paquette-Smith, Fecher and Johnson2016) showed that two-year-old toddlers (2;0 to 2;5 years old) were able to detect mismatches in a variety of coarticulation cues within words, suggesting that children can use subphonemic (i.e., fine-grained phonetic detail) information during spoken word recognition. This finding is further supported by Zamuner et al. (Reference Zamuner, Moore and Desmeules-Trudel2016), who found that two- and three-year-old (monolingual) toddlers, as well as adults, use regressive vowel nasalization during online spoken word recognition. In the procedure, participants were presented with spoken words that had been cross-spliced to manipulate coarticulation cues. For example, the oral vowel from the word boat [boʊt] was replaced with the nasalized vowel from the word bone [bõʊ̃n] (the tilde represents nasalization on the vowel), yielding a stimulus with a mismatching nasalized vowel [bõʊ̃t]. Adults and children were able to perceive the mismatching vowel nasalization, as indicated by the fact that they looked towards the image of the bone when they heard boat presented with a nasalized vowel. However, while adults ended up fixating the target boat well above chance by the end of the mismatching ([bõʊ̃t]) trials, young children were unable to resolve the ambiguity caused by the cross-splicing. Instead, children hovered around chance towards the end of the trial, as if they could not decide which of the two words they had heard. This pattern of results showed time-dependent sensitivity to coarticulation cues in toddlers, but also suggested that young listeners had difficulty resolving phonetic mismatches, which could be attributed to children's relative inefficiency in resolving lexical competition compared to adults (Huang & Snedeker, Reference Huang and Snedeker2011; Rigler et al., Reference Rigler, Farris-Trimble, Greiner, Walker, Tomblin and McMurray2015; Sekerina & Brooks, Reference Sekerina and Brooks2007; Swingley, Pinto & Fernald, Reference Swingley, Pinto and Fernald1999). As a tentative explanation based on Huang and Snedeker (Reference Huang and Snedeker2011) who found that five-year-old children showed continued interference from a competitor word, Zamuner et al. (Reference Zamuner, Moore and Desmeules-Trudel2016) hypothesized that the smaller number of exemplars in memory may yield less robust word representations, and therefore result in lower activation of the target word when the auditory stimuli contained mismatches. Another complementary hypothesis was based on the less mature processing system in toddlers, yielding different competitor inhibition mechanisms (Huang & Snedeker, Reference Huang and Snedeker2011) and thus difficulties in recognizing ambiguous stimuli. This is an open question, as little work has examined the development of the link between spoken word processing and competitor inhibition in children (e.g., with nine-year-olds and sixteen-year-olds, see Rigler et al., Reference Rigler, Farris-Trimble, Greiner, Walker, Tomblin and McMurray2015). However, the general finding seems to be that children are slower than adults at activating targets and inhibiting competitors (Cross & Joanisse, Reference Cross and Joanisse2018; Huang & Snedeker, Reference Huang and Snedeker2011; Rigler et al., Reference Rigler, Farris-Trimble, Greiner, Walker, Tomblin and McMurray2015). In our study, we compare and extend the findings with toddlers from Zamuner et al. (Reference Zamuner, Moore and Desmeules-Trudel2016) with a group of older monolingual children (4;3 to 6;5 years old), to investigate whether older children are better able to resolve mismatching coarticulatory cues.
Second language perception and phonetic variability
In addition to competition and inhibition mechanisms, exposure to phonetic variability has been shown to significantly impact lexical processing and word learning in children (e.g., Rost & McMurray, Reference Rost and McMurray2009) as well as adult second language (L2) processing (Barcroft & Sommers, Reference Barcroft and Sommers2005). However, the picture is not as clear for bilingual children. In their review article, Byers-Heinlein and Fennell (Reference Byers-Heinlein and Fennell2014) argue that exposure to more than one language often results in more phonetic variation in the input for young learners, which could in turn result in maintained sensitivity to more phonetic contrasts than monolinguals. For example, bilinguals can be exposed to two languages from the same person or within code-switched sentences (Byers-Heinlein, Reference Byers-Heinlein2013), and speech sounds produced by bilinguals are often different from monolinguals (MacLeod & Stoel-Gammon, Reference Macleod and Stoel-Gammon2009). Bilingual learners are thus exposed to greater variability than monolinguals in general. Therefore, given that exposure to variability influences early lexical processing (Rost & McMurray, Reference Rost and McMurray2009), bilingual learners are expected to maintain sensitivity to phonetic distinctions in both of their languages (for a review of the process in adults see Flege, Reference Flege2007), i.e., they ought to discriminate more contrasts than monolinguals (Burns, Yoshida, Hill & Werker, Reference Burns, Yoshida, Hill and Werker2007; Kuhl, Conboy, Coffey-Corina, Padden, Rivera-Gaxiola & Nelson, Reference Kuhl, Conboy, Coffey-Corina, Padden, Rivera-Gaxiola and Nelson2008; Sundara, Polka & Genesee, Reference Sundara, Polka and Genesee2006). However, monolingual children are expected to maintain sensitivity to distinctions that are contrastive in their native language but lose this ability for foreign contrasts, as has been repeatedly shown in the literature (Kuhl et al., Reference Kuhl, Conboy, Coffey-Corina, Padden, Rivera-Gaxiola and Nelson2008).
To date, very little work has been conducted on the interplay between phonetic details and word recognition in bilingual children, a process that depends on the ability to distinguish sounds. Some work has examined how monolinguals and bilinguals process a Catalan vowel contrast between /ɛ/ and /e/, which maps to a single vowel category in Spanish. Children's sensitivity to this contrast appears to depend partly on the stimuli used. In one study which included cognates (Ramon-Casas, Swingley, Sebastián-Gallés & Bosch, Reference Ramon-Casas, Swingley, Sebastián-Gallés and Bosch2009), Catalan–Spanish bilingual children (aged 17 to 27 months) were insensitive to the /ɛ/-/e/ contrast. However, in a study using novel words (Ramon-Casas, Fennell & Bosch, Reference Ramon-Casas, Fennell and Bosch2017), bilinguals aged 21 and 22 months were able to perceive the /ɛ/-/e/ contrast. While the work by Ramon-Casas and colleagues illustrates the variation in phonemic perception between monolinguals and bilinguals, these studies do not examine how bilingual children cope with coarticulatory information within spoken words. It is important to make this distinction because phonetic cues differ across languages in how they are realized (Cohn, Reference Cohn1990). For example, as mentioned above, vowel nasalization is coarticulatory, non-contrastive and variable in English (Beddor, Reference Beddor2009); vowel nasalization is not necessary for recognizing words in English, e.g., the word scent can be realized with a vowel that is more or less nasalized and listeners will recognize the word anyway. However, languages like French have phonological nasalization on vowels (Cohn, Reference Cohn1990), which can be variably realized as well, where words differ based solely on vowel nasalization (e.g., pain [pẽ] ‘bread’ ~ paix [pɛ] ‘peace’), and in which phonological nasalization is expressed through longer nasalization duration on the vowel (Desmeules-Trudel & Brunelle, Reference Desmeules-Trudel and Brunelle2018). French learners must remain phonologically sensitive to vowel nasalization duration to differentiate between words, while the same cue does not indicate meaning differences between words for English listeners (i.e., variable nasalization duration always corresponds to a coarticulatorily nasalized vowel in English) even though the cue can be used to speed up word recognition.
When it comes to bilingual children's perception of phonetic properties that are present in both of their languages (e.g., vowel nasalization for English–French bilinguals, however with different phonological status in their languages), little is known concerning the perception of sublexical (e.g., coarticulatory) information. However, we know that young monolingual children's word recognition patterns are significantly influenced by coarticulation (Paquette-Smith et al., Reference Paquette-Smith, Fecher and Johnson2016; Zamuner et al., Reference Zamuner, Moore and Desmeules-Trudel2016) and bilingual children can maintain sensitivity to phonetic properties in more than one language (Burns et al., Reference Burns, Yoshida, Hill and Werker2007; Kuhl et al., Reference Kuhl, Conboy, Coffey-Corina, Padden, Rivera-Gaxiola and Nelson2008; Ramon-Casas et al., Reference Ramon-Casas, Fennell and Bosch2017; Sundara et al., Reference Sundara, Polka and Genesee2006).
This thus leads us to the first formal goal of the current study, which is to investigate if bilingual children are more or less sensitive to (mismatching) coarticulatory cues than monolinguals. Specifically, we aim to determine if the presence of contrastive vowel nasalization in the L2 (French) has an influence on the perception of nasal coarticulation (i.e., non-contrastive) in the L1 (English). In the current study, we operationalize sensitivity to nasal coarticulation through potential disruptions in the word recognition patterns of items that contain phonetic mismatches (i.e., the presence of a nasalized vowel in an oral-consonant context, see below for specific methods). Based on previous research, we expect that bilingual children will display sensitivity to coarticulation, just like their monolingual peers. However, no strong predictions can be made as to whether bilinguals’ sensitivity will be lesser, equal to, or greater than that of monolinguals. On the one hand, some studies have documented that monolinguals and bilinguals show similar processing abilities (e.g., Byers-Heinlein, Fennell & Werker, Reference Byers-Heinlein, Fennell and Werker2013; Legacy, Zesinger, Friend & Poulin-Dubois, Reference Legacy, Zesinger, Friend and Poulin-Dubois2018), which would yield to similar patterns of sensitivity to coarticulation (i.e., equal disruption in word processing when mismatching cues are present) in monolinguals and bilinguals. On the other hand, some research has demonstrated a bilingual advantage in processing (for a review, see Bialystok, Craik & Luk, Reference Bialystok, Craik and Luk2012) and yet some other research that bilinguals lag behind their monolingual peers (Pelham & Abrams, Reference Pelham and Abrams2014). If these hypotheses are true, given the fact that both languages’ systems influence each other (Brasileiro Reis Pereira, Reference Brasileiro Reis Pereira2009; Fabiano & Goldstein, Reference Fabiano and Goldstein2005; Paradis, Reference Paradis2001), one might predict that English–French bilinguals’ sensitivity to vowel nasalization will be different from monolingual English listeners. For example, since English–French bilingual children have to maintain a phonological contrast between non-nasalized and nasalized vowels in their (French) lexicon, one might expect their phonological system to treat coarticulatory vowel nasalization in English differently, perhaps with greater sensitivity to coarticulation. This prediction would be supported by the fact that bilingual listeners are exposed to more variability for this phonetic cue (Byers-Heinlein & Fennell, Reference Byers-Heinlein and Fennell2014), and that this kind of variability may motivate maintaining fine-grained perceptual abilities for vowel nasalization in English–French bilingual children.
The second goal of the current study is to examine if four- to six-year-old children are able to resolve coarticulatory mismatches within words, compared to the younger group from Zamuner et al. (Reference Zamuner, Moore and Desmeules-Trudel2016). We are interested in this question because younger children (two- and three-year-olds) could not yet overcome the coarticulation mismatch in Zamuner et al.'s (Reference Zamuner, Moore and Desmeules-Trudel2016) previous investigation. Since Huang and Snedeker (Reference Huang and Snedeker2011) found evidence that five-year-old children show sustained interference from competitors during word discrimination, we do not expect our group of four- to six-year-old participants to resolve the phonetic mismatch as efficiently as adults. However, we predict that they will be more adult-like in their resolution of competitor interference than Zamuner et al.'s (Reference Zamuner, Moore and Desmeules-Trudel2016) toddlers, given their older age.
Methods
Participants
The group of younger monolinguals were 19 children, aged 2;1 to 3;10, who completed the study published in Zamuner et al. (Reference Zamuner, Moore and Desmeules-Trudel2016), and for whom the data was reanalyzed below in order to provide a comparison with the older monolingual group.Footnote 1 The other children, aged between 4;0 and 6;11 years, completed the same experiment (N = 119). We focused on this age range to determine if children older than 3;0 years could resolve phonetic ambiguity created by mismatching phonetic cues. Children were tested in a sound-attenuated room on a university campus or museum-based lab. Twenty bilinguals (ten girls, ten boys; age range: 4;3 to 6;5 years; M = 5;4 years; SD = 8.1 months) and twenty age-matched monolinguals (seven girls, thirteen boys; age range: 4;3 to 6;5 years; M = 5;4 years; SD = 8.2 months) are included in the current analysis.Footnote 2 Note that the majority of the data collection was conducted in a museum-based lab, which has the objective of involving the community in developmental research through research participation and knowledge translation to the families. In this context, parents are welcome to walk into the reception area of the museum-based lab and are offered to participate in the research with their child. Consequently, given the inclusive mandate of the testing setting, we did not restrict our recruitment criteria to only monolinguals and bilinguals, but rather provided an opportunity to the children to participate in a research activity. However, as we were interested in researching sensitivity to coarticulation and coarticulatory-mismatch solving abilities in monolinguals and bilinguals, our strategy was to select two groups of age-matched monolinguals and bilinguals. Note that all children from the large testing sample that fitted the inclusion criteria below were included in the analyses.
The final group of children included in the analyses were either English monolingual or English–French bilinguals who had not been diagnosed with a speech or hearing delay as determined by parental questionnaire. Our criteria for determining bilingualism in children were established through parental report and are as follows: children had to be considered English-dominant (i.e., 50% or more exposure to English in the family, to ensure that the children would know all of the used words in the experiment), had to be exposed to L2 French ≥ 30% of the time for two consecutive years (across contexts such as home, daycare/school, with extended family), had to be exposed to French as an L2 from the first year of life, and had to be exposed to L2 French at least 30% of the time through overall development. No participants spoke an L2 other than French. Due to the absence of wide-spread norms for establishing bilingualism status in L2 research, we required bilingual participants to be exposed to L2 French for more than a quarter of their linguistic interactions, therefore the 30% cut-off point. Bilingual children's average exposure to French across development was 45.1%, SD = 7.8%. Monolingual children had been exposed to French less than 30% overall and had been exposed to French for less than two consecutive years. Monolingual children's average exposure to French across development was 6.1%, SD = 6.3%. Note that in the Canadian education system, children can be exposed to French as an L2 relatively early, at four or five years old, which means that all children in our sample were likely exposed to French to some extent, although not continued exposure to French in the monolingual group. Other children who were tested did not fit our bilingualism criteria or were not age-matched (N = 26) or contributed only one trial in one of the experimental conditions (N = 37). Note that these latter 37 children fixated to target images in filler trials, but did not meet our criterion of looking to at least two trials in both the same-splice and cross-splice conditions. There were 16 participants excluded based on a failure to calibrate or other technical problems (N = 14), fussing (N = 1), or parental interference (N = 1).
Stimuli
Stimuli were six pairs of imageable English nouns (see Table 1). Each pair started with the same consonant and vowel, followed by either an oral consonant (e.g., boat [boʊt]) or a nasal consonant (e.g., bone [bõʊ̃n]), and both had the same place of articulation. As in Zamuner et al., (Reference Zamuner, Moore and Desmeules-Trudel2016), three additional experimental pairs (duck–dumptruck, leg–lemon, and egg–M) were excluded from the analyses because of multiple coarticulation cues: nasalization and place of articulation. There were nine filler pairs (boots–carrot, star–keys, monkey–camel, frog–fish, dog–elephant, turtle–sandwich, chicken–kangaroo, doll–clock, and flower–sun). The stimuli were recorded by a female native speaker of Canadian English and normalized for amplitude at 70 dB. A trained phonetician spliced the stimuli by keeping the initial and final consonants of an oral word token (e.g., [boʊt]1) and replacing the original vowel with one from another token of the same word (e.g., [boʊt]2) or a nasal token (e.g., [bõʊ̃n]N), considering zero crossings to avoid acoustic artifacts like clicks or noises in the final signal. This yielded two splicing conditions: one with matching phonetic cues (same-splice, e.g., [b1o͜ʊ2t1]), and one with mismatching phonetic cues (cross-splice, e.g., [b1õʊ̃Nt1]). In Table 1, the two rightmost columns indicate the vowel onset timing within the target word.
Table 1. List of the target-competitor pairs and durational details about the auditory stimuli.

The images used in the experiment were the same size, and animacy was also controlled for within pairs (e.g., adding eyes to the cloud image which was paired with the image of a clown) in order to minimize preference effects in children. Potential frequency effects could not be controlled for due to the limited number of familiar and picturable C(C)VC-C(C)VN English minimal pairs. We also could not include word frequency as a covariate in our analyses given the low number of items in our procedure, although this question could be the object of further investigations in the future.
Design and procedure
Children were tested by themselves or on their parent's lap. Eye gaze data was collected on an Eyelink1000 (campus-based lab) or an Eyelink1000 Plus (museum-based lab) eye tracker in monocular remote mode, measuring movements of the right eye. The experimenter proceeded through a three-point calibration before the familiarization phase. During the familiarization, children saw each test and filler image and heard the corresponding unspliced label. During the experimental phase, children saw a central fixation point to ensure that they looked at the center of the screen at the beginning of the trial, then two images appeared on the screen. Experimental and filler images always appeared in the same pairs. Images were displayed for 1500 ms, and then an audio clip with the phrase “Look at the [target]” played. The images remained on the screen for four seconds after the onset of the sound file; each trial lasted approximately six seconds. The entire experiment took approximately five minutes to complete, with a total 18 trials. The splicing condition in which each item was presented was counterbalanced across participants (e.g., half of the children heard boat in the same-splice condition, and half in the cross-splice condition).
Results
Analysis procedure
Eye movement data (right eye only) was extracted using DataViewer 2.16 in 50-ms time bins. Proportions of fixations to the target images within the time bins were calculated as:

The data were statistically analyzed using generalized additive mixed models (GAMMs; Wood, Reference Wood2017), which can account for nonlinear trends through time, as found in eye tracking data. GAMMs can also include (linear or nonlinear) random effects, and account for autocorrelation in the time-dependent data (i.e., one data point in time is necessarily correlated to the preceding data point, which can yield to an overconfidence of model estimates; Baayen, van Rij, de Cat & Wood, Reference Baayen, van Rij, de Cat, Wood, Speelman, Heylen and Geeraerts2018; Porretta, Kyröläinen, van Rij & Järvikivi, Reference Porretta, Kyröläinen, van Rij and Järvikivi2018). Furthermore, GAMMs do not assume normal distribution of the data, which makes them appropriate for eye fixation data. We will present two models in the current paper: first, a GAMM for the fixations to the target (e.g., the boat image) in the cross-splice splicing condition only to assess sensitivity to English nasal coarticulation in monolinguals and bilinguals as well as competitor inhibition patterns, and a second GAMM on the fixations to the target image in the filler trials to assess group differences between mono- and bilinguals in unspliced words.
The dependent variable of the GAMMs was the empirical-logit-transformed fixations (Barr, Reference Barr2008) to the target. Empirical logits are an approximation of the log odds of looking to one image (e.g., the target) compared to the other image (e.g., the competitor), calculated as:

where y corresponds to the number of samples during which the target was fixated, and N corresponds to the total number of samples within the time bin (i.e., eye tracker sampling at 500 Hz, thus 25 samples per 50-ms time bin).
The independent factors of interest for the GAMMs were the time window of analysis (between 300 ms and 2000 ms for experimental trials, and between 300 ms and 1000 ms for filler trials, see below) and language background (young monolingual, old monolingual or old bilingual). We chose a shorter window of analysis for the filler trials since these were not spliced, were unambiguous, and there was no expected effect of phonological competition (e.g., the item star was presented next to the item keys). The peak in average fixations to the target in the filler trials occurred at approximately 750 ms for all three groups.
For the GAMM on experimental trials, we modeled empirical-logit-transformed fixations in the cross-splice condition in order to assess sensitivity to phonetic mismatch. The time window of interest was chosen for analysis of experimental trials between 300 ms after word onset to account for eye movement programming delay (Buckler & Fikkert, Reference Buckler and Fikkert2016; Zamuner et al., Reference Zamuner, Moore and Desmeules-Trudel2016) until 2000 ms after word onset, a time at which it is likely that children will continue to look at the images based on the prompt. Similarly to other GAMM analyses (Porretta, Tucker & Järvikivi, Reference Porretta, Tucker and Järvikivi2016; Porretta et al., Reference Porretta, Kyröläinen, van Rij and Järvikivi2018), random effects corresponded to a combination of participant and trial (i.e., event), allowing each trial (for each participant) to have its own intercept in the model. An AR-1 autocorrelation value of 0.868 was empirically determined based on the data and included in the experimental GAMM formula, and a value of 0.706 for the filler items GAMM. Autocorrelation values correspond to the average correlation of a given data point with the preceding one in the time series. We present below a difference curve (fixations to the target by young monolinguals minus fixations by old monolinguals), generated from the GAMM, between younger monolinguals and older monolinguals to assess the differences in fixations to the target in the cross-splice condition, and thus examine if one or the other group was more sensitive to phonetic mismatches (i.e., sensitivity to coarticulation). We also present a difference curve between older bilinguals and monolinguals to assess the group differences concerning sensitivity to nasal coarticulation.
Analysis of proportions of fixations, experimental trials
In this analysis, we were interested in the effect of language background (bilingual or monolingual) and participant age (young monolinguals and old monolinguals) on sensitivity to nasal coarticulation. Eye tracking results in Figure 1 show the general effects of splicing condition on fixations to the target image. Higher values on the y-axis suggest that children tended to fixate the target image more. In all groups, participants looked more to targets in the same-splice condition (grey lines) compared to the cross-splice condition (blue lines). Focusing on the same-splice condition (grey lines), there does not seem to be a difference between monolinguals and bilinguals, as demonstrated by similar shapes and overlapping error bars throughout the analysis window. However, young monolinguals fixated to the target slightly less than older monolinguals between 500 ms and 1000 ms within the trial, although the error bars seem to overlap with older children. This suggests relatively similar processing abilities for all children for same-splice words.

Figure 1. Overall fixation patterns to the target by splicing condition and participant language background. Higher proportions of fixations on the y-axis correspond to more fixations to the target (e.g., boat) and lower proportions of fixations on the y-axis correspond to more fixations to the competitor (e.g., bone).
In the cross-splice condition (blue lines in Figure 1), bilinguals (triangles) maintained similar proportions of fixations to the target as older monolinguals, but young monolinguals (empty squares) fixated more to the target (e.g., boat) than both the older groups. Note that in our procedure, proportions of fixations were calculated based on fixations to the images only, which then suggests that young monolinguals shifted their gaze between the target (e.g., boat) and (nasal) competitor (e.g., bone) more than the other groups. This suggests that the younger group was disrupted by the phonetic mismatch, but that they also fixated less to the competitor image, which suggests less sensitivity to coarticulatory nasalization. In other words, young monolinguals did not inhibit the target (e.g., boat) as much as the older groups when hearing phonetic cues that corresponded to the nasal competitor (e.g., bone), thus that they might not consider coarticulation as much when processing words.
This finding is also supported by the statistical analysis presented in the difference curves in Figure 2 (also see Table A1 in Appendix). For illustration purposes, this figure presents difference curves in fixations to the target (e.g., blue empty-squares curve minus blue circles curve in Figure 1, and blue triangle curve minus blue circles curve in Figure 1) between young monolinguals and older monolinguals (Figure 2A), as well as between older bilinguals and older monolinguals (Figure 2B) within the time window (x-axis) on separate panels. These curves were computed with the plot_diff function of the itsadug package (van Rij, Wieling, Baayen & van Rijn, Reference van Rij, Wieling, Baayen and van Rijn2017). This function plots difference curves in predicted (mean and confidence intervals) data by the model. Portions of the difference curves that are significantly above or below 0 represent a significant difference between the two groups for a given time interval, and are noted with red-shaded intervals below. Y-values below 0 represent more fixations to the competitor by young monolinguals or bilinguals than old monolinguals, and y-values above 0 represent more fixations to the target image.

Figure 2. Difference curves (experimental trials) as predicted by the GAMM analysis for young monolinguals against old monolinguals (A) and old bilinguals against old monolinguals (B) in the cross-splice condition only. (Red) shaded area corresponds to the interval in which the difference is significant. Grey bands around the average curve correspond to 95% confidence intervals.
We were thus able to establish that young monolinguals fixated significantly more to the target in cross-spliced trials between 950 ms and 1385 ms when compared to older monolinguals (deviance explained of 39.7%). This supports our observation that young monolinguals shift their gaze between the two images in the coarticulatory mismatch condition more than older monolinguals, thus that they might not be as sensitive to coarticulation as the latter group. Furthermore, towards the end of the trials, no differences in the raw data or statistical analysis emerged between younger and older monolinguals, suggesting that the older group did not resolve the phonetic mismatch better than the younger group.
The lack of apparent difference between older bilinguals and older monolinguals in the raw data (Figure 1) suggests that bilingual listeners were as equally sensitive to nasal coarticulation as monolingual children overall: both groups were sensitive to coarticulation (i.e., see dip in blue curve for both groups in the time window of analysis in Figure 1). This is also borne out by the statistical analysis of cross-splice items, where no difference emerged when computing the difference curve in fixations to the target between bilinguals and monolinguals (Figure 2B). Although we are aware that it is difficult to formulate strong conclusions from null results, the overwhelming similarity of the fixation curves between older monolinguals and bilinguals in the cross-splice condition (Figure 1) points in the direction of similar sensitivity to nasal coarticulation in both groups. It is possible that using a different type of measure, age group or coarticulation contrast, one may see differences in processing between monolinguals and bilinguals. However, for the contrast tested in this study, we observed no statistical difference between the groups of monolinguals and bilinguals.
Analysis of filler trials
This analysis compared all groups’ fixations to targets on the filler trials (Figure 3) in order to assess the potential differences across groups when processing regular speech. In Figure 4, we show difference curves between young monolinguals and older bilinguals (A) as well as between older bilinguals and older monolinguals (B). Visualization of the GAMM results (see Table A2 in Appendix for the numeric output of the model; deviance explained of 56.2%) in Figure 4 shows that young monolinguals fixated significantly less to the target for the entire duration of the trials (Figure 4A). However, there was no significant difference between monolingual and bilingual children in filler trials (Figure 4B). This means that, on the one hand, younger monolinguals were less efficient than older monolinguals at fixating to the target, but that (older) monolingual and bilingual participants process English filler words similarly. While it would be informative to have independent measures of children's language skills using standardized tests, this analysis of filler trials suggests that there are no differences in processing abilities for non-cross-spliced items in age-matched monolinguals and bilinguals, in addition to less efficient general processing in younger monolingual children compared to older children.

Figure 3. Overall fixation patterns to the target for filler trials by participant language background and age. The green bars/lines represent the window of analysis.

Figure 4. Difference curves (filler trials) as predicted by the GAMM analysis for young monolinguals against old monolinguals (A) and old bilinguals against old monolinguals (B). (Red) shaded area corresponds to the interval in which the difference is significant. Grey bands around the average curve correspond to 95% confidence intervals.
Discussion
Bilingual spoken word recognition is made more complex because two languages exist in a listener's mind. While research has found that monolingual adults and toddlers are able to use English vowel nasalization during spoken word recognition (Beddor et al., Reference Beddor, McGowan, Boland, Coetzee and Brasher2013; Zamuner et al., Reference Zamuner, Moore and Desmeules-Trudel2016), little research has focused on the development of these abilities over time, and on how bilingual children process phonetic details during spoken word recognition. Thus, we investigated the development of English monolingual children's sensitivity to English coarticulatory vowel nasalization. We also examined how English–French bilingual children process English coarticulatory vowel nasalization.
First, we compared data from a group of younger monolingual children (aged 2 to 3 years) to a group of older monolingual children (aged 4 to 6 years). The statistical analysis showed that younger monolingual children tended to fixate slower and less to targets in filler trials, which can be explained by a less mature word processing system. In the cross-splice condition, listeners were presented with targets that contained a mismatched nasalized vowel. This led all groups to fixate more to the competitor (bone) and less to the target (boat) for a portion of the trial. However, fixations to the target in the cross-splice condition were significantly higher for younger monolinguals than older monolinguals (i.e., closer to 50% fixations than older monolinguals, since both groups had fixations well below 50% in this splicing condition). Thus, our data suggest that the older monolinguals were more sensitive to coarticulation (i.e., young monolinguals were not capable of inhibiting the non-nasalized target as much as old monolinguals), and the activation of the target was more disrupted by the mismatch (since, at that point within the trial, the competitor is hypothesized to be activated). Consequently, one could argue that the older monolingual children were better at resolving the phonetic mismatch because they recovered from a larger disruption. However, looking at the amount of looking to the target in the cross-splice condition, both the younger monolinguals and older bilinguals peak at 50% looking to the target. Thus, fixation data indicate that even older monolingual children cannot resolve the phonetic mismatch (fixations hover around chance in the cross-splice condition, 2000 ms after word onset), similar to the younger toddler participants. This corresponds to the Huang and Snedeker (Reference Huang and Snedeker2011) explanation that children have sustained competitor interference, and it is not until children are older that they have more adult-like processing patterns (see also Rigler et al., Reference Rigler, Farris-Trimble, Greiner, Walker, Tomblin and McMurray2015).
The second goal was to compare the processing of coarticulation cues that vary across languages in bilinguals and compare those results with a group of monolinguals. To do this, we examined English vowel nasalization in a group of English monolinguals and a group of English–French bilinguals. English and French are ideal languages to investigate this question, since French uses vowel nasalization as a phonological distinction (Cohn, Reference Cohn1990), while English contains vowel nasalization as merely a coarticulatory property (Beddor, Reference Beddor2009). We initially predicted that bilingual children would be sensitive to nasal coarticulation in English, but made no strong prediction on the potential differences between monolingual and bilingual children's processing of nasal coarticulation. Past research has argued that monolinguals and bilinguals process language similarly (Byers-Heinlein et al., Reference Byers-Heinlein, Fennell and Werker2013), in which case no difference is expected between our groups. However, others have suggested a bilingual advantage (Bialystok et al., Reference Bialystok, Craik and Luk2012) or disadvantage (Pelham & Abrams, Reference Pelham and Abrams2014), in which cases we would have expected to find differences across the groups concerning sensitivity to nasal coarticulation. We found that monolingual and bilingual children displayed similar sensitivity to vowel nasalization (Legacy et al., Reference Legacy, Zesinger, Friend and Poulin-Dubois2018), as shown by similar patterns of fixations to the target images, across the time course, and in both the same-splice and cross-splice conditions. This is in line with some research showing that monolinguals and bilinguals process certain aspects of language similarly. Given that the children in our study were relatively-balanced bilinguals and exposed to both English and French from a young age, perhaps it is not surprising that the bilinguals were not different from the monolinguals in processing the coarticulation cues. Moreover, while French has a phonological contrast between oral and nasal vowels and English has pervasive nasal coarticulation between nasal consonants and oral vowels, this also means that English monolingual children are being exposed to cues of nasalization in their linguistic environment. Perhaps this is enough for English monolinguals to develop similar sensitivity to nasal coarticulation as English–French bilinguals, even though the way the cues are used in the different languages varies. It is possible that if one were to test a cue that occurs in only one of the two languages, there would be evidence of differences between bilinguals and monolinguals.
In summary, we find that children's sensitivity to coarticulation cues grows between the ages of two to six years; however, older children continue to have sustained competitor interference. We also found that bilingual children’s sensitivity to coarticulatory information in their L1 patterns similar to that of monolinguals’, even when the coarticulatory cue is contrastive in their L2. Our results are in line with previous studies showing equal sensitivity to phonetic details for bilinguals and monolinguals (Liu & Kager, Reference Liu and Kager2018).
There are a number of possible directions for future research. First, a parallel study in French with French monolingual and English–French bilingual children would answer whether there are any bi-directional effects in how bilingual and monolingual children process phonological and coarticulatory vowel nasalization differently from monolinguals. Second, one could examine a group of younger English–French bilinguals to establish whether they are similar to the English monolinguals. We predict that a similar group of younger bilinguals would perform like the younger monolinguals. This is because our older bilinguals were relatively balanced in their exposure to English and French – recall that the bilingual children's average exposure to French across development was 45.1%. This leads to another avenue for future research, which would be to examine processing in bilinguals with different language experience, to see whether less proficient bilinguals would be more likely to draw on the phonemic inventory of their L1 to facilitate speech perception in their L2, and vice versa (Desmeules-Trudel, Reference Desmeules-Trudel2018). For example, to see whether children who were more dominant in French would show more sensitivity to English nasalization, as a cross-over transfer from French.
While the current findings raise a number of intriguing questions, our findings support the idea that phonological representations are rich and include phonetic details (Browman & Goldstein, Reference Browman and Goldstein1986; Pierrehumbert, Reference Pierrehumbert, Gussenhoven and Warner2002), such as coarticulation. This research also highlights that, in some instances, bilinguals can show similar processing to their monolingual peers. The historically monolingual focus in language research belies the fact that a majority of the world's children are exposed to more than one language (Grosjean, Reference Grosjean1982). It is therefore crucial to understand how speech perception, and linguistic skills more generally, develop in this, the majority population.
Acknowledgments
We thank Sarah Colby, Maya Pilin, Myriam Ducos, Émilie Piché and Keara Boyce for assistance with data collection. FDT benefited from doctoral scholarships from the Fonds de Recherche du Québec – Société et Culture and Social Sciences and Humanities Research Council of Canada (SSHRC), CM from a M.A. scholarship from SSHRC, and TZ from a Natural Sciences and Engineering Research Council of Canada (NSERC) research grant for the completion of this research. We also thank audiences at the ICIS 2018 and the CLA 2017 annual meeting.
Appendix
Table A1. Summary of the GAMM (experimental trials) output.

Table A2. Summary of the GAMM (fillers) output.
