INTRODUCTION
This study reports the results of a longitudinal case study examining the acquisition of the English voice system by a three-year-old native speaker of Dutch. Two main research questions are addressed: first, the study examines how and to what extent a young native speaker of Dutch acquires the production of a laryngeal system in a second language, in which he is suddenly immersed. The main question addressed is whether the child makes a distinction between the L1 and L2 phonetics or uses just one phonetic system for the production of both languages. Second, this study examines in what sense early L2 acquisition of a laryngeal system is similar to or different from L1, simultaneous bilingual and late L2 acquisition. On the one hand, early L2 acquisition is crucially different from L1 acquisition, as the child has already acquired the phonetics of the first language. On the other hand, it also differs from late L2 acquisition, as the L1 system has only just been acquired by the child and may exert less influence on the L2 than in late L2 acquisition.
BACKGROUND
Although both Dutch and English have a contrast between voiced and voiceless stops, the contrast is phonetically realized in different ways in word-initial position. Whereas in Dutch the contrast is one between prevoiced and short-lag stops, English contrasts short-lag with long-lag stops. Dutch is therefore sometimes called a ‘voicing language’ (i.e. a language with a contrast between stops which are generally produced with prevoicing and unaspirated stops), while English is termed an ‘aspirating language’ (i.e. a language which contrasts short-lag with long-lag, aspirated stops) (Jansen, Reference Jansen2004: 1). Single word-initial voiced stops are nearly always produced with prevoicing in Dutch (Simon, Reference Simon2009; Van Alphen, Reference Van Alphen2004). English voiced stops are usually not prevoiced, though some variability has been observed. Williams (Reference Williams1977), for instance, found a bimodal distribution in voiced stops produced by ten American English adults, with scattered items in the negative Voice Onset Time (VOT) range and a peak in the short-lag region. Normally, however, voiced stops in English are realized in the short-lag VOT region, with values roughly between 0–25 ms (Docherty, Reference Docherty1992; Flege, Reference Flege1982; Lisker & Abramson, Reference Lisker and Abramson1964; Simon, Reference Simon2009).
With respect to the reported ages at which monolingual children acquire the laryngeal contrast in their native language, there is variation in the literature. This variation is presumably due to individual differences between children as well as to the use of different criteria for determining ‘acquisition’ (Macken & Barton, Reference Macken and Barton1979: 42). Macken & Barton (Reference Macken and Barton1979), for instance, considered the English short lag–long lag contrast to be fully acquired by a child when the child produced stops with mean VOT values that fall within the adult VOT ranges. Van der Feest (Reference Van der Feest2007: 128), on the other hand, divided children into two groups on the basis of whether or not they had produced ‘at least one clear instance of a voiced initial target word and one clear instance of a voiceless initial target word’. Despite these differences in methodology, a comparison of studies examining the acquisition of the laryngeal contrast in voicing and aspirating languages reveals that the laryngeal contrast between obstruents in languages with a short lag–long lag contrast is generally produced at an earlier age than in languages with a voicing lead–short lag contrast. English-speaking children have been reported to acquire the contrast between voiced and voiceless stops at around the age of 2 ; 0 (Macken & Barton, Reference Macken and Barton1979; Snow, Reference Snow1997). Macken & Barton (Reference Macken and Barton1979) report that, after an initial stage in which there is no contrast at all between children's voiced and voiceless stops, children go through a stage in which they do make a contrast between the two categories, but both are realized in the short-lag region and consequently fall within what adults perceive as voiced stops. Children learning a voicing language, on the other hand, acquire the contrast between prevoiced and short-lag VOT at around the age of 3 ; 0 or even later. Whereas the realization of short-lag voiceless stops is acquired much earlier, the production of prevoicing may not be fully acquired by the age of 3 ; 0 (Kager, Van der Feest, Fikkert, Kerkhoff & Zamuner, Reference Kager, Van der Feest, Fikkert, Kerkhoff, Zamuner, van der Torre and van de Weijer2007; Kuijpers, Reference Kuijpers1993; Macken & Barton, Reference Macken and Barton1980; Van der Feest, Reference Van der Feest2007).
Table 1 summarizes the VOT ranges for Dutch and English children and adults reported in the literature. Since VOT depends on, for instance, speech rate (Kessinger & Blumstein, Reference Kessinger and Blumstein1998) and place of articulation (Cho & Ladefoged, Reference Cho and Ladefoged1991), these values are rough indications.
TABLE 1. VOT values for Dutch and English stops in adult and child speech
![](https://static.cambridge.org/binary/version/id/urn:cambridge.org:id:binary-alt:20160627083951-05197-mediumThumb-S0305000909009386_tab1.jpg?pub-status=live)
a The number of informants in the samples (N) is provided between parentheses.
b All studies on VOT in child speech report individual differences between children. Since we cannot discuss all individual results, values for age groups were calculated and presented.
Studies on simultaneous bilingual acquisition of a voicing and an aspirating language confirm the finding that the voice contrast in aspirating languages is acquired before the one in voicing languages. Deuchar & Clark (Reference Deuchar and Clark1996) conducted a longitudinal study with one child learning both Spanish and English from birth. At age 2 ; 3 the child had acquired the voice contrast in English, but had not attained adult-like values in Spanish, as both voiced and voiceless stops were produced within the short-lag region (though the beginning of a contrast could be detected within this region). Kehoe, Lleó & Rakow (Reference Kehoe, Lleó and Rakow2004) investigated the acquisition of stop consonant voicing in four Spanish–German bilingual children and found that none of the children produced voicing lead in Spanish voiced stops at age 2 ; 6. Similarly, Johnson & Wilson (Reference Johnson and Wilson2002) found that Japanese voiced stops were produced without prevoicing by two bilingual English–Japanese children aged 2 ; 2 and 4 ; 8. They also found that the two children produced voiceless stops with much longer VOTs than their parents, in Japanese (their mother's L1) as well as in English (their father's L1). Overlong VOTs are also reported by Watson (Reference Watson and Bialystok1991) in the French and English stops of French–English bilingual children.
The question whether children employ one system for both languages or two separate systems has been a central issue in the literature on bilingualism. Deuchar & Quay (Reference Deuchar and Quay2000: 46, 111–13) argue that the question ‘one system or two?’ cannot easily be answered by looking at inventories of segmental phonology, but that the when and how of language differentiation can be examined by focusing on the acquisition of a phonological contrast, like the voice contrast, which is realized differently in the two languages under investigation. Since the child in the present study has been raised monolingually until the age of 3 ; 2 (see Method), this case study differs from simultaneous bilingual acquisition, as the child at age 3 ; 2 has already acquired an L1 phonetic system. Studies on (late) L2 acquisition have shown that learners transfer VOT values from the L1 into the L2, both in perception (e.g. Pater, Reference Pater2003) and in production (e.g. Flege, Frieda, Walley & Randazza, Reference Flege, Frieda, Walley and Randazza1998; Suomi, Reference Suomi1980). Simon (Reference Simon2009) analyzed the production of the English voice contrast by sixteen adult native speakers of (Belgian) Dutch learning English and showed that the informants produced long-lag voiceless stops in English, but realized voiced stops with voicing lead. This difference was ascribed to the acoustic salience of long-lag stops compared to short-lag stops: L2 learners notice that English voiceless stops are different from Dutch voiceless stops, which triggers the acquisition process. Prevoiced stops, on the other hand, are acoustically non-salient and function as a major cue for the voice character of the stop in the L1 (see Van Alphen, Reference Van Alphen2004). Hence, even though prevoicing is acquired late, once it is acquired it is easily transferred into a foreign language.
HYPOTHESES
On the basis of the literature overview on L1, bilingual and L2 acquisition presented above, two hypotheses can be formulated regarding the development of the L2 English voice contrast by a Dutch-speaking child. Since L1 studies have shown that prevoicing is only acquired around the age of 3 ; 0 or even later, the child's system will depend on whether he has acquired Dutch prevoiced stops at the moment he comes into contact with English. The two hypotheses can be formulated as follows: (1) If the child has acquired the production of prevoicing in Dutch at the start of data collection, it is predicted that he will transfer prevoicing into English. This hypothesis is based on the finding in L2 acquisition studies that once prevoicing is acquired, it is very difficult to lose it. (2) if the child has not yet acquired prevoicing in Dutch at the outset of this study, he will produce both Dutch and English voiced stops in the short-lag region. Since long-lag, aspirated stops are acoustically salient and acquired early in L1 and bilingual acquisition, it is assumed that the child will start producing English aspirated stops early in the acquisition process. As a result, the child's English system is predicted to have a contrast between prevoiced and long-lag stops (if hypothesis 1 is confirmed) or between short-lag and long-lag stops, i.e. the target L2 system (if hypothesis 2 is confirmed).
METHOD
Participant
The informant for this study is a male native speaker of Dutch, who was 3 ; 6 when the first recording took place. The child, who in this paper will be referred to as George, moved with his Dutch-speaking parents from Groningen, a town in the north of the Netherlands, to the US (Massachusetts) when he was 3 ; 2. He was exposed to English as a second language only three months later, when he started attending an American preschool, i.e. seven weeks before the first recording took place. His parents reported no hearing or speech impairments.
Language context
Before the family moved to the US, George had not had any extensive contact with English or any other foreign language. After George and his parents started living in the US, the language input George received was situationally determined: Dutch was the language used at home in child–parent as well as parent–parent interactions, and English was used in the child's preschool and in the playground, so that the child was exposed to English most of the time on weekdays.
Procedure
George was recorded during eleven sessions over a period of seven months. The recordings took place every two or three weeks, with a longer break of seven weeks after session 4. The experiment, which consisted of a repetition task and a picture-naming task, was conducted both in Dutch and in English. The child was seated in front of a computer in a quiet room in his home, with a microphone positioned on a stand between the computer and the child. His speech was recorded with Adobe Audition 2.0. The Dutch and English recordings were carried out by native speakers of the two languages, so as to put the child in ‘monolingual mode’, rather than in ‘bilingual mode’ (see Johnson & Wilson, Reference Johnson and Wilson2002: 274). The Dutch and English data collection sessions were conducted on the same day or within a couple of days, with the Dutch session preceding the English one. As the experimenters would play with the child for at least fifteen minutes prior to the commencement of the recording session, so as to activate the language in which data were going to be collected, it is assumed that the Dutch task which preceded the English one did not have an effect on the child's productions in the latter task. The reason why a repetition and a picture-naming task were conducted was that George knew relatively few words in English at the outset of the study and the repetition task allowed us to collect data without the child having to come up with English words himself.
Stimuli
The stimuli consisted of monosyllabic words with a single onset alveolar (/t/ or /d/) or bibabial (/p/ or /b/) stop. During the repetition task the child was shown pictures on a computer screen and simultaneously heard the name of the object spoken by a native speaker of the language over the computer. The words in the repetition task were monosyllabic minimal (or near-minimal) pairs, such as poot ‘paw’–boot ‘boat’ for Dutch and pear–bear for English and were balanced for place of articulation of the onset stop (see Appendix A). Five Dutch and an equal number of English fillers (e.g. snow and cloud) were inserted.
In the picture-naming task, the child was shown pictures on a computer screen and asked to name the object or an object related to the picture. The picture naming task elicited 24 stop-initial words. Most words formed near-minimal pairs, in which the vowel following the stops was the same or had the same or a similar height (e.g. tuin ‘yard’–duim ‘thumb’ for Dutch and tongue–duck for English) (see Appendix B).
Data analysis
The tasks were designed to elicit 12 target tokens in the repetition task and 24 in the picture naming task in each of the eleven sessions. (The number of tokens in each task was reversed in the first two sessions, as the child's active English vocabulary was relatively small at the outset of the study.) The child produced an average of 36 tokens per session for Dutch and 32 for English. In total, 394 Dutch and 352 English tokens could be used in the analysis. All VOT measurements were carried out in Praat (4.5.12; Boersma & Weenink, Reference Boersma and Weenink2008).
RESULTS
The following two sections discuss the production of voiceless and voiced stops. Since the data do not show a normal distribution, non-parametric statistical assessments will be used. Results from the repetition and the picture-naming task are pooled, as the difference between the VOTs in the two tasks proved to be non-significant (Mann–Whitney test: for Dutch U=4167, Z=−0·139, p=0·890; for English U=3325, Z=−0·631, p=0·528).
Voiceless stops
The box-and-whisker plot in Figure 1 presents the results for Dutch and English /t/ and /p/ in all eleven sessions together. Outliers were identified as tokens with a VOT more than 1·5 times outside the Interquartile Range (IQR). Outlier values (>1·5 IQR) are presumed to be the result of abnormal speech rate or loudness and were excluded from the analysis.
![](https://static.cambridge.org/binary/version/id/urn:cambridge.org:id:binary-alt:20160627083955-00775-mediumThumb-S0305000909009386_fig1g.jpg?pub-status=live)
Fig. 1. VOT for /p/ and /t/ in all eleven sessions together.
Three main observations can be made regarding George's realization of voiceless stops in Dutch and English. A first observation is that the VOTs are significantly higher for English than for Dutch, for both places of articulation (Mann–Whitney two-tailed test for /t/: U=1600, Z=−6·398, p<0·01; for /p/: U=2795, Z=−5·054, p<0·01). Though the box plot shows that there is considerable overlap in the realizations of the tokens and the child does not clearly separate the Dutch VOT range from the English one, he does make a subtle contrast between Dutch and English voiceless stops. Note that the VOT range within which tokens are produced is extremely large, both in Dutch (ranging from 12·1–133·4 ms) and in English (6·3–163·1 ms). However, since variability is a characteristic of child language in general (see Macken & Barton, Reference Macken and Barton1979: Table 1), this variability may not be the result of exposure to two languages.
Second, the analysis revealed that George's VOT values for Dutch stops are much higher than the adult L1 Dutch norm and that his voiceless stops are thus not typically Dutch. Whereas the VOT range of Dutch voiceless stops has been reported to be 0–25 ms in child as well as in adult speech, the child's median VOT over all sessions is 70·1 ms for /t/ and 64·3 for /p/, which falls within the long-lag VOT region, typical of English voiceless stops (see Table 1). The box plot shows that George produces hardly any Dutch tokens within the short-lag region.
Finally, the child produced significantly longer VOTs in alveolar /t/ than in bilabial /p/ in English, but not in Dutch (Mann–Whitney two-tailed test for English: U=2318, Z=−3·731, p<0·01; for Dutch: U=3760, Z=−9·23, p=0·356). The finding in English is in line with earlier studies which have shown that the further back in the oral cavity the consonant is produced, the longer the VOT will be (Cho & Ladefoged, Reference Cho and Ladefoged1991). The finding that the child does not follow this place of articulation effect in Dutch may indicate that the child has not completely reached the adult VOT targets. Figure 2 presents the median VOT values for voiceless (alveolar and bilabial) stops in all eleven individual Dutch and English sessions.
![](https://static.cambridge.org/binary/version/id/urn:cambridge.org:id:binary-alt:20160627083949-87086-mediumThumb-S0305000909009386_fig2g.jpg?pub-status=live)
Fig. 2. VOT in individual sessions.
Kruskal–Wallis tests were carried out for Dutch and English and revealed that there are significant differences in VOT distributions across the eleven sessions (Dutch: χ2(10, N=181)=32·87, p<0·01; English: χ2(10, N=157)=44·96, p<0·01). Post-hoc Mann–Whitney tests comparing the results of session 1 with those of session 11 show that there is a significant rise for both Dutch (U=28, Z=−3·202, p=0·001) and English (U=29, Z=−2688, p=0·006). In the majority of cases, individual successive sessions do not show significant rises or falls, though there is a significant rise between session 1 and 2 for Dutch (U=23, Z=−3552, p<0·01) and English (U=33, Z=−1·985, p=0·049), which may indicate that the child acquired the long-lag stops of English very early on in the acquisition process and simultaneously adjusted his Dutch stops in the direction of the English ones.
Voiced stops
Figures 3 and 4 present histograms displaying the frequency with which voiced stop-initial words were realized in binary VOT ranges in all eleven Dutch and English sessions respectively. The VOTs of Dutch and English voiced stops did not differ significantly according to place of articulation (two-tailed Mann–Whitney for Dutch: U=4208, Z=−1·019, p=0·308; for English: U=2703, Z=−1·450, p=0·147) and hence the results for bilabial /b/ and alveolar /d/ are collapsed.
![](https://static.cambridge.org/binary/version/id/urn:cambridge.org:id:binary-alt:20160627083956-08461-mediumThumb-S0305000909009386_fig3g.jpg?pub-status=live)
Fig. 3. Frequency of Dutch voiced stops in binary VOT ranges.
![](https://static.cambridge.org/binary/version/id/urn:cambridge.org:id:binary-alt:20160627083947-42254-mediumThumb-S0305000909009386_fig4g.jpg?pub-status=live)
Fig. 4. Frequency of English voiced stops in binary VOT ranges.
Figures 3 and 4 show that for both Dutch and English there is a bimodal distribution: there are scattered items with negative VOTs and a high frequency of tokens in the short-lag VOT range, with a peak between 20–30 ms for both languages. While there is a great deal of variability in the VOTs of George's Dutch and English voiced stops, there are clearly two main VOT ranges and hardly any tokens in between these two ranges, i.e. between −60 and 10. This means that the child's productions of voiced stops are not random and in fact similar to the distribution reported by Williams (Reference Williams1977) for adult L1 English speakers.
The graph in Figure 5 presents the percentage of prevoiced tokens in the eleven individual sessions in Dutch and English. A linear regression analysis revealed that in the Dutch sessions, there is a downward trend in the percentage of prevoiced tokens over the eleven sessions (regression coefficient β=−8·240, p<0·01). No such trend can be observed for the English data (regression coefficient β=−2·481, p=0·327). The graph also shows that the difference between Dutch and English, which is as much as 78·9% in the first session, is considerably smaller in the last four sessions (with differences in sessions 8, 9, 10 and 11 of 11·8%, 18·3%, 11·1% and 19·4% respectively).
![](https://static.cambridge.org/binary/version/id/urn:cambridge.org:id:binary-alt:20160627083847-92340-mediumThumb-S0305000909009386_fig5g.jpg?pub-status=live)
Fig. 5. Percentage of voiced stops produced with prevoicing in the individual sessions.
DISCUSSION
The case study examining the acquisition of the English voice contrast in stops by a three-year-old native speaker of Dutch aimed to answer two questions. The first question is whether George develops two separate phonetic systems for his L1, Dutch, and the L2, English, or whether he has just one phonetic component for both languages. Although the child's phonetic realizations of both voiced and voiceless stops in Dutch and English are variable, some clear trends could be observed. It was found that George's voiceless stops were realized in the long-lag VOT region in English as well as in Dutch and adults would thus categorize all the child's voiceless stops as aspirated. Since voiceless stops in Dutch are never aspirated, George's Dutch voiceless stops sound distinctively non-native. However, the analysis also revealed that over all sessions together George produced significantly longer VOTs in English than in Dutch. This indicates that he keeps a subtle contrast between Dutch and English voiceless stops.
The observation for the voiced stops is similar to that for the voiceless ones: here, too, George gradually moves the Dutch stops in the direction of the English ones. Whereas nearly all tokens are produced with prevoicing in the first sessions, the production of prevoicing in Dutch decreases as the English acquisition process goes on. Although George also produces a fair number of tokens with prevoicing in English, the last Dutch and English recordings contain hardly any prevoiced tokens.
Thus, when George was 3 ; 6 and had only just begun learning English, he had acquired the Dutch phonetic realizations, with prevoiced /b/ and /d/ and unaspirated /p/ and /t/. However, when the child starts acquiring the English phonetics of voiced and voiceless stops, he moves the Dutch phonetics in the direction of the English target realizations. Even in the early sessions, the VOTs are high compared to the L1 Dutch norm, in which unaspirated stops have a VOT of around 20 ms. It is possible that these relatively high values are already an effect of influence from English in the weeks before the first recording took place (cf. Kuhl, Tsao & Liu (Reference Kuhl, Tsao and Liu2003), who showed that even short-term exposure of Mandarin Chinese had a positive effect on American infants' foreign language phonetic perception).
In sum, the analysis revealed that George contrasts short-lag with long-lag stops in both Dutch and English and thus adapts the Dutch phonetic system in the direction of the English one. However, even in the last sessions (9–11) the VOT values for English voiceless stops are higher than those for Dutch and in nine of the eleven sessions the percentage of prevoiced tokens is higher in Dutch than in English, suggesting that the child differentiates between the two languages, though not in a target-like manner. This is reminiscent of Stage II in Macken & Barton's (Reference Macken and Barton1979) analysis of L1 English child speech.
The second issue is the extent to which early L2 acquisition of a voice system is similar to or different from L1 acquisition, simultaneous bilingual and late L2 acquisition. The hypothesis formulated at the outset of this paper was that, if the child had acquired the production of prevoicing in Dutch at the start of the study, he would transfer it into English, as it has been reported that it is hard to ‘unlearn’ the production of prevoicing in an L2, if it is an acoustically salient cue to the voice character in the L1. Alternatively – if the child had not acquired prevoicing at the start of the study – it was hypothesized that the child would produce both Dutch and English stops in the short-lag region.
As hypothesized, the child acquired the phonetic realization of long-lag stops early on in the acquisition process, as even in the early sessions the VOTs for the English voiceless stops are well within (and sometimes exceeding) the target VOT range.
The analysis of voiced stops revealed that the child had acquired prevoicing in Dutch at age 3 ; 6, since he produced the overall majority of voiced stops with voicing lead in the first session. However, the prediction that the child would transfer prevoicing into English was only partly borne out. While he does start producing prevoicing in English to some extent, the longitudinal analysis reveals that there was a downward trend in the production of prevoicing in Dutch, and that in the last four sessions hardly any Dutch or English tokens are realized with prevoicing. This difference between the adult L2 learners in Simon (Reference Simon2009), who do not suppress prevoicing in Dutch and transfer it to English, and the child, who in the last recording sessions produces hardly any prevoiced stops in Dutch or English, could be ascribed to the fact that between the ages of 3 ; 0 and 4 ; 0, the child's L1 system can still easily change as the result of exposure to an L2. Whereas for adult L1 Dutch speakers the production of prevoicing is a long-time habit which is hard to ‘unlearn’ when speaking an L2 with short-lag instead of prevoiced stops, the child had probably only just acquired the production of prevoicing in Dutch. When he was then, at age 3 ; 6, exposed to a language with short-lag stops, he easily gets rid of prevoicing again and produces short-lag stops instead of prevoiced ones. The case study illustrates how flexible a three-year-old child's L1 phonetic system still is and how easily it can be influenced by a foreign language, in which the child is immersed.
APPENDIX A. REPETITION TASK
![](https://static.cambridge.org/binary/version/id/urn:cambridge.org:id:binary:20151027044615834-0700:S0305000909009386_tab2.gif?pub-status=live)
APPENDIX B. PICTURE-NAMING TASK
![](https://static.cambridge.org/binary/version/id/urn:cambridge.org:id:binary:20151027044615834-0700:S0305000909009386_tab3.gif?pub-status=live)