1. Introduction
Researchers in second-language (L2) speech learning have known for a long time that L2 accent is affected by the learners’ native language (L1) (Flege, Reference Flege, Burmeister, Piske and Rohde2002; Flege, Schirru & MacKay, Reference Flege, Schirru and MacKay2003; Mayr, Reference Mayr2005; Mayr & Escudero, Reference Mayr and Escudero2010; Simon, Reference Simon2009). The opposite scenario, i.e. the effect of the L2 learning experience on L1 pronunciation, has, however, only recently received systematic attention (Dmitrieva, Jongman & Sereno, 2010; de Leeuw, Schmid & Mennen, Reference de Leeuw, Schmid and Mennen2010; de Leeuw, Mennen & Scobbie, in press; Mennen, Reference Mennen2004).
The non-pathological decrease in proficiency in a previously acquired language is referred to as L1 attrition (Köpke & Schmid, Reference Köpke, Schmid, Schmid, Köpke, Keijzer and Weilemar2004; Schmid, Reference Schmid2010). L1 attrition may affect entire bilingual speech communities, a phenomenon sometimes referred to as societal loss or attrition (e.g. Bullock & Gerfen, Reference Bullock and Gerfen2004a, Reference Bullock and Gerfenb, Reference Bullock and Gerfen2005). The focus of the present study is, however, on attrition at the level of individual speakers.
Some research in this area has focused on the complete loss of an individual's L1, for example in the context of international adoptees (Ventureyra, Pallier & Yoo, Reference Ventureyra, Pallier and Yoo2004). The majority of studies, including the present one, are, however, concerned with late consecutive bilinguals, resident in an L2-speaking environment, who encounter changes in their native-language accent, often despite continued use of the L1. Some of these individuals may be perceived to have a foreign accent in their native language (de Leeuw et al., Reference de Leeuw, Schmid and Mennen2010; Hopp & Schmid, in press). However, their intelligibility will not normally be adversely affected.
A recent example of the latter type is Dmitrieva et al.'s (2010) study of word-final obstruent voicing in Russian, a language characterized by neutralization of the voicing distinction in final position. In this study, native Russian speakers living in the United States who have knowledge of English, a language that maintains a voicing contrast, were found to devoice word-final obstruents in Russian to a lesser extent than monolingual native speakers of Russian. Specifically, while Russian monolinguals produced differences in closure/frication duration and release duration, Russian speakers with English-language experience also made a contrast in the duration of the preceding vowel and the duration of voicing into closure/frication, suggesting an effect of L2 learning on L1 pronunciation.
Similarly, in an earlier study, Flege (Reference Flege1987) investigated voice onset time (VOT) in word-initial plosives produced by several types of English–French bilinguals alongside monolingual controls. The bilinguals included a group of native speakers of American English who were long-term residents in Paris and married to French speakers, and a group of native French speakers who were long-term residents in the United States. The study found that these two groups of bilinguals produced plosives with compromise VOT values in both languages.Footnote 1 These results were interpreted as evidence for merged L1–L2 representations, which, according to the Speech Learning Model (SLM) (see Flege, Reference Flege and Strange1995, Reference Flege, Burmeister, Piske and Rohde2002) exist on a common “phonological space” (Flege Reference Flege and Strange1995, p. 239). Similar effects of cross-linguistic assimilation are reported in Major's (1992) study of stop consonant voicing in native speakers of American English living in Brazil, and in Peng's (Reference Peng1993) study of fricative production in Taiwanese Amoy–Mandarin bilinguals.
In contrast, Flege and Eefting (Reference Flege and Eefting1987) found that advanced Dutch learners of English produced Dutch /t/ with VOT values that were shorter, and thus more dissimilar from English ones, than those produced by Dutch speakers with less English-language experience. Likewise, de Leeuw et al. (in press) report that two out of ten native German speakers who were long-term residents in Canada “overshot” the monolingual German norm with respect to the tonal alignment of a pre-nuclear rise. Specifically, the tonal alignment at the end of the rise was later in their German productions, and thus more dissimilar from the L2 norm than that of native German speakers resident in Germany. Similar patterns are also reported for these speakers’ productions of the lateral phoneme /l/ (de Leeuw, Reference de Leeuw2009). Together, the results of these studies indicate that cross-linguistic interaction may not necessarily lead to assimilation of L1 and L2 categories, but may instead result in dissimilation, or polarization of categories. According to the SLM (Flege, Reference Flege and Strange1995, Reference Flege, Burmeister, Piske and Rohde2002), this happens so that L1 and L2 categories are kept maximally distinct.
Polarization has also been observed in the context of bilingual vowel systems. Guion (Reference Guion2003), for example, investigated the production of the three Quichua vowels /ɪ a ʊ/ and the five Spanish vowels /i e a o u/ by four types of L1 Quichua L2 Spanish bilinguals. Simultaneous and early bilinguals as well as some mid bilinguals managed to keep L1 and L2 categories distinct, while late bilinguals produced vowels in both languages with L1-like properties. Importantly, bilinguals who had acquired L2 vowels produced their L1 Quichua categories as a whole higher in the vowel space than bilinguals who had not. Guion argues that this shift in the L1 vowel space was caused by an attempt to achieve sufficient perceptual distinctiveness between L1 and L2 categories, consistent with Adaptive Dispersion Theory (Lindblom, Reference Lindblom, Ohala and Jaeger1986, Reference Lindblom, Hurford, Studdert-Kennedy and Knight1998).
Chang (Reference Chang2010, Reference Chang, Lee and Zee2011) reports a similar shift in the L1 vowel system of late English–Korean bilinguals, however, in the direction of the L2 system. In other words, in his study, the shift could not have been caused by a need for greater distinctiveness of L1 and L2 categories, as in Guion's, but rather by assimilatory processes, akin to those found in Flege (Reference Flege1987) or Major (Reference Major1992). Chang argues that the disparity between his study and Guion's may be due to differences in the age of onset of L2 learning: Guion's subjects were early bilinguals, Chang's late bilinguals.
Chang's study is also of interest for another reason: while much of the literature on L1 attrition focuses on highly experienced late L2 learners, his study is the first to demonstrate changes to L1 pronunciation in initial stages of L2 learning. More specifically, 19 native English beginning learners of Korean from the United States participated in a six-week intensive language course in Korea. At the end of each week of instruction, their production of monosyllabic words in Korean and English was assessed acoustically. The results for English stop consonants and vowels revealed systematic shifts in the direction of Korean over the course of the study. These changes, while measurable, were relatively subtle and, unlike studies of advanced L2 learners (de Leeuw et al., Reference de Leeuw, Schmid and Mennen2010; Hopp & Schmid, in press), did not result in a noticeable foreign accent in the participants’ L1.
Further evidence for relatively quick changes in L1 accent comes from Sancier and Fowler's (1997) longitudinal study of a native speaker of Brazilian Portuguese with extensive English-language experience who regularly travelled between Brazil and the United States. The study found that her VOT values in both languages were consistently shorter after several months in Brazil than after months spent in the United States, although the magnitude of the difference was relatively minor. Interestingly, native Brazilian Portuguese listeners were responsive to these subtle changes in accent, and managed to differentiate her Portuguese productions in terms of recent linguistic experience. On the basis of such findings, Chang (Reference Chang2010) concludes that quick subtle changes in L1 accent, referred to as L1 phonetic drift in his terminology, may happen “as a matter of course” (p. 190).Footnote 2
However, it is not clear whether interaction effects of this kind are indeed inevitable. Mennen (Reference Mennen2004), for instance, found that while four out of five advanced Dutch learners of Greek failed to produce tonal alignment in pre-nuclear rises accurately in either language, one speaker's productions were entirely native-like in the L1 and L2. Likewise, de Leeuw et al. (in press) report that one of their L1 German – L2 English participants produced tonal alignment at the start and end of the pre-nuclear rise in conformity with the norms of monolingual speakers of either language. Finally, Major (Reference Major1992) reports that one of his L1 English – L2 Portuguese participants managed to produce stop consonants in both languages with VOT values within the monolingual range. These results suggest (i) that it may be possible to acquire a native-like accent in the L2, without thereby exhibiting attrition in the L1, and (ii) that attrition in L1 accent is characterized by a fair amount of interpersonal variation.
Variability across individuals may, however, pose methodological difficulties. Thus, most studies on attrition in L1 accent have opted for a cross-sectional design, and involve comparisons of attriters with monolingual control speakers (e.g. Dmitrieva et al., Reference Dmitrieva, Jongman and Sereno2010; de Leeuw et al., in press; Flege, Reference Flege1987; Major, Reference Major1992; Mennen, Reference Mennen2004). Comparisons of this kind are not always straightforward, however, since differences in the size and shape of individuals’ vocal tracts need to be taken into account as well as the specific regional and social features of their L1 accent.
In contrast, longitudinal studies are not affected by interpersonal variation of this kind, as they involve comparisons of individuals with themselves. However, it is rarely feasible to observe attriters over a long period of time. Thus, Chang's (2010, 2011) study was limited to six weeks and Sancier and Fowler's (1997) to a few months. Moreover, even if individuals’ speech were assessed over a period of several decades, observable differences may not be a result of L1 attrition, but instead due to factors such as changes in the linguistic norms of the L1 speech community, or physiological and maturational changes in these individuals. As Harrington (Reference Harrington2006) and Harrington, Palethorpe and Watson (Reference Harrington, Palethorpe and Watson2000a, Reference Harrington, Palethorpe and Watsonb) have shown, even the Queen's accent is not immune to change over time. The present study aims to combine the advantages of a cross-sectional and a longitudinal design by exploring the unique case of bilingual monozygotic twin sisters with differing linguistic experience during the last 30 years.
Specifically, this study investigates whether L1 attrition has occurred in the speech of a monozygotic twin who emigrated from the L1 environment 30 years ago. This was tested by comparing her speech productions to those of her identical twin sister, who has been living in the L1-speaking environment all her life.
A twin study, such as the present one, is particularly interesting in the context of L1 attrition, as it allows for control of variables that cannot easily be controlled for in other types of design, including developmental L1 acquisition, L1 regional accent, as well as physiological and neurological characteristics. With respect to the latter, it has been shown that monozygotic twins share highly similar characteristics in terms of speech and language. Thus, cortical structure in the brain areas responsible for speech and language input and output processing, such as Broca's and Wernicke's areas, as well as frontal brain regions, has been found to be genetically influenced, with brain structure presenting as increasingly similar in individuals with increasing genetic similarity (Thompson, Cannon, Narr, van Erp, Poutanen, Huttunen, Lönnqvist, Standertskjöld-Nordenstam, Kaprio, Khaledy, Dail, Zoumalan & Toga, Reference Thompson, Cannon, Narr, van Erp, Poutanen, Huttunen, Lönnqvist, Standertskjöld-Nordenstam, Kaprio, Khaledy, Dail, Zoumalan and Toga2001).
In terms of the physiological characteristics of speech, it has been shown that monozygotic twins have vocal tracts of equal length. This is because vocal tract length is correlated with height and weight (Fitch & Giedd, Reference Fitch and Giedd1999). As a result, it is not surprising that the acoustic characteristics of speech of monozygotic twins are also highly similar (Loakes, Reference Loakes2006; Nolan & Oh, Reference Nolan and Oh1996; Przybyla, Horii & Crawford, Reference Przybyla, Horii and Crawford1992; Whiteside & Rixon, Reference Whiteside and Rixon2003). Whiteside and Rixon (Reference Whiteside and Rixon2003), for instance, found that the adult monozygotic twin brothers in their study were much more similar in terms of second-formant (F2) onset and target patterns than their age- and sex-matched sibling who participated in the study two years later.
On the basis of these findings, we hypothesize that the monozygotic twins in the present study would present with near-identical phonetic properties in their L1, if their linguistic experience had not diverged. What is more, it will be argued that any salient observable differences in the twins’ current L1 accent are a result of their differing linguistic environments. Note that both twins use their L1 on a regular basis (see below for details), and thus in the present study a lack of L1 use is an inadequate explanation for attrition.
However, in view of recent studies suggesting that attrition may not apply “across the board” (de Leeuw et al., in press; Major, Reference Major1992; Mennen, Reference Mennen2004), it is not clear whether the twins’ L1 accents will actually exhibit any differences at all. It is perfectly conceivable that the effect of L2 use on L1 accent turns out to be inconsequential. Moreover, it is possible that certain areas of pronunciation are more vulnerable to attrition than others. As a result, the twin sisters were tested in a number of areas that differ systematically across Dutch, their L1, and English, their L2, with VOT in word-initial plosives (Experiment 1) and vowels (Experiment 2) the focus of this study. Suprasegmental aspects of their speech have also been examined, but are reported elsewhere (Mennen, Mayr & Price, 2011).
2. Experiment 1
2.1 Participants
A set of monozygotic female adult twins, MZ and TZ, both consecutive Dutch–English bilinguals, participated in the study. They were 62 years of age at the time of data collection, with no history of speech, language or hearing difficulties. Details of their heights, weights, and estimated vocal tract lengths are given in Table 1. Vocal tract length was calculated on the basis of fourth-formant (F4) measurements taken at the mid-point of each vowel token (see Experiment 2), using the formula
![\begin{equation}
{\rm L} = \frac{{(2n - 1)c}}{{4\,Fn}}
\end{equation}](https://static.cambridge.org/binary/version/id/urn:cambridge.org:id:binary:20160203060026301-0744:S136672891100071X_eqnU1.gif?pub-status=live)
where n = formant number, L = length of tube (cm), c = speed of sound in air (35,000 cm/sec) and F = resonance frequency (Carey, Reference Carey2002). Inspection of Table 1 shows that the twins’ vocal tracts are identical in length.
Table 1. Height, weight and vocal tract details for MZ and TZ.
![](https://static.cambridge.org/binary/version/id/urn:cambridge.org:id:binary:20160203060026301-0744:S136672891100071X_tab1.gif?pub-status=live)
In terms of linguistic experience, MZ and TZ developed their L1 in the same environment, growing up in the province of Noord-Holland, where Standard Dutch is spoken. Both encountered L2 English for the first time at a MULO-B high school, where they studied the language between the ages of 13 and 18 years. Subsequently, they both worked for an international telecommunications company, using English alongside Dutch in their work environment on a daily basis. At the age of 22, MZ met her future English-speaking husband, which increased her use of English outside work. Four years later, aged 26, she gave up her post in telecommunications, whilst still using English on a daily basis with her future husband. In contrast, TZ continued to use English mostly for work purposes.
At the age of 32, the twins’ language environments changed when MZ left the Netherlands for the United Kingdom, while TZ remained in the Netherlands. As a consequence, over the last 30 years, MZ has been predominantly using her L2 in social, community or work contexts. She prefers to read books and newspapers in English, and only watches English-language television. On the other hand, she spends an average of five weeks a year in the Netherlands visiting family, where she speaks only Dutch. When in the United Kingdom, an average of four hours a week is spent speaking Dutch on the telephone. In addition, MZ and her daughter, an English-dominant English–Dutch bilingual, use both languages regularly, with frequent code-switches.
In contrast, TZ has lived in the same region of the Netherlands all her life, and consequently predominantly uses Dutch in everyday interactions. However, English being widely used in Dutch society, she is regularly exposed to the language, for example via television and films. Moreover, as previously mentioned, she has used English for work purposes and in social situations, for example during visits to the United Kingdom.
2.2 Materials and procedure
Dutch and English both distinguish voiced and voiceless plosives. However, the distinction is implemented differently in the two languages in terms of VOT, i.e. the timing relation between release of the plosive and the onset of voicing. Thus, voiceless plosives in English are aspirated in word-initial position, characterized by long-lag VOT values (Docherty, Reference Docherty1992), while they are unaspirated in Dutch, with short-lag VOT values (Lisker & Abramson, Reference Lisker and Abramson1964; Simon, Reference Simon2009). Voiced plosives, in turn, are characterized by prevoicing in Dutch, i.e. vocal fold vibration occurs before the release of the plosive (van Alphen, Reference van Alphen2004; van Dommelen, Reference van Dommelen, van den Broecke, van Heuven and Zonneveld1983), while they have short-lag VOT values in English (Docherty, Reference Docherty1992). For this reason, Dutch has been referred to as a voicing language and English as an aspirating language (Jansen, Reference Jansen2004).
Table 2 depicts the materials used in the experiment. The stimuli consist of monosyllabic real words of Dutch and English, distinguished in terms of place of articulation and voicing. The words were matched cross-linguistically as much as possible, yielding a number of interlingual homographs. Note that there is no voiced velar plosive /ɡ/ in Dutch, except in a few English loanwords. Hence the Dutch word list encompasses 4 (words) × 5 (plosives) = 20 target items, the English word list 4 (words) × 6 (plosives) = 24 target items. Each of these was produced four times in a carrier phrase, i.e. Ik zei X (Dutch: “I said X”) and I say X (English), for a total of 80 Dutch tokens and 96 English tokens from each participant.
Table 2. Stimulus material used in Experiment 1; 1PS denotes 1st person singular verb form.
![](https://static.cambridge.org/binary/version/id/urn:cambridge.org:id:binary-alt:20160713150129-06144-mediumThumb-S136672891100071X_tab2.jpg?pub-status=live)
The materials were recorded in individual sessions in a sound-attenuated booth, using a Zoom H2 Handy Recorder with a sampling rate of 44.1 kHz and 16-bit resolution. Importantly, the Dutch and English materials were collected on different days to avoid dual language activation as much as possible (Grosjean, Reference Grosjean1989, Reference Grosjean and Nicol2001).
For the same reason, each session commenced with a brief interaction in the relevant language. Subsequently, the participants were asked to read the target sentences at a natural pace. In order to conceal the purpose of the experiment from them, the wordlist for VOT was randomly interspersed with items from a list targeting vowels (Experiment 2) as well as monosyllabic distractor items that were not analyzed further. A short break was scheduled at the end of each 15-minute block to avoid fatigue effects. Including the materials for prosodic analysis reported elsewhere (Mennen et al., Reference Mennen, Mayr and Price2011), it took each participant approximately two hours to complete the recording session in each language.
2.3 Analysis and results
The digitized materials were transferred to a standard PC, and analyzed acoustically using PRAAT software (Boersma & Weenink, Reference Boersma and Weenink2010). VOT was measured from the release burst, signalled by a sharp peak in waveform energy, to the start of the oscillating line indicating the onset of voicing of the following vowel. If voicing began during the plosive closure period, VOT was measured from the point at which vocal fold vibration could be detected in the waveform, coupled with the presence of aperiodic wide-band energy visible in the spectrograms, up to the release burst.
Figure 1 depicts MZ's and TZ's mean VOT values, with standard deviations, for the Dutch voiceless plosives /p t k/.
![](https://static.cambridge.org/binary/version/id/urn:cambridge.org:id:binary-alt:20160713150129-13995-mediumThumb-S136672891100071X_fig1g.jpg?pub-status=live)
Figure 1. VOT (in ms) for Dutch voiceless plosives (MZ and TZ); error bars indicate +/–1 SD.
Inspection of the figure indicates that TZ's VOT values fall within the short-lag range characteristic of voiceless plosives in native Dutch speech (see Flege & Eefting, Reference Flege and Eefting1987; Lisker & Abramson, Reference Lisker and Abramson1964; Simon, Reference Simon2009). The relatively shorter values for bilabial plosives compared with alveolars and velars, in turn, are expected since the differing cavity sizes behind the articulators result in differences in air pressure (Cho & Ladefoged, Reference Cho and Ladefoged1999).
MZ's productions of the Dutch voiceless plosives, in contrast, are characterized by considerably longer VOT values than those produced by her twin sister, with values that are intermediate between those of the native Dutch and the native English norms (Docherty, Reference Docherty1992; Flege & Eefting, Reference Flege and Eefting1987; Lisker & Abramson, Reference Lisker and Abramson1964).
A cross-linguistic comparison of her productions of voiceless plosives, in turn, shows minor differences in the mean values for /t/ and /k/ in Dutch and English. However, as the standard deviations in Figure 2 indicate, MZ's VOT values exhibit a large degree of cross-linguistic overlap. Moreover, the direction of the difference in mean values is inconsistent, with slightly higher values for English /t/ than Dutch /t/, but slightly lower values for English /k/ than Dutch /k/. Taken together, we thus contend that the results for the voiceless plosives suggest cross-linguistic assimilation patterns, with compromise VOT values that differ from the native norms of the L1 and the L2.
![](https://static.cambridge.org/binary/version/id/urn:cambridge.org:id:binary-alt:20160713150129-31628-mediumThumb-S136672891100071X_fig2g.jpg?pub-status=live)
Figure 2. VOT (in ms) for Dutch and English voiceless plosives (MZ); error bars indicate +/–1 SD.
Figure 3 depicts the mean VOT values, with standard deviations, of MZ's and TZ's productions of the Dutch voiced plosives /b/ and /d/. Inspection of the figure indicates that TZ's productions are strongly prevoiced, in conformity with the native Dutch norm (Lisker & Abramson, Reference Lisker and Abramson1964; van Alphen, Reference van Alphen2004). This pattern was found to be highly consistent, with 100% of her tokens produced with a voicing lead. Interestingly, the same applies to MZ's productions, with virtually identical VOT values to her sister's. This indicates that her voiced plosives have not undergone L1 attrition.
![](https://static.cambridge.org/binary/version/id/urn:cambridge.org:id:binary-alt:20160713150129-49069-mediumThumb-S136672891100071X_fig3g.jpg?pub-status=live)
Figure 3. VOT (in ms) for Dutch voiced plosives (MZ and TZ); error bars indicate +/–1 SD.
Moreover, VOT measurements of her English voiced plosives also indicate consistent prevoicing. In fact, as Figure 4 shows, the voicing lead in MZ's productions of English plosives is even longer than in her Dutch ones. This suggests transfer of prevoicing from the L1 to the L2.
![](https://static.cambridge.org/binary/version/id/urn:cambridge.org:id:binary-alt:20160713150129-18619-mediumThumb-S136672891100071X_fig4g.jpg?pub-status=live)
Figure 4. VOT (in ms) for Dutch and English voiced plosives (MZ); error bars indicate +/–1 SD.
2.4 Discussion
The results for MZ's voiceless plosives indicate cross-linguistic assimilation patterns. Thus, she produced /p t k/ in both languages with VOT values that are longer than the native Dutch norm, but not as long as the long-lag aspirated plosives of English. Assimilatory patterns with compromise VOT values of this kind have been demonstrated in a number of previous studies on L1 attrition (Flege, Reference Flege1987; Major, Reference Major1992).
MZ's voiced plosives, in contrast, were consistently prevoiced in both languages, in line with the native Dutch norm. There is thus no evidence for L1 attrition with respect to Dutch /b/ and /d/. This indicates an interesting asymmetry in MZ's VOT patterns, with L1 attrition apparent in her productions of the voiceless categories, but not the voiced ones. Moreover, with respect to the latter, MZ appears to have transferred the articulatory patterns underpinning voicing in her L1 to the L2. Note, however, that prevoicing may sometimes occur in English voiced plosives, in particular in voiced environments (Docherty, Reference Docherty1992).
An interesting finding of the study is that MZ's English voiced plosives are even more prevoiced than her Dutch ones. One interpretation would be to suggest that she has acquired separate representations for voiced categories in the two languages and is attempting to differentiate them thus, exhibiting a polarization effect. Unlike the studies reviewed earlier (de Leeuw et al., in press; Flege & Eefting, Reference Flege and Eefting1987), this would involve a shift of her L2 categories, rather than L1 categories, a scenario that has been demonstrated in a number of previous studies. The early Italian–English bilinguals in Flege et al.'s (2003) study, for instance, produced L2 English /eɪ/ with greater tongue movement than native English speakers, which the authors interpret as an attempt to differentiate it from monophthongal Italian /e/. In the present context, however, such a polarization effect is unlikely given the relatively minor difference between MZ's Dutch and English categories. Moreover, one would expect it to operate in the opposite direction with VOT values for English /b d g/ that are even longer than the short-lag categories of the native English norm. Other factors, such as differences in speaking rate, are therefore more likely explanations, although the specific reason for the observed pattern remains unclear.
Overall, then, the findings obtained here suggest cross-linguistic assimilation patterns for all plosives, with voiced ones realized with a voicing lead in both languages, and voiceless ones with VOT values intermediate between short-lag and long-lag categories. These results conform closely to those of Simon (Reference Simon2009). In that study, L1 Dutch learners of English from Flanders were able to produce the aspirated long-lag plosives of English accurately, but not the unaspirated short-lag ones, instead realizing the latter with a voicing lead, as in the present study. Simon argues that this may be due to the greater acoustic salience of aspiration over prevoicing. Moreover, prevoicing functions as a critical cue for the identification of voiced plosives in Dutch. Thus, van Alphen (Reference van Alphen2004) has shown that native Dutch speakers frequently misidentify tokens of Dutch voiced plosives that lack a voicing lead. It is therefore not surprising if Dutch speakers transfer this feature to the L2. Also, as MZ produced her voiceless plosives with intermediate values, some of which still fall within the short-lag range, e.g. /p/, she would have been unable to maintain sufficient contrast between her voiced and voiceless categories, had she succeeded in suppressing prevoicing for /b d g/.
Finally, despite the consistency between the two studies, there is one notable difference: Simon's subjects not only acquired long-lag categories in English, but also managed to retain their short-lag categories in Dutch. MZ, in contrast, failed to differentiate voiceless plosives in the two languages. This disparity suggests differences in the interaction of the L1 and L2 sound systems across the two studies. Thus, Simon's subjects, who are university students of English living in a Dutch-speaking environment, receive the majority of their input in the L1, and consequently, L2-to-L1 interaction effects are perhaps less likely. MZ, on the other hand, receives most of her input in the L2, and as a consequence may be more prone to bidirectional interference effects.
3. Experiment 2
3.1 Materials and procedure
Experiment 2 focuses on the twins’ vowel productions in Dutch and English. Standard Dutch distinguishes the phonologically short vowels /ɪ ɛ ɑ ɔ ʏ/, the phonologically long vowels /i e a o u ø y/, the closing diphthongs /ɛɪ ɔu œy/ and schwa (Booij, Reference Booij1995; Gussenhoven, Reference Gussenhoven1999). Note that the nominal Dutch monophthongs /e o ø/ are characterized by vowel-inherent spectral change (Adank, van Hout & Smits, 2004). Dutch also has a number of marginal vowels, such as the nasal vowels /ɛ˜ ɑ˜ ɔ˜/. However, these are only used in a few loanwords (Collins & Mees, Reference Collins and Mees2003). Standard Southern British English (SSBE), in turn, contains the short vowels /ɪ ɛ æ ʌ ɒ ʊ/, the long vowels /i ɑ ɔ u ɜ/, the closing diphthongs /eɪ aɪ ɔɪ əʊ aʊ/, the centring diphthongs /ɪə eə ʊə/ and schwa (Roach, Reference Roach2004). Only the 15 Dutch vowels /i ɪ e ɛ a ɑ o ɔ u ø y ʏ ɛɪ ɔu œy/ and the 16 English vowels /i ɪ ɛ æ ɑ ʌ ɔ ɒ u ʊ ɜ eɪ aɪ ɔɪ əʊ aʊ/ were included in the study. To control for phonetic context effects, these were embedded in a /hVt/ frame, yielding the target words depicted in Table 3.
Table 3. Target words and corresponding phonetic symbols (Experiment 2); asterisks indicate non-words.
![](https://static.cambridge.org/binary/version/id/urn:cambridge.org:id:binary-alt:20160713150129-52728-mediumThumb-S136672891100071X_tab3.jpg?pub-status=live)
Note that only some of the target words are real words. Despite language-specific spelling conventions, orthography-based difficulties could not be ruled out altogether. In order to ensure activation of the intended vowel categories, production of the target words was therefore primed by the use of real-word rhyming prompts. During the recording, the participants read out the primes for each category followed by the relevant target word produced once in isolation and subsequently in a carrier phrase, e.g. moet “must” (1PS), zoet “soft”, hoet (non-word), ik zei hoet “I said hoet”; rat, mat, hat, I say hat. This procedure was repeated four times for each vowel category, yielding a total of 15 × 4 = 60 Dutch phrases and 16 × 4 = 64 English phrases. Recall that these were interspersed with other materials during data collection. Otherwise, the procedure followed the pattern described in Experiment 1.
3.2 Analysis and results
Following extraction of the target words from the carrier phrases, using PRAAT software (Boersma & Weenink, Reference Boersma and Weenink2010), the duration of each vowel was measured from the first positive peak in the digitized waveform up to, but not including, the portion of acoustic silence that signals the closure period of the post-vocalic plosive. Subsequently, the frequency of the first two formants was measured using formant trackers, set at a frequency maximum of 5000 Hz with a dynamic range of 30 dB and a window length of 0.025 seconds. F1 and F2 frequencies were measured at the vowel mid-point for the monophthongs, and at the 25% and 75% portions for the diphthongs. Where mistracking occurred, the automatically tracked formants were hand corrected.
Figure 5 displays the mean F1 and F2 frequencies (in Hz) of the Dutch monophthongs, as produced by MZ and TZ. Note that the nominal Dutch monophthongs /e o ø/ are discussed together with the diphthongs further below.
![](https://static.cambridge.org/binary/version/id/urn:cambridge.org:id:binary-alt:20160713150129-26346-mediumThumb-S136672891100071X_fig5g.jpg?pub-status=live)
Figure 5. F1~F2 plot of Dutch monophthongs, as produced by MZ (filled circles) and TZ (unfilled circles).
Inspection of Figure 5 shows that TZ's and MZ's vowel spaces are fundamentally different. As expected, TZ's productions conform closely to the native norm for Dutch vowels (Adank et al., Reference Adank, van Hout and Smits2004). On the other hand, with the exception of /a/ and /u/, MZ's Dutch monophthongs are characterized by consistently greater degrees of openness, compared with those of her twin sister, as manifest in systematically higher F1 values. Thus, across all Dutch monophthongs, her mean F1 value is 594 Hz, compared with a mean of 511 Hz for TZ. In some instances, e.g. /ɔ/ and /ɛ/, MZ's categories differ in F1 from her sister's by more than 200 Hz.
In order to determine cross-linguistic effects, MZ's productions of the Dutch monophthongs were also compared with those of her English monophthongs. Inspection of the F1~F2 plot in Figure 6 shows that MZ did not produce a contrast between some Dutch and English categories. Her Dutch /ɛ/, for instance, which, as we have seen, is considerably higher in F1 than her sister's (see Figure 5), is indistinguishable from her English /ɛ/. Similar patterns hold for Dutch /i ɪ ɔ u/. Note that the latter is involved in a three-way merger together with English /u/ and /ʊ/.
![](https://static.cambridge.org/binary/version/id/urn:cambridge.org:id:binary-alt:20160713150129-43275-mediumThumb-S136672891100071X_fig6g.jpg?pub-status=live)
Figure 6. F1~F2 plot of MZ's productions of the Dutch (filled circles) and English (unfilled circles) monophthongs.
Not all categories exhibit cross-linguistic assimilations, however. MZ's Dutch /ɑ/, for example, is clearly distinct from her English /ɑ/, perhaps because of the differences in duration between the two categories (MZ's Dutch /ɑ/: 100 ms (SD: 10); English /ɑ/: 247 ms (SD: 26)). Interestingly, the former has nevertheless undergone a shift in F1 towards a more open position (see Figure 5). Likewise, MZ produced the front rounded Dutch vowels /y/ and /ʏ/ with considerably higher F1 values than TZ despite the fact that these categories have no English counterparts.
Figure 7 depicts TZ's and MZ's productions of the Dutch diphthongs, as measured from the 25% to the 75% portions of each category. TZ's productions are represented by unbroken arrows in the figure, MZ's by broken arrows.
![](https://static.cambridge.org/binary/version/id/urn:cambridge.org:id:binary-alt:20160713150129-66487-mediumThumb-S136672891100071X_fig7g.jpg?pub-status=live)
Figure 7. F1~F2 plot of Dutch diphthongs, as produced by MZ (broken arrows) and TZ (unbroken arrows).
As the figure indicates, TZ's and MZ's diphthong productions differ considerably. TZ's productions conform to the norms expected for Dutch (Adank et al., Reference Adank, van Hout and Smits2004). MZ's productions, on the other hand, are consistently higher in F1 than her sister's. Moreover, /e o ø/ are also produced with considerably greater vowel-inherent spectral change, as the comparatively longer arrows in Figure 7 indicate.
A comparison of MZ's productions of the English and Dutch diphthongs, in turn, suggests a lack of cross-linguistic distinctiveness for some categories (see Figure 8). Thus, MZ produced Dutch /e/, /o/ and /ɔu/ similar to English /eɪ/, /əʊ/ and /aʊ/, respectively. She made a clear contrast between the remaining Dutch and English diphthongs, however. Interestingly, as with the monophthongs, Dutch diphthong categories that do not have an L2 counterpart, e.g. /ø/, were also characterized by a shift towards more open positions.
![](https://static.cambridge.org/binary/version/id/urn:cambridge.org:id:binary-alt:20160713150129-05855-mediumThumb-S136672891100071X_fig8g.jpg?pub-status=live)
Figure 8. F1~F2 plot of MZ's productions of the Dutch (broken arrows) and English (unbroken arrows) diphthongs.
3.3 Discussion
The purpose of this experiment was to determine whether long-term residence in an English-speaking environment has affected MZ's production of L1 vowels. A comparison of her Dutch vowel productions with those of her twin sister indeed suggests L1 attrition. Interestingly, the patterns for the monophthongs and the diphthongs are not erratic, but follow a general trend towards more open realizations compared with the native Dutch norm. How can this pattern be explained?
To begin with, the direction of the shift in MZ's vowel space is not surprising. After all, SSBE vowels are generally more open than their Standard Dutch counterparts (Adank et al., Reference Adank, van Hout and Smits2004; Deterding, Reference Deterding1997; Hawkins & Midgley, Reference Hawkins and Midgley2005). Thus, the mean F1 value for female native speakers of Dutch in Adank et al.'s (2004) study is 481 Hz, compared with a mean value of 612 Hz produced by the female native speakers of SSBE in Deterding's (1997) study. Accordingly, the shift in MZ's Dutch vowel space is in the direction of her L2.
The cross-linguistic differences between the vowel systems of Dutch and English can also explain the acoustic dimension involved in the shift, i.e. F1. Hence, independent of specific categories, the two languages differ systematically in F1, but not F2. Moreover, the human auditory system shows greater sensitivity to differences between lower frequencies than higher frequencies (Goldstein, Reference Goldstein2010). This makes F1-related changes in vowel systems more likely (see Chang, Reference Chang2010).
Finally, the mechanisms underpinning the shift in MZ's L1 vowel space need to be considered. According to Flege's (1995) SLM, cross-linguistically similar L1 and L2 categories are perceptually related to each other, forming merged L1–L2 representations. Consistent with this account, MZ did not produce a cross-linguistic difference between some of her vowel categories. For example, her Dutch /ɛ/ may have been “pulled” towards a more open, and thus more English-like position due to interlingual identification with English /ɛ/. The same mechanism could then be responsible for changes to other L1 categories. Thus, according to this account, the observed shift in F1 across the various Dutch categories could be the result of a series of unconnected changes affecting pairs of L1 and L2 vowels.
However, a token-by-token explanation cannot fully account for the observed patterns. After all, not only did L1 categories assimilated to L2 ones shift towards a more open position, but also L1 categories with no counterpart in the L2, such as /ʏ/ and /ø/. An alternative explanation would be to suggest that cross-linguistic interactions operate at a system-wide level, rather than at the level of individual sounds.
This interpretation has been invoked in previous studies of bilingual vowel systems (Chang, Reference Chang2010, Reference Chang, Lee and Zee2011; Guion, Reference Guion2003). Thus, Guion (Reference Guion2003) argued that changes in the vowel space of the early Quichua–Spanish bilinguals in her study were motivated by a need for greater dispersion between L1 and L2 categories. However, polarization effects of this kind cannot account for the findings of the present study, since MZ's L1 categories shifted in the direction of the L2, rather than away from it. The results obtained here are more akin to those of Chang (Reference Chang2010, Reference Chang, Lee and Zee2011) in which late L1 English – L2 Korean bilinguals also exhibited assimilatory patterns. Note that in contrast to the present study, the bilinguals in his study raised their L1 vowels, rather than lowered them. Moreover, the F1 shift in Chang's study was relatively subtle, with an average decrease for his female participants of 17 Hz over the six-week period of the study, while the mean F1 difference between MZ's and TZ's Dutch monophthongs was 83 Hz.
The specific mechanisms underpinning system-wide shifts are not entirely clear. Perhaps they are instigated by token-level assimilations of particular categories, such as Dutch /ɛ/ and English /ɛ/ in the present context. This might skew the internal consistency of the system, and trigger subsequent changes to other categories in an attempt to achieve a state of equilibrium. As Chang speculates, perhaps bilinguals become gradually attuned to the average F1 of the L2 spectrum, which may then become linked to the L1 spectrum.
This explanation presupposes similar changes to the various vowel categories. However, this was not the case in the present context. Thus, the magnitude of the changes to MZ's Dutch vowels is not equal across categories. Moreover, Dutch /a/ did not participate in the shift, perhaps because it cannot be lowered any further due to physiological constraints. As a consequence, MZ's L1 vowel space does not constitute a shifted replica of TZ's, but instead has its own unique configuration. The results of this experiment thus suggest a complex interaction between token-level and system-level shifts in bilingual vowel systems.
4. General discussion and conclusion
The present study is the first to investigate L1 attrition in the speech of bilingual monozygotic twins differing in linguistic experience. As such, it provides a unique control setting for the assessment of bilingual speech, making it possible to control for variables that cannot easily be controlled for in other studies, such as the social, physiological and neurological characteristics of speech.
The main purpose of the study was to determine whether MZ, who has been living in an L2-speaking environment for the last 30 years, shows signs of attrition in her pronunciation of Dutch, her L1. This was assessed by comparing her speech to that of her identical twin sister, TZ, who has been living in the Netherlands all her life. Two different areas of pronunciation were examined: Experiment 1 focused on the twins’ production of VOT in word-initial plosives, Experiment 2 on their production of vowels.
In both experiments, systematic differences between the twin sisters were observed. In Experiment 1, MZ produced Dutch voiceless plosives with consistently longer VOT values than her sister, suggesting an influence of the long-lag categories of English. L1 attrition was also apparent in Experiment 2, as MZ produced most of her Dutch vowels with inaccurately high F1 values. Interestingly, where attrition occurred, it manifested itself in assimilatory patterns. No instances of cross-linguistic dissimilation were observed. These results are consistent with Flege's (1995) claim that differences between L1 and L2 categories are more likely to be perceived in early than late bilinguals.
Overall, the present study suggests that long-term experience with L2 English has affected MZ's L1 accent. In this respect, she differs from individuals in previous studies who managed to produce their L1 and L2 within the monolingual norms of either language (de Leeuw et al., in press; Major, Reference Major1992; Mennen, Reference Mennen2004). The specific factors that have led to L1 attrition in MZ's speech are not known. Her frequent code-switches in interactions with her daughter may have been a contributing factor, however. After all, code-switching has been found to increase the likelihood of late consecutive bilinguals living in an L2-speaking environment being perceived as non-native in their L1 (de Leeuw et al., Reference de Leeuw, Schmid and Mennen2010).
Nevertheless, not all areas investigated were subject to L1 attrition. MZ's production of the Dutch voiced plosives /b/ and /d/, for instance, was native-like. In fact, the results indicate an L1 effect on the L2, with her English voiced plosives realized with a consistent voicing lead. Similarly, despite wide-ranging changes to her vowel system, MZ's production of Dutch /a/ was accurate. These results suggest that some areas of pronunciation may be more prone to attrition than others. At present, it is not entirely clear, however, which areas they are. In future research, the same individuals should be tested across a range of areas of pronunciation. This will make it possible to control for individual differences across attriters. To this end, work on suprasegmental aspects of MZ's and TZ's speech is currently underway (Mennen et al., Reference Mennen, Mayr and Price2011). Together with the present study, this may go some way towards elucidating the relative contribution of segmental and suprasegmental aspects of pronunciation to L1 attrition.
Future work is also needed to determine whether the areas most affected by attrition are the same as those with which advanced L2 learners struggle most. Recent research involving foreign accent ratings revealed comparable patterns for groups of attriters and late L2 learners (Hopp & Schmid, in press). Whether the specific phonetic characteristics that mark them out as foreign-accented are the same, remains to be seen in future research.
Finally, this study has theoretical implications. A number of explanatory frameworks have been put forward in second-language speech learning (Best & Tyler, Reference Best, Tyler, Bohn and Munro2007; Escudero, Reference Escudero2005; Flege, Reference Flege and Strange1995; Lado, Reference Lado1957). However, only Flege's (1995) SLM accounts for bidirectional influence in bilingual sound systems. The other models either do not address L2-to-L1 interaction or specifically preclude it.
As we have seen, the SLM claims that L2 sounds are perceptually linked to their closest L1 counterparts. It is, however, not clear on what basis this linkage occurs. Moreover, cross-linguistic interaction may not, or not only arise at the level of individual sounds. Instead, the results of Experiment 2 suggest that bilinguals may at least in part link L1 and L2 sounds at a system-wide level. Provided these results are confirmed in perception experiments, extensions to existing theoretical accounts are required. They are also needed as they currently only explain interactions at the level of segments, leaving suprasegmental aspects of speech in bilinguals unaccounted for. It is hoped that such extensions will go some way towards improving our understanding of the nature of interactions in bilingual sound systems.