Hostname: page-component-745bb68f8f-lrblm Total loading time: 0 Render date: 2025-02-06T14:09:35.676Z Has data issue: false hasContentIssue false

Effects of age and experience on the production of English word-final stops by Korean speakers*

Published online by Cambridge University Press:  27 January 2010

WENDY BAKER*
Affiliation:
Brigham Young University
*
*Address for correspondence: Wendy Baker, PhD, Department of Linguistics and English Language, Brigham Young University, 4057 JFSB, Provo, UT 84601 wendy_baker@byu.edu
Rights & Permissions [Opens in a new window]

Abstract

This study examined the effect of second language (L2) age of acquisition and amount of experience on the production of word-final stop consonant voicing by adult native Korean learners of English. Thirty learners, who differed in amount of L2 experience and age of L2 exposure, and 10 native English speakers produced 8 English monosyllabic words ending in voiced and voiceless stops. These productions were presented to 10 English listeners for perceptual judgment and subjected to acoustic analyses to determine how well learners produced vowel duration and closure (stop gap) duration, two cues to stop consonant voicing. Results revealed that even learners with 10 years of L2 experience did not always produce stop consonant voicing accurately, that learners' age of acquisition influenced their production of both cues, that vowel duration was easier to learn than closure duration, and that English listeners used both these cues in their judgments of production accuracy.

Type
Research Article
Copyright
Copyright © Cambridge University Press 2010

Two factors that are often confounded in second language (L2) research are amount of L2 experience (often described as the length of a speaker's residence in a target language country) and age of acquisition (frequently defined as a speaker's age upon arrival in a target language country). Indeed, when researchers test learners’ abilities, they often find that the learners who arrived in a target language country at a younger age have resided in that country longer. This relationship between amount of L2 experience and age of acquisition makes it difficult for researchers to tease apart the individual effects of each of these factors (see Piske, Mackay and Flege, Reference Piske, MacKay and Flege2001, for discussion). In addition, most studies investigating age and experience effects have compared learners who began L2 learning in childhood or early adolescence with those who learned their L2 after a putative “critical period”, or in adulthood (e.g., see Flege, Yeni-Komshian and Liu, Reference Flege, Yeni-Komshian and Liu1999; Baker and Trofimovich, Reference Baker and Trofimovich2005). Recent studies, however, have demonstrated that age of acquisition effects, at least for some L2 abilities, may occur even when comparing adult learners with adult learners (Stevens, Reference Stevens1999; Chiswick and Miller, Reference Chiswick and Miller2008). Generally, these past studies of age effects in adulthood have focused on learners’ global, self-rated L2 ability (see Birdsong and Molis, Reference Birdsong and Molis2001, for an exception), but have not closely examined the relationship between age and experience, especially with respect to particular aspects of L2 learning.

In the current study, we sought to extend this past research by investigating the individual effects of age and experience on two specific aspects of adult (i.e., post-puberty) L2 phonological learning. Specifically, we wished to determine how adult learners’ amount of L2 experience and age of acquisition influence their ability to produce two different phonetic cues for the same segmental target (closure duration and vowel duration as cues for word-final English stop consonant voicing). We examined two cues for the same segmental target to determine whether (and why) some cues seem to be more impervious to factors such as age and experience than others. To accomplish these goals, we asked adult native Korean learners of English, who differed in their amount of experience with English and in their age of arrival in the US, to produce monosyllabic English words ending in voiced or voiceless stop consonants (/d/vs. /t/).

L2 experience

The role of experience in L2 phonological learning is still relatively unclear. While a growing number of studies have demonstrated that learners tend to improve in their ability to perceive and produce different aspects of the L2 as learners’ amount of L2 experience increases, other studies have found little or no effect of experience on L2 phonological learning. Increasing amounts of L2 experience have been linked to improvements in production of L2 prosody and stress (Trofimovich and Baker, Reference Trofimovich and Baker2006, Reference Trofimovich and Baker2007; Nguyen, Ingram and Pensalfini, Reference Nguyen, Ingram and Pensalfini2008), perception and production of L2 vowels (Flege, Bohn and Jang, Reference Flege, Bohn and Jang1997; Baker and Trofimovich, Reference Baker and Trofimovich2005), perception of coarticulatory rules (Levy and Strange, Reference Levy and Strange2008) and judgments of perceptual similarity between native language (L1) and L2 segments (Trofimovich, Baker and Mack, Reference Trofimovich, Baker and Mack2001). In contrast, however, other studies have revealed that amount of experience has a minimal effect on L2 phonological learning (Oyama, Reference Oyama1976; Tahta, Wood and Loewenthal, Reference Tahta, Wood and Loewenthal1981; Flege, Reference Flege1988; Moyer, Reference Moyer1999). One example of this is Cebrian (Reference Cebrian2006), who found that amount of experience with English did not affect native Catalan speakers’ identification of tense and lax English vowels. Regardless of how long these speakers had studied English, they used durational cues to identify these vowels instead of acoustic (spectral) cues.

Why is it that amount of L2 experience does not always influence L2 phonological learning? One obvious reason is that there are vast methodological differences among studies in the choice of L2 features examined, and languages studied, as well as in the elicitation tasks and analyses used. Piske, Mackay and Flege (Reference Piske, MacKay and Flege2001) suggest at least two other reasons. The first reason, they argue, is that measuring amount of experience as the number of years learners have spent in a target country is problematic because learners could live in that country for many years and interact only with speakers of their L1. The second reason offered by Piske et al. is that many studies often do not include learners with a sufficiently broad difference in amount of experience. For example, Flege (Reference Flege1988) compared Spanish speakers learning English with either 1.1 or 5.1 years of English experience and did not find a difference between the two groups. In the current study, we attempted to address these two criticisms by examining only learners who use the L2 often (i.e., as college students studying in their L2) and who differ from each other extensively in their amount of L2 experience (less than 1 year, 3 years and 10 years).

Learners’ age

Another possible reason for conflicting findings regarding L2 experience effects is that learners’ age may be a confounding factor even in cases where adult L2 learners are studied. Because age of acquisition effects are typically not controlled for in studies of adult (i.e., post-puberty) L2 learning, these effects may mask or even overshadow L2 experience effects, making it hard for researchers to determine the precise contribution of L2 experience to phonological learning. While no research has to date directly shown that age of acquisition effects occur across the lifespan for specific features of L2 perception and production, there is sufficient evidence to suggest that adult learners who acquire their L2 earlier in life may reach a higher level of L2 accuracy than those adult learners who acquire it later. For example, Birdsong and Molis (Reference Birdsong and Molis2001) showed a gradual decline for native Spanish speakers’ English grammaticality judgments as these speakers’ age of acquisition of English increased even after puberty, suggesting that age of acquisition effects are not tied to a “critical period,” reflecting strictly maturational effects, but may occur gradually throughout the lifespan. Similar findings have also been shown in examinations of age of acquisition effects for global ratings of L2 ability in datasets as large as the US 2000 census (Stevens, Reference Stevens1999; Chiswick and Miller, Reference Chiswick and Miller2008). In addition, age of acquisition effects for adult L2 learners have recently been reported by Flege, Birdsong, Bialystok, Mack, Sung and Tsukada (Reference Flege, Birdsong, Bialystok, Mack, Sung and Tsukada2006) and by Trofimovich and Baker (Reference Trofimovich and Baker2006). Both sets of researchers found a confound between age of acquisition and amount of experience for some features of L2 production, concluding that age of acquisition even for adults was in some cases a better predictor of L2 ability than amount of L2 experience.

Age and experience effects on specific L2 phonetic features

In light of the findings above (e.g., Flege et al., Reference Flege, Birdsong, Bialystok, Mack, Sung and Tsukada2006; Trofimovich and Baker, Reference Trofimovich and Baker2006), in the current study we examined the influence of age and experience on adult L2 phonological learning. It appears that the influence of these factors might differ depending on the specific aspect of L2 phonology examined (Bohn and Flege, Reference Bohn and Flege1990; Flege et al., Reference Flege, Bohn and Jang1997; Baker and Trofimovich, Reference Baker and Trofimovich2005). For instance, Bohn and Flege (Reference Bohn and Flege1992) showed that more experienced German learners of English were able to produce English /æ/ (a vowel not found in German) more accurately than inexperienced learners but that the two groups did not differ in their ability to produce English vowels /i/ and /ɪ/ (both of which are similar to German vowels). Trofimovich and Baker (Reference Trofimovich and Baker2006, Reference Trofimovich and Baker2007) recently hypothesized that cross-language comparisons could explain why the learning of some English suprasegmentals (i.e., stress timing) by Korean learners is susceptible to effects of L2 experience while the learning of others (speech rate, pause frequency) is not. They concluded that features that differed significantly across the L1 (Korean) and L2 (English) in terms of their acoustic–phonetic realization were easier to acquire in the L2 than features that did not differ significantly across the two languages. Flege, Mackay and Meador (Reference Flege, MacKay and Meador1999) found similar results when examining age of acquisition effects on the perception and production of English vowels by native Italian speakers. They found that “new” vowels (i.e., L2 vowels that do not have similar counterparts in the L1) were less affected by age of acquisition than are “similar” vowels (i.e., L2 vowels with phonetically similar counterparts in the L1).

In the current study, we hypothesized that it is also possible that amount of experience and age of acquisition would influence the learning of different phonetic cues making up the same target. In other words, some cues to a phonetic target may be more learnable than others given the same amount of L2 experience or age of acquisition. To examine this issue in detail, we investigated how two cues to word-final stop consonant voicing in English (vowel duration, closure duration) are acquired by Korean learners with different amounts of L2 experience and different ages of acquisition.

Final consonant voicing in English and Korean

L2 learners of English often have difficulty producing and/or perceiving the contrast between voiced (e.g., /d/ as in bad) and voiceless (e.g., /t/ as in bat) final stops (Flege and Port, Reference Flege and Port1981; Mack, Reference Mack1982; Elsendoorn, Reference Elsendoorn1984; Flege, Munro and Skelton, Reference Flege, Munro and Skelton1992; Flege, Reference Flege1993; Yavas, Reference Yavas1997). In addition, children acquiring English as an L1 also have difficulties perceiving and producing this contrast accurately (Tsukada, Birdsong, Mack, Sung, Bialystok and Flege, Reference Tsukada, Birdsong, Mack, Sung, Bialystok and Flege2004; Nittrouer, Reference Nittrouer2004; Lowenstein and Nittrouer, Reference Lowenstein and Nittrouer2008). Word-final consonant voicing thus appears to be ideal for studying effects of experience and age of acquisition. There are many factors that contribute to making word-final consonant voicing difficult to acquire. Some of these factors include: (1) the existence of several phonetic cues to signaling voicing in English; (2) the frequency with which word-final voicing contrasts occur in spontaneous speech; and (3) for Korean learners of English, the differences in word-final consonant voicing between English and Korean. These factors are discussed in turn.

One reason for the pervasive difficulty of word-final consonant voicing is that learners must perceive and produce several different cues to the voiced–voiceless distinction, including closure duration and preceding vowel duration (House and Fairbanks, Reference House and Fairbanks1953; Lisker, Reference Lisker1957; Kluender, Diehl and Wright, Reference Kluender, Diehl and Wright1988). Closure (stop gap) duration refers to the time interval between the last formant transition for the preceding vowel and the onset of the burst for the stop (Lisker, Reference Lisker1957). Essentially, closure duration is a measure of how long the airflow is obstructed before a stop is released if, of course, there is a release burst (Lisker, Reference Lisker and Valdman1972). In English, closure duration is often longer for voiceless than for voiced stops. Vowel duration, in turn, refers to the total length of the vowel preceding a stop. In English, vowels are often longer in a voiced than in a voiceless context. Other cues to word-final stop consonant voicing include the intensity and duration of the release burst (Park and Kang, Reference Park and Kang2006) and F1 offset frequency (Crowther and Mann, Reference Crowther and Mann1992). While these other cues are important for identifying both the place of articulation and the degree of voicing, cross-language differences between the use of closure and vowel duration in English and Korean, as well as the saliency and frequency of these cues in English, provide an interesting case study upon which to study age and experience effects on adult L2 phonological learning. Therefore, our discussion is limited to these two phonetic cues.

A second reason for the difficulty of word-final consonant voicing is that English final stops are often deleted, especially in fast speech (Jurafsky, Bell, Gregory and Raymond, Reference Jurafsky, Bell, Gregory, Raymond, Bybee and Hopper2001), although, on average, speakers typically release final stops more often than not (women: 67%, men: 59%; Bell, Jurafsky, Fosler-Lussier, Girand, Gregory and Gildea, Reference Bell, Jurafsky, Fosler-Lussier, Girand, Gregory and Gildea2003). Since at least one cue to word-final consonant voicing (closure duration) is absent in cases when stops are not released, learning the voiced–voiceless distinction in English is a difficult task. This may explain why native Korean speakers are less accurate in perceiving stop voicing in word-final than in initial or medial positions, and are much less accurate in perceiving word-final stop voicing when the closure duration is absent (Park and Kang, Reference Park and Kang2006).

Finally, specifically for Korean learners of English, this distinction is difficult, perhaps because Korean does not have word-final stop consonant voicing distinctions (Chen, Reference Chen1970; Eckman, Reference Eckman1977, Reference Eckman1981; Major and Faudree, Reference Major and Faudree1996). Unlike English, Korean has a three-way stop distinction: lax or plain stops, aspirated stops, and fortis or tensed (produced with a tightened glottis) stops (Cho, Jun and Ladefoged, Reference Cho, Jun and Ladefoged2002). However, Korean stop consonants in word- or coda-final positions are always voiceless and are never released, meaning that they are produced with no closure duration and no release burst (Sohn, Reference Sohn1999; Lee and Ramsey, Reference Lee and Ramsey2000). In addition, as discussed above, both vowel duration and closure duration in English differ for word-final voiced and voiceless stops. In contrast, the difference in vowel duration preceding word-final stops is much less pronounced in Korean and, as a result, very likely not as salient as in English (Chen, Reference Chen1970).

Thus, in order to distinguish between English voiced and voiceless word-final stops, a Korean learner must be able to perceive and produce two cues that are in opposite relation to each other: a longer vowel duration and a shorter closure duration for voiced stops, and a shorter vowel duration and a longer closure duration for voiceless stops (House and Fairbanks, Reference House and Fairbanks1953; Lisker, Reference Lisker1957; Mack, Reference Mack1982; Flege, Reference Flege1993). Having to learn these two cues to consonant voicing, both of which involve durational differences, appears to be a difficult task requiring fairly extensive amounts of L2 experience to accomplish (e.g., Flege et al., Reference Flege, Munro and Skelton1992; Flege, Reference Flege1993).

We reasoned that the two cues to word-final consonant voicing, vowel duration and closure duration, should present adult Korean learners with different degrees of difficulty. We hypothesized that a distinction in vowel duration would be easier for Korean learners to acquire than a distinction in closure duration. Learning vowel duration differences for voiced and voiceless stops should be relatively easy because vowel length is phonemic in Korean. Although clear-cut vowel duration differences are gradually disappearing in many dialects of modern Korean (Sohn, Reference Sohn1999; Lee and Ramsey, Reference Lee and Ramsey2000), Korean speakers should nonetheless be relatively sensitive to vowel length, being able to use this sensitivity to perceive and produce vowel length differences in English.

In contrast, learning closure duration differences should be relatively complex for these learners. Closure duration can only be a reliable cue to word-final consonant voicing if speakers release the stop, thereby signaling to the listener how long their closure duration is. Since final stops in English are often not released, Korean learners of English will have been exposed to this cue less often than to vowel duration (Bell et al., Reference Bell, Jurafsky, Fosler-Lussier, Girand, Gregory and Gildea2003). In addition, because Korean has no released word-final stops, learners may have more difficulty acquiring the distinction in closure duration than the distinction in vowel duration. Thus, we reasoned that learning both to release English word-final stops and to use the appropriate closure duration in producing them should take Korean learners a considerably longer amount of time than learning vowel length differences for English word-final stops.

The current study

The first objective of the current study was to determine whether amount of L2 experience and age of acquisition affected how accurately adult Korean learners of English can produce word-final consonant voicing. The second objective of this study was to determine whether the influence of these two factors differs as a function of the type of phonetic cue for word-final consonant voicing (vowel duration vs. closure duration). Considering previous studies on other aspects of L2 learning (i.e., Stevens, Reference Stevens1999; Birdsong and Molis, Reference Birdsong and Molis2001; Moyer, Reference Moyer2004), we predicted that age of acquisition effects may in fact be more strongly related to L2 production accuracy than L2 experience effects, even for adult L2 learners. To determine how Korean learners’ productions of voiced and voiceless stops in English are affected by amount of their L2 experience and their age of acquisition, we asked English listeners to judge the quality of these productions and also performed acoustic analyses of the same words. By comparing acoustic analyses to native speaker judgments, we sought both to determine whether these two measures yield similar findings and to obtain a more comprehensive picture of L2 learners’ ability to produce word-final stop voicing in English.

Method

Participants

The participants were thirty Korean learners of English and ten native English speakers. Because in this study we wished to examine adult L2 learners who differed in their age of acquisition and amount of L2 experience, none of the thirty Korean learners was younger than eighteen at the time of their arrival in the US (for studies that have documented age of acquisition effects for younger L2 learners, see Flege et al., Reference Flege, MacKay and Meador1999; Baker and Trofimovich, Reference Baker and Trofimovich2005). The thirty learners were divided into three groups, with ten learners in each, depending on their amount of experience with English. The first group (beginning learners) included functionally monolingual Korean speakers with very little experience with English. These learners had arrived in the US at a mean age of twenty-nine (range: 24–33 years), had resided in the US for about three months (range: 1–5 months) and were on average twenty-nine years of age (range: 25–33). The second group (intermediate learners) included learners who arrived in the US at a mean age of twenty-four (range: 18–30 years), had resided in the US for about three years (range: 2.1–3.6) and were on average twenty-eight years of age (range: 22–33). The third group (advanced learners) comprised learners who had arrived in the US at a mean age of twenty-one (range: 18–25 years), resided in the US for about ten years (range: 7–15 years), and were on average thirty-two years of age (range: 28–36 years). Ten native English speakers (NS group), with an average age of twenty-six years, formed a control group.

All participants were students at a major English-speaking university and used English on a daily basis. The Korean participants’ daily language use was verified by asking them to estimate how often they used English in performing several tasks (talking to friends, using English at school, in church, etc.). The beginning learners overall used English less often daily than the advanced learners (F(2,27) = 4.98, p < .025), but the three groups did not differ in their daily use of English at home (22% on average), at school (80% on average) and in interactions with friends (36% on average). This suggested that the Korean participants used English in the US to a similar degree (at least 80% of the time at school) and in similar situations. Although most of the Korean participants had taken English classes in Korea (usually starting at the age of thirteen), they had not been exposed to English spoken natively and had limited knowledge of English before arriving in the US. A summary of the participants’ background information appears in Table 1.

Table 1. Means and Standard Deviations (in parentheses) for participant variables.

note: Chronological age, AOA, LOR are in years; self-ratings are based on a 10-point scale.

Although all participants were adults, the beginning learners were slightly older than both the intermediate and the advanced learners at the time of their arrival in the US (F(2,27) = 15.87, p < .001). The advanced learners were also slightly older than both the intermediate learners and the English speakers at the time of testing (F(3,36) = 6.57, p < .01). These analyses thus confirmed that the learners’ length of residence (LOR), a measure of L2 experience, and their age at the time of arrival (AOA) in the US, a measure of age of acquisition, were confounded in this study (Hakuta, Bialystok and Wiley, Reference Hakuta, Bialystok and Wiley2003; Stevens, Reference Stevens2004; Trofimovich and Baker, Reference Trofimovich and Baker2006). The implications of this are discussed for each analysis conducted.

All participants also rated their proficiency in English and Korean on a 10-point scale (1 = I don't know any English/Korean, 10 = I am a native speaker of English/Korean). The learners estimated their proficiency in Korean at the native-speaker level but differed in their English proficiency (F(3,36) = 119.91, p < .001), with the advanced learners rating themselves as more proficient than the intermediate learners, and the intermediate learners rating themselves as more proficient than the beginning learners. All three groups of learners rated themselves lower in their English proficiency than the group of native English speakers did.

Materials

The stimuli used in this study included two pairs of phonemically contrasting English vowels: /i/–/ɪ/ and /æ/–/ɛ/. These vowels were placed in two phonetic contexts in English monosyllabic CVC words, with one phonetic context containing a voiced and one containing a voiceless stop in final position: (/i/: beat, bead; /ɪ/: bit, bid; /æ/: bat, bad; /ɛ/: bet, bed). The stimuli were produced by a female native English speaker (age: 31) who had minimal experience with a foreign language. The female speaker recorded the eight target words along with ten other CVC words (used here as distractors) which contained the same four (/i/, /ɪ/, /æ/, /ɛ/) and two additional (/u/, /ʊ/) English vowels in the same and other phonetic contexts (e.g., h_t, h_d). (For data pertaining to these vowels and these other phonetic contexts, see Baker and Trofimovich, Reference Baker and Trofimovich2006. The words used here as distractors were book, good, booed, boot, he'd, hid, head, had, hood, who'd.) All recordings were digitized at 16 kHz, normalized for peak intensity, and ramped off during the first and last 15 ms to prevent audible clicks. Prior to the experiment, ten native speakers of English identified the stimuli with 98% accuracy in an open-choice identification task.

Procedure

The participants performed a picture naming task. They were tested individually in a quiet room, using a personal computer and speech presentation software (Smith, Reference Smith1997). In this task, the participants were asked to name black-and-white line drawings, whose names (or descriptions) contained the four target vowels. For example, the word bed was used to name the drawing depicting a bed. Because some drawings did not unambiguously depict the objects to be named, the participants were familiarized with the intended words during the study phase. In the study phase, the participants twice viewed each drawing and repeated the intended word recorded by the female native English speaker. The intended word was presented first in a sentence, which provided the necessary context, and then in citation form. These sentences (e.g., He bid seven dollars, describing an image of several dollar bills) provided context to help participants remember the more abstract words such as bid and bet. The participants had no less difficulty remembering these words than the more concrete words with unambiguous labels (e.g., bat). In the test phase, which contained two blocks of twenty-four randomized trials, the participants attempted to spontaneously name the drawings by labeling them with the appropriate words. When no response was given, the expected word was played via headphones and the participant repeated it. This accounted for less than 1% of the total productions. Only spontaneous productions from the test phase were used in subsequent analyses.

The version of a picture naming task used in this study, one that included an auditory model for the learners to repeat, has been used to elicit production data in previous studies of L2 phonological learning (e.g., Tsukada et al., Reference Tsukada, Birdsong, Mack, Sung, Bialystok and Flege2004). This task allows for eliciting fluent speech while avoiding reading (a potentially confounding factor) as part of the task. This task was deemed appropriate for this study because it familiarized the participants (in both the study phase and during the first block of trials) with the words to be used in describing the pictures and thus ensured that the participants produced identical and therefore maximally comparable speech samples. Because only spontaneous productions from the test phase were used in all subsequent analyses, the likelihood of the participants’ direct mimicry of the auditory models played was minimized. Although we could have elicited the target words in spontaneous speech or in a more naturalistic task, doing so would have increased the likelihood that speakers would produce unreleased stops, making it impossible for us to measure closure duration. Indeed, the more natural the task, the less likely speakers will release word-final stops (Wolfram, Reference Wolfram1969). Because we hoped to see how learners produced the two cues to word-final consonant voicing (closure duration, vowel duration), if in fact they could produce them, we chose a task where they would be most likely to do so.

The participants' responses were recorded using a Shure unidimensional head-mounted microphone (SM10A) and Sony DAT tape recorder (TCD-D8). The last token of each stimulus word which was spontaneously produced was selected from the recordings of each of the forty participants. As with the test stimuli, the selected words were excised from the recording, ramped off during the first and last 15 ms to eliminate audible clicks and normalized for peak intensity.

Perception task

The 320 recorded word tokens (8 words × 4 groups × 10 participants), along with the entire set of recorded distractor words, were randomized and re-recorded onto a high-quality audiotape, with each stimulus presented with a 4-second interval. This audiotape was subsequently played to ten listeners (aged: 18–27) for identification. All listeners were functionally monolingual speakers of English who grew up in monolingual English homes. During listening sessions, which took place in a language laboratory in small groups, the listeners were instructed to listen to each word token (heard over individual headsets) and to choose on an answer sheet one of four response alternatives. For the target words containing /i/ and /ɪ/, the response options were always the same: bid, bit, bead and beat (printed in one of four random orders). Similarly, for the target words containing /æ/ and /ɛ/, the response options were also the same: bed, bet, bad and bat (again, printed in one of four random orders). That is, for each target word (e.g., bed), listeners always had one correct response alternative (bed) along with three other choices: bet (correct vowel, wrong voicing), bad (wrong vowel, correct voicing), bat (wrong vowel, wrong voicing). Using these four response alternatives permitted us to see whether mispronunciations of target words were due to incorrect production of the vowel, the final consonant, or both. During the listening sessions, the listeners were not told which words were produced nor what vowels or final consonants were intended. The listeners were instructed to base their judgments on the words that were actually spoken, not the words that they thought the speaker may have intended.

Overall accuracy

For all statistical tests, the alpha level for significance was set at .05. The reported effect sizes are partial eta squared (η2p), calculated by dividing the effect sum of squares by the effect sum of squares plus the error sum of squares. A Bonferroni procedure was applied to adjust the alpha level for tests of simple main effects. All correlations are based on two-tailed distributions.

Our first analysis focused on overall production accuracy. In this analysis, we compared the participants’ word production scores, defined as the number of listeners (out of 10) who identified each word as its intended target. We derived two production scores for each participant: one for the voiced context (a mean for bead, bid, bad, bed) and one for the voiceless context (a mean for beat, bit, bat, bet). These scores were submitted to a two-way repeated measures analysis of variance (ANOVA) with group (4) as a between-subjects factor and voicing (2) as a within-subjects factor. This analysis yielded only a significant main effect of group (F(3,36) = 28.39, p < .0001, η2p = .70), with no significant main effect of voicing (F(1,36) = .05, p = .82, η2p = .001) and no significant two-way interaction (F(3,36) = 2.11, p = .12, η2p = .15). Tukey Honestly Significant Difference or HSD post hoc tests showed that the scores for the NS group were significantly higher than those for the three learner groups (p < .0001). In addition, the scores for the advanced group were higher than the scores for the beginning group (p = .012), although neither group's scores differed from those for the intermediate group. Word production scores are plotted for each group in Figure 1.

Figure 1. Mean word production scores in the voiced and voiceless context for the three learner groups and the group of native English speakers. Brackets enclose 2 SEs (Standard Errors).

To determine if the effect of L2 experience (suggested by the ANOVA) was independent from the effect of age of acquisition (AOA), a first-order partial correlation was computed between the Korean learners’ production scores (n = 30), pooled over voiced and voiceless contexts, and amount of L2 experience, defined here as length of residence, with AOA partialled out. This analysis yielded a non-significant correlation between LOR and production scores after AOA was partialled out (r(27) = .07, n.s.). However, the correlation between production scores and AOA remained significant after LOR was partialled out (r(27) = −.38, p = .043) (the older the learner upon arrival, the lower the production score). This result indicates that the learners’ ability to produce the target words accurately did not seem to depend on their L2 experience (indexed as LOR) but instead was related to the age at which they were exposed to English in the US.

Voicing confusions

Our next analysis centered on the participants’ production of stop consonant voicing, the focus of this study. We calculated the frequency with which the 10 listeners misidentified each intended target because of voicing confusions. That is, for each participant, we computed the number of listeners (out of 10) who misidentified each word token as having the correct vowel but the wrong voicing. As before, we calculated two scores for each participant: one for the voiced context (bead, bid, bad, bed misidentified as beat, bit, bat, bet, respectively) and one for the voiceless context (beat, bit, bat, bet misidentified as bead, bid, bad, bed, respectively). Overall, voicing confusions accounted for up to 20% of all errors in listener identification. The remainder of errors was due to vowel substitution (bad heard as bed) or due to both vowel substitution and voicing confusion (bad heard as bet). Voicing confusion scores for each group appear in Figure 2.

Figure 2. Mean voicing confusion scores in the voiced and voiceless context for the three learner groups and the group of native English speakers. Brackets enclose 2 SEs.

Voicing confusion scores were submitted to a similar two-way group (4) × voicing (2) repeated measures ANOVA which revealed a significant main effect of group (F(3,36) = 7.88, p < .001, η2p = .40), and a significant two-way interaction (F(3,36) = 3.84, p = .017, η2p = .24), but no significant main effect of voicing (F(1,36) = .14, p = .71, η2p = .004). Tests of simple main effects, used to explore the significant interaction, revealed that in the voiced context the NS group had significantly fewer voicing confusions than the three learner groups (p < .005), which did not differ from one another. In the voiceless context, however, both the NS and the advanced groups had significantly fewer voicing confusions than the other two learner groups (p < .008). Thus, it seemed that the learners with ten years of experience might have learned to produce English voiceless (but not voiced) stops accurately. These results, however, are only tentative without examining the confounding factor of AOA.

Again, to determine if this effect of L2 experience was independent from the effect of AOA, first-order partial correlations were computed between the learners’ voicing confusion scores (separately for voiced and voiceless contexts, n = 30) and LOR, with AOA partialled out. After partialling out AOA, the correlations between LOR and voicing confusion scores were not significant in either context (rs(27) = −.19 to −.13, n.s.). After partialling out LOR, the correlation between AOA and voicing confusion scores was non-significant in the voiceless context (r(27) = .10, n.s.); however, it remained significant in the voiced context (r(27) = .36, p = .05) (the older the learner upon arrival, the more voicing confusions). This suggests that the learners’ ability to produce English stops accurately (at least in the voiced context) was related more to AOA than to LOR.

To sum up, the results of the listener perception task showed some differences among the learner groups in the accuracy of their productions of word-final stops. However, it was age of acquisition (indexed here by AOA), not amount of experience (indexed here by LOR), that appeared to explain these differences among the learner groups.

Acoustic measurements

Although revealing, the findings reported thus far do not clarify which of the two cues to word-final stop consonant voicing (vowel duration, closure duration) the learners were able to exploit in their production of English stops. Therefore, we carried out three acoustic analyses to determine how accurately the learners mastered these two cues. We counted the number of released voiced and voiceless word-final stops and performed vowel duration and closure duration measurements for each of the 8 target words spoken by the 30 Korean learners and 10 English speakers (320 tokens).Footnote 1 Acoustic measurements were taken by hand directly from the waveform display of ESPS/+WAVES speech software. Vowel duration was measured between two cursors placed to demarcate the beginning of periodicity for the vowel following the release of the word-initial /b/ and the end of periodicity prior to the word-final stop. Closure duration was measured (when present) between two cursors placed to demarcate the end of periodicity for the vowel and the release burst for the stop.

Vowel duration

Our first analysis here examined whether the learners produced a distinction in vowel length before voiced versus voiceless word-final stops. For each participant, we computed two measurements: one for the voiced context (a mean for bead, bid, bad, bed) and one for the voiceless context (a mean for beat, bit, bat, bet). Vowel durations in both contexts are shown for each group in Table 2. These measurements were submitted to a two-way group (4) × voicing (2) repeated measures ANOVA which yielded only a significant main effect of voicing (F(1,36) = 229.89, p < .001, η2p = .87), with no significant main effect of group (F(3,36) = 1.16, p = .34, η2p = .09) and no significant two-way interaction (F(3,36) = .64, p = .59, η2p = .05). This pattern of findings indicated that vowels were longer in the voiced than in the voiceless context and that, within each context, vowel durations did not differ among the four groups.Footnote 2 Apparently, even the learners with the least amount of L2 experience (less than one year) could produce vowel durations before voiced and voiceless stops in a native-like manner, although again these results are speculative without an examination of the influence of AOA.

Table 2. Vowel and closure durations (ms) in the voiced and voiceless contexts (Standard Errors appear in parentheses).

As in the previous analyses, we computed first-order partial correlations between vowel duration measurements (in voiced and voiceless contexts) and the learners’ AOA and LOR (n = 30). After partialling out AOA or LOR from these relationships, we found no significant correlations (rs(27) = −.23 to .04, n.s.). This suggested that production of vowel duration was unrelated to LOR or AOA. Thus, all learners, regardless of amount of their L2 experience or age of acquisition, were able to make a vowel duration distinction before voiced and voiceless final stops.

Released stops

Our next analysis focused on the number of released word-final stops (stops with a release burst) produced by each participant group. Being able to use closure duration as a cue to consonant voicing implies that a speaker produces a released stop. Therefore, we calculated for each participant the number of released stops in the voiced and voiceless context (out of four possible per context). A stop was considered to be released if a burst was clearly visible on a waveform display and also clearly audible on the recording. The frequency rates of released word-final stops appear in Table 3.

Table 3. Number and percent of released word-final stops in the voiced and voiceless contexts (Standard Errors appear in parentheses).

These frequencies were submitted to a two-way group (4) × voicing (2) repeated measures ANOVA which yielded a significant main effect of voicing (F(1,36) = 6.94, p = .012, η2p = .16), and a significant main effect of group (F(3,36) = 4.05, p = .014, η2p = .25) but no significant two-way interaction (F(3,36) = .87, p = .47, η2p = .07). The significant main effect of voicing showed that more released word-final stops were produced in the voiceless than in the voiced context. Notably, the native speakers released word-final stops in both contexts 100% of the time, a finding that we revisit below. Tukey HSD post hoc tests, used to explore the significant main effect of group, showed that only the learners with the least amount of experience (less than one year) produced fewer released stops than the native speakers did (p < .007). Before examining the influence of AOA on these results, it appears therefore that with about three years of L2 experience the learners were able to produce released word-final stops at a native speaker rate. As before, we computed first-order partial correlations between frequencies of released stops (separately in each context) as well as AOA and LOR (n = 30). After partialling out AOA or LOR from each relationship, we found no significant associations (rs(27) = −.03 to .21, n.s.). This suggested that production of released word-final stops was unrelated to LOR or AOA.

Closure duration

We next examined whether participants produced a distinction in closure duration for voiced and voiceless stops. Closure duration measurements were obtained only for the word-final stops that were released (since only released stops would have a measurable closure duration): 68% (54/80) of all word tokens for the beginning group, 85% (68/80) for the intermediate, 88% (70/80) for the advanced, and 100% (80/80) for the NS group. As before, we computed two measurements for each participant: one for the voiced context (a mean for bead, bid, bad, bed) and one for the voiceless context (a mean for beat, bit, bat, bet). These closure duration values (shown in Table 2) were submitted to a two-way group (4) × voicing (2) repeated measures ANOVA which revealed a significant main effect of voicing (F(1,36) = 146.38, p < .0001, η2p = .80), and a significant main effect of group (F(3,36) = 13.99, p < .0001, η2p = .54) but no significant two-way interaction (F(3,36) = .20, p = .90, η2p = .02). The significant main effect of voicing indicated that closure durations were longer in the voiceless than in the voiced context. Tukey HSD post hoc tests exploring the significant main effect of group showed that, regardless of context, the three learner groups produced significantly longer closure durations (i.e., more typical of voiceless stops) than the NS group did (p < .035). The advanced learners’ closure durations were shorter than those of the beginning learners (p = .007), although neither group differed significantly from the intermediate group. Apparently, learning to produce closure duration in a native-like manner was a difficult task for all learners, even after approximately ten years of L2 experience.

As in the previous analyses, we computed first-order partial correlations between closure duration measurements (in voiced and voiceless contexts) as well as AOA and LOR (n = 30). After partialling out AOA, we found no significant correlations between closure duration and LOR (rs(27) = −.23 to .12, n.s.). However, when we partialled out LOR, the correlations between closure duration and AOA remained significant in the voiced context (r(27) = .49, p = .007), and in the voiceless context (r(27) = .48, p = .007) (the older the learner upon arrival, the longer, or less native-like, the closure durations). This result suggested that, as with listener judgments analyses, production of closure durations was associated with AOA, not LOR.

Acoustic values as predictors of native English listener judgments

The acoustic analyses presented thus far established that the L2 learners mastered at least some aspects of word-final stops in English. Overall, the two main cues to consonant voicing were easier to produce for voiceless than voiced stops. Differences in vowel duration were also easier to acquire than differences in closure duration. What these analyses did not establish, however, is whether and how the examined cues to consonant voicing (vowel duration, closure duration, frequency of released stops) contribute to native English listeners’ judgments of how English word-final stops are produced. Although most English speakers seem to rely on vowel duration as a cue to word-final consonant voicing (Raphael, Reference Raphael1972; Elsendoorn, Reference Elsendoorn1984), it is impossible to know whether our listeners based their accuracy judgments in the perception task on differences in vowel or closure duration, or whether they used both these cues or perhaps neither. Therefore, to determine which cues to consonant voicing affected listener judgments, we submitted all acoustic measurements and listener-based word production scores to correlation and regression analyses.

We first computed zero-order correlations among listener-based word production scores and the three acoustic measurements (vowel duration, closure duration, frequency of released stops) for the three learner groups (n = 30) separately in the voiced and voiceless context. This analysis showed that some acoustic measurements were significantly correlated with word production scores and with one another, especially in the voiceless context (Table 4).

Table 4. Summary of correlation analyses among acoustic measurements and listener-based word production scores.

note: *p < .05, **p < .01, ***p < .001.

Because there existed some strong associations among the acoustic measurements, which resulted in multi-collinearity, and because our dataset was not sufficiently large (n = 30), a multiple regression analysis with listener-based word production scores as the criterion measure and all three acoustic measurements as predictors was not possible. Instead, the three acoustic measurements were individually regressed on the listener-based word production scores, separately for the voiced and voiceless context. The goal of these analyses was to estimate the degree to which each of the three acoustic measurements examined here predicted listener-based word production scores. These six separate linear regression analyses (three per context) allowed for determining the amount of variance that each of the acoustic measurements (vowel duration, closure duration, frequency of released stops) shared with listener-based word production scores.

A summary of three statistically significant regression models (one for the voiced context, and two for the voiceless context) appears in Table 5. Each model represents a predictive relationship between listener-based word production scores (criterion measure) and the three acoustic measurements (predictor variables), with beta values (B, β) used as parameters and t values testing the significance of these parameters. Of particular interest here is the metric of the goodness-of-fit of each model (Adjusted R 2) which represents the proportion of the variation in listener-based word production scores that can be explained by each predictor variable. In the voiced context, closure duration explained about 16% of variance in word production scores (the shorter the closure duration, the higher the score). In the voiceless context, both vowel duration and frequency of released stops significantly predicted word production scores: these variables explained about 21% (the shorter the vowel duration, the higher the score) and 47% (the more word-final stops released, the higher the score) of the variance in word production scores.

Table 5. Summary of regression analyses for acoustic measurements as predictors of listener-based production scores.

note: *p < .05, **p < .01, ***p < .001.

Discussion

The objective of the current study was to determine whether age of acquisition and amount of L2 experience affected the ability of adult Korean learners of English to produce English final consonant voicing, and whether the influence of these two factors depended on the phonetic cue (vowel duration or closure duration). Our results showed that production of word-final stops in English indeed depended on age of acquisition and amount of L2 experience and posed a considerable problem for the Korean learners.

Effects of L2 experience and age

Results of listener-based production scores indicated that even after about ten years of L2 experience, the learners’ overall word production accuracy and their production accuracy with respect to word-final consonant voicing (at least in the voiced context) were significantly lower than the native speakers’ production accuracy. In addition, correlation and regression analyses showed that both cues to word-final consonant voicing contributed to predicting native English listeners’ judgments of the learners’ word production accuracy. In the voiced context, closure duration appeared to be a significant predictor; in the voiceless context, both vowel duration and frequency of released stops emerged as significant predictors. This result suggested the importance of both cues to learning word-final stop consonant voicing in English. Moreover, acoustic analyses of the two cues to stop consonant voicing indicated that these cues presented different degrees of difficulty for the learners. Vowel duration was much easier for the learners to acquire than closure duration. All learners, even those with less than one year of L2 experience, produced vowel duration differences before voiced and voiceless stops in a native-like manner. However, even after ten years of L2 experience, the learners were not native-like in their production of word-final stop closure duration in English.

Most importantly, however, the age of learners’ exposure to the L2 in the US was a stronger predictor of accuracy than was amount of their L2 experience. In particular, this study yielded evidence that some aspects of the voicing distinction in English appeared to be related more to the age at which the learners were exposed to English in the US than to the amount of their L2 experience. In other words, the learners’ age at the time of L2 exposure (range: 18–33 years), not the amount of their L2 experience (range: 1 month–15 years), was associated with listener-based word production scores and acoustic measurements of closure durations. The adult learners in this study who arrived in the US in their early twenties tended to produce CVC words with word-final stops more accurately and tended to produce closure durations which were more typical of native speaker values than the learners who arrived in the US in their late twenties and early thirties. These effects were manifested in both listener judgments and acoustic analyses, demonstrating the pervasive nature of this finding.

Although age effects have been documented frequently in studies of L2 phonological learning by child learners (e.g., Flege et al., Reference Flege, MacKay and Meador1999), such effects are relatively uncommon in investigations of adult L2 learning (Birdsong and Molis, Reference Birdsong and Molis2001; Trofimovich and Baker, Reference Trofimovich and Baker2006). To the best of our knowledge, this study is the first to document post-critical period age of acquisition effects for a specific L2 phonetic feature. This finding suggests an age-related decline in language functioning even after puberty, not just within the putative critical or sensitive period for learning an L2 (Hakuta et al., Reference Hakuta, Bialystok and Wiley2003; for review, see Bialystok and Hakuta, Reference Bialystok, Hakuta and Birdsong1999). Such an age-related decline could be due to a number of cognitive and social factors that correlate with an individual's age: memory capacity and processing speed (Rabinowitz, Ornstein, Folds-Bennett and Schneider, Reference Rabinowitz, Ornstein, Folds-Bennett and Schneider1994), patterns of language socialization and use (Jia and Aaronson, Reference Jia and Aaronson2003) and/or amount of formal schooling (Flege et al., Reference Flege, MacKay and Meador1999). These and other factors (i.e., Moyer, Reference Moyer2004), and their influence on adult L2 phonological learning, need to be examined in future research.

What this finding does suggest is that differences in age of acquisition may also account for conflicting findings in studies examining L2 experience effects. That is, researchers may need to control for age of acquisition effects, even for adult L2 learners, in order to be able to determine whether and to what extent learners improve in their L2 perception and production as a function of L2 experience. In fact, in this study if we had not examined age of acquisition as a factor separately from L2 experience, we would have been compelled to conclude that amount of experience did in fact influence the learning of English word-final consonant voicing. Instead, we found that age of acquisition, not amount of L2 experience, predicted learners’ accuracy for this L2 feature. Even ten years of residence in the US was not enough experience for the adult learners to overcome age of acquisition effects. Baker and Trofimovich (Reference Baker and Trofimovich2005) recently showed that, with more L2 experience, adult Korean learners of English improved in their production of only some L2 vowels (i.e., only those English vowels that did not have similar counterparts in Korean). By contrast, child L2 learners were able to improve in their production of all vowels examined in that study. Combined with the results of this study, these findings suggest that amount of L2 experience may play a much less significant role for adult than for child L2 learners. Certainly, more research is needed to disentangle the effects of L2 experience and age of acquisition in both child and adult L2 phonological learning.

Effect of phonetic cue

In the introduction to this study, we argued that the effects of age of acquisition and amount of L2 experience may manifest themselves differently not only for specific aspects of the L2, such as distinct segmental or suprasegmental targets (Bohn and Flege, Reference Bohn and Flege1992; Trofimovich and Baker, Reference Trofimovich and Baker2006, Reference Trofimovich and Baker2007), but also for different phonetic cues making up the same target. If this were indeed the case, we reasoned, then it would be possible to explain (and perhaps also to predict) with greater precision why certain aspects of the L2 are more learnable than others, especially by younger learners or by learners with more L2 experience. The findings of this study, which examined the acquisition of one L2 target (stop consonant voicing) cued by two phonetic distinctions (vowel duration, closure duration), yielded some support for this hypothesis. Overall, the findings indicated that some characteristics of a single phonetic target may be more learnable than others.

Whether or not differences in L2 learning demonstrated in this study are traceable to age of acquisition or L2 experience, there are two findings relevant to this issue. The first finding is that word-final stop consonant voicing in English appeared to be more learnable in the voiceless than in the voiced context. Our analyses of listener-based voicing confusions indicated that the Korean learners with about ten years of L2 experience regardless of their age of acquisition were native-like in the voiceless context but performed significantly more poorly than the native English speakers in the voiced context. Our acoustic measurements also showed that all learners produced a greater number of released word-final stops in the voiceless than in the voiced context. Most likely, the voiceless context was inherently easier for the learners than the voiced context because Korean word-final, pre-pausal stops are always voiceless (Sohn, Reference Sohn1999; Lee and Ramsey, Reference Lee and Ramsey2000), although more research with learners from another language background could clarify this issue. It is possible, for example, that more voiceless than voiced stops would be released because longer closure durations usually relate to more intense release bursts (see Sundara, Reference Sundara2005). Further research examining release bursts in L2 learners’ speech will be helpful in understanding the relative importance of this feature.

Nonetheless, this initial cross-language similarity between English and Korean did not seem to trigger a complete transfer of accurate voicing production from Korean to English. Indeed, only the learners who arrived in the US at around the age of twenty and had resided there for about ten years were able to produce word-final voiceless stops with native-like accuracy. Although cross-language similarity may help learners initially, for example, by making them aware of how their L1 relates to their L2, the learning of a particular L2 target to native-like mastery (notably, in a single phonetic context) might require not just extensive L2 experience (about ten years) but perhaps explicit training accompanied by intensive, daily L2 practice.

The second finding is that vowel duration, as a cue to stop consonant voicing, was easier for the learners to acquire than closure duration. In fact, even the learners with the least amount of L2 experience (less than one year) and the highest age of acquisition (over age thirty) in this study produced a distinction in vowel length in a native-like manner. There are at least three factors that would make vowel duration easier to learn than closure duration. One factor is related to the status of vowel length in Korean. Although clear-cut vowel duration differences are gradually disappearing in many dialects of modern Korean, such differences can still be observed for a few vowel pairs (Sohn, Reference Sohn1999; Lee and Ramsey, Reference Lee and Ramsey2000). Vowel duration thus appears to be readily available for a Korean learner to signal voicing distinctions in English, a cue that can be learned with a minimal amount of L2 experience and by older learners. In this regard, Korean learners of English are similar to native English speakers learning vowel length distinctions in Swedish (McAllister, Flege and Piske, Reference McAllister, Flege and Piske2002). Although vowel length does not signal phonemic distinctions in English, it can sometimes be used by listeners in vowel identification, for example, to determine if /æ/ or /ɛ/ is produced (Whalen, Reference Whalen1989), and in making contrasts such as those discussed in this study. Thus, the Korean learners in this study, just as the English speakers learning Swedish in the McAllister et al. study, could use their sensitivity to vowel durations in their L1 to help them learn vowel length distinctions in their L2. In other words, the Korean speakers in this study and the English speakers in the McAllister et al. study most likely were able to rely on their knowledge of how vowel duration is used at the phonetic level in their L1s to enable them to learn vowel duration differences in their L2s.

Another factor which likely contributed to making vowel duration easier to learn than closure duration is that vowels, as continuants, may be more perceptually salient than consonants, leading learners to acquire a vowel-based cue before a consonant-based cue (see Collins, Trofimovich, White, Cardoso and Horst, in press, and DeKeyser, Salaberry, Robinson and Harrington, Reference DeKeyser, Salaberry, Robinson and Harrington2002, for a discussion of the role of saliency in L2 learning). For example, Bohn (Reference Bohn and Strange1995) found that speakers of German acquired durational differences more easily than spectral differences (differences in formant frequencies) when learning English vowels. Bohn speculated that this finding was due to differences in cue saliency, since distinctions in vowel duration were easier to perceive and therefore easier to acquire than distinctions in spectral qualities of vowels. Saliency (which we here equate broadly with perceptibility) thus appears to be an important factor determining how susceptible an L2 feature is to effects of experience (and perhaps age of acquisition effects as well) and, consequently, how quickly this feature is learned (see Goldschneider and DeKeyser, Reference Goldschneider and DeKeyser2001). This conclusion, however, must remain speculative until a usable metric of phonetic saliency is developed and validated in future research.

Yet another factor that could render vowel duration as a more learnable cue than closure duration relates to the frequency of occurrence of released word-final stops in the speech of native English speakers. In English, word-final stops are produced with a released burst in a variable manner, determined by a variety of phonetic factors, such as the vowel preceding the stop or its place of articulation (e.g., Lisker, Reference Lisker1999), and sociolinguistic variables, including the speaker's gender, speaking style and dialect (e.g., Bond and Moore, Reference Bond and Moore1994; Byrd, Reference Byrd1994). Although word-final stops in English are frequently released (e.g., Bent and Bradlow, Reference Bent and Bradlow2003; Tsukada et al., Reference Tsukada, Birdsong, Mack, Sung, Bialystok and Flege2004), as were, in fact, all stops produced by the native speakers in this study, the learners might not have been consistently exposed to high frequencies of released word-final stops throughout their experience with English (see Tsukada et al., Reference Tsukada, Birdsong, Mack, Sung, Bialystok and Flege2004). Because of this variable nature of word-final stops, the learners may not have received highly frequent, clear and reliable evidence of how closure duration is used as a cue to stop consonant voicing (see Holt and Lotto, Reference Holt and Lotto2006, for a discussion of cue weighting in speech categorization). As a result, closure duration proved difficult for learners to acquire.

Concluding remarks

The findings of the current study, which focused on the role of experience but yielded evidence of age of acquisition effects in L2 phonological learning, prompt at least four broad conclusions. The first of these conclusions is that learning word-final stop consonant voicing in English is a complex task for L2 learners from many language backgrounds (e.g., Flege et al., Reference Flege, Munro and Skelton1992; Yavas, Reference Yavas1997). This task may require extensive amounts of L2 experience because word-final stop consonant voicing is cued by several phonetic distinctions. The results of this study extend previous investigations of word-final stop consonant learning by showing that a distinction in closure duration might make this aspect of English particularly hard to acquire. The second broad conclusion prompted by the findings of this study is that native English listeners seem to rely on several cues to word-final consonant voicing (vowel duration, closure duration, release burst of a word-final stop) in their perceptual judgments of word production accuracy. This finding could have important implications for L2 teaching, as several phonetic cues may need to be targeted in teaching, most likely through explicit instruction, for learners to be able to produce voicing distinctions accurately and for English listeners to avoid misperception of word-final voicing. A third broad conclusion that emerges from this study is that L2 experience and age of acquisition effects may manifest themselves differently not only for individual segmental or suprasegmental targets but also for different phonetic cues making up the same target. A segmental target in the L2 (word-final stop voicing) can be acquired in one context (voiceless) but not in another (voiced), and one cue to such a target (vowel duration) can be acquired more easily than another (closure duration). The final conclusion is that post-puberty age of acquisition effects might be a stronger predictor of L2 segmental accuracy than amount of experience, even for adult L2 learners. Overall, these findings show the complex nature of L2 phonological learning and suggest possible reasons why some L2 targets, whether segmental or suprasegmental, are more learnable than others.

Footnotes

*

This research was partially supported by research grants from the University of Illinois at Urbana-Champaign and Brigham Young University. An earlier version of this paper was presented at the 2002 meeting of the Acoustical Society of America, Cancun, Mexico. The author gratefully acknowledges Pavel Trofimovich and three anonymous reviewers for their helpful suggestions on earlier drafts of this paper.

1 An anonymous reviewer suggested that a ratio of vowel duration to closure duration might be a more accurate measure of voicing. We computed such ratios for all participants and compared them across the participant groups. The findings of this analysis were identical to the results of our separate analyses of vowel duration and closure duration. Because our intention here was to examine these two cues individually, we report separate analyses for vowel duration and closure duration.

2 Because in this analysis vowel duration was compared between the voiced and the voiceless context, both tense and lax vowels were included in each context. To ensure that the tense/lax status of vowels did not influence any of our findings, we compared vowel durations for tense and lax vowels. Although for all groups of participants tense vowels were longer in duration than lax vowels (Fs(1,36) > 9.84, ps < .003), the tense/lax status of vowels did not interact with the group variable (ps > .09).

References

Baker, W. & Trofimovich, P. (2005). Interaction of native- and second-language vowel system(s) in early and late bilinguals. Language and Speech, 48, 127.CrossRefGoogle ScholarPubMed
Baker, W. & Trofimovich, P. (2006). Perceptual paths to accurate production of L2 vowels: The role of individual differences. International Review of Applied Linguistics, 44, 231250.CrossRefGoogle Scholar
Bell, A., Jurafsky, D., Fosler-Lussier, E., Girand, C., Gregory, M. & Gildea, D. (2003). Effects of disfluencies, predictability, and utterance position on word form variation in English conversation. Journal of the Acoustical Society of America, 113, 10011024.CrossRefGoogle ScholarPubMed
Bent, T. & Bradlow, A. (2003). The interlanguage speech intelligibility benefit. Journal of the Acoustical Society of America, 114, 16001610.CrossRefGoogle ScholarPubMed
Bialystok, E. & Hakuta, K. (1999). Confounded age: Linguistic and cognitive factors in age differences for second language acquisition. In Birdsong, D. (ed.), Second language acquisition and the critical period hypothesis, pp. 161181. Mahwah, NJ: Erlbaum.Google Scholar
Birdsong, D. & Molis, M. (2001). On the evidence for maturational constraints in second language acquisition. Journal of Memory and Language, 44, 235249.CrossRefGoogle Scholar
Bohn, O.-S. (1995). Cross-language speech perception in adults: First language transfer doesn't tell it all. In Strange, W. (ed.), Speech perception and linguistic experience: Issues in cross-language research, pp. 279304. Timonium, MD: York Press.Google Scholar
Bohn, O.-S. & Flege, J. E. (1990). Interlingual identification and the role of foreign language experience in L2 vowel perception. Applied Psycholinguistics, 11, 303328.CrossRefGoogle Scholar
Bohn, O.-S. & Flege, J. E. (1992). The production of new and similar vowels by adult German learners of English. Studies in Second Language Acquisition, 14, 131158.CrossRefGoogle Scholar
Bond, Z. S. & Moore, T. J. (1994). A note on the acoustic–phonetic characteristics of inadvertently clear speech. Speech Communication, 14, 324337.CrossRefGoogle Scholar
Byrd, D. (1994). Relations of sex and dialect to reduction. Speech Communication, 15, 3954.CrossRefGoogle Scholar
Cebrian, J. (2006). Experience and the use of non-native duration in L2 vowel categorization. Journal of Phonetics, 34, 372387.CrossRefGoogle Scholar
Chen, M. (1970). Vowel length variation as a function of the voicing of the consonant environment. Phonetica, 22, 129159.CrossRefGoogle Scholar
Chiswick, B. R. & Miller, P. W. (2008). A test of the Critical Period Hypothesis for language learning. Journal of Multilingual and Multicultural Development, 29, 1629.CrossRefGoogle Scholar
Cho, T., Jun, S.-A. & Ladefoged, P. (2002). Acoustic and aerodynamic correlates of Korean stops and fricatives. Journal of Phonetics, 30, 193228.CrossRefGoogle Scholar
Collins, L., Trofimovich, P., White, J., Cardoso, W. & Horst, M. in press. Some input on the easy/difficult grammar question. The Modern Language Journal.Google Scholar
Crowther, C. S. & Mann, V. (1992). Native language factors affecting use of vocalic cues to final consonant voicing in English. Journal of the Acoustical Society of America, 92, 711722.CrossRefGoogle ScholarPubMed
DeKeyser, R., Salaberry, R., Robinson, P. & Harrington, M. (2002). What gets processed in processing instruction? A commentary on Bill VanPatten's “Processing instruction: An update”. Language Learning, 52, 805823.CrossRefGoogle Scholar
Eckman, F. R. (1977). Markedness and the contrastive analysis hypothesis. Language Learning, 27, 315330.CrossRefGoogle Scholar
Eckman, F. R. (1981). On predicting phonological difficulty in second language acquisition. Studies in Second Language Acquisition, 4, 1830.CrossRefGoogle Scholar
Elsendoorn, B. (1984). Tolerances of durational properties in British English vowels. Unpublished PhD dissertation, University of Utrecht.Google Scholar
Flege, J. E. (1988). Factors affecting degree of perceived foreign accent in English sentences. Journal of the Acoustical Society of America, 84, 7079.CrossRefGoogle ScholarPubMed
Flege, J. E. (1993). Production and perception of a novel second-language phonetic contrast. Journal of the Acoustical Society of America, 93, 15891607.CrossRefGoogle ScholarPubMed
Flege, J. E., Birdsong, D., Bialystok, E., Mack, M., Sung, H. & Tsukada, K. (2006). Degree of foreign accent in English sentences produced by Korean children and adults. Journal of Phonetics, 34, 153175.CrossRefGoogle Scholar
Flege, J. E., Bohn, O.-S. & Jang, S. (1997). Effects of experience on non-native speakers’ production of perception of English vowels. Journal of Phonetics, 25, 437470.CrossRefGoogle Scholar
Flege, J. E., MacKay, I. R. A. & Meador, D. (1999). Native Italian speakers’ production and perception of English vowels. Journal of Acoustical Society of America, 106, 29732987.CrossRefGoogle ScholarPubMed
Flege, J. E., Munro, M. J. & Skelton, L. (1992). Production of the word-final English /t/–/d/ contrast by native speakers of English, Mandarin, and Spanish. Journal of the Acoustical Society of America, 92, 128143.CrossRefGoogle Scholar
Flege, J. E. & Port, R. (1981). Transfer and developmental processes in adult foreign language speech production. Applied Psycholinguistics, 5, 323347.CrossRefGoogle Scholar
Flege, J. E., Yeni-Komshian, G. H. & Liu, S. (1999). Age constraints on second language learning. Journal of Memory and Language, 41, 78104.CrossRefGoogle Scholar
Goldschneider, J. M. & DeKeyser, R. M. (2001). Explaining the natural order of L2 morpheme acquisition in English: A meta-analysis of multiple determinants. Language Learning, 51, 150.CrossRefGoogle Scholar
Hakuta, K., Bialystok, E. & Wiley, E. (2003). Critical evidence: A test of the Critical-Period Hypothesis for second language acquisition. Psychological Science, 14, 3138.CrossRefGoogle ScholarPubMed
Holt, L. L. & Lotto, A. J. (2006). Cue weighting in auditory categorization: Implications for first and second language acquisition. Journal of the Acoustical Society of America, 119, 30593071.CrossRefGoogle ScholarPubMed
House, A. S. & Fairbanks, G. (1953). The influence of consonant environment upon the secondary acoustical characteristics of vowels. Journal of the Acoustical Society of America, 25, 105113.CrossRefGoogle Scholar
Jia, G. & Aaronson, D. (2003). A longitudinal study of Chinese children and adolescents learning English in the United States. Applied Psycholinguistics, 24, 131161.CrossRefGoogle Scholar
Jurafsky, D., Bell, A., Gregory, M. & Raymond, W. D. (2001). Probabilistic relations between words: Evidence from reduction in lexical production. In Bybee, J. & Hopper, P. (eds), Frequency and the emergence of linguistic structure, pp. 229254. Philadelphia: John Benjamins.CrossRefGoogle Scholar
Kluender, K. R., Diehl, R. L. & Wright, B. A. (1988). Vowel-length differences before voiced and voiceless consonants: An auditory explanation. Journal of Phonetics, 16, 153169.CrossRefGoogle Scholar
Lee, I. & Ramsey, S. R. (2000). The Korean language. Albany: SUNY Press.Google Scholar
Levy, E. S. & Strange, W. (2008). Perception of French vowels by American English adults with and without French language experience. Journal of Phonetics, 36, 141157.CrossRefGoogle Scholar
Lisker, L. (1957). Closure duration and the inter-vocalic voiced–voiceless distinction in English. Language, 33, 4249.CrossRefGoogle Scholar
Lisker, L. (1972). Stop duration and voicing in English. In Valdman, A. (ed.), Papers in linguistics and phonetics to the memory of Pierre Delattre, pp. 339343. Paris: Mouton.CrossRefGoogle Scholar
Lisker, L. (1999). Perceiving final voiceless stops without release: Effects of preceding monophthongs versus nonmonophthongs. Phonetica, 56, 4455.CrossRefGoogle ScholarPubMed
Lowenstein, J. H. & Nittrouer, S. (2008). Patterns of acquisition of native voice onset time in English-learning children. Journal of the Acoustical Society of America, 124, 11801191.CrossRefGoogle ScholarPubMed
Mack, M. (1982). Voicing dependent vowel duration in English and French: Monolingual and bilingual production. Journal of the Acoustical Society of America, 71, 173178.CrossRefGoogle Scholar
Major, R. C. & Faudree, M. C. (1996). Markedness universals and the acquisition of voicing contrasts by Korean speakers of English. Studies in Second Language Acquisition, 18, 6990.CrossRefGoogle Scholar
McAllister, R., Flege, J. E. & Piske, T. (2002). The influence of L1 on the acquisition of Swedish quantity by native speakers of Spanish, English and Estonian. Journal of Phonetics, 30, 229258.CrossRefGoogle Scholar
Moyer, A. (1999). Ultimate attainment in L2 phonology: The critical factors of age, motivation, and instruction. Studies in Second Language Acquisition, 21, 81108.CrossRefGoogle Scholar
Moyer, A. (2004). Age, accent, and experience in second language acquisition: An integrated approach to critical period inquiry. London: Multilingual Matters.CrossRefGoogle Scholar
Nguyen, T. A.-T., Ingram, C. L. J. & Pensalfini, J. R. (2008). Prosodic transfer in Vietnamese acquisition of English contrastive stress patterns. Journal of Phonetics, 36, 158190.CrossRefGoogle Scholar
Nittrouer, S. (2004). The role of temporal and dynamic signal components in the perception of syllabic-final stop voicing by children and adults. Journal of the Acoustical Society of America, 115, 17771790.CrossRefGoogle ScholarPubMed
Oyama, S. (1976). A sensitive period for the acquisition of nonnative phonological system. Journal of Psycholinguistic Research, 5, 261283.CrossRefGoogle Scholar
Park, H. & Kang, S.-H. (2006). A cross-linguistic study of the perception of the voicing contrast in English plosives. Eoneohag, 45, 337.Google Scholar
Piske, T., MacKay, I. R. A. & Flege, J. E. (2001). Factors affecting degree of foreign accent in an L2: A review. Journal of Phonetics, 29, 191215.CrossRefGoogle Scholar
Rabinowitz, M., Ornstein, P. A., Folds-Bennett, T. H. & Schneider, W. (1994). Age-related differences in speed of processing: Unconfounding age and experience. Journal of Experimental Child Psychology, 57, 449459.CrossRefGoogle ScholarPubMed
Raphael, L. (1972). Preceding vowel duration as a cue to the perception of the voicing characteristics of word-final consonants in American English. Journal of the Acoustical Society of America, 51, 12961303.CrossRefGoogle Scholar
Smith, S. C. (1997). UAB Software [Computer software]. Department of Rehabilitation Sciences, University of Alabama at Birmingham.Google Scholar
Sohn, H.-M. (1999). The Korean language. Cambridge: Cambridge University Press.Google Scholar
Stevens, G. (1999). Age at immigration and second language proficiency among foreign-born adults. Language in Society, 28, 555578.CrossRefGoogle Scholar
Stevens, G. (2004). Using census data to test the critical-period hypothesis for second language acquisition. Psychological Science, 15, 215216.CrossRefGoogle ScholarPubMed
Sundara, M. (2005). Acoustic–phonetics of coronal stops: A cross-language study of Canadian English and Canadian French. Journal of the Acoustical Society of America, 118, 10261037.CrossRefGoogle Scholar
Tahta, S., Wood, M. & Loewenthal, K. (1981). Foreign accents: Factors relating to transfer of accent from the first language to a second language. Language and Speech, 24, 265272.CrossRefGoogle Scholar
Trofimovich, P. & Baker, W. (2006). Learning second-language suprasegmentals: Effect of L2 experience on prosody and fluency characteristics of L2 speech. Studies in Second Language Acquisition, 28, 130.CrossRefGoogle Scholar
Trofimovich, P. & Baker, W. (2007). Learning prosody and fluency characteristics of L2 speech: The effect of experience on child learners’ acquisition of five suprasegmentals. Applied Psycholinguistics, 28, 251276.CrossRefGoogle Scholar
Trofimovich, P., Baker, W. & Mack, M. (2001). Context- and experience-based effects on the learning of vowels in a second language. Studies in the Linguistic Sciences, 31, 167186.Google Scholar
Tsukada, K, Birdsong, D., Mack, M., Sung, H., Bialystok, E. & Flege, J. E. (2004). Release bursts in English word-final voiceless stops produced by native English and Korean adults and children. Phonetica, 61, 6783.CrossRefGoogle ScholarPubMed
Whalen, D. H. (1989). Vowel and consonant judgments are not independent when cued by the same information. Perception & Psychophysics, 46, 284292.CrossRefGoogle Scholar
Wolfram, W. (1969). A sociolinguistic description of Detroit Negro speech. Washington, DC: Center for Applied Linguistics.Google Scholar
Yavas, M. (1997). The effects of vowel height and place of articulation in interlanguage final stop devoicing. International Review of Applied Linguistics, 35, 115125.Google Scholar
Figure 0

Table 1. Means and Standard Deviations (in parentheses) for participant variables.

Figure 1

Figure 1. Mean word production scores in the voiced and voiceless context for the three learner groups and the group of native English speakers. Brackets enclose 2 SEs (Standard Errors).

Figure 2

Figure 2. Mean voicing confusion scores in the voiced and voiceless context for the three learner groups and the group of native English speakers. Brackets enclose 2 SEs.

Figure 3

Table 2. Vowel and closure durations (ms) in the voiced and voiceless contexts (Standard Errors appear in parentheses).

Figure 4

Table 3. Number and percent of released word-final stops in the voiced and voiceless contexts (Standard Errors appear in parentheses).

Figure 5

Table 4. Summary of correlation analyses among acoustic measurements and listener-based word production scores.

Figure 6

Table 5. Summary of regression analyses for acoustic measurements as predictors of listener-based production scores.