In the field of second language acquisition (SLA), there is a growing consensus among a range of theoretical perspectives that adult second language (L2) learners’ speech can continue to develop as a function of increased practice and experience (i.e., experience effects), and that the extent to which these learners can eventually improve their L2 performance is strongly tied to their age of acquisition (i.e., age effects; e.g., Flege, Reference Flege2016, for speech learning model). While much discussion in this research area has concerned the acquisition of L2 segmentals (Piske, MacKay, & Flege, Reference Piske, Mackay and Flege2001), a growing number of L2 speech researchers have also examined the underlying mechanisms of L2 fluency development.
The existing literature has extensively worked to illustrate which subconstructs of L2 speech (speed, breakdown, or repair) determine native speakers’ perception of fluency, and what kinds of learner factors are crucial for efficient and effective fluency development (for a comprehensive review, see Segalowitz, Reference Segalowitz2016). Due to the relatively limited quantity and quality of samples used in previous studies, however, little is known about the acoustic correlates of perceived fluency at different proficiency levels, and the role of learner variables (experience and age) in the attainment of various levels of L2 fluency performance.
In the context of 90 adult Japanese learners of English with diverse L2 experience and 10 native speakers (N = 100), this study aimed to examine the specific linguistic characteristics and learner profiles of low-, mid-, and high-level fluency performance. We elucidated which aspects of temporal information (speed, breakdown, and repair) native speakers differentially relied on while assessing the overall fluency of the native and nonnative speech samples. Subsequently, we probed the extent to which these low-, mid- and high-level L2 fluent learners differed in terms of the length of residence (0–18 years) and the age of arrival (19–40 years) to an L2 speaking environment.
BACKGROUND
L2 perceived, utterance, and cognitive fluency
In its broadest sense, fluency, especially in practice, has been considered as equivalent to general oral proficiency (Chambers, Reference Chambers1997). On a more narrow scale, many L2 scholars have focused on which acoustic properties relate to the optimal, smooth, and fluid delivery of L2 speech (utterance fluency), and how these features interact to influence native speakers’ fluency judgments of L2 speech (perceived fluency; Skehan, Reference Skehan2003; Tavakoli & Skehan, Reference Tavakoli, Skehan and Ellis2005). In the existing literature, the components of utterance fluency have been analyzed through three groups of objective measures: (a) breakdown (e.g., the number of filled and unfilled pauses between and within clauses); (b) speed (e.g., the number of pruned syllables uttered per minute); and (c) repair (e.g., the number of repetitions and self-corrections; Bosker, Pinget, Quené, Sanders, & de Jong, Reference Bosker, Pinget, Quene, Sanders and de Jong2013). As summarized in Table 1, it has been generally shown that native speakers’ fluency judgments can be mainly associated with the breakdown and speed measures, and, to a much lesser degree, linked to the repair measures.Footnote 1
Table 1. Summary of five major L2 fluency studies examining the relationship between perceived and utterance fluency
![](https://static.cambridge.org/binary/version/id/urn:cambridge.org:id:binary:20180413115154361-0313:S0142716417000571:S0142716417000571_tab1.gif?pub-status=live)
Furthermore, a growing number of SLA scholars have also examined the cognitive processes underlying fluent speech performance (cognitive fluency). For example, Kormos (Reference Kormos2006) proposed that certain aspects of utterance fluency measures (breakdown, speed, and repair) can reflect three different stages of L2 speech production, conceptualizing the message, encoding and formulating linguistic information, and monitoring one's own output. Specifically, one breakdown fluency measure, the number of final-clause pauses, is argued to signal L2 learners’ engagement with conceptualization and content planning; another breakdown measure, the number of mid-clause pauses, is related to the present state of L2 learners’ timely phonological, lexical, and syntactic encoding; and repair fluency measures reflect the amount of attentional resources that L2 learners have for the purpose of monitoring their own speech. Comparatively, speed fluency measures (e.g., speech/articulation rate) are thought to involve every dimension of L2 speech production (conceptualization, formulation, and monitoring), and act as a crucial indication of automatization.
Whereas much attention has been given to examining the complex relationships between perceived, utterance, and cognitive fluency, the extent to which these different components of fluency actually develop as L2 learners become more proficient over time (beginner → intermediate → advanced) has remained surprisingly understudied. As observed in Table 1, previous studies have drawn on relatively small data sets focusing on particular groups of L2 learners with relatively homogenous proficiency levels (N = 16–40), a common methodological problem in the field of SLA, as pointed out by Norris, Plonsky, Ross, and Schoonen (Reference Norris, Plonsky, Ross and Schoonen2015). According to a componential view of proficiency, L2 speech is a composite phenomenon constituting both global dimensions (e.g., perceived fluency) and subconstructs (e.g., breakdown, speed, and repair fluency; de Jong, Steinel, Florijn, Schoonen, & Hulstijn, Reference de Jong, Steinel, Florijn, Schoonen and Hulstijn2012), and the associations between these global and subskill domains may vary in relation to different proficiency levels (Higgs & Clifford, Reference Higgs, Clifford and Higgs1982). In L2 assessment research (which is outside of the domain of L2 fluency research), for example, it has been shown that the relative weight of vocabulary appropriateness as measured by global oral proficiency judgments (e.g., comprehensibility and communicative adequacy) may be strong for low-level proficiency L2 learners, while grammar and pronunciation accuracy could be more distinguishing of high-level proficiency L2 learners (e.g., Isaacs & Trofimovich, Reference Isaacs and Trofimovich2012).
Though few in number, some researchers have examined which acoustic variables constitute beginner-, intermediate-, and advanced-level L2 fluency. Adopting a cross-sectional design, Cucchiarini, Strik, and Boves (Reference Cucchiarini, Strik and Boves2000) compared the acoustic characteristics of the speech of two different groups of L2 Dutch learners (beginner vs. intermediate), finding that their perceived fluency was predicted by different types of utterance fluency (breakdown fluency for the beginner group vs. speed fluency for the intermediate group). More recently, Derwing, Munro, Thomson, and Rossiter (Reference Derwing, Munro, Thomson and Rossiter2009) longitudinally tracked 32 L2 learners of English over the first 2 years of their immersion in Canada. The results showed that the participants steadily improved their perceived fluency by developing their articulation rate throughout the project, but dramatically decreased the number of pauses in their speech only within the first year of the research. Using a quasi-experimental, pretest/posttest design, Lambert, Kormos, and Minn (Reference Lambert, Kormos and Minn2017) examined how L2 learners enhanced diverse dimensions of utterance fluency while repeating the same communicative task six times over a 2-hr English conversation session. The results showed that the subconstructs of L2 utterance fluency developed at different learning rates, as predicted by Kormos's (Reference Kormos2006) psycholinguistic model of speech production. The participants continued to increase their speech rate throughout the multiple task repetitions (i.e., automatizing all the relevant speech processing systems); their final-clause and mid-clause pauses significantly decreased over the first three or four repetitions; and their self-repairs substantially declined only between the fourth and fifth repetitions.
Taken together, the previous literature suggests that adult L2 learners’ improvement can be observed particularly (a) in the development of breakdown and speed fluency by enhancing their smooth and fluid access to the conceptualizer and formulator in the initial stage of SLA (beginner → intermediate); and (b) in the development of repair and speed fluency by optimizing the process of monitoring in the later stages of SLA (intermediate to advanced). Following this line of thought, the current study aimed to revisit the acoustic correlates of low-, mid-, and high-level fluent speech in the context of a relatively large-scale data set covering a wide range of proficiency levels (N = 100).
EXPERIENCE AND AGE EFFECTS ON ADULT SECOND SPEECH LEARNING
In the field of L2 speech research, scholars have explored two essential questions regarding the mechanisms underlying successful L2 pronunciation learning: (a) how adult L2 learners can quickly improve the spectral and temporal dimensions of consonants and vowels in relation to increased experience (i.e., the role of experience in rate of learning); and (b) the extent to which they can eventually refine the nativelikeness of their pronunciation proficiency, especially in accordance with learners’ age of acquisition (i.e., the role of age in ultimate attainment; for a comprehensive review, see Saito, in press). With respect to the former (rate of learning), length of residence (LOR) has been considered as a rough proxy for L2 experience, as it does not always mirror how L2 learners actually use the target language. For example, certain learners could choose to use their native language (L1) rather than their L2 as the primary language of communication for the duration of their potentially extensive residence (Flege & Liu, Reference Flege and Liu2001). However, there is ample evidence that adult L2 speech learning continues to take place as a function of increased LOR, as long as learners use the target language through interaction with other native and nonnative speakers on a daily basis (e.g., Derwing & Munro, Reference Derwing and Munro2013; Saito, Reference Saito2015).
Within the first few years of immersion, many adult L2 learners’ phonological forms quickly become intelligible, especially in the context of frequently used words (Munro & Derwing, Reference Munro and Derwing2008 for vowels; Saito & Munro, Reference Saito and Munro2014, for approximants). These L2 learners appear to continue to enhance their segmental (e.g., Baker, Reference Baker2010, for stops; Saito & Brajot, Reference Saito and Brajot2013, for approximants; Flege, Bohn, & Jang, Reference Flege, Bohn and Jang1997, for vowels) and prosodic (e.g., Trofimovich & Baker, Reference Trofimovich and Baker2006, for word stress and intonation) accuracy over an extensive period of time (e.g., 5–10 years of LOR). This process of phonological reattunement is assumed to facilitate learners’ comprehension and production of a number of phonologically similar words (e.g., minimal pairs; Bundgaard-Nielsen, Best, & Tyler, Reference Bundgaard-Nielsen, Best and Tyler2011), and is used as empirical support for many theoretical accounts that claim that even adult L2 learners can learn new sounds in a manner similar to L1 acquisition (e.g., Flege, Reference Flege2016, for speech learning model; Best & Tyler, Reference Best, Tyler, Bohn and Munro2007, for perceptual assimilation model—L2).
With respect to the ultimate attainment of L2 speech learning, a number of large-scale studies have demonstrated strong age effects on the final quality of adult L2 learners’ pronunciation proficiency after years of immersion in a L2 speaking environment. Whereas some L2 learners can achieve near-nativelike L2 pronunciation proficiency, especially when exposed to the target language from an early age (<6–7 years), other L2 learners with late age of acquisition (AOA) profiles (>12–14 years) tend to have detectable accents (e.g., Flege, Munro, & MacKay, Reference Flege, Munro and MacKay1995; Granena & Long, Reference Granena and Long2013; Saito, Reference Saito2013). This could be because adult L2 learners have already lost their access to the innate language acquisition device by which to pick up the target language at a nativelike level based on mere exposure (Abrahamsson & Hyltenstam, Reference Abrahamsson and Hyltenstam2008; Granena & Long, Reference Granena and Long2013), or because certain cognitive abilities (e.g., working memory, brain size, speech processing, and attentional/inhibitory control) relevant for successful language acquisition likely decline after the age of 18–20 years (Birdsong, Reference Birdsong2006; Saito, Reference Saito2013).
Although L2 pronunciation learning involves a wide range of acoustic phenomena (e.g., the accurate and fluent use of consonants, vowels, word stress, intonation, and rhythm), it is noteworthy that the aforementioned studies have been exclusively concerned with the effects of experience and age on the development of L2 segmental accuracy. What has remained understudied to date is the extent to which such findings could be generalized to other aspects of L2 pronunciation development (prosody, rhythm, and fluency). To our knowledge, very few studies have examined the role of LOR and AOA particularly in L2 fluency development and attainment, especially focusing on a range of L2 learners with varied experience, age, and proficiency profiles. With a total of 30 Korean learners of English (LOR = 0.1 to 15 years), Trofimovich and Baker (Reference Trofimovich and Baker2006) showed that the participants’ breakdown fluency (the number of pauses) was associated with their LOR, especially among the inexperienced and moderately experienced learners (LOR < 3 years). In the context of 102 experienced German learners of English (LOR > 10 years), Lahmann, Steinkrauss, and Schmid (Reference Lahmann, Steinkrauss and Schmid2017) did not find any significant relationship between participants’ age of arrival and their utterance fluency performance (breakdown, speed, and repair), suggesting that many L2 learners may be able to attain high-level fluency as a function of increased experience regardless of their starting AOA (different from L2 segmental learning, which is amenable to both experience and age effects throughout one's life).
To further examine this topic, the current study compared 90 Japanese learners of English with varied LOR and AOA profiles (see the Method section) with 10 native speaker baselines. Our data set departed in quantity/quality from the aforementioned studies (Trofimovich & Baker, Reference Trofimovich and Baker2006, for 30 inexperienced and experienced L2 learners; Lahmann et al., Reference Lahmann, Steinkrauss and Schmid2017, for 102 experienced learners but without any comparison with inexperienced learners nor native speakers). We aimed to identify low-, mid-, and high-fluent speakers by way of a cluster analysis based on 10 native speakers’ subjective judgment scores (i.e., perceived fluency). Subsequently, we investigated which components of utterance fluency (breakdown, speed, and repair) could distinguish between the three different Japanese (low, mid, and high) and the English baseline (native) groups. Finally, we explored whether and to what degree the grouping category (low, mid, or high) could be related to the participants’ LOR and AOA profiles. The following research questions were thus formulated:
1. How do breakdown, speed, and repair fluency correlate with native speakers’ intuitive judgments of fluency?
2. Which utterance fluency measures distinguish between learners at low, mid, high, and native levels of perceived fluency?
3. To what extent do experience and age factors influence the attainment of such different fluency levels?
METHOD
Participants in the current study included 100 speakers (90 nonnatives and 10 natives) who provided spontaneous speech samples, and 10 native listeners who rated all the speech samples for perceived fluency.
Speech samples
A total of 90 spontaneous speech samples were drawn from our unpublished corpus, which currently comprises 500+ Japanese learners of English with varied L2 learning experience in Japan, Canada, the United States and the United Kingdom (for details, see Saito, Reference Saito2017; for the materials deposited in IRIS, see Marsden, Mackey, & Plonsky, Reference Marsden, Mackey, Plonsky, Mackey and Marsden2016). All of them were native speakers of Japanese (both of their parents were L1 Japanese speakers) and started learning L2 English from Grade 7 in foreign language classroom settings in Japan.
Speakers
To cover a wide range of proficiency levels and learner profiles, the 90 participants were selected in accordance with the following categories, which were adapted from Trofimovich and Baker (Reference Trofimovich and Baker2006): inexperienced learners (LOR = 0 years), experienced learners (LOR < 5 years), and attainers (LOR > 6 years). For the latter two groups, care was taken to choose only those who had reported using L2 English as their main language of communication at the time of the investigation. According to the analyses of individual interviews, their L2 use was considered highly frequent on a 6-point scale (1 = infrequent, 6 = very frequent; M = 5.3; range = 4-6). This was done to avoid including L2 learners who actually continued to use L1 Japanese despite their residence in Canada, and whose LOR factor did not correlate with the actual quantity/quality of their L2 experience (Flege, Reference Flege2016). Finally, we selected from the same corpus data a baseline group of native English speakers who completed the same task in order to provide the baseline data for the purpose of comparison.
1. Inexperienced Japanese learners (n = 10). A total of 10 inexperienced university-aged Japanese learners (at the time of the project) were chosen to provide the lower range of the baseline data (L2 learners without any experience abroad; M age = 20.4 years; range = 18–21 years). Since they had never stayed nor studied abroad (LOR = 0 months), their performance was considered to serve as a proxy for the initial stage of Japanese learners’ L2 fluency development (solely based on their 6 years of foreign language experience in Japan).
2. Experienced Japanese learners (n = 40). A total of 40 Japanese learners were chosen for the “experienced” category (M age = 34.7 years; range = 22–48 years). These learners had a range of LOR profiles in Vancouver and Calgary, Canada (M = 1.4 years; range = 0.1 to 5 years) with widely different AOA points (M age = 28.3 years; range = 19–40 years). Given the cross-sectional (Trofimovich & Baker, Reference Trofimovich and Baker2006) and longitudinal (Munro, Derwing, & Saito, Reference Munro, Derwing, Saito, Levis and LeVelle2013) evidence that much L2 speech learning likely takes place over an extensive period of immersion (LOR = 0–5 years), their performance was assumed to represent the midstage of L2 fluency development.
3. Japanese attainers (n = 40). In line with the standards in L2 ultimate attainment research (e.g., DeKeyser, Reference DeKeyser2013), a total of 40 Japanese attainers were also included (M age = 40.2 years; range = 28–63 years). They had been in Canada for at least 6 years (M = 11.3 years; range = 6–18 years), and had various AOA profiles (M age = 27.1 years; range = 21–36 years). Their performance was considered to indicate the final stage of L2 fluency development.
4. Native English baselines (n = 10). To provide the upper range of the baseline data (targetlike forms without any foreign accents), this group comprised a total of 10 native speakers of English recruited in Vancouver, Canada (M age = 27.5 years; range = 18–37 years). At least one of their parents was an L1 English speaker. They reported that they had been using English as their L1 from birth onward and had limited knowledge/use of the other official language in Canada, French.
Task procedure
All the speakers engaged in a timed picture description task designed to elicit spontaneous language production, where the primary focus was on conveying meaning rather than form under communicative pressure (Spada & Tomita, Reference Spada and Tomita2010). The task was developed based on picture description tasks that have been widely used in previous L2 speech research where L2 learners explained a series of pictures in a sequence (e.g., Derwing et al., Reference Derwing, Munro, Thomson and Rossiter2009) or a single picture (e.g., Munro & Mann, Reference Munro and Mann2005).Footnote 2 In this task, the participants described seven pictures with a limited amount of planning time (i.e., 5 s per photo). Whereas the first four pictures were used as practice for participants to get used to the task procedure (describing a photo with little planning), the remaining three pictures were used for the final analyses. Given that the current study included inexperienced learners who had noted much difficulty in producing free speech, especially due to the significant lack of their conversational experience inside/outside classrooms, the decision was made to provide three key words so that they could at least start producing language without too much silence at the beginning of each picture description.Footnote 3
The first 10 s of each picture description were cut, combined, and saved in a WAV file for each speaker (10 s × 3 pictures = 30 s in total). Efforts were made to ensure that each sample started from the beginning of the picture description without initial dysfluencies (e.g., false starts or/and hesitations) and ended at a phrase boundary. The total length of each speech sample (30 s per speaker) was comparable to previous fluency research (e.g., Bosker et al., Reference Bosker, Pinget, Quene, Sanders and de Jong2013, for 20 s; Derwing et al., Reference Derwing, Munro, Thomson and Rossiter2009, for 20 s).Footnote 4 All the speech data were individually recorded in a quiet room in a community center, a university lab, or the participants’ residences using digital Roland-05 audio recorders (set to 44.1 kHz sampling rate with 16-bit quantization).
Perceived fluency analyses
Listeners
A total of 10 native listeners were recruited in London, United Kingdom, to assess all the speakers’ global fluency (M age = 29.3 years; range = 18–51 years). They were born and raised in English-speaking families in London and had at least one native English-speaking parent. None of the participants had studied Japanese prior to the project. Their familiarity with Japanese accented English was moderate (M = 2.9, range = 1–4) on a 6-point scale (1 = not at all, 6 = very much).Footnote 5
Rating procedure
Following the methodology by Derwing et al. (Reference Derwing, Munro, Thomson and Rossiter2009), the listeners received a brief explanation on the definition of perceived global fluency (i.e., the flow and smoothness of speech); notably, raters were not asked to pay attention to specific subconstructs of L2 speech (i.e., utterance fluency features), such as the number of pauses, repetitions, and self-corrections. Next, they proceeded with a practice session where they rated 3 speech samples (not included in the main data set) and explained their decisions for each sample. After we ensured that each listener focused on fluency (the raters’ comments mainly concerned “tempo” rather than overall proficiency, accuracy, nor complexity of L2 speech), they proceeded to assess a total of 100 speech samples that were played in a randomized order through Praat (Boersma & Weenink, Reference Boersma and Weenink2012) on a 9-point scale (1 = not fluent at all, 9 = very fluent).
As operationalized in the previous literature (e.g., Bosker et al., Reference Bosker, Pinget, Quene, Sanders and de Jong2013) and in order to tap into their initial intuitions and impressions about the L2 speech, the listeners were permitted to listen to each sample only once. In addition, the listeners were explicitly told to use the entire 9-point scale as much as possible, and were informed that the data set represented a wide scope of adult L2 fluency proficiency ranging from inexperienced learners (without any experience abroad) to experienced learners (with extensive LOR in an L2 speaking environment) to native speakers. Since each listener session took approximately 2 hr (including explanation, training, and rating), all the listeners took a 10-min break halfway through.
Interrater agreement
According to the results of Cronbach α analyses, the 10 listeners showed relatively high interrater agreement on their intuitive judgments of perceived L2 fluency (α = 0.98) in line with other fluency studies (e.g., Bosker et al., Reference Bosker, Pinget, Quene, Sanders and de Jong2013, for α = 0.97).
Utterance fluency analyses
All the speech samples were transcribed into analysis of speech units (Foster, Tonkyn, & Wigglesworth, Reference Foster, Tonkyn and Wigglesworth2000). Conforming to Kormos's (Reference Kormos2006) utterance and cognitive fluency model, these samples were coded for three different dimensions of utterance fluency (breakdown, speed, and repair), which are assumed to correspond to four stages of L2 speech production (conceptualization, formulation, articulation, and monitoring). We purposefully selected these utterance measures as they have been found to demonstrate little intercollinearity (Bosker et al., Reference Bosker, Pinget, Quene, Sanders and de Jong2013). For breakdown fluency, the number of filled (e.g., ah, oh, and eh) and unfilled (>250 ms; Bosker et al., Reference Bosker, Pinget, Quene, Sanders and de Jong2013) pauses in the middle and end of clauses were manually calculated and divided by the total number of words.Footnote 6 For speed fluency, the total phonation time (without all filled pauses) was divided by the total number of syllables (i.e., articulation rate). For repair fluency, the number of repetitions and self-corrections were divided by the total number of words.
Three trained researchers served as analysts for this portion of the study: the second, third, and fourth author. In a 1-hr training session, they received explicit explanation on each category of breakdown, speed, and repair fluency. Next, they practiced and discussed the analytic procedure with 5 similar speech samples (not included in the main data set). After they confirmed their clear understanding of the concept of the utterance fluency categories, they then analyzed 10 samples randomly selected from the main data set in order to check intercoder reliability. The results of Cronbach α analyses found relatively high α values for breakdown (α = 0.95 for filled pauses, α = 0.92 for unfilled pauses, α = 0.91 for final-clause pauses, and α = 0.91 for mid-clause pauses), speed (α = 0.93 for articulation rate), and repair (α = 0.96 for repetitions, α = 0.97 for self-corrections). Finally, the three researchers were randomly assigned to analyze a subset of 30 different speech samples, respectively.
RESULTS
The first objective of the statistical analyses was to identify three different levels of L2 fluency (low, mid, and high) based on the results of the 10 listeners’ rating scores of 90 nonnative speech samples. To this end, a hierarchical cluster analysis using Ward's method was adopted to categorize all the samples (n = 90) into smaller homogeneous groups. In accordance with a visual inspection of the dendogram (see Figure 1), a three-factor solution was adopted, dividing 90 Japanese learners into three groups: low (n = 29), mid (n = 30), and high (n = 31).
![](https://static.cambridge.org/binary/version/id/urn:cambridge.org:id:binary:20180413115154361-0313:S0142716417000571:S0142716417000571_fig1g.gif?pub-status=live)
Figure 1. Dendrogram tree of hierarchical clusters based on the participants’ perceived fluency scores.
The descriptive statistics of the participants’ perceived fluency scores are summarized in Table 2. According to the results of 95% confidence interval analyses, there was no overlapping of the groups’ mean scores, indicating that the four groups (low, mid, high, and native) significantly differed in their perceived fluency performance at a p < .05 level.
Table 2. Descriptive statistics of perceived and utterance fluency scores
![](https://static.cambridge.org/binary/version/id/urn:cambridge.org:id:binary:20180413115154361-0313:S0142716417000571:S0142716417000571_tab2.gif?pub-status=live)
Note. Perceived fluency scores were based on a total of 10 native listeners’ intuitive judgements on a 9-point scale (1 = dysfluent, 9 = very fluent)
The second objective of the statistical analyses was to investigate the relationship between the five utterance fluency measures: final-clause pause ratio, mid-clause pause ratio, articulation rate, repetition ratio, and self-correction ratio (for the descriptive results, see Table 2). According to the results of Pearson correlation analyses (as summarized in Table 3), three significant correlations were found: between final-clause pauses (breakdown) and articulation rate (speed); mid-clause pauses (breakdown) and articulation rate (speed); and repetitions and self-corrections (repair; p < .005, Bonferroni corrected). In contrast, such significant correlations were not found for the two breakdown measures (mid-clause vs. final-clause pauses). The repair measures were not significantly associated with the breakdown nor the speed measures (p < .005). In keeping with Kormos's (Reference Kormos2006) proposal, the results suggest that the five utterance fluency measures included in the current study seem to tap into the participants’ abilities to perform three separate cognitive operations during L2 speech production: (a) conceptualization (final-clause pauses and articulation rate), (b) formulation (mid-clause pauses and articulation rate), and (c) monitoring (repetitions and self-corrections).
Table 3. Results of Pearson correlation analyses between five utterance fluency measures
![](https://static.cambridge.org/binary/version/id/urn:cambridge.org:id:binary:20180413115154361-0313:S0142716417000571:S0142716417000571_tab3.gif?pub-status=live)
Note. *p < .005 (Bonferroni corrected)
The third objective of the statistical analyses was to illustrate the acoustic correlates of the 10 native listeners’ intuitive fluency judgments of the 100 native and nonnative speakers. Given that the listeners demonstrated relatively high interrater agreement as to L2 fluency judgments (α > 0.95), the perceived fluency scores were averaged across raters to generate a single score for each speaker. To analyze the relationship between the perceived and utterance fluency, the mean fluency scores (dependent variable), and all the breakdown, speed and repair measures (independent variable) were analyzed via a set of Pearson correlation analyses. As shown in Table 4, the perceived fluency scores were significantly linked to mid- and final-clause pauses and articulation rate (p < .010, Bonferroni corrected). However, the role of the repair factor (repetition and self-correction) in perceived fluency remained unclear, as the correlation between the repetition ratio and perceived fluency reached only marginal significance (p = .014).
Table 4. Correlation coefficients between perceived fluency scores and five utterance fluency measures
![](https://static.cambridge.org/binary/version/id/urn:cambridge.org:id:binary:20180413115154361-0313:S0142716417000571:S0142716417000571_tab4.gif?pub-status=live)
Note. *p <.001 (Bonferroni corrected)
To further examine the relative weights of the five utterance measures in the perceived fluency scores, a stepwise multiple regression analysis was performed. As summarized in Table 5, the regression model, which included three utterance fluency variables (articulation rate, mid-clause pauses, and final-clause pauses), accounted for 45.0% of the variance in accuracy, with no evidence of strong collinearity in the model (VIF = 1.85; see Table 5). According to this model, the native listeners used speed fluency (articulation rate) as a primary cue, and breakdown (mid- and final-clause pauses) as a secondary cue for the perceived fluency judgments.
Table 5. Results of multiple regression analysis using acoustic variables as predictors of perceived fluency
![](https://static.cambridge.org/binary/version/id/urn:cambridge.org:id:binary:20180413115154361-0313:S0142716417000571:S0142716417000571_tab5.gif?pub-status=live)
Note. The variables entered into the regression equation included mid-clause pause ratio, final-clause pause ratio, articulation rate, repetition ratio, and self-correction ratio.
The fourth objective of the statistical analyses was to examine how the five utterance fluency scores (final-/mid-clause pauses, articulation rate, repetitions, and self-corrections) distinguished between the four different perceived fluency groups (low, mid, high, and native). A set of one-way analyses of variance (ANOVAs) were performed with perceived fluency level as the grouping factor and each of the utterance fluency scores as the dependent variable (Bonferroni corrected, p < .016).
As shown in Table 6, the results of ANOVAs found that whereas the final-clause pause factor distinguished between low and mid levels of perceived fluency (p = .015), the pause ratio of the other groups (mid, high, and native) appeared to be similar (p > .016). The mid-clause pause factor differentiated not only between low and mid levels of perceived fluency (p = .001), but also between mid and high levels of perceived fluency (p = .005). There was no statistically significant difference in the mid-clause pause ratio between the high and native fluency groups (p > .016). The articulation rate factor distinguished between all four different levels of perceived fluency (p = .006 for low and mid, p = .002 for mid and high, and p < .001 for high and native). Finally, the ANOVAs did not find any significant group effects for the repair factors (repetition and self-correction ratio) at a p < .016 level.
Table 6. Summary of group differences for low, mid, high and native levels of perceived fluency
![](https://static.cambridge.org/binary/version/id/urn:cambridge.org:id:binary:20180413115154361-0313:S0142716417000571:S0142716417000571_tab6.gif?pub-status=live)
Note. ap < .016 (Bonferroni corrected)
The fifth and final objective of the statistical analyses was to illustrate what kinds of learner profiles, experience (LOR) and age (age of arrival), could identify those L2 participants who actually attained different levels of perceived fluency. With respect to the effect of L2 experience, we ran a one-way ANOVA to see whether the three groups of Japanese learners (29 low-fluent learners, 30 mid-fluent learners, and 31 high-fluent learners) significantly differed according to their LOR backgrounds (0–18 years). As shown in Table 7, the results yielded a significant effect of group, F (2, 87) = 49.264, p < .001, ηp 2 = 0.53, indicating that the experience factor (LOR) accounted for 53% of the variance in the participants’ perceived fluency performance. A set of multiple comparison analyses further revealed that the LOR factor significantly distinguished three different levels of perceived fluency (p < .001 for low and mid, and mid and high) at a p < .025 level (Bonferroni corrected).
Table 7. Descriptive statistics of learner length of residence profiles
![](https://static.cambridge.org/binary/version/id/urn:cambridge.org:id:binary:20180413115154361-0313:S0142716417000571:S0142716417000571_tab7.gif?pub-status=live)
In terms of the influence of the participants age profiles (age of arrivals), we eliminated from the data set a total of 10 inexperienced Japanese learners, all of whom belonged to the low-fluency group, as they had no AOA records due to the lack of their experience abroad. With the remaining 80 Japanese learners (19 low-fluent learners, 30 mid-fluent learners, and 31 high-fluent learners), results of a one-way ANOVA did not find a significant effect of group, F (2, 77) = 0.441, p = .645, ηp 2 = 0.02 (summarized in Table 8). Thus, the results here hinted that AOA did not play a substantial role in the attainment of high-level fluent speech.
Table 8. Descriptive statistics of learner age of acquisition profiles
![](https://static.cambridge.org/binary/version/id/urn:cambridge.org:id:binary:20180413115154361-0313:S0142716417000571:S0142716417000571_tab8.gif?pub-status=live)
To further examine the learner profiles of L2 learners who actually attained “nativelike” fluency performance, the following procedure in the nativelikeness literature was adopted (e.g., DeKeyser, Reference DeKeyser2013). We calculated the means and confidence intervals (CI) of the baseline group's perceived fluency scores (see Table 2) and then counted how many Japanese learners’ fluency performance fell within two CIs of the baseline mean values. Out of the 90 participants, only 7 learners’ fluency performance was identified as nativelike. As shown in Table 9, these participants’ LOR and AOA profiles widely ranged, suggesting that neither LOR nor AOA could be a reliable predictor for the incidence of attaining nativelike L2 fluency.
Table 9. Learner profiles of seven Japanese learners who attained nativelike fluency performance
![](https://static.cambridge.org/binary/version/id/urn:cambridge.org:id:binary:20180413115154361-0313:S0142716417000571:S0142716417000571_tab9.gif?pub-status=live)
DISCUSSION AND CONCLUSION
In the context of 100 spontaneous speech samples (produced by 90 inexperienced/experienced Japanese learners and attainers, and 10 native speakers) and 10 native listeners who judged the overall fluency of the speech data on a 9-point scale (1 = dysfluent, 9 = very fluent), the current study was designed to probe the complex mechanisms underlying perceived (overall impression), utterance (breakdown, speed, and repair) and cognitive (conceptualization, formulation, and monitoring) fluency at different proficiency levels. To examine the generalizability of the topic (L2 fluency development) to the overall framework of L2 speech learning (Flege, Reference Flege2016), the study also aimed to identify whether and to what degree the different proficiency levels could be related to L2 learners’ individual differences in terms of overseas experience (operationalized as LOR) and AOA (the first intensive exposure to L2 English). A summary of the results is presented in Table 10.
Table 10. Summary of Acoustic Characteristics and Learner Profiles of Low, Mid, High and Native Fluency
![](https://static.cambridge.org/binary/version/id/urn:cambridge.org:id:binary:20180413115154361-0313:S0142716417000571:S0142716417000571_tab10.gif?pub-status=live)
Note. Dashed lines separate different fluency levels that are distinguished by a given acoustic and learner variable. CI for confidence intervals
With respect to the first research question (the relationship between perceived and utterance fluency), the results of the correlation and multiple regression analyses showed that the native listeners tend to use speed (articulation rate) as a primary acoustic cue (explaining 45% of variance) and breakdown (final- and mid-clause pauses) as a secondary acoustic cue (explaining 12% of variance) for their overall fluency judgments. Comparatively, the extent to which they relied on the repair-related information (repetitions and self-corrections) remained unclear. The relative importance of the acoustic information in perceived fluency (speed > breakdown > repair) here is in line with findings reported in existing studies (e.g., Bosker et al., Reference Bosker, Pinget, Quene, Sanders and de Jong2013).
Turning to the second research question, the current study further expounded whether and to what degree the listeners differentially used breakdown, speed, and repair information while assessing different levels of speech fluency. To this end, four proficiency categories: low (n = 29), mid (n = 30), high (n = 31), and native (n = 10), were determined based on cluster analyses of the 10 native listeners’ fluency ratings of the 100 speech samples. The results of the series of ANOVAs provided three unique findings. The native listeners used all of the breakdown/speed measures to differentiate low- and mid-level fluency; two measures (mid-clause pauses and articulation rate) to differentiate mid- and high-level fluency; and only one measure (articulation rate) to differentiate high- and native-level fluency. The results here lend some empirical support to Cucchiarini et al.’s (Reference Cucchiarini, Strik and Boves2000) claim that the acoustic correlates of perceived fluency may differ depending on the level of proficiency, with breakdown fluency being a relatively strong predictor for beginners’ L2 fluency, and speed fluency for more advanced learners’ fluency.
In terms of the third research question (the role of LOR and AOA in low-, mid-, and high-level fluency), the ANOVAs showed that the three proficiency groups significantly differed according to the participants’ LOR profiles, but not according to their AOA profiles. The results of the CI analyses (summarized in Tables 7 and 8) suggest that L2 learners may need a different amount of experience to achieve mid-level fluency proficiency (LOR > 3.7 years) and high-level fluency proficiency (LOR > 8.8 years) regardless of their age of arrival in an L2 speaking environment (18–40 years). The results here concur with previous findings on the presence of strong experience effects (Trofimovich & Baker, Reference Trofimovich and Baker2006), but a lack of any significant age effects (Lahmann et al., Reference Lahmann, Steinkrauss and Schmid2017) on L2 fluency development. Of note, this temporal aspect for L2 speech learning is different from the widely accepted view in regards to L2 segmental acquisition, where both experience and age effects are equally strong (Flege, Reference Flege2016).
Although the cross-sectional nature of the data set in the current study does not directly relate to development per se, several scholars have provided theoretical (de Jong et al., Reference de Jong, Steinel, Florijn, Schoonen and Hulstijn2012) and empirical (Derwing et al., Reference Derwing, Munro, Thomson and Rossiter2009) evidence that L2 learning takes place on a continuum of perceived fluency (low → mid → high) as a function of increased experience. Given that the current study featured a large number of L2 learners with diverse proficiency (low to high) and experience (0–20 years of LOR) profiles, it can be argued that examining the acoustic characteristics of their speech can provide several tentative explanations for how L2 learners develop different aspects of utterance fluency (final-clause pauses, mid-clause pauses, and articulation rate) to reach low, mid, high, and native fluency-proficiency levels over an extensive period of time (>10 years) with a varied degree of learner awareness (explicitly and implicitly). In particular, their developmental patterns could be discussed in relation to Kormos's (Reference Kormos2006) proposal of the different stages of cognitive operations during L2 speech production (conceptualization, formulation, articulation, and monitoring), and the amount of L2 experience required to reach each fluency level (operationalized as LOR).
In the initial stage of L2 fluency development (low → mid-level fluency), we would like to argue that much learning can be observed, particularly in the decreasing number of final-clause pauses; this claim stems from the finding that many L2 learners in the current study with adequate amounts of experience (LOR = 3.7–7.1 years) demonstrated nativelike pause frequency. As Kormos (Reference Kormos2006) suggested, the frequency of final-clause pauses is hypothesized to capture the efficient and timely conceptualization during L2 speech production (see Götz, Reference Götz2013; Lambert et al., Reference Lambert, Kormos and Minn2017). Thus, the findings here indicate that inexperienced L2 learners (e.g., LOR < 0.8 years) may conceptualize what to say more slowly. Given that spontaneous production entails various levels of processing operations in parallel (Skehan, Reference Skehan2014), this delay in conceptualization could be due to the interaction of problems at both conceptualization and formulation. That is, inexperienced L2 learners’ relatively weak representational and processing systems in the target language require excessive amounts of cognitive resources for linguistic encoding and formulation, leaving considerably less cognitive capacity that they could use for conceptualization.
As their L2 experience and proficiency increases (e.g., approximately 5 years of immersion), these learners may continue to enhance and then attain the more prompt and robust retrieval of the preverbal message even during spontaneous L2 speech production (like speaking an L1). Although the mildly experienced learners’ conceptualization processes may reach the nativelike efficiency in terms of the final-clause pause ratio, the other aspects of their fluency performance (mid-clause pause ration and articulation rate) could be still substantially different from advanced-level L2 learners (e.g., LOR = 8.8–12.4 years) and native speakers.
In the later stages of L2 fluency development (mid → high-level fluency), the frequency of L2 learners’ mid-clause pauses appears to reach nativelike levels, suggesting that their linguistic encoding processes seem to be optimized in keeping with their gradually developing phonetic, lexical, and grammatical knowledge over approximately 10 years (8.8–12.4) of LOR. To reach native-level perceived L2 fluency, however, even such experienced L2 learners still need to enhance their articulation rate by automatizing both the conceptualization and the formulation processes at a faster speed (Trofimovich & Baker, Reference Trofimovich and Baker2006). Since we did not find LOR nor AOA to be predictors of perceived nativelike fluency performance, it remains open to further investigation which factors (“beyond” LOR and AOA), such as cognitive abilities (e.g., Granena & Long, Reference Granena and Long2013, for aptitude; O'Brien, Segalowitz, Collentine, & Freed, Reference O'Brien, Segalowitz, Collentine and Freed2006, for working memory), motivation (Saito, Dewaele, & Hanzawa, Reference Saito, Dewaele and Hanzawa2017, for integrativeness, instrumentality vs. metacognition) and personality (e.g., Dewaele & Furnham, Reference Dewaele and Furnham2000, for extroversion vs. introversion), could facilitate this.
At the same time, it is crucial to point out that our findings confirmed a generally accepted view that L2 learners’ linguistic systems and behaviors are essentially different from those of native speakers (e.g., Abrahamsson & Hyltenstam, Reference Abrahamsson and Hyltenstam2008). As shown in our data set, many of even highly experienced L2 learners still significantly differed from the native baseline group especially in terms of speed fluency (while they demonstrated nativelike breakdown and repair fluency). It is theoretically intriguing to further pursue the underlying mechanism for the attainment of native-level fluency proficiency (i.e., what kinds of L2 learners can ultimately achieve all dimensions of utterance fluency at nativelike levels?). From educational perspectives, however, the current study rather suggests that L2 learners should selectively work on certain temporal features directly related to their different learning goals, such as the attainment of mid-level fluency (final-/mid-clause pauses and articulation rate) and high-level fluency (mid-clause pauses and articulation rate). All the relevant features here have been identified as crucial for successful L2 comprehensibility (Isaacs & Trofimovich, Reference Isaacs and Trofimovich2012). Our arguments here concur with a growing number of scholars who have claimed that L2 learning should concern/aim at increasing comprehensibility and intelligibility rather than nativelikeness as a realistic, prioritized, and attainable goal (e.g., Derwing & Munro, Reference Derwing and Munro2013; Isaacs & Trofimovich, Reference Isaacs and Trofimovich2012; Jenkins, Reference Jenkins2014; Saito, Reference Saito2015).
The repair factor (i.e., frequency of repetitions and self-corrections) did not significantly relate to the native listeners’ fluency ratings, nor did it distinguish between any proficiency groups in the current study (low vs. mid vs. high vs. native). In conjunction with the finding that both L2 learners and native baselines produced similar number of repairs during their picture descriptions, our findings echo previous studies evidencing the weak role of the repair phenomenon in L2 fluency (e.g., Kormos & Dénes, Reference Kormos and Dénes2004; Prefontaine, Kormos, & Johnson, Reference Prefontaine, Kormos and Johnson2016). At the same time, however, the lack of any significant associations related to repair fluency also cast doubt on the construct validity of the repair measures used in the current (and other existing) studies. Although we used separate categories to capture two types of repair (i.e., repetitions and self-corrections), different types of repair could be further analyzed at a fine-grained level, such as appropriateness repair (specifying ambiguous and/or incoherent content message more precisely) and error repair (modifying erroneously activated lexical, syntactical, morphological, and phonological forms at the sage of the formulation; Kormos, Reference Kormos1999). Future studies are warranted to scrutinize precisely which types of repair could be uniquely tied to L1 and L2 fluency (for similar arguments, see Bosker et al., Reference Bosker, Pinget, Quene, Sanders and de Jong2013).
Another issue that needs to be discussed is the lack of age effects on the final quality of L2 fluency attainment. One possible interpretation is that age effects may be relatively weak for those dimensions of L2 speech where much learning likely takes place as long as L2 learners regularly use and practice the target language for an extensive period of time. One such linguistic feature with much learning potential includes the approximation of nativelike fluency. As shown in the current study, Japanese learners with extended amounts of L2 experience (LOR > 8.8 years) seemed to attain similar results to the native speaker baseline in many dimensions of utterance fluency (e.g., the frequency of final-/mid-clause pauses, repetitions, and repairs); a significant difference between the high- and native-level fluency groups was observed only in articulation rate (for similar findings, see also Trofimovich & Baker, Reference Trofimovich and Baker2006). Previous literature has indicated that strong age effects can be clearly observed for acquisitionally difficult dimensions of L2 speech, such as prosodic and segmental accuracy (Flege et al., Reference Flege, Munro and MacKay1995; Saito, Reference Saito2013, in press) and lexicogrammatical complexity (Lahmann, Steinkrauss, & Schmid, Reference Lahmann, Steinkrauss and Schmid2016). To summarize, the results here suggest that there are unique learning patterns generalizable to various dimensions of L2 speech learning (i.e., strong and extensive experience effects; Flege, Reference Flege2016, for segmentals; Trofimovich & Baker, Reference Trofimovich and Baker2006, for suprasegmentals), and specific to L2 fluency attainment (i.e., weak age effects and high-level achievement; Lahmann et al., Reference Lahmann, Steinkrauss and Schmid2017).
To close, two primary limitations need to be acknowledged with an eye toward further replication and elaboration of this topic. First, all the fluency analyses were based on 30 s of spontaneous speech elicited by a single task: a timed picture description where speakers were not required to conceptualize a great deal of content message (which is argued to influence the frequency of final-clause pauses; Kormos, Reference Kormos2006). Following previous L2 speech literature, the findings need to be replicated with various task modalities/demands, such as pretask and online planning time (Yuan & Ellis, Reference Yuan and Ellis2003), task repetition (Ahmadian & Tavakoli, Reference Ahmadian and Tavakoli2011), and single versus dual task conditions (Révész, Michel, & Gilabert, Reference Révész, Michel and Gilabert2016). Another crucial limitation of the study concerns the lack of instruments evaluating the influence of L1 fluency on L2 fluency. Whereas the L1–L2 fluency link is particularly strong among inexperienced learners (Derwing et al., Reference Derwing, Munro, Thomson and Rossiter2009), speakers’ L1 speech rate seems to continue to be a strong predictor of L2 speed fluency (de Jong, Groenhout, Schoonen, & Hulstijn, Reference de Jong, Groenhout, Schoonen and Hulstijn2015). To disentangle L1 speaking styles from any discussion related to L2 fluency proficiency, it is necessary for future studies to adopt both L1 and L2 fluency measures.
ACKNOWLEDGMENTS
We are grateful to the journal associate editor, Annie Tremblay, and two anonymous Applied Psycholinguistics reviewers for their constructive feedback on an earlier version of the manuscript. We also acknowledge Hui Sun, Keiko Hanzawa, Takumi Uchihara, and George Smith for their help with data collection and analyses. The project was funded by the Grant-in-Aid for Scientific Research in Japan (No. 26770202) and the Birkbeck College Additional Research Support.