Introduction
Second language (L2) acquisition researchers have become increasingly interested in the complex relationship between learners’ motivations and emotions, and their combined influence on L2 speech learning outcomes in classroom settings. A growing body of cross–sectional evidence suggests that individuals with greater motivation and more positive emotions are more willing to communicate in the L2 and more inclined to engage in extended L2 practice, often resulting in advanced proficiency (Botes, Dewaele, & Greiff, Reference Botes, Dewaele and Greiff2022a; Dörnyei, Reference Dörnyei2020). Recent developments in this area of study have seen a growing emphasis on longitudinal investigations (Li & Wei, Reference Li and Wei2023). These studies have shed light on the dynamic interplay between motivation, enjoyment, anxiety, and L2 outcomes over a specific time frame, typically spanning one academic term (3-4 months). We argue that to disentangle the causal relationships between motivation, enjoyment, anxiety, and acquisition, the topic needs to be reexamined in relation to a wider range of potentially important factors over a relatively long period (> 1 semester). This paper presents a longitudinal study focusing on 121 Japanese high school learners of English–as–a–Foreign–Language (EFL). It explores how these EFL learners engaged in L2 English practice in the classroom setting, and how their motivation, emotions, and L2 speech proficiency evolved over a period of 1.5 years. Here, we aim to empirically test the four hypotheses concerning the directionality of this relationship:
-
1. Changes in motivation and emotions serve as a driving force for L2 speech learning (motivation and emotions → L2 speech learning).
-
2. Enhanced L2 speech learning fosters increased levels of motivation and enjoyment and lower levels of anxiety (L2 speech learning → motivation and emotions).
-
3. Motivation, emotions, and L2 speech learning are interwoven and correlated with each other without clear directionality (motivation/emotions ↔ L2 speech learning).
-
4. The nature of the relationship evolves, shifting from correlational in the short term to causal in the long term.
Background
L2 speech learning in EFL classrooms
To date, extensive research has been dedicated to understanding the factors influencing the success of L2 speech learning in naturalistic settings, demonstrating that adult L2 learners can continue to acquire new sounds as long as they are immersed in sufficient daily input across various contexts (Flege & Bohn, Reference Flege, Bohn and Wayland2021). A growing amount of attention has been directed to exploring the applicability of these findings to foreign language learning contexts, where learners typically receive a few hours of language instruction weekly—termed the “minimal input” condition (Larson-Hall, Reference Larson-Hall2008, p. 36). In these studies, L2 speech proficiency outcomes have been assessed using diverse metrics, such as national standardized test scores (e.g., BISTA assessment; Baumert et al., Reference Baumert, Fleckenstein, Leucht, Köller and Möller2020), native speakers’ judgments of overall impression (e.g., overall foreign accentedness and comprehensibility; Nagle, Reference Nagle2018), and specific pronunciation tests (e.g., English [r] and [l] for Japanese learners of English; Saito, Reference Saito2019). Factors traditionally linked to successful outcomes include the age of learning (Larson-Hall, Reference Larson-Hall2008), duration of learning (Jaekel et al., Reference Jaekel, Schurig, Florian and Ritter2017), and participation in extracurricular activities (e.g., Muñoz, Reference Muñoz2014 for study abroad; Saito & Hanzawa, Reference Saito and Hanzawa2016 for conversation activities).
While these experience–related factors account for small–to–medium effects (e.g., r = .30-.40; Saito & Hanzawa, Reference Saito and Hanzawa2016), they do not fully explain all variances. This observation has led some researchers to propose that beyond external experiences, intrinsic individual differences significantly impact L2 speech learning success (Trofimovich et al., Reference Trofimovich, Kennedy, Foote, Reed and Levis2015). Specifically, when learners with similar exposure and practice opportunities are compared, those with certain motivational and emotional profiles may leverage each learning opportunity more effectively, resulting in greater gains (Moyer, Reference Moyer2014; Nagle, Reference Nagle2018; Saito et al., Reference Saito, Dewaele and Hanzawa2017; Zhou & Papi, Reference Zhou and Papi2023). These researchers seem to share an assumed logical sequence: greater motivation and more positive emotions lead to increased L2 practice and experience, which in turn leads to acquisition. This investigation aims to scrutinize the interplay between motivation and emotions in L2 speech learning.
Cross–sectional investigations of motivation, emotions, and L2 learning
Over the last two decades, various conceptual frameworks have been developed to elucidate the complex nature of L2 learners’ motivation and emotion profiles. Dörnyei’s L2 Motivational Self System has become the default framework in Second Language Acquisition research (Dörnyei, Reference Dörnyei2005, Reference Dörnyei, Dörnyei and Ushioda2009, Reference Dörnyei2020). It posits that L2 learners’ motivational dispositions can be conceptualized as the Ideal L2 Self (attributes the learner ideally aspires to attain) and the Ought–to L2 Self (attributes the learner believes are necessary to meet external expectations, duties, and obligations). In more recent developments, this self-system has been expanded upon with the introduction of the 2 × 2 model of future self-guides, delineating the Ideal Self/Own (with promotion focus) versus Others and Ought–to L2 Self/Own (with prevention focus; Papi et al., Reference Papi, Bondarenko, Mansouri, Feng and Jiang2019). L2 learners with Ideal L2 Self/Own are likely to practice the target language and maximize each practice opportunity (eager L2 use) while those with Ought–to L2 Self/Own tend to resort to the minimal use of the target language to avoid making mistakes (vigilant L2 use; Papi & Khajavy, Reference Papi and Khajavy2021).
Cross–sectional evidence has consistently demonstrated a stronger correlation between the motivation profiles related to Ideal L2 Self and higher levels of L2 proficiency (measured via general proficiency test scores) and achievement (operationalized via in–house exam results), in contrast to the motivation profiles associated with the Ought–to L2 Self (Dörnyei & Chan, Reference Dörnyei and Chan2013; Papi & Khajavy, Reference Papi and Khajavy2021; Lamb, Reference Lamb2012; Moskovsky et al., Reference Moskovsky, Assulaimani, Racheva and Harkins2016; Papi & Teimouri, Reference Papi and Teimouri2014). This distinction between the roles of the Ideal and Ought–to L2 Self in L2 learning outcomes is further supported by meta–analytic evidence (Al-Hoorie, Reference Al-Hoorie2018; r = .20 vs. -.05).
The last decade of L2 acquisition research has witnessed a significant transition in scholarly focus. The traditional exclusive focus on negative emotions (i.e., Foreign Language Classroom Anxiety [FLCA]; Horwitz et al., Reference Horwitz, Horwitz and Cope1986) has been replaced by a more holistic perspective that includes both negative and positive emotions (i.e., Foreign Language Enjoyment [FLE]; Dewaele & MacIntyre, Reference Dewaele and MacIntyre2014). Researchers have become aware that learner emotions do not emerge in isolation but are part of a highly dynamic system including interacting linguistic, social, and psychological factors (Dewaele et al., Reference Dewaele, Witney, Saito and Dewaele2018).
A meta-analysis demonstrated that FLE and FLCA are independent dimensions that are moderately negatively correlated (r = -.31) (Botes et al., Reference Botes, Dewaele and Greiff2022a, p. 214). Although FLCA and FLE are linked to several similar learner–internal sociobiographical variables (age, education, proficiency level, multilingualism), they are also associated with distinct learner–external variables. FLE tends to be linked with classroom factors, such as relationships with teachers and peers (Dewaele et al., Reference Dewaele, Witney, Saito and Dewaele2018) and with teaching methods (Dewaele et al., Reference Dewaele, Guedat‐Bittighoffer and Dat2024). In contrast, FLCA is more strongly related to a learner’s personality traits such as neuroticism (Dewaele & MacIntyre, Reference Dewaele, MacIntyre, Sato and Loewen2019). Meta–analytic reviews have identified a small–to–medium negative correlation between FLCA and general academic achievement in an L2 (r = -‥36 in Teimouri et al., Reference Teimouri, Goetze and Plonsky2019; r = -.39 in Botes et al., Reference Botes, Dewaele and Greiff2020) and a moderate positive correlation between FLE and both self–reported achievement and academic achievement (r = .27 and r = .30, respectively; Botes et al., Reference Botes, Dewaele and Greiff2022a). A stronger correlation emerged between FLE and Willingness to Communicate (r = .48; Botes et al., Reference Botes, Dewaele and Greiff2022a).
The relationship between motivation and emotions has been the focus of intense debate in the field of psychology and applied linguistics. They are different in nature but are connected to some extent (MacIntyre et al., Reference MacIntyre, Ross, Clément, Lamb, Csizér, Henry and Ryan2019; Dörnyei, Reference Dörnyei2020). In contrast with emotions, motivation is specifically goal-oriented and reflects the effort and investment learners put into the FL learning process and in the development of a new identity (Ushioda & Dörnyei, Reference Ushioda, Dörnyei, Dörnyei and Ushioda2009). Dörnyei (Reference Dörnyei2020) argues that some emotions may have “a certain amount of goal-directed quality” (p. 121) but that they are not directly linked to goal–specific action like motivation. At most they may have vague “action tendencies” (p. 121). He disagrees with MacIntyre et al. (Reference MacIntyre, Ross, Clément, Lamb, Csizér, Henry and Ryan2019) that positive emotions are intrinsically motivating and negative emotions demotivating. He sees emotions as occupying the backseat in the learning process: “they can sustain and amplify existing motivation […] and they can also instigate the generation of new, goal–directed behavioral scripts (i.e., motives proper)” (Dörnyei, Reference Dörnyei2020, p. 121–122). Dörnyei and Henry (Reference Dörnyei, Henry and Elliot2022) view emotions as the fuel that sustains motivation. This view is not unlike the one proposed in Dewaele et al. (Reference Dewaele, Saito and Halimi2023) where FLE emerged as a buoy that could help sustain sagging motivation.
The current study adopts a perspective where motivation and emotions are intertwined, analyzing sociopsychological individual differences through both motivational constructs (Ideal and Ought–to L2 Self) and emotional aspects (FLCA and FLE).
Longitudinal investigations of motivation, emotions, and L2 learning
Researchers have been increasingly interested in the dynamic nature of motivation and emotions and how they connect to L2 learning. Longitudinal research designs have been used with multiple data collection points over various periods, ranging from a couple of days (Guedat-Bittighofer & Dewaele, Reference Guedat-Bittighoffer and Dewaele2023) to one academic semester (Elahi Shirvan & Taherian Reference Elahi Shirvan and Taherian2021). Several longitudinal studies have investigated how participants’ motivation and emotion profiles at the outset of the project shaped their L2 learning gains. For instance, Saito et al. (Reference Saito, Dewaele and Hanzawa2017) explored how 40 Japanese EFL students with diverse motivation profiles, including integrativeness, instrumentality, and international posture, developed their L2 speech proficiency (comprehensibility) over the course of one academic semester (3 months). The results indicated that those with vague, long–term goals for learning L2 English improved their comprehensibility (although not nativelikeness) in L2 speech proficiency.
Similarly, by tracking the speech development of 83 Chinese EFL students over one academic semester, Zhou and Papi (Reference Zhou and Papi2023) discovered that the strength of participants’ self-images as their ideal selves could predict the development of L2 comprehensibility (see also Nagle, Reference Nagle2018, involving English speakers learning Spanish). Interestingly, these findings observed in classroom settings were not replicated in naturalistic environments. For example, Sun et al. (Reference Sun, Saito and Dewaele2024) did not find the Ideal Self to be a significant predictor for L2 speech development among 50 Chinese learners of English during their four–month study abroad period.
Regarding the relationship between emotions and L2 learning gains, there are short–term intervention studies (1-2 hours) suggesting that individuals experiencing greater negative emotions (anxiety), tend to show less learning gain when receiving instruction (e.g., Sheen, Reference Sheen2008; Miller & Godfroid, Reference Miller and Godfroid2020). As for longitudinal investigations, Li and Wei (Reference Li and Wei2023) investigated how positive and negative emotions related to L2 achievement, measured through in–house English tests, among a total of 954 junior secondary English learners at four intervals (T1-T4) over the course of one semester (3 months). Structural equation modeling revealed that these emotions independently showed associations with English achievement at T2 (one week later) and T3 (five weeks later). However, only FLE was significantly associated with achievement at T4 (nine weeks later), while the significant associations between FLCA and acquisition disappeared at the end of the project.
Motivation for the current study
Existing literature has provided both cross–sectional and longitudinal evidence supporting a significant association between motivation, FLE, FLCA, and the rate of L2 speech learning (and general L2 learning proficiency and outcomes). However, only a few studies have explored the directionality of the link between motivation, FLE, and FLCA on the one hand and L2 learning on the other hand. Such an investigation could be directly relevant to the theoretical core of major motivation and emotions frameworks (e.g., L2 Motivational Self System, FLCA, and FLE). These frameworks address the relationship between different types of motivation and emotions and their differential impacts on L2 learning experience and development. Considering the nascent literature on the topic, it seems that three different positions have emerged:
-
1. Motivation and emotions as a driving or inhibiting force for L2 learning (see Figure 1 Model 1): The first position posits that motivation and emotions are key driving forces for L2 learning. It suggests that learners, when engaged in practice opportunities (i.e., experience) with stronger motivation, strong positive and weak negative emotions can maximize each input opportunity, leading to increased learning gains (e.g., Moskovsky et al., Reference Moskovsky, Assulaimani, Racheva and Harkins2016; Papi, Reference Papi2018; Papi & Teimouri, Reference Papi and Teimouri2014).
-
2. Enhanced motivation and positive emotions through L2 learning (see Figure 1 Model 2): The second perspective argues that greater L2 learning experience and development can lead to enhanced motivation and enjoyment and reduce anxiety (Botes et al., Reference Botes, Dewaele and Greiff2020, 2022a). For example, incorporating real–life task activities can boost motivation (Heydarnejad et al., Reference Heydarnejad, Tagavipour, Patra and Farid Khafaga2022), and more meaning–focused activities can increase enjoyment (Dewaele et al., Reference Dewaele, Guedat‐Bittighoffer and Dat2024).
-
3. Interdependence of motivation, emotions, and L2 learning (see Figure 1 Model 3): The third perspective suggests that the relationship between motivation and emotions and L2 learning is intertwined, with multidirectional causality and nonlinear trajectories. Within this position, some emphasize strong interdependence (i.e., correlations) between motivation, emotions, and L2 learning without specifying directionality (e.g., Dörnyei, Reference Dörnyei2005), while others underline the time–specific nature of the relationship. Motivation and emotions may initially drive the learning, but once a certain level of proficiency is attained, the newly developed abilities can strengthen motivation and positive emotions in a positive feedback loop while reducing negative emotions (e.g., Li et al., Reference Li, Dewaele, Pawlak and Kruk2022).
![](https://static.cambridge.org/binary/version/id/urn:cambridge.org:id:binary:20250210094524016-0010:S0272263125000038:S0272263125000038_fig1.png?pub-status=live)
Figure 1. Visual Summary of Model Comparisons. FLE_Social for Foreign Language Enjoyment Social; FLE_Personal for Foreign Language Enjoyment Personal; FLCA for Foreign Language Classroom Anxiety; Experience for percentage of L2 use inside classrooms.
In addition to examining which of these three models best explains data, it is crucial to adopt a long–term longitudinal design to investigate the directionality of the relationship between motivation, emotions, and acquisition. This is because L2 learning is often slow, gradual, and dynamic, especially in countries where learners have limited access to L2 input outside the classroom. The investigation of the directionality between motivation, emotions, and acquisition is both theoretically and pedagogically relevant. Educators need to know whether enhancing and maintaining students’ motivation and positive emotions while reducing their negative emotions should be a priority, as it can anchor and accelerate L2 learning in the long run (motivation/emotions → learning). Alternatively, helping students perceive their progress first may be the best remedy for boosting motivation and emotions (learning → motivation/emotions). It is also possible that targeting both motivation, emotions, and L2 development as equal pedagogical focuses could lead to an optimal L2 learning loop (motivation, emotions ↔ learning).
Saito et al. (Reference Saito, Dewaele and Hanzawa2017) and Zhou and Papi (Reference Zhou and Papi2023) have tracked L2 learning behaviors at multiple points while measuring motivation and emotions only once. Li and Wei (Reference Li and Wei2023) took a different approach, examining the links between emotion and learning outcomes at five testing points, revealing limited changes in emotion and L2 proficiency over time. Similarly, other studies conducted longitudinal investigations of changes in motivation and emotions, highlighting the relatively stable nature of these variables (e.g., Dewaele et al., Reference Dewaele, Saito and Halimi2023; Elahi Shirvan & Taherian, Reference Elahi Shirvan and Taherian2021). The lack of significant change in motivation, emotions, and L2 outcomes may be attributed to the relatively short–term timeframe of these studies (one academic semester: 3-4 months). The current study is designed to address these methodological concerns, looking at the longitudinal relationship between changes in motivation, emotions, and L2 speech proficiency for an extensive period of classroom experience (i.e., 1.5 years).
Current study
In this research project, we explored the interplay between motivation, emotions, and L2 speech proficiency in a cohort of 121 Japanese high school students over a period of 1.5 years. The study was conducted at a well–regarded public high school located in northern Japan. Data collection was longitudinal, occurring at four intervals. We initially assessed the students’ L2 speech proficiency shortly after they commenced high school, providing sufficient time for their adaptation to the new learning environment (T1: Summer 2016). Further assessments of L2 speech proficiency were conducted at intervals of 6 months (T2: Winter 2016), 10 months (T3: Spring 2017), and 18 months (T4: Winter 2018). In addition, surveys to evaluate the students’ profiles in terms of motivation, emotions, and language learning experiences were conducted at T2, T3, and T4.
In our initial report based on data from T2 and T3 (Saito et al., Reference Saito, Dewaele, Abe and In’nami2018), we examined the predictive capacity of motivation and emotions for short–term L2 speech development over a 4–month period. The current final report extends the analysis to the entire duration of 1.5 years, aiming to clarify the directionality of the relationship between changes in motivation, emotions, and L2 speech development over a long period. Here, we were particularly interested in three different timings which were assumed to represent three distinct phases of L2 speech learning within the classroom context:
-
• T1-T2 (first 6 months): This short–term period represents how quickly participants can improve their L2 speech while adjusting to the new learning environment.
-
• T1-T3 (10 months): This mid–term period reflects how much proficiency they can achieve after spending sufficient time adapting to the learning environment.
-
• T1-T4 (1.5 years): This long–term period captures the extent to which they continue improving their L2 speech proficiency over the long run.
To explore how changes in motivation, emotions, and L2 speech proficiency evolved at different time points (T1, T2, T3, T4), we employed a two–step approach. The first step involved assessing the magnitude of change in motivation, emotions, and L2 speech proficiency over the research period. This was examined using the mean–based analysis—one–way repeated ANOVAs. Subsequently, we examined the interdependencies between shifts in motivation, emotions, and L2 speech proficiency, utilizing relative gain scores. To this end, the variance-based approach—structural equation modeling (SEM)—was employed to determine which theoretical model in Figure 1 most aptly fits the observed data at each time point.
The central research question probes the dynamic interaction between young EFL students’ motivation, emotions, and their L2 speech proficiency, particularly in the context of their language practice in their classroom. We posited three potential patterns of interaction:
-
1. Enhanced motivation, positive emotions catalyzing increased L2 speech development (motivation/emotions → acquisition; Moskovsky et al., Reference Moskovsky, Assulaimani, Racheva and Harkins2016; Papi, Reference Papi2018)
-
2. Advancements in L2 speech proficiency bolstering motivation/emotions (acquisition → motivation/emotions; Botes et al., Reference Botes, Dewaele and Greiff2020, 2022a; Heydarnejad et al., Reference Heydarnejad, Tagavipour, Patra and Farid Khafaga2022)
-
3. A bidirectional relationship between changes in motivation/emotions and L2 speech proficiency (motivation/emotions ↔ acquisition; Dörnyei, Reference Dörnyei2005; Li et al., Reference Li, Dewaele, Pawlak and Kruk2022)
Given that our longitudinal data highlight three distinct phases of L2 speech learning trajectories (6 months, 10 months, and 1.5 years), we also introduced a fourth possibility:
-
4. The correlational versus causal nature of the relationship between motivation, emotions, and L2 speech learning may vary depending on the specific time point in the learning process within classroom settings, with motivation and emotions serving as crucial driving forces, especially for long–term L2 speech development (Moyer, Reference Moyer2014).
Method
Participants
A total of 121 first–year high school students (ages 15-16) were recruited. They included 50 male and 71 female participants. Their backgrounds in English as a Foreign Language (EFL) were diverse, as indicated by their varied age of EFL onset (M = 10.0 years, SD = 3.2, Range = 3-13 years), total duration of EFL study (M = 874.1 hours, SD = 564.5, Range = 150-4,500 hours), and length of extracurricular EFL instruction, such as study school attendance (M = 492.0 hours, SD = 564.5, Range = 0-3,150 hours). Given that all students were admitted to the high school based on a uniform entrance examination, their English proficiency levels were relatively homogeneous. Their general proficiency, as measured by the EIKEN Test in Practical English Proficiency, ranged between Grade Pre-2 and Grade 2. This level of proficiency corresponds to A2 (basic user) and B1 (independent user) on the Common European Framework of Reference for Languages scale (EIKEN Foundation of Japan, 2017).
During the project duration, from Summer 2016 to Winter 2018, the students were taught according to a standard EFL syllabus. They were required to attend seven EFL classes each week, with each session lasting 50 minutes. The classes were delivered by three Japanese teachers, each possessing very advanced proficiency in English. Comprehensive classroom observations were undertaken, and the details of these observations are presented in the Supporting Information-S1. As previously established in EFL studies (e.g., Larson-Hall, Reference Larson-Hall2008; Muñoz, Reference Muñoz2014), participants’ L2 learning experience profiles were collected using self–report questionnaires at three different time points (T2, T3, and T4).
To obtain ethical approval from both participating students and their parents, a set of steps was carefully taken. First, the teachers invited the students’ parents to attend a meeting where the investigators of the project explained the academic purpose of the project (i.e., tracking high–school students’ L2 speaking proficiency development) and the precautionary measures taken for personal information protection. After obtaining the parents’ consent, the teachers explained the same details to the students before they took the speaking tests and the experience and motivation/emotion questionnaires. Once the students agreed to participate, they signed the consent form at the end of the questionnaires.
Motivation and emotions questionnaires
To capture participants’ motivation and emotions, a composite questionnaire was administered. The questionnaire comprised 8 items for motivation and 18 items for emotion. It was specifically designed to measure the distinct aspects of motivation as conceptualized in Dörnyei’s L2 Motivational Self System (Dörnyei, Reference Dörnyei2005), which includes Ideal Self and Ought–to Self, and FLCA as well as FLE outlined in Dewaele and MacIntyre (Reference Dewaele and MacIntyre2014). Although a more recent model of motivation, the 2 × 2 model of future self-guides, was proposed by Papi et al. (Reference Papi, Bondarenko, Mansouri, Feng and Jiang2019), this could not be incorporated into the questionnaire due to the timing of data collection (2016 to 2018). Similarly, the recent conceptualization of FLE distinguishes one superordinate and three lower–order dimensions, FLE Teacher, FLE Social, and FLE Personal (Botes et al., Reference Botes, Dewaele and Greiff2020).
In the current study, FLE was operationalized for Social (n = 5 items) and Personal (n = 5 items) based on Dewaele and MacIntyre’s (Reference Dewaele and MacIntyre2014) Principal Components Analysis of the 21 FLE items. FLE_Social refers to good social relationships with peers and teachers while FLE_Personal refers to “an internal sense of enjoyment in the face of challenges” (p. 231). The decision to focus on the two lower–order dimensions was based on the need for a higher level of granularity and previous findings that FLE_Personal was a better predictor of gain than FLE_Social (Dewaele & MacIntyre). All items were based on a 6-point scale.
FLCA was measured through eight items first used by Dewaele and MacIntyre (Reference Dewaele and MacIntyre2014) and later validated in Botes, Westhuizen et al. (Reference Botes, van der Westhuizen, Dewaele, MacIntyre and Greiff2022b). The items refer to mild–to–strong physical symptoms of anxiety and to manifestations of social anxiety. Participants responded to each statement on a 6-point scale. The reliability of the composite questionnaire was checked via the Cronbach Alpha coefficient. Given that all the reliability coefficients exceeded .70 (see Table 2), they could be considered acceptable in relation to the field–specific guidelines (Larson-Hall, Reference Larson-Hall2015). The composite questionnaire is deposited in L2 Speech Tools and is publicly available for replication (Mora-Plaza et al., Reference Mora-Plaza, Saito, Suzukida, Dewaele and Tierney2022: https://sla-speech-tools.com/).
L2 speech proficiency measures
To collect and assess a relatively large number of participants’ speech within a short period without any significant delay (within 10 days), all participants took an automated English–speaking test, Telephone Standard Speaking Test (TSST) developed by ALC Press Inc. in Japan. The TSST was modeled after the Oral Proficiency Interview format established by the American Council on the Teaching of Foreign Languages. Scores range from one to nine, with nine being the highest score. Comprehensive validation studies have been conducted to confirm the reliability and validity of the TSST (for a comprehensive summary, see Koizumi, Reference Koizumi2017, Reference Koizumi, Chapelle and Voss2021). For instance, notable findings regarding the TSST’s validation include high test–retest reliability (r = .73), strong agreement with the face–to–face Standard Speaking Test (r = .89), and significant correlations with external proficiency tests (ranging from .6 to .8; e.g., EIKEN, TOEFL). This validation evidence underscores the TSST’s effectiveness and reliability in evaluating the speaking proficiency of English language learners in an automated and efficient manner.
Using their home phones or cell phones at their convenience, participants were instructed to take the exam in a quiet room to ensure their speech was clear for accurate ratings, as background noise can negatively impact raters’ assessments. The testing agency monitored participants’ progress and completion of the test. Their test attendance and scores were factored into their final assessment by their EFL teachers. In cases where technical issues arose (e.g., unstable cell phone signals), participants were permitted to retake the test, but only once. Such instances were exceedingly rare, with only one participant failing and consequently dropping out of the study. To facilitate the tracking and comparison of participants’ test scores over time, they were given a deadline to complete the test, typically within 10 days of receiving the phone number for the test.
Each test session lasted 15 minutes, during which participants responded to 10 recorded questions. These questions were randomly selected from a diverse databank, ensuring exposure to varied content at each testing point (T1 to T4). Designed to elicit spontaneous speech, participants had no prior knowledge of the questions and were given 45 seconds to formulate and provide a detailed answer for each. To eliminate any potential confusion regarding task instructions, they were provided in both English and Japanese. The questions varied in response type, including three narrative, three descriptive, and four reasoning questions. Each speech sample submitted through the TSST was evaluated by three trained judges. This tripartite evaluation approach ensured high reliability and agreement among the judges (Koizumi, Reference Koizumi2017). The structured and monitored testing procedure, combined with the standardized evaluation by trained judges, contributed to the validity and reliability of the speech proficiency assessments in the study.Footnote 1
EFL experience questionnaire
Participants’ experiences of learning L2 English were measured using a questionnaire adapted from the EFL Experience Questionnaire (Saito et al., Reference Saito, Dewaele and Hanzawa2017). While a range of items were surveyed in the questionnaire, the most relevant experience variable for the current study—participants’ recent experience of learning L2 English in classroom settings—was included in the analysis. To quantify the degree to which they practiced L2 English, participants were asked to report the percentage of time spent using L2 English during English classes (for similar methods, see Larson-Hall, Reference Larson-Hall2008; Muñoz, Reference Muñoz2014).
In the current investigation and analysis, the participants’ experience outside classrooms was not included. The participants’ experiences outside classrooms varied substantially in many different ways: modes (conversational activities with L1 vs. L2 English speakers; reading vs. listening; presence vs. absence of after–school activities) and quantity (some reporting zero hours, others reporting up to 100 hours per week). As a result, each dataset was neither normally distributed nor comparable. Given that the sample size (n = 121 students) was rather small for SEM analyses, we decided to focus on one key experience variable most relevant to the project (i.e., L2 use inside classrooms) and not to include multiple extracurricular experience variables to avoid weakening statistical power. For open science purposes and for those interested in follow–up analyses, the participants’ L2 use outside classrooms is available in the shared dataset. Supplementary materials are available via the Open Science Framework (https://osf.io/tacuy/?view_only=4d07da7d753a4a00915a15cf27db549b).
Results
Measurement invariances
To examine the degree to which constructs were properly measured over time, we investigated the longitudinal measurement invariance of motivation (i.e., Ideal L2 Self and Ought–to L2 Self) and emotions (i.e., FLE [Social, Personal] and FLCA). Specifically, each subconstruct was separately modeled as a latent variable at each time point, and the data–model fit was assessed. For example, to investigate the measurement invariance of Ideal L2 Self, we modeled it as a latent variable loaded on the four observed variables (for these variables or items, see L2 Speech Tools [https://sla-speech-tools.com/]) at each time point. The model was tested for (a) configural invariance (i.e., whether the one-factor model with no equal constraints on parameters across time fit the data well), (b) weak invariance (i.e., whether the model fit the data well, with factor loadings specified to be equal across time), and (c) strong invariance (i.e., whether the model fit the data well, additionally with intercepts specified to be equal across time). Thus, parameters were gradually constrained to be of equal size, with no constraints on configural invariance, constraints on factor loadings for weak invariance, and constraints on factor loadings and intercepts for strong invariance. Satisfying these measurement invariances suggests that changes between time points are related to the latent variable being modeled. Following the procedures in Kline (Reference Kline2023), Nagle (Reference Nagle2023), and Vandenberg and Lance (Reference Vandenberg and Lance2000), the measurement invariance was analyzed for each construct/sub-construct (Motivation [Ideal, Ought-to], Enjoyment[Social, Personal], and Anxiety) at each time point (T2, T3, and T4). In the current investigation, we considered constructs with levels of configural, weak, and strong invariance as adequate for longitudinal analyses. However, we exercised caution when constructs failed to reach at least the configural level of invariance, as this indicated that what comprised these constructs may have varied at each time point.
The descriptive statistics for each item in Table 1 show similar means and standard deviations over time, particularly for motivation, suggesting little change longitudinally. For example, for Ideal L2 Self Item 1, these values were 3.21 and 1.55 at Time 2, 3.34 and .53 at Time 3, and 3.19 and 1.57 at Time 4. The results for measurement invariance in Table 2 were mixed depending on the constructs. For motivation, both Ideal L2 Self and Ought–to L2 Self showed strong invariance. For emotions, FLE Social showed strong invariance, whereas FLE Personal showed only configural invariance. The results suggest that the components of Ideal L2 Self, Ought–to L2 Self, and FLE (i.e., how the questionnaire responses factored into these constructs) remained relatively stable and longitudinal changes in Ideal L2 Self, Ought–to L2 Self, and FLE (Social, Personal) were considered to reflect changes in these constructs. In contrast, FLCA showed none of these invariances. Any observations related to longitudinal changes in FLCA in the current dataset should be interpreted cautiously since the changes could be due to the measurement nonequivalence of questionnaire items, not actual changes in the constructs.
Table 1. Descriptive statistics for items measuring motivation and emotions
![](https://static.cambridge.org/binary/version/id/urn:cambridge.org:id:binary:20250210094524016-0010:S0272263125000038:S0272263125000038_tab1.png?pub-status=live)
Note: The questionnaires to measure motivation and emotions were administered at Times 2 through 4. They were not administered at Time 1. Each item is on a scale of 1 through 6. Item numbers correspond to those in L2 Speech Tools https://sla-speech-tools.com/ (Composite Motivation & Emotions Questionnaire). For example, Ideal L2 Self Item 1 in the table above corresponds to “I imagine myself as someone who is able to speak English.” in the supplementary material. FLE for Foreign Language Enjoyment; FLCA for Foreign Language Classroom Anxiety. aGrand means by construct.
Table 2. Fit indices for the tests of measurement invariance of motivation and emotions
![](https://static.cambridge.org/binary/version/id/urn:cambridge.org:id:binary:20250210094524016-0010:S0272263125000038:S0272263125000038_tab2.png?pub-status=live)
Note: df = degrees of freedom. CFI = comparative fit index. RMSEA = root mean square error of approximation. CI = confidence interval. SRMR = standardized root mean square residual; aThe configural invariance model specifies the same factor structure with no equal constraints on parameters across time; bThe weak (i.e., metric) invariance model specifies the same factor structure with equal factor loadings across time; cThe strong (i.e., scalar) invariance model specifies the same factor structure with equal factor loadings and intercepts across time. FLE for Foreign Language Enjoyment; FLCA for Foreign Language Classroom Anxiety.
Motivation, emotions, and L2 speech development
Table 3 summarizes statistics for motivation, emotions, experience, and L2 speech proficiency profiles over time. The initial aim of the statistical analyses was to evaluate the overall development of participants’ motivation (Ideal L2 Self and Ought–to L2 Self), emotions (FLE [Social, Personal] and FLCA), and L2 speech proficiency. For motivation, FLE and FLCA, one–way repeated measure ANOVAs were conducted, using Ideal L2 Self, Ought–to L2 Self, FLE [Social, Personal], and FLCA scores as dependent variables and Time as the independent variable (T2-T4). The results indicated no significant development in terms of Ideal L2 Self, F(2, 202) = .763, p = .468, η² = .022, Ought–to L2 Self, F(2, 202) = 2.803, p = .063, η² = .027, FLE Social, F(2, 202) = .463, p = .630, η² = .004, FLE Personal, F(2, 202) = 1.922, p = .149, η² = .018, and FLCA, F(2, 202) = .050, p = .951, η² = .001.
Table 3. Summary statistics for motivation, emotions, experience, and L2 speech proficiency profiles over time
![](https://static.cambridge.org/binary/version/id/urn:cambridge.org:id:binary:20250210094524016-0010:S0272263125000038:S0272263125000038_tab3.png?pub-status=live)
Note: FLE for Foreign Language Enjoyment (6-point); FLCA for Foreign Language Classroom Anxiety (6-point); TSST for Telephone Standard Speaking Test (9-point). aThis t0_to_t1_relative_gain score is .00000252100840336522. bThis t0_to_t2_relative_gain score is −.00000181818181818153. cThis t0_to_t2_relative_gain score is −.00000142857142857808.
In contrast, a significant increase over time with large effect sizes was observed in participants’ L2 speech scores with relatively large effects, F(2, 202) = 41.127, p < .001, η² = .272. Multiple comparison analyses revealed significant improvements in participants’ TSST scores from T1 to T4 (p < .016; Bonferroni corrected). The gains were statistically significant from T1 to T2 (t = -3.382, p < .001, d = .28), from T2 to T3 (t = -3.107, p = .002, d = .028), and from T3 to T4 (t = -5.00, p < .001., d = .42).
These findings suggest that while there was clear improvement in L2 speech proficiency throughout the project, learners’ motivation and emotions remained relatively stable, showing no significant shifts despite the participants’ enhanced L2 proficiency. This result did not appear to clearly support any of our hypotheses regarding the link between motivation, emotions, and acquisition.
Model comparisons
The second aim of the statistical analysis was to examine which theoretical model in Figure 1 most aptly fit the observed data at each time point. The three models have different predictions—(1) motivation and enjoyment trigger L2 speech development (while anxiety delays it) (motivation/emotions → acquisition), (2) L2 speech development boosts motivation and enjoyment and it limits anxiety (acquisition → motivation/emotions), and (3) motivation, emotions and L2 speech development are interwoven (motivation/emotions ↔ acquisition). We compared the two causality models (Models 1 and 2) and one correlation model (Model 3) via Structural Equation Model (SEM). To fit models, the Maximum Likelihood with robust standard errors and a scaled test (MLR) was used with the lavaan package (Rosseel, Reference Rosseel2012) in R.
In light of our focus on the impact of motivation and emotions on L2 speech development, and given the significant improvement in participants’ L2 proficiency from T1 to T4, the analyses focused on participants’ improved L2 proficiency scores at T2 (six months), T3 (ten months), and T4 (1.5 years) relative to the onset of the project (T1, 0 months). To determine the extent of learning at each data collection point, residual T2, T3, and T4 scores were generated while controlling for T1 scores. T1 scores reflected their L2 English outcomes after years of EFL learning experiences in various settings before entering high school. Therefore, the residual T2, T3, and T4 scores indicated what they had learned after six months, ten months, and 1.5 years of study at high school, factoring out any influence of their diverse L2 English learning unrelated to the current project (i.e., how well they already spoke L2 English prior to the project).Footnote 2
Conversely, due to the lack of significant shifts in participants’ motivations and emotions profiles, their raw motivation and emotion scores at T2, T3, and T4 were used, considering these scores as relatively stable traits in the initial ANOVA analyses.
Given that all theories on L2 speech development consider experience as the main catalyst for acquisition (e.g., Flege & Bohn, Reference Flege, Bohn and Wayland2021), the initial two causality models incorporate a hierarchical relationship between experience and enhanced L2 speech proficiency. As mentioned in the Current Study section, the L2 learning experience was operationalized as the extent to which participants practiced L2 English within classrooms, and such experience profiles were measured as participants’ self-reports of the percentage of time spent using L2 English in class.
The first causal model posits that motivation and emotions determine the degree of L2 practice within classroom settings, which subsequently influences L2 speech learning gains (Figure 1 Model 1). Meanwhile, the second causal model suggests that while increased practice opportunities bolster L2 speech proficiency, such advancements in L2 speech capabilities can, in turn, bolster motivation and enjoyment while reducing anxiety (Figure 1 Model 2). Conversely, the correlational model suggests a complex interplay among motivation and emotions, experience, and acquisition, without a predefined causal direction. This model integrates two motivation variables, three emotion variables, and one experience variable, each purportedly impacting participants’ gain scores (Figure 1 Model 3). The results of SEM are presented at T2 (6 months), T3 (10 months), and T4 (1.5 years), respectively.
To evaluate each model, we used fit indices and criteria recommended in the SEM literature: (a) a Comparative Fit Index (CFI) value of .90 or higher, (b) a Root Mean Square Error of Approximation (RMSEA) value of .08 or lower, and (c) a Standardized Root Mean Square Residual (SRMR) value of .08 or lower (Browne & Cudeck, Reference Browne, Cudeck, Bollen and Long1993). To compare models, we used Akaike Information Criterion (AIC) and Bayes information criterion (BIC) values. Models with smaller values were considered more appropriate. Along with these statistical criteria, the substantive interpretability of the models was also considered. Note that Model 3, being a perfect–fit model (i.e., a saturated model; e.g., Kline, Reference Kline2023), did not allow for the evaluation of its fit using indices such as CFI. However, it was possible to compare it with other models using indices such as AIC.
-
• T2-6 Months: According to the results of the chi–square tests (see Table 4), Model 1 did not fit the data well (CFI = .371, RMSEA = .141 [95% confidence interval = .072, .217], and SRMR = .081). Model 2 fit the data well (CFI = .997, RMSEA = .024 [.000, .133], and SRMR = .046). Model 3, being a saturated model, did not allow for evaluating its fit. Comparatively, Model 2 showed lower AIC and BIC values (1902.773 and 1980.588, respectively) than Model 3 (2248.045 and 2345.315, respectively). Thus, we considered Model 2 to be the best model to explain the data at T2. As visually summarized in Figure 2, the regression analyses revealed significant relationships between experience, enhanced L2 proficiency, and emotions, indicating that the extent of practice in classrooms positively correlated with greater L2 learning gains (β = .188, p = .026). This increase in L2 proficiency was associated with an increase in positive emotions (β = .377, p < .001 for FLE Social; β = .312, p < .001 for FLE Personal) and a decrease in negative emotion (β = -.200, p = .006 for FLCA).
-
• T3-10 Months: Ten months after the project began, neither causal Model 1 nor Model 2 adequately fit the data, as indicated by their fit indices overall (CFI = .577, RMSEA = .138 [.063, .219], and SRMR = .076 for Model 1; CFI = .890, RMSEA = .167 [.092, .249], and SRMR = .083 for Model 2). Consequently, only the correlational Model 3 received support at this time point in the dataset. As visually summarized in Figure 3, the correlational analysis within Model 3 revealed significant associations between FLE and in–classroom experience (r =.371, p < .001 for FLE Social enjoyment; r = .285, p < .001 for FLE Personal enjoyment), as well as the one between motivation and improved L2 proficiency (r = .332, p =.001 for Ideal L2 Self).
-
• T4-1.5 Years: After an extensive period since the outset of the project (1.5 years), Model 1 fit the data well (CFI = 1.000, RMSEA = .000 [.000, .061], and SRMR = .021), whereas Model 2 showed poor fit (CFI = .929, RMSEA = .156 [.078, .242], and SRMR = .093). Model 3, as a saturated model, was not suitable for fit evaluation. Comparatively, Model 1 showed lower AIC and BIC values (1652.377 and 1728.743, respectively) than Model 3 (1969.708 and 2065.166, respectively). The results of the regression analysis indicated the following relationship between motivation, emotions and acquisition: Those with stronger motivation (Ideal L2 Self) showed more input and engagement within their classroom experience (β = .301, p = .017); such increase in in–class L2 experience led to more L2 speech learning gains (β = .236, p = .018). For a visual summary, see Figure 4.
Table 4. Comparisons of Model 1 (motivation/emotions → acquisition), Model 2 (acquisition → motivation/emotions), and Model 3 (motivation/emotions ↔ acquisition)
![](https://static.cambridge.org/binary/version/id/urn:cambridge.org:id:binary:20250210094524016-0010:S0272263125000038:S0272263125000038_tab4.png?pub-status=live)
Notes: χ² = Chi–square test of model fit; df = degrees of freedom; CFI = Comparative Fit Index; RMSEA = Root Mean Square Error of Approximation; CI = Confidence Interval; SRMR = Standardized Root Mean square Residual. AIC = Akaike Information Criterion; BIC = Bayesian Information Criterion. aModel 3 is a saturated model—a type of a model that fits the data perfectly, making it impossible to test the model-data fit (e.g., Kline, Reference Kline2023).
![](https://static.cambridge.org/binary/version/id/urn:cambridge.org:id:binary:20250210094524016-0010:S0272263125000038:S0272263125000038_fig2.png?pub-status=live)
Figure 2. Best-fitting model at Time 2 (6 months from Time 1): Model 2.
Note: Values in the figure are standardized estimates. FLE for Foreign Language Enjoyment; FLCA for Foreign Language Classroom Anxiety. Improved L2 proficiency refers to TSST gain scores from Time 1 to Time 2.
![](https://static.cambridge.org/binary/version/id/urn:cambridge.org:id:binary:20250210094524016-0010:S0272263125000038:S0272263125000038_fig3.png?pub-status=live)
Figure 3. Best(-fitting) model Time 3 (10 months from Time 1): Model 3.
Note: Values in the figure are standardized estimates. FLE for Foreign Language Enjoyment; FLCA for Foreign Language Classroom Anxiety. aTSST gain scores from Time 0 to Time 3.
![](https://static.cambridge.org/binary/version/id/urn:cambridge.org:id:binary:20250210094524016-0010:S0272263125000038:S0272263125000038_fig4.png?pub-status=live)
Figure 4. Best-fitting model at Time 4 (1.5 years from Time 1): Model 1.
Note: Values in the figure are standardized estimates. FLE for Foreign Language Enjoyment; FLCA for Foreign Language Classroom Anxiety. Improved L2 proficiency refers to TSST gain scores from Time 1 to Time 4.
Discussion
This study investigated the relationship between individual differences in motivation, emotions (FLE and FLCA), and L2 speech learning among 121 Japanese EFL high school students over a 1.5–year period. We tested four hypotheses: two causality models positing that enhanced motivation and emotions drive L2 speech learning (Model 1: motivation/emotions → acquisition) and that improved L2 speech proficiency boosts motivation and emotions (Model 2: acquisition → motivation/emotions), alongside a correlation model suggesting a bidirectional relationship between motivation, FLE, FLCA and acquisition without a predefined direction (Model 3: motivation/emotions ↔ acquisition). Finally, we explored a fourth hypothesis suggesting that the nature of the relationship between motivation, emotions, and L2 speech learning may evolve over different phases of EFL speech learning (6 months, 10 months, 1.5 years), with the predictive power of motivation and emotions becoming stronger over time (Models 2 and 3 → Model 1).
The findings revealed three main insights: First, the motivation and emotions measured through questionnaires remained stable throughout the project. Second, participants’ L2 speech proficiency consistently improved at each testing point (6 months, 10 months, and 1.5 years). Third, while the causal roles of motivation and emotions in L2 speech learning remained unclear within the first one year of the study, the motivation/emotions → acquisition model (Model 1) emerged as the best–fitting model for explaining long–term outcomes (1.5 years).
Overall, our findings corroborate previous longitudinal evidence suggesting that L2 learners’ motivation and emotion profiles exhibit only mild fluctuations in classroom settings (e.g., Dewaele et al., Reference Dewaele, Saito and Halimi2023). This in turn suggests that motivation and emotions could be a statistic construct when profiles are operationalized via questionnaire instruments and averaged on a group level (Jiang & Dewaele, Reference Jiang and Dewaele2015). One reason for this could be the fact that the instruments are too blunt to measure what happens below the surface (i.e., group averages). In contrast, the participants’ L2 speech proficiency significantly improved in classroom environments through consistent participation in English education (Jaekel et al., Reference Jaekel, Schurig, Florian and Ritter2017) and through increased in–class practice within such settings (Muñoz, Reference Muñoz2014). Our findings align with the major theoretical paradigm demonstrating that L2 learners possess the capacity to acquire new sounds and languages regardless of age and contexts, given sufficient input opportunities (Flege & Bohn, Reference Flege, Bohn and Wayland2021).
The results of the ANOVA analyses presented here—highlighting the stability of motivation/emotions alongside the dynamism of L2 speech learning—do not straightforwardly align with any of the postulated hypotheses (motivation/emotions → acquisition, acquisition → motivation/emotions, and motivation/emotions ↔ acquisition). Longitudinal analysis elucidated that motivation, emotions, and acquisition are not directly comparable due to the contrasting nature of these elements (i.e., stable motivation/emotions vs. dynamic L2 speech learning). Instead, the findings suggest a possible pathway: motivation and emotions might initially amplify opportunities for practicing the target language in classroom settings; this positive link between motivation, emotions, and practice could, in turn, lead to increased autonomy and optimism which could lead to more L2 use in classrooms, which could—in turn—enhance L2 speech proficiency (stable motivation and emotions → increased practice opportunities → acquisition; Dewaele & Meftah, Reference Dewaele and Meftah2024; Dörnyei, Reference Dörnyei2020).
However, the assertion about the stability of motivation and emotions needs to be interpreted with caution and thus further examined with more balanced research designs. Scholars have repeatedly argued that individuals’ motivation and emotions are fluctuating at every point of L2 use (Waninge, Dörnyei, & De Bot, Reference Waninge, Dörnyei and De Bot2014). The practice of measuring and averaging learners’ motivation and emotions may not accurately reflect its dynamic, time–sensitive nature (Zhang, Dai, & Wang, Reference Zhang, Dai and Wang2020; Talebzadeh et al., Reference Talebzadeh, Elahi Shirvan and Khajavy2020). To more effectively capture this phenomenon, it is crucial to employ diverse instruments beyond questionnaires, such as motometers and interviews (Elahi Shirvan & Taherian, Reference Elahi Shirvan and Taherian2021 for emotion; Liu & Wang, Reference Liu and Wang2017 for motivation), or classroom observations and interviews to understand the specific causes of fluctuations (Dewaele & Pavelescu, Reference Dewaele and Pavelescu2021).
Turning to the results of the SEM, our findings support an additional possibility that the causal role of motivation and emotions in L2 speech learning (stable motivation/emotions → acquisition [Model 1]) could be clearer when the relationship was examined for a longer period of time whereas this model was not the best fit within the first year of the project.
In the initial six months (T1-T2), for example, the best fit model suggested that an increase in classroom L2 practice could be associated with enhanced L2 speech proficiency, which in turn influenced motivation and emotions profiles, particularly emotional aspects—FLE and FLCA. This in turn supports the acquisition → motivation/emotions model (Model 2) that posits L2 proficiency gains as determinants of motivation and emotions (Botes et al., Reference Botes, Dewaele and Greiff2020, 2022a). The influence of L2 speech proficiency on motivation and emotion is plausible, especially given that substantial learning tends to occur during the first few months in new L2 learning environments (Flege & Bohn, Reference Flege, Bohn and Wayland2021). During this early phase, learners’ speech proficiency may improve regardless of their initial motivation and emotional state; rather, any measurable gains in L2 proficiency during this period may further enhance their motivation and emotions toward learning the L2.
Around the time of 10 months (T1-T3), the causal relationship became less clear, suggesting a mutual influence between motivation, emotions, experience, and acquisition (motivation/emotions ↔ acquisition [Model 3]). This suggests that not only do motivation and emotions have strong associations with acquisition, but these relationships are also equally connected to the extent of target language practice in classroom settings.
Over the longer term (1.5 years; T1-T4), a clear directionality finally emerged between greater motivation and emotions, specifically Ideal L2 Self, and more classroom practice opportunities, which then positively affected L2 speech learning gains, lending empirical support to the motivation/emotions → acquisition model (Model 1). The results here support the views that Ideal Self can lead learners to engage in more practice (Dörnyei, Reference Dörnyei2020) and that L2 speech proficiency improves as a function of increased input and output opportunities (Flege & Bohn, Reference Flege, Bohn and Wayland2021).
Taken together, the findings support our fourth hypothesis that the relationship between motivation, emotions, and L2 speech learning evolves at different stages of EFL learning (short–, mid–, and long–term), with the predictive power of motivation and emotions becoming more pronounced in the long term. This observation corroborates the feedback loop model proposed by Li, Dewaele, et al. (Reference Li, Dewaele, Pawlak and Kruk2022), which posits that elevated motivation and enjoyment, combined with lower anxiety, foster language acquisition, which in turn, impacts motivation and emotions. Here, our findings introduce the notion of a reverse feedback mechanism that applies across both short–, mid– and long–term contexts of classroom L2 speech learning:
-
• Short– and Mid–Term (< 1 year): It appears that learners are driven to enhance their L2 speech proficiency, irrespective of their initial profiles of motivation and emotions, particularly within the setting of students newly enrolled in a prestigious school. This motivational phase seems to extend up to a year of EFL learning, during which motivation, emotions and acquisition begin to mutually influence one another as has been found in other contexts (Dewaele & Meftah, Reference Dewaele and Meftah2024).
-
• Long-Term (> 1 year): Beyond the one–year mark within the same educational context, learners who have attained a certain level of L2 speech proficiency might hit a developmental plateau. It is at this juncture that a stable motivation and an optimal emotional profile become crucial, motivating learners to persist in their extensive practice and extracting joy in doing so while keeping anxiety in check. The ability to nurture motivation and enjoyment allowing them to intertwine is the basis for further progress and achievement in the L2 speech. The framework proposed here relates to cross–sectional evidence indicating that among highly experienced L2 learners in classroom settings, those with highly advanced L2 speech proficiency often exhibit strong professional motivation (Moyer, Reference Moyer2014).
The analogy that initial motivation and enjoyment act as a nurturing nest from which learners, akin to fledgling birds, eventually venture out, underscores the importance of these affective factors. Once motivation and enjoyment are fully grown, they encourage learners to persist in high–quality language practice, thereby enhancing and achieving advanced L2 proficiency (Dewaele & Meftah, Reference Dewaele and Meftah2024).
Finally, unlike FLE, the current dataset did not identify FLCA as a significant predictor of the amount of L2 practice or development at any point. The only suggested pathway is that improved L2 speech proficiency leads to reduced FLCA, particularly within the first six months (T1-T2). These findings align with recent research, which shows that while cross–sectional evidence supports a correlation between FLCA and L2 learning outcomes (e.g., Teimouri et al., Reference Teimouri, Goetze and Plonsky2019), longitudinal data have not found FLCA to play a significant role in L2 learning outcomes (Li & Wei, Reference Li and Wei2023).
Our findings echo Dewaele and MacIntyre’s (Reference Dewaele and MacIntyre2014) original hypothesis that, although FLE and FLCA may overlap, they represent fundamentally different phenomena. Together with recent research, our tentative conclusion is that FLE is more closely related to the factors currently driving L2 learning in classroom settings, such as ongoing classroom practices (e.g., peer engagement and cooperation; Li, Huang et al., Reference Li, Huang and Li2021), teaching methods (e.g., more communicative vs. traditional/language–focused methods; Dewaele et al., Reference Dewaele, Guedat‐Bittighoffer and Dat2024), teacher characteristics (e.g., the use of humor; Dewaele et al., Reference Dewaele, Saito and Halim2022), and long–term development behaviors (Li & Wei, Reference Li and Wei2023; Saito et al., Reference Saito, Dewaele, Abe and In’nami2018). Consequently, FLE appears to predict the extent to which L2 learning occurs (Dewaele et al., Reference Dewaele, Guedat‐Bittighoffer and Dat2024; Li & Wei, Reference Li and Wei2023; Saito et al., Reference Saito, Dewaele, Abe and In’nami2018). In contrast, FLCA seems to be a more stable, trait–like emotion that can have a disruptive, narrowing effect in the moment (Dewaele & MacIntyre, Reference Dewaele and MacIntyre2014) and/or a consequence (rather than predictor) of L2 learning and use (as suggested by the current dataset).
Limitations
Looking forward, we acknowledge limitations and propose directions for further research as follows:
-
1. Only a limited number of motivation dimensions and learner emotions were considered. The effect of more dimensions from within the self–system framework could be explored (e.g., Papi et al., Reference Papi, Bondarenko, Mansouri, Feng and Jiang2019) as well as a wider range of positive and negative emotions (e.g., Boredom and Peace of Mind; Dewaele & Meftah, Reference Dewaele and Meftah2024; Li & Wei, Reference Li and Wei2023).
-
2. Second, participants’ L2 use was operationalized through self-reports in the current study. To narrow down the scope of the experience directly relevant to the main focus of the study (how much participants had practiced L2 English within classroom settings), the method itself has been problematized by many scholars. For example, a majority of previous L2 motivation studies have used criterion measures wherein participants rated their L2 use using a 6-point scale for a range of broad statements (e.g., “I am prepared to expend a lot of effort in learning English”). As Al-Hoorie et al. (Reference Al-Hoorie2018) pointed out, such practice (described as the “questionnaire curse”) does not clearly capture the multifaceted nature of participants’ EFL experience. Notably, certain scholars have attempted to capture the dynamic nature of L2 learning experiences in various settings by asking participants to track precisely how much and how they practiced L2 English daily using a range of online instruments (e.g., The Lang-Track-App; Arndt, Granfeldt, & Gullberg, Reference Arndt, Granfeldt and Gullberg2023). Other scholars have proposed more sophisticated measures to capture the depth and intensity of the EFL experience (e.g., social network analysis; Strawbridge, Reference Strawbridge2023). However, we also caution that simply including as many experience variables as possible may not be a realistic option for future L2 speech research. To our knowledge, there is little research that can successfully link such highly complex experience profiles to any aspects of L2 learning. Additionally, including numerous experience variables as predictors would necessitate a large sample size, which would otherwise result in very low statistical power. Finally, some scholars have argued that experience variables alone may not fully explain the outcomes of L2 speech learning, as certain learners with greater aptitude may be able to make better use of every input and output opportunity than others (Mora, Reference Mora, Derwing, Munro and Thomson2022). More relevant discussion and future studies are needed to further address these methodological and conceptual concerns.
-
3. Relatedly, Papi and Hiver (Reference Papi and Hiver2024) proposed a new framework of qualitatively different L2 learning experiences. Under this framework, L2 learning experiences can be further categorized according to learners’ agentic behaviors, such as input-seeking, interaction-seeking, information-seeking, and feedback-seeking. Future studies should disentangle the dynamic interplay between different types of motivation, qualitatively different learning experiences and behaviors, and acquisition (cf., Papi & Khajavy, Reference Papi and Khajavy2021; Papi et al., Reference Papi, Bondarenko, Mansouri, Feng and Jiang2019).
-
4. It is important to remember that the degree of measurement invariance for FLCA was below the configural level, suggesting that the way the questionnaire items factored into the construct of FLCA varied at each time point. That is, any relevant findings related to FLCA may not be comparable across these different time points (T2-T4). However, there is still much ongoing discussion regarding how many constructs underlie the complex phenomenon of L2 emotions (e.g., Dewaele & MacIntyre, Reference Dewaele and MacIntyre2014, for Enjoyment and Anxiety; Li & Wei, Reference Li and Wei2023 for Enjoyment, Boredom, and Anxiety) and what kinds and how many questionnaire items are needed to reliably tap into these supposedly different constructs (e.g., Botes et al., Reference Botes, Dewaele and Greiff2022a, for the Short–Form Foreign Language Classroom Anxiety Scale). To our knowledge, no studies have ever explored how these constructs remain the same or change when data is collected at different time points. Relatedly, there is a consensus that the key emotion constructs may be difficult to fully validate. The emotions are not “pure” and they do not operate in a social vacuum. Dewaele and MacIntyre (Reference Dewaele and MacIntyre2014) pointed out that Enjoyment is a multifaceted emotion, including learners’ feelings of pride, creativity, pride, interest, fun, excitement in class combined with social factors that may change over time such as the relationship with peers and the perception of support and friendliness of the teacher. Learner emotions are part of a complex and dynamic system which means that isolated incidents such as mocking by peers, a harsh remark by a teacher, or a disappointing test result could all undermine the learners’ positive feelings and dent their enjoyment. MacIntyre (Reference MacIntyre, Gkonou, Daubney and Dewaele2017) underlined the existence of fluctuations in anxiety but he could have been talking about any other learner emotion: “Anxiety is continuously interacting with several other learner, situational and other factors including linguistic abilities, physiological reactions, self–related appraisals, pragmatics, interpersonal relationships, specific topics being discussed, type of setting in which people are interacting and so on” (p. 23). The decision about what items to include in instruments to tap into different but interacting emotions is thus by definition fraught. In the current investigation, we would like to acknowledge (a) that we simply adopted well–researched questionnaire instruments for motivation (Dörnyei, Reference Dörnyei2005) and emotion (Dewaele & MacIntyre, Reference Dewaele and MacIntyre2014) and followed the analysis protocols for comparability with the ample previous literature using the same methods; and (b) that the development and validation of context–specific motivation and emotion questionnaires were beyond the main focus of the current study. Thus, we now call for future studies to further examine whether, to what degree, and how these oft–used questionnaires (typically used in cross–sectional datasets) can be adapted when applied to longitudinal datasets.
-
5. Lastly, L2 speech proficiency was measured using a proficiency test (TSST). Research shows diverse progress and results across L2 speech proficiency subdimensions; while learners may quickly advance in fluency and lexicogrammar, segmental accuracy tends to show greater individual variance and is influenced by learners’ aptitude (e.g., Saito, Suzukida, & Sun, Reference Saito, Suzukida and Sun2019 for phonemic coding; Kachlicka et al., Reference Kachlicka, Saito and Tierney2019 for auditory processing; Darcy, Mora & Daidone, Reference Darcy, Mora and Daidone2016 for inhibitory control). Future studies should investigate if the relationship between motivation, emotions, and acquisition varies between relatively easy aspects (e.g., fluency) and more difficult aspects (e.g., segmental details; see Saito, Suzuki et al., Reference Saito, Suzuki, Oyama and Akiyama2021).
Conclusion
Tracking the development of 121 Japanese EFL students over 1.5 years, this study explored the dynamic relationships between motivation, positive and negative emotions, and L2 speech proficiency. The mean–based analyses (ANOVAs) suggest that motivation and emotions remain relatively stable for these beginner EFL learners, fueling greater engagement with in–class practice and subsequent gains in L2 speech. Notably, the variance-based analyses (SEM) indicate the nature of the relationship between motivation, emotions, and L2 speech proficiency—whether causal or correlational—may evolve (Li et al., Reference Li, Dewaele, Pawlak and Kruk2022). At the outset of their language learning journey, many L2 learners actively engage with the target language, enhancing their proficiency. This initial phase may exhibit a reciprocal influence between learners’ motivation, emotions, and their language learning, particularly within the first year of EFL learning. As learners adapt and progress, the significance of robust motivation, growing enjoyment and limited anxiety becomes evident. It fuels consistent language practice and facilitates ongoing improvement in L2 speech proficiency.
Supplementary material
The supplementary material for this article can be found at http://doi.org/10.1017/S0272263125000038.
Acknowledgements
We are grateful to all the participating teachers, school administrators, and students who supported and encouraged us throughout every stage of this project. We were fortunate to have had the opportunity to work with them over three consecutive years. We also appreciate the helpful comments provided by the SSLA reviewers and the journal editor, Luke Plonsky, on earlier versions of this paper. This project was funded by the Grant-in-Aid for Scientific Research in Japan (16H03455) awarded to Mariko Abe, and the UK-ISPF Grant (1185702223) and the Leverhulme Trust Grant (RPG-2024-391) awarded to Kazuya Saito.