Orthographic effects on pronunciation refer to the orthographic form affecting language learners’ pronunciation accuracy. Research has shown that orthography induces both positive and negative impacts on second language (L2) pronunciation (Bürki et al., Reference Bürki, Welby, Clément and Spinelli2019). Some studies found a facilitative effect of orthography with L2 learners making fewer phoneme errors in their L2 production with both orthographic and audio input than solely audio input (Erdener & Burnham, Reference Erdener and Burnham2005; Rafat, Reference Rafat2015). Conversely, other studies found that exposure to orthography led to less target-like pronunciation (Bassetti, Reference Bassetti2017; Bassetti & Atkinson, Reference Bassetti and Atkinson2015). For example, the study on native English-speaking learners of German found that an orthographic input often provides contradictory information regarding German final devoicing (Hayes-Harb et al., Reference Hayes-Harb, Brown and Smith2018).
How L2 orthography affects L2 phonology could depend on whether L1 and L2 use the same script. Bassetti (Reference Bassetti, Piske and Young−Scholten2008) has pointed out the importance of investigating L1 and L2 with different scripts. Bassetti (Reference Bassetti, Piske and Young−Scholten2008) categorized two types of orthographic effects — inter-orthographic and intra-orthographic effects. An inter-orthographic effect refers to L2 learners applying the L1 grapheme-to-phoneme correspondence (GPC) rules to interpret the L2 orthography that resembles L1. An intra-orthographic effect refers to L2 learners recoding L2 graphemes with incorrect L2 GPCs. Therefore, an inter-orthographic effect presents only when L1 and L2 share similar scripts. Otherwise, only intra-orthographic effects are possible.
Inter-orthographic effects depend on shared scripts and transparent L1 orthographies
The study of inter-orthographic effects largely centered on speakers with L1 and L2 that are both alphabetic with a different orthographic depth (Katz & Frost, Reference Katz and Frost1992). Orthographic depth refers to a transparent-to-opaque continuum in which the GPCs of a language writing system vary to different degrees. A language is considered to have a transparent orthography (e.g., Italian and Spanish) if it has unambiguous and regular GPCs such that its orthography reliably represents pronunciation. In contrast, a language is considered to have an opaque orthography (e.g., Dutch and English) if its GPCs are inconsistent such that phonemic interpretations vary with context. For example, in English, the letter “a” is mapped to /æ/ in “apple”, /ɑː/ in “father”, /ə/ in “about”, and /eɪ/ in “base”.
A previous study by Erdener and Burnham (Reference Erdener and Burnham2005) demonstrated that native users of transparent scripts might rely more on orthographies during L2 processing than those whose native languages were less orthographically transparent. L2 learners with a transparent L1 script tend to be misled by L2 orthography when it does not congruently map onto the L2 phonology (i.e., incongruent GPCs), whereas L2 learners with an opaque L1 script may have a weaker connection between orthography and phonology. Moreover, congruence between the L1 and L2 GPCs could aid L2 phonological accuracy, whereas incongruence would inhibit L2 learning (Escudero et al., Reference Escudero, Simon and Mulak2014).
Inter-orthographic effects were found in English as a second language (ESL) learners with Italian as a L1 (Bassetti, Reference Bassetti2017; Bassetti & Atkinson, Reference Bassetti and Atkinson2015). Both Italian and English are alphabetic. Italian is orthographically transparent, while English is orthographically opaque. Italian participants applied the Italian double-letter convention to interpret English graphemes, resulting in non-target-like L2 pronunciation (Bassetti, Reference Bassetti2017; Bassetti & Atkinson, Reference Bassetti and Atkinson2015). They tended to pronounce a longer vowel/consonant duration in digraphs than in singletons in English such that /i:/ was produced longer in “seen” than in “scene”. The same alphabetic letter-based script allowed Italian GPCs to override L2 phonetic knowledge in English. Inter-orthographic effects depend on shared scripts between the L1 and the L2, and L1 with a transparent orthography such that the L2 learners tend to decode L2 orthography using their L1 GPCs.
The role of L1 transparency in intra-orthographic effects
Few studies explored intra-orthographic effects by examining speakers with L1 and L2 having different writing systems. Sokolović-Perović et al. (Reference Sokolović-Perović, Bassetti and Dillon2020) examined the influence of number of letters on the duration of consonants and vowels in the English pronunciation of late Japanese–English sequential bilinguals. Even though Japanese has a non-alphabetic script, Japanese bilinguals produced a longer sound when they saw a double-letter English consonant. There is no correspondence between double letters and extended sounds in Japanese. For English, consonant lengthening (gemination) is not distinctive. For example, the double letters “nn” in “dinner” does not correspond to /n:/. Nevertheless, some multimorphemic words are geminated in English. For instance, “misspell” is pronounced as /ˌmɪsˈspel/. Intra-orthographic effects provided a possible explanation in this case. The Japanese participants developed an incorrect conception of English gemination during their L2 acquisition. It was internal to English since there was no such source in Japanese or English.
Japanese contains two non-alphabetic writing systems, Kanji and Kana. Kanji is based on orthographically opaque characters, while Kana is based on highly transparent syllables. Native users of transparent alphabetic language scripts tend to depend on orthographic information to access L2 phonology (Erdener & Burnham, Reference Erdener and Burnham2005). Sokolović-Perović et al. (Reference Sokolović-Perović, Bassetti and Dillon2020) argued that the transparent though non-alphabetic Kana would make participants rely more on English orthographic forms when producing English words. Unlike inter-orthographic effects, in which specific L1 orthography-phonology correspondences are transferred through shared scripts between the L1 and the L2, intra-orthographic effects might involve the transfer of a general L1 reading strategy that relies heavily on orthography to decode phonology. To test this mechanism, it is important to examine L2 learners with an orthographically opaque L1 such that they do not habitually form a strong connection between orthography and phonology during L1 processing.
Cantonese and Mandarin Chinese have non-alphabetic and opaque orthography
In this study, we investigated intra-orthographic effects in ESL learners with Chinese as a L1 as Chinese script is opaque and non-alphabetic. Chinese is logographic, reflecting semantic designs rather than phonological structure with its written units corresponding to ideographs (Perfetti et al., Reference Perfetti, Rieben and Fayol1997). Chinese characters map onto syllabic morphemes as the speech units. Some characters are the simplest pictographs that cannot be further divided, such as 人 (/jan4/ in Cantonese Chinese, the fourth tone; “man”), 山 (/saan1/ in Cantonese Chinese, the first tone; “mountain”), and 木 (/muk6/ in Cantonese Chinese, the sixth tone; “wood”). Some characters are associative compounds of two or more pictographs or ideographs that refer to meaning. For example, 信 (/seon3/ in Cantonese Chinese, the third tone; “letter” or “believe”) is made up of the pictographs 人 and 言 (/jin4/ in Cantonese Chinese, the fourth tone; “language”); and 休 (/jau1/ in Cantonese Chinese, the first tone; “break”) contains the pictographs 人 and 木. Many of the Chinese characters are phono-semantic compounds made up of semantic radicals and phonetic-related components. For example, 睬 (/coi2/ in Cantonese Chinese, the second tone; “look”) is made up of a semantic radical 目 (“eye”) and a phonetic component 采 (/coi2/ in Cantonese Chinese, the second tone; “pick”).
In contrast to Japanese, Chinese characters are highly opaque as they do not possess clear segmental structures relating to phonology. The connection between Chinese orthography and phonology is considered substantially weak. For example, the pictograph 女 (/neoi5/ in Cantonese Chinese, the fifth tone; “female”) and the compound 安 (/on1/ in Cantonese Chinese, the first tone; “safe”) share the same component 女, but they have entirely different pronunciations. The pictograph 衣 (/ji1/ in Cantonese Chinese, the first tone; “clothes”) and the compound 醫 (/ji1/ in Cantonese Chinese, the first tone; “heal”) look completely different yet they have the same pronunciation. Even for characters with a phonetic-related compound, the phonetic compound may only provide limited phonetic clues due to discrepant tones and consonants. For example, the character 份 (/fan6/ in Cantonese Chinese, the sixth tone; “part”) consists of a phonetic component 分 (/fan1/ in Cantonese Chinese, the first tone; “divide”), but 分 and 份 have different pronunciations due to tonal differences. The character 鍾 (/zung1/ in Cantonese Chinese, the first tone; “clock”) consists of a phonetic component of 童 (/tung4/ in Cantonese Chinese, the fourth tone; “child”), but 童 and 鍾 have different pronunciations due to differences in both consonants and tones.
The current study included both Cantonese ESL learners from Hong Kong and Mandarin ESL learners from Mainland China as participants. Although sharing logographic written forms that are considered highly opaque, Cantonese and Mandarin are considered as two different Chinese dialects with distinctive phonemes, tones, grammars, and vocabularies (Snow, Reference Snow2004). Cantonese is spoken in mainly the south of China (e.g., Hong Kong, Macau, and Guangzhou). Mandarin is the formal language in China and is spoken in mainland China and Taiwan. Unlike Hong Kong people who learn to read Cantonese by rote only, Mainland people learn to read Mandarin by Pinyin, a Romanized phonetic script to represent Mandarin phonemes. Pinyin is considered to have a highly transparent orthography with highly consistent letter-sound correspondences (Bassetti, Reference Bassetti, Andreas, Xin and Yexin2007). English and Mandarin Pinyin share many letters, but they do not sound the same across the two languages.
L2 proficiency, phonological awareness, and orthographic effects
Bassetti et al. (Reference Bassetti, Mairano, Masterson and Cerni2020) and Veivo and Järvikivi (Reference Veivo and Järvikivi2013) found a negative relationship between L2 proficiency and inter-orthographic effects. Escudero et al. (Reference Escudero, Simon and Mulak2014) found that L2 proficiency significantly predicted performance, but only in perceiving difficult pseudoword pairs. Some studies found that orthographic knowledge could lead to L2 misconception regardless of language proficiency. For example, Erdener and Burnham (Reference Erdener and Burnham2005) tested Turkish and English natives using naïve languages, while Bassetti and Atkinson (Reference Bassetti and Atkinson2015) tested Italian experienced ESL learners, both studies reported inter-orthographic effects regardless of L2 proficiency. Studies that found a significant effect argued that low-proficiency learners failed to link the orthographic representations and the phonological representations together; besides, their inadequate phonological representation led to more pronunciation errors (Escudero et al., Reference Escudero, Simon and Mulak2014; Veivo & Järvikivi, Reference Veivo and Järvikivi2013). On the other hand, high-proficiency learners might have integrated orthographic and phonological representations with high phonological accuracy.
Similarly, Bassetti et al. (Reference Bassetti, Mairano, Masterson and Cerni2020) demonstrated that more accurate L2 phonological representation as reflected by higher phonological awareness in L2 learners was linked to weaker inter-orthographic effects on L2 production. Phonological awareness refers to the awareness of and the ability to reflect, analyze, and manipulate speech sounds (Leong et al., Reference Leong, Tan, Cheng and Hau2005; McBride-Chang, Reference McBride-Chang1995). It helps children initially grasp the letter−sound relationships in word reading (Treiman, Reference Treiman, Sawyer and Fox1991). For Italian ESL learners in Bassetti et al. (Reference Bassetti, Mairano, Masterson and Cerni2020), higher L2 phonological awareness on consonant length being not contrastive in English was related to a smaller double-consonant-to-single-consonant duration ratio in their English production. That is, better performance on L2 phonological awareness predicted weaker inter-orthographic effects on L2 production. Actually, higher L2 proficiency also predicted weaker inter-orthographic effects on L2 production in that study. More proficient Italian ESL learners probably had a better understanding that singleton and geminate consonants were not contrastive in English as in Italian; therefore, they were less likely to decode L2 orthography using their L1 GPCs.
These previous studies (e.g., Bassetti et al., Reference Bassetti, Mairano, Masterson and Cerni2020; Veivo & Järvikivi, Reference Veivo and Järvikivi2013) investigated the relationship between inter-orthographic effects and L2 proficiency, and phonological awareness in alphabetic native language users. The relationship between intra-orthographic effects and L2 proficiency, and L2 phonological awareness is yet to be answered. It is possible that L2 proficiency is particularly important for intra-orthographic effect because it is internal to L2 orthography without transfer of the incongruent L1-to-L2 GPCs.
The present study
The current study extended the investigation of intra-orthographic effects to ESL learners with non-alphabetic and orthographically opaque Chinese as L1. Cantonese ESL learners from Hong Kong learn to read Chinese by rote such that they do not have an L1 reading strategy that places great reliance on orthography to decode phonology. By examining intra-orthographic effects in Cantonese ESL learners from Hong Kong in Experiment 1, this study tested if the transfer of such L1 reading strategy was the primary mechanism for intra-orthographic effects on L2 production. Experiment 1 would also test if L2 phonological awareness and L2 proficiency predicted intra-orthography effects on L2 production.
Experiment 2 compared orthographic effects experienced by Cantonese ESL learners from Hong Kong and Mandarin ESL learners from Mainland China. Having learned Chinese through transparent Pinyin scripts, Mandarin ESL learners develop an L1 reading strategy to access phonology through highly consistent orthography-phonology correspondences. As Pinyin shares a script with English, inter-orthographic effects are possible in addition to intra-orthographic effects. Experiment 2 aimed to compare the performance between Cantonese and Mandarin ESL learners to examine the impact of knowing Pinyin on orthographic effects.
Experiment 1
By studying Cantonese experienced ESL learners from Hong Kong, Experiment 1 examined intra-orthographic effects on L2 pronunciation in ESL learners with an L1 having opaque orthography and a completely different writing system from the L2. These native Cantonese speakers from Hong Kong have Cantonese as their dominant language and typically begin to learn English in kindergartens. Although English is one of the official languages in Hong Kong, these Cantonese speakers rarely use English outside classrooms. Hong Kong ESL learners live in a non-immersive environment, and exposure to native spoken English is far from enough (Wong et al., Reference Wong, Dealey, Leung and Mok2021). These Cantonese natives typically speak English with a Cantonese accent despite years of English-learning experience in classrooms throughout their primary, secondary, and tertiary education. Other researchers have considered them as advanced or experienced ESL learners (Chan, Reference Chan2019) or ESL students (Gan, Reference Gan2012) rather than bilinguals. Therefore, our participants from Hong Kong were considered as experienced ESL learners rather than early bilinguals. It is also noteworthy that English teaching in Hong Kong predominantly uses exercises emphasizing rote memorization, such as dictation and grammar practice (Poon, Reference Poon2010). English-teaching classrooms in Hong Kong emphasize reading and writing skills rather than listening and speaking. Public admission examinations in Hong Kong also focus on reading, grammar, and composition rather than communication and speaking accuracy (Evans, Reference Evans1996).
Previous studies have suggested that native users of opaque orthographies may adopt a “whole-picture” mental representation in lexical reading, while speakers of transparent languages depend on segmental orthographic representation (Erdener & Burnham, Reference Erdener and Burnham2005; Lemhöfer et al., Reference Lemhöfer, Dijkstra, Schriefers, Baayen, Grainger and Zwitserlood2008). Without a L1 reading habit of forming a solid connection between orthography and phonology to transfer to L2 processing, Cantonese participants are expected to rely less on English orthographic forms when producing English words. Instead, their L1 reading strategy of using the “whole-picture” to represent lexical items might work well for English as English has an opaque orthography with inconsistent GPCs. Therefore, the use of the L1 reading strategy would predict minimal orthographic effects, at least for the silent-letter stimuli, which involve irregular GPC rules. If intra-orthographic effects are found in English production of Cantonese participants, it implies that the Cantonese participants are relying on the graphemes to decode the pronunciation to certain extent, although such reading strategy is not transferred from their L1. Such a finding would rule out the transfer of the L1 reading strategy that places great reliance on orthography to decode phonology as the key mechanism behind intra-orthographic effects. More importantly, the presence of intra-orthographic effects in Cantonese participants would also suggest that they are not completely transferring the “whole-picture” reading strategy from L1 to process English words.
Experiment 1 also investigated the role of L2 phonological awareness and L2 proficiency in predicting intra-orthographic effects on L2 production. Since the intra-orthographic effects do not involve the transfer of L1 orthography-phonology correspondences to decode L2 orthography, L2 phonological awareness and L2 proficiency should be the primary factors in determining the quality of L2 phonological representation, thereby predicting intra-orthographic effects.
Past literature implied that phonological awareness skill was not always fully developed in adults (Moran & Fitch, Reference Moran and Fitch2001; Spencer et al., Reference Spencer, Schuele, Guillot and Lee2008). ESL learners could show a wide range of English phonological awareness skills depending on their prior experience with alphabetic literacy in their L1s (Holm & Dodd, Reference Holm and Dodd1996; Read et al., Reference Read, Yun-Fei, Hong-Yin and Bao-Qing1986). In this study, the Cantonese ESL learners have an opaque L1 orthography and learn Cantonese phonology by rote memorization. Frequently engaging in syllable-level processing in L1, Cantonese-speaking young adults from Hong Kong were shown to have limited phonological awareness in English and increased difficulty in processing nonwords in English compared to native speakers from Australia, as well as ESL learners from Mainland China and Vietnam, who all have an alphabetic literacy in their L1s (Holm & Dodd, Reference Holm and Dodd1996). These findings suggest that early literacy processing skills from L1 are transferred to ESL learning (Holm & Dodd, Reference Holm and Dodd1996). Hence, it is predicted that Cantonese ESL learners, especially those with low phonological awareness and proficiency in English, would show strong intra-orthographic effects on L2 production in the current study.
To measure the extent of orthography influences on L2 pronunciation, the homophone and silent-letter read-aloud tasks from Bassetti and Atkinson (Reference Bassetti and Atkinson2015) were adopted. The homophone read-aloud task was used to examine L2 speakers’ production accuracy of English homophones to test if participants’ production would be biased by the differences in orthography. For example, “aloud” and “allowed” are spelled differently but have the same pronunciation. Pronouncing the two words differently indicates orthographic effects that participants judge the pronunciation by orthographies. Silent letters refer to letters lacking phonetic correspondences, such as “b” in “lamb” (/læm/) and “l” in “salmon” (/ʼsæmən/). A failure to omit the production of silent letters results in an orthography-induced epenthesis — insertion of a sound with a grapheme but without a phonetic counterpart — indicating orthographic effects on L2 pronunciation (Bassetti & Atkinson, Reference Bassetti and Atkinson2015).
Italian ESL learners in Bassetti and Atkinson (Reference Bassetti and Atkinson2015) produced on average 40% of the stimuli as non-homophonic pairs in the homophone read-aloud task. For silent-letter read-aloud task, 85% of stimuli were pronounced with added phonemes. The findings showed inter-orthographic effects that incongruence between L1 and L2 GPCs led to L2 production mistakes. For Cantonese participants, since there are no shared scripts between the L1 and the L2, transferring L1 orthography-phonology correspondences to L2 is not possible. High error rates of pronouncing homophonic pairs as non-homophonic and a high rate of orthography-induced epenthesis would not indicate inter-orthographic effects, but rather intra-orthographic effects that are internal to the L2.
Phonological awareness in English was measured through a phoneme deletion task and a pseudoword read-aloud task. High accuracy rates in these tasks reflect strong phonological awareness. Participants’ overall English proficiency level was indicated by their scores in the Hong Kong Diploma of Secondary Education (HKDSE) in English language, a standardized public exam for university admission in Hong Kong. HKDSE results are also widely accepted by more than 280 tertiary institutions worldwide.
Method
Participants
Fifty-four undergraduate students with Cantonese as a L1 and English as a L2 from City University of Hong Kong were recruited from the Basic Psychology participant pool and received course credit for their participation. Participants aged between 18 and 23 years (M = 19.2, SD = 1.37)Footnote 1 with no reported history of language, hearing, or reading impairments. Participants’ onset of English learning ranged from age 1 to 5 years (M = 3.35, SD = 1.03), and their years of learning ESL was 12−20 (M = 15.8, SD = 1.86). Table 1 shows the demographic and language of the participants.
Table 1. Demographic and Language of Participants in Experiment 1

Note. M = mean. SD = standard deviation.
Materials
Materials used in Experiment 1 (as described below) can be accessed via https://osf.io/njd7z/.
Language history questionnaire
A language history questionnaire was used to collect participants’ demographic data regarding their gender, age, grades of English language in public examinations and other English-learning history, such as age of acquisition (AoA) and duration of living in English-speaking countries.
Homophone read-aloud task
Stimuli included 24 homophonic word pairs (Appendix A), half of which was adopted from Bassetti and Atkinson (Reference Bassetti and Atkinson2015) and the rest was from an online homophone database (Aloisi, Reference Aloisi2008).
Silent-letter read-aloud task
Stimuli included eight target words with silent letters (Appendix B) adopted from Bassetti and Atkinson (Reference Bassetti and Atkinson2015). Each target word contains one of the three silent letters “b”, “d”, or “l”. There were four words for “b” and two words for “d” and “l”, respectively.
Phoneme deletion task
Thirty-two target words (Appendix C), including 16 congruent and 16 incongruent stimuli, were adopted from Tyler and Burnham (Reference Tyler and Burnham2006). Congruence of stimuli refers to the deletion of the first phoneme resulting in a string of letters that match the spelling of the correct phonological response. For instance, deletion of the first phoneme in a congruent stimulus, “bride” (/braɪd/), results in a phonological response, /raɪd/, which matches the spelling of “ride”. On the other hand, deletion of the first phoneme in an incongruent stimulus, “worth” (/wɜːθ/), results in a phonological response, /ɜːθ/, that mismatches the spelling of “orth” (/ɜːθ/ ≠ “orth”). Instead, /ɜːθ/ is pronounced as “earth”. The stimuli were recorded by a female native speaker of English with a British accent. She read aloud each stimulus three times, and the recording with the most natural intonation and moderate speed was selected as stimulus.
Pseudoword read-aloud task
Fifteen pseudowords sounding like real words without any semantic content (e.g., “burd” pronounced as “bird”; Appendix D) were adopted from Lukatela and Turvey (Reference Lukatela and Turvey1991).
Procedure
Participants were seated individually in front of a Windows-running PC in the Social Science Laboratories at City University of Hong Kong. A Logitech H340 USB headset with a mounted microphone was used to present the audio stimuli and capture participants’ spoken responses. Instruction was provided verbally in Cantonese by a trained experimenter prior to each task. First, participants gave consent and filled in the language history questionnaire through Qualtrics (Qualtrics, 2005). Then participants completed all other tasks through Paradigm (Perception Research Systems, 2007) in a standardized order as follows: homophone and silent-letter read-aloud, phoneme deletion and pseudoword read-aloud.
For the homophone and silent-letter read-aloud task, all homophonic words and silent-letter words were presented in a randomized order. After receiving the instruction, participants were told to press SPACEBAR to begin whenever they were ready. For each trial, one stimulus word was shown visually at the center of the computer screen for participants to read aloud. Participants were given as much time as they need and were instructed to press SPACEBAR after their verbal responses. The next stimulus would then be presented after 5 s. There were 32 trials in total. No feedback was given in the experimental trials.
For the phoneme deletion task, participants went through a demonstrating trial and two practice trials with auditory answers prior to the experimental trials. Participants were visually presented a stimulus at the center of the computer screen and heard its auditory form simultaneously. They were required to pronounce it without the first phoneme. After each trial, participants pressed SPACEBAR and waited 5 seconds for the next trial. There were 32 trials in total. No feedback was given in the experimental trials.
Procedure for the pseudoword read-aloud task was the same as the other read-aloud tasks, except that a demonstrating trial and two practice trials with auditory answers were presented prior to the experimental trials. There were 15 trials in total. All spoken responses were captured and recorded by Paradigm. Participants were given breaks in between the tasks and were debriefed upon completion of the whole study.
Results
Data of each task in Experiment 1 can be accessed via https://osf.io/njd7z/.
Two homophonic pairs “caught, court” and “sauce, source”, which are considered nonhomophones by rhotic speakers, were excluded from analysis. Since the instruction did not specify which accent to adopt for the task, the two pairs were excluded to avoid potential effects specific to accents. Participants’ spoken responses from the phoneme deletion task and the read-aloud tasks were scored by two qualified Cantonese-speaking English teachers with prior English phonetics training. Rater 1 was a high school teacher with 7 years of teaching experience. Rater 2 was a primary school English teacher with 8 years of teaching experience. They were blinded to the hypotheses of this study and were asked to listen and rate participants’ pronunciation independently according to the scoring scheme. Table 2 presents the scoring scheme for each task. There was no restriction on the order of rating or the number of times they could listen to the recordings. The inter-rater reliability was high for all tasks. Cohen’s Kappa was .91 for the silent-letter read-aloud task, .81 for the homophone read-aloud taskFootnote 2 , and .95 for both phoneme deletion and pseudoword read-aloud tasks, respectively. Participants’ grades in HKDSE in English Language were also coded for analysis (Table 2). Table 3 presents the descriptive statistics for all tasks.
Table 2. Scoring Scheme for all Tasks and the Hong Kong Diploma of Secondary Education Examination (HKDSE) in English Language in Experiment 1

Note. To provide a finer discrimination of candidates’ ability at the top end, level 5** is awarded to the highest-achieving 10% (approximately) level 5 candidates and level 5* is awarded to the next highest-achieving 30% (approximately) level 5 candidates.
Table 3. Descriptive Statistics for Error Rates in Homophone and Silent-Letter Read-aloud Tasks, and Accuracy Rates in Phoneme Deletion and Pseudoword Read-aloud Tasks in Experiment 1

Homophone read-aloud task
Participants had a mean error rate of 13.2% (SD = 15%) that they produced homophonic word pair as nonhomophones. Word pair “seas, seize” had the highest error rate of 29.6%; word pairs “son, sun” and “thai, tie” had the lowest error rate of 1.85%. No obvious error pattern was found.
Silent-letter read-aloud task
The mean error rate for Cantonese participants was 32.2% (SD = 19.2%). However, participants’ performance was inconsistent with the silent letter ‘b’ that “climb” (61.1%) and “lamb” (61.1%) had the highest error rate, but “comb” had a low error rate of 22.2%. A Chi-Square test of independence indicated that the distribution of participants’ epentheses across the silent letter “b” stimuli, “climb”, “lamb”, and “comb”, was significantly different, χ2 (2) = 21.8, p < .001.
For the silent letter “l”, “walk” yielded a relatively lower error rate of 7.4% than its counterpart “salmon” (46.3%)Footnote 3 . A Chi-Square test of independence confirmed that the distribution of participants’ epentheses across the two stimuli was significantly different, χ2 (1) = 20.8, p < .001. As AoA and exposure to L2 affects L2 learning (Indefrey, Reference Indefrey2006), a possible explanation is that “walk” (M = 3.45) was acquired at an earlier age than “salmon” (M = 8) (Kuperman et al., Reference Kuperman, Stadthagen-Gonzalez and Brysbaert2012). Another possible reason is that participants were more familiar with “walk” than “salmon”. The familiarity ratings with a 7-point scale from our piloting data for Experiment 2 confirmed that (M walk = 6.96; M salmon = 4.88).
For the silent letter “d”, “Wednesday” and “landscape” yielded an error rate of 18.5% and 22.2%, respectively. A Chi-Square test of independence showed no significant difference in the distribution of participants’ epentheses across these two stimuli, χ2 (1) = 0.228, p = .633.
Phoneme deletion and pseudoword read-aloud tasks
For the phoneme deletion task, participants obtained a mean accuracy rate of 60% (SD = 19.2%). In the current study, the mean error rates of congruent stimuli (M = 39.2%, SD = 12.5%) and incongruent stimuli (M = 40.7%, SD = 12.7%) did not show a significant difference, F(1, 53) = .68, p = .41. The mean accuracy rate for the pseudoword read-aloud task was 63.5% (SD = 3.28%). Overall, Cantonese participants displayed a moderate level of phonological awareness, which was consistent with previous findings of a relatively low level of phonological awareness in ESL learners from Hong Kong, compared to those from Mainland China and Vietnam (Holm & Dodd, Reference Holm and Dodd1996).
Orthographic effects, phonological awareness, and L2 proficiency
The total phonological awareness (PA) score was the sum of raw scores from the phoneme deletion and the pseudoword read-aloud task. A higher total PA score indicates a higher level of phonological awareness. Ten Pearson’s correlation tests between homophone read-aloud task, silent-letter read-aloud task, phoneme deletion task, pseudoword read-aloud task, and HKDSE scores were conducted at a Bonferroni-adjusted alpha level of .005 (0.05/10). Table 4 shows the correlation matrix.
Table 4. Correlation Matrix Among Error Rates in the Orthographic Effect Tasks, Accuracy Rates in the Phonological Awareness (PA) Tasks, the Total PA Score, and the Hong Kong Diploma of Secondary Education Examination (HKDSE) Score in Experiment 1

Note.*p < .005 (Bonferroni-adjusted).
A significant correlation was found between accuracy rates from the phoneme deletion and pseudoword read-aloud tasks at a Bonferroni-adjusted level (r = .63, p < .001), indicating that these tasks showed sufficient validity in measuring phonological awareness while measuring different dimensions of it. Participants’ error rate in the homophone read-aloud task showed a significant negative correlation with the PA score at a Bonferroni-adjusted level (r = −.67, p < .001), and the HKDSE score at a Bonferroni-adjusted level (r = −.51, p < .001). PA score had a significant positive correlation with HKDSE score at a Bonferroni-adjusted level (r = .53, p < .001). The silent-letter read-aloud performance showed no significant correlation with any other variables.
Discussion
The Cantonese ESL participants showed intra-orthographic effects on their L2 pronunciation. In contrast to inter-orthographic effects, intra-orthographic effects took place as different scripts are used in Cantonese and English such that the interference by L1 orthography-to-phonology correspondences is not involved. Transferring the L1 reading strategy, which relies heavily on orthography to decode phonology, to L2 reading, could possibly explain the intra-orthographic effects demonstrated by Japanese-English sequential bilinguals in Sokolović-Perović et al. (Reference Sokolović-Perović, Bassetti and Dillon2020). However, it could not explain the orthographic effects demonstrated by our Cantonese ESL learners as they do not use any transparent orthographies in their L1. The presence of orthographic effects suggested that the Cantonese participants relied on the English spellings to decode the pronunciation to a certain extent, even though such reading strategy is not transferred from their L1. This also implies that the Cantonese participants were not always using the “whole-picture” reading strategy from the L1 to process English words in this experiment.
Although English is considered as an opaque orthography with many irregular and inconsistent GPCs (Borleffs et al., Reference Borleffs, Maassen, Lyytinen and Zwarts2017), its alphabetic nature still allows a good proportion (79.3%; cf. 90.4% for German) of phonological representations to be correctly retrievable from its orthography using the GPC rules (Ziegler et al., Reference Ziegler, Perry and Coltheart2000). One possibility is that the Cantonese participants recognize this property of English and adopt a “hybrid” reading strategy that allows them to decode English words at several different grain sizes, including mapping from grapheme to phoneme, letter pattern to rime or syllable, and mapping at the whole-word level.
In this study, Cantonese ESL learners with higher phonological awareness and English proficiency were less influenced by orthographic forms on their English pronunciation. Aligned with inter-orthographic effects demonstrated in Bassetti et al. (Reference Bassetti, Mairano, Masterson and Cerni2020), intra-orthography effects in this study were also negatively related to both L2 proficiency and phonological awareness. This implies that accuracy and precision of L2 phonological representations as reflected by level of phonological awareness and L2 proficiency level are likely to be important predictors for both inter- and intra-orthography effects.
According to the fuzzy lexicon hypothesis, the phonological representation of L2 words is not fully specified and lacks details such that some phonemes and phonemic sequences are underspecified (Cook et al., Reference Cook, Pandža, Lancaster and Gor2016). This is more likely at the early stage of L2 acquisition and for less-proficient L2 learners (Cook et al., Reference Cook, Pandža, Lancaster and Gor2016). Hence, fuzzy L2 phonological representation in L2 learners might force them to seek orthographic input as another source of information.
However, neither L2 proficiency nor phonological awareness correlated with performance in the silent-letter read-aloud task. Cantonese participants tended to pronounce the silent letter /b/ in “climb” and “lamb” only, but not as much for “comb”. Performance for “walk” and “salmon” was also inconsistent. These results suggested that Cantonese participants did not aware of the convention of silent letters in English.
Since Cantonese participants have long been observed with a consistent tendency to devoicing the final obstruent in English (Chan, Reference Chan2006; Chan & Li, Reference Chan and Li2000; Edge, Reference Edge1991), the attempt to pronounce the silent letter “b” was less likely to be due to contrastive differences in the sound inventory between Cantonese and English, or articulation of permissible final consonants. In English, “b” and “d” can be in either word-initial or word-final position. In Cantonese, “b” and “d” can only be in word-initial position. Our Cantonese participants did not consistently conceal the letter “b” in word-final /mb/ or /bt/ clusters, which was against their habit of omitting English obstruents in word-final position. This implied that the intra-orthographic effect is less likely to be contributed by L1 influence. Rather, it is probably internal to English as silent-letter words in English trigger a regularity violation.
Experiment 2
Experiment 1 examined intra-orthographic effects in Cantonese ESL learners as they have orthographically opaque and non-alphabetic L1 such that their reading strategy does not rely on highly consistent orthography-phonology correspondences. These results ruled out the transfer of such L1 reading strategy as a key mechanism underlying intra-orthographic effects. Our results from Experiment 1 suggested that how accurate the L2 learners phonologically represent L2 words instead predicted intra-orthographic effects. Experiment 2 aimed to further examine the role of the L1 reading strategy in intra-orthographic effects by studying Mandarin Chinese ESL learners from Mainland China. Unlike Cantonese speakers in Hong Kong who learn to read Cantonese by rote only, Mandarin ESL learners from Mainland China learn to read through Pinyin, a transparent Romanized phonetic script, to represent the pronunciation of Chinese characters.
In Mainland China, learning Pinyin in the primary school is a core component in the national curriculum as the first step to learn Chinese (Ministry of Education of the People’s Republic of China, 2011). The primary function of Pinyin is to link abstract Chinese characters to its pronunciation. Children can learn the pronunciation of novel words through its Pinyin. Pinyin is a useful tool for Chinese characters that people are capable to pronounce but unable to write. In addition, Pinyin is the main tool for typing Chinese characters on computers or smartphones using an English keyboard. Pinyin is critical for Chinese reading. A recent study has identified a reciprocal relationship between Pinyin skills and character recognition (Zhang et al., Reference Zhang, Georgiou, Inoue, Zhong and Shu2020).
The Pinyin system adopts 25 Roman alphabets, which are also used in English, excluding “v” and adding “ü”. Many graphemes are shared in English and Mandarin Pinyin, but they do not sound the same across the two languages (see Appendix E for a display of Pinyin initials and finals and their examples, together with IPA-based symbols and approximate English pronunciation). Unlike English words consisting of consonants and vowels, initials and finals are the essential elements of Pinyin. Initials contain consonants; Finals include (1) simple vowels (e.g., a, e, and ü) or (2) compound finals. Compound finals are composed of two or three vowels (e.g., ai, ei, and uei), or a vowel followed by a nasal consonant (e.g., an, ian, and ong). A Pinyin syllable can be spelled with either an initial with a final or just a final itself with one of the four tones, including a flat tone (–), a rising tone (/), a falling-rising tone (∨), or a falling tone (\).
As Lü (Reference Lü2017) pointed out, reading acquisition of a language crucially starts with discovering the basic unit that is embedded in each graphic symbol, followed by uncovering the mapping details between the graphic symbol and its sound. For Chinese, learning to read first requires individuals to realize that each Chinese character is monosyllabic and corresponds to a morpheme with little phonological consistency. Therefore, learning to read in Chinese requires great memorization (Bialystok et al., Reference Bialystok, McBride-Chang and Luk2005). Learning to read words in English includes an initial process of first realizing that letters represent sounds (Lü, Reference Lü2017). The alphabetic system of English makes readers to rely heavily on phonics in learning to read words (Bialystok et al., Reference Bialystok, McBride-Chang and Luk2005). In later literacy, English speakers would gradually figure out the GPCs might not always be regular. The fundamental difference between Chinese and English learning might prompt Cantonese ESL learners to adopt different reading strategies for the two languages as shown in Experiment 1.
However, Mandarin speakers rely on alphabetic Pinyin to pronounce Chinese characters. As the writing system of Pinyin is highly transparent due to consistent letter-sound correspondences (Bassetti, Reference Bassetti, Andreas, Xin and Yexin2007), it is possible that Mandarin speakers use the reading strategy of Pinyin in English reading, leading to non-target-like English pronunciation. Pinyin does not contain silent letters. Common silent letters in English, such as “b”, “d”, and “h”, are all pronounced in Pinyin. Besides, each Roman alphabet in Pinyin corresponds to only one sound, while one Roman alphabet in English could link to different sounds. For example, “a” in Pinyin is similar to /ɑː/ only, but “a” in English is mapped to /ɑː/ in “father”, /ə/ in “about” and /eɪ/ in “base”. As Chinese and English belong to entirely different writing systems, inter-orthographic effects that rely on grapheme-to-phoneme conversions between the two languages is unlikely. However, it is possible that the shared scripts of Pinyin and English exert extra inter-orthographic effects in Mandarin speakers due to incongruent GPCs between Pinyin and English.
It is likely that the daily practice of using transparent Pinyin scripts to read or type in Chinese encourages Mandarin speakers to also rely on orthographies when learning English. Regardless of the incongruent GPCs between English and Pinyin, literacy of Pinyin aided the development of phonological awareness in the early stage of learning both Mandarin and English (McBride-Chang et al., Reference McBride-Chang, Bialystok, Chong and Li2004; McDowell & Lorch, Reference McDowell and Lorch2008). Numerous studies have found that Mainland participants outperformed Hong Kong participants in phonemic-related tasks in Chinese (Cheung & Chen, Reference Cheung and Chen2004; Cheung et al., Reference Cheung, Chen, Lai, Wong and Hills2001) and English (McDowell & Lorch, Reference McDowell and Lorch2008). These findings suggest that Mandarin speakers may be more aware of the relationship between graphemes and phonemes compared to Cantonese-speaking ESL learners, thereby making them more dependent on orthographies.
Experiment 2 investigated if Mandarin-speaking participants would be susceptible to more orthographic effects in their English pronunciation compared to Cantonese-speaking participants due to possible inter-orthographic effects from Pinyin when controlling for L2 proficiency. Controlling for L2 proficiency is crucial as Hong Kong ESL learners have an earlier age of English acquisition and are exposed to English more in their daily life compared to Mandarin speakers from Mainland China due to a difference between their English education system and English-speaking environment (Nunan, Reference Nunan2003). Also, consistent to Experiment 1, we hypothesized that L2 proficiency would predict orthographic effects on L2 production for both Cantonese and Mandarin participants. Since word familiarity has an influence on orthographic effects (Veivo & Järvikivi, Reference Veivo and Järvikivi2013), stimuli in Experiment 2 were better controlled on word familiarity.
Experiment 2 also used homophone and silent-letter read-aloud tasks to examine orthographic effects on English pronunciation in Chinese ESL learners from Hong Kong and China. As HKDSE is not available for Mandarin participants from China and to rule out the potential influence from the time gap between test-taking and the current experiment, Experiment 2 used Lexical Test for Advance learners of English (LexTALE; Lemhöfer & Broersma, Reference Lemhöfer and Broersma2012), which is a valid and standardized test for measuring English proficiency in advanced L2 learners.
Method
Participants
Sixty participants aged between 17 and 24 years (M = 20.2, SD = 1.65), including 30 native Mandarin speakers from China and 30 native Cantonese speakers from Hong Kong, were recruited from the Basic Psychology Participant Pool at City University of Hong Kong. Participants received course credits for their participation. All participants reported no history of speech or hearing impairments and speak English as their L2 with an onset of English learning ranging from age 3 to 12 years (M Cantonese = 4.63; M Mandarin = 7.73). Table 5 shows the demographic information and language background of the participants in Experiment 2.
Table 5. Demographic Information and Language Background of Participants in unmatched and matched groups of LexTALE in Experiment 2

Note. M = mean. SD = standard deviation.
To control for L2 proficiency between the Cantonese- and Mandarin-speaking groups, 13 participants from each group were selected based on their LexTALE percentage-correctness scores. The resulting Cantonese and Mandarin matched groups both had a mean LexTALE percentage-correctness of 65. The mean AoA for the Cantonese- and Mandarin-speaking matched groups was 4.23 and 7.77, respectively.
Materials
Materials used in Experiment 2 (as described below) can be accessed via https://osf.io/njd7z/.
Language history questionnaire
The questionnaire used in Experiment 1 was revised by adding questions tailored for Mandarin participants, such as questions related to China’s national college entrance examination.
Pilot study for stimuli selection
To ensure that all stimulus words were highly familiar to Chinese ESL learners, a pilot study was conducted through an online survey on Qualtrics (Quatrics, 2005). Stimuli included 34 homophonic word pairs, 15 silent-letter words, and corresponding foil words with pronounced-letter matched with the silent-letters (e.g., “height” and “honest”). The homophonic stimuli were selected from an online homophone database (Aloisi, Reference Aloisi2008). The pronunciation of all stimuli was checked via online Oxford English Dictionary (Oxford University Press, n.d.). A group consisting of 52 ESL learners from China (N = 27) and Hong Kong (N = 25) rated all stimuli on subjective word familiarity with a 7-point scale (Nusbaum et al., Reference Nusbaum, Pisoni and Davis1984), in which a “1” indicated that the word is unknown and a “7” indicated that the word is of the highest familiarity.
Homophone read-aloud task
Twenty-one homophonic word pairs (Appendix F) that were pronounced the same in both British English and American English were selected as stimuli. From piloting, the mean subjective word familiarity was 6.01 out of 7 (SD = .52).
Silent-letter read-aloud task
The stimuli included 10 silent-letter words and 10 foil words with matched pronounced-letters (Appendix G). The silent letters included “b” (4 words), “d” (2 words), “h” (2 words), and “l” (2 words). There was no significant difference on word familiarity, t fam (18) = .77, p = .448, or word frequency, t freq (18) = −.90, p = .38, between the silent-letter words (M fam = 6.64, SD fam = .31; M freq = 30691, SD freq = 23108) and the foils (M fam = 6.52, SD fam = .41; M freq = 82284, SD freq = 179915).
LexTALE
LexTALE is a visual lexical decision task that measures vocabulary knowledge for advanced ESL learners as an indication of their general English proficiency (Lemhöfer & Broersma, Reference Lemhöfer and Broersma2012). In a study involving 72 Dutch and 87 Korean ESL learners, LexTALE scores significantly correlated with different measures of general English proficiency, including (a) L1-L2 noun translation performance, (b) L2-L1 noun translation performance, (c) Quick Placement Test, a commercial placement test commonly used by universities in Europe for assigning students to different English course levels, and (d) Test of English for International Communication (Lemhöfer & Broersma, Reference Lemhöfer and Broersma2012). During LexTALE, participants were presented with English stimuli one by one and were asked to determine whether the stimulus was a real English word or not. Participants’ English proficiency was indicated by % correctav, a measure of percentage-correctness with an adjustment for unequal proportion of words and nonwords (Lemhöfer & Broersma, Reference Lemhöfer and Broersma2012).
Procedure
A similar procedure from Experiment 1 was used in Experiment 2. Participants first completed LexTALE on the official website, followed by the language background questionnaire, the read-aloud tasks, and a rating task on subjective word familiarity. In the read-aloud task, participants were presented a word on the screen via Microsoft PowerPoint. Participants read aloud the presented word after a cued sound. Participants then pressed SPACEBAR to reveal the next word. The stimuli were presented in a randomized order. Participants’ spoken responses were captured by a Logitech H340 USB headset with a mounted microphone and recorded by Windows Voice Recorder.
Results
Data of each task in Experiment 2 can be accessed via https://osf.io/njd7z/.
Two native English-speaking raters, one with a British accent and the other with a North American accent, independently listened to and coded participants’ spoken responses in a randomized order. Raters had received formal training in English phonology and were asked to rate the stimuli in silent-letter read-aloud task in a manner of standard British accent to avoid potential effect from the dialect difference, especially for letter “l”. The first rater held a certificate of TEFL (Teaching English as a Foreign Language) and had 3 years of English-teaching experience. The second rater had 6 years of English-teaching experience and held a certificate of TEFL and TESOL (Teaching English to Speakers of Other Languages). When the two raters could not reach an agreement, a third native English-speaking rater with a British accent made final judgments, which accounted for 3.36% of the words.
The inter-rater reliability was measured by S score (Bennett et al., Reference Bennett, Alpert and Goldstein1954), an index for the reliability of categorical measurements adjusted by the percentage of raters’ agreement reached by chance. The mean S scores for homophone, silent-letter, and pronounced-letter read-aloud tasks were .96 (SD = .06), .92 (SD = .1), and .90 (SD = .14), respectively, indicating high inter-rater reliability. Results from all 30 Cantonese and 30 Mandarin participants were presented first, followed by results involving a direct comparison of the 13 Mandarin and 13 Cantonese participants matched on L2 proficiency.
Word familiarity check
All the word stimuli were rated as highly familiar: homophones (M = 6.47, SD = .50), silent-letter words (M = 6.57, SD = .62), and pronounced-letter words (M = 6.62, SD = .54). There was no significant difference on subjective word familiarity ratings between Cantonese and Mandarin participants for the stimuli.
Homophone read-aloud task
Mean error rates for Cantonese and Mandarin participants were 13.7% (SD = 12.6%) and 18.3% (SD = 7.3%), respectively. Both Cantonese and Mandarin participants had the highest error rates in pronouncing the homophonic word pairs “pear, pair” (M Cantonese = M Mandarin = 66.7%) and “sweet, suite” (M Cantonese = 53.3% and M Mandarin = 83.3%).
Silent-letter read-aloud task
The mean error rates in the silent-letter read-aloud task for the Cantonese and Mandarin participants were 26.7% (SD = 15.4%) and 19.3% (SD = 14.6%), respectively. Appendix H shows the mean error rate for each stimulus. The result of a mixed ANOVAs revealed that participants had a significantly higher error rate for silent letter “b” (M = 52.5%, SD = 34%) compared to other target letters (M = 4.83%, SD = 9.48%), F(1, 58) = 118.8, p < .001, regardless of participants’ L1.
Unlike Experiment 1, Cantonese participants in this experiment showed a higher consistency on stimuli with silent letter “b”. A Chi-Square test of independence showed that the distribution of Cantonese participants’ epentheses across the silent letter “b” stimuli, “climb”, “lamb”, “doubt”, and “debt” was not significantly different, χ2 (3) = 2.83, p = .419. Mandarin participants had a lower error rate for the stimulus “doubt” (M = 23.3%) compared to the other silent letter “b” stimuli (mean range = 43.3%–63.3%). A Chi-Square test of independence showed that the distribution of Mandarin participants’ epentheses across the silent letter “b” stimuli, “climb”, “lamb”, “doubt”, and “debt” was significantly different, χ2 (3) = 9.83, p = .02. This inconsistency in Mandarin participants across stimuli with silent letter “b” implies that the orthographic effect is likely to come from the L2 instead of the L1. None of the participants in this study pronounced /l/ in the word “walk”. Only 10% of the participants produced /l/ in “calm”. Both groups of participants had a low error rate of pronouncing the “d” sound and the “h” sound (mean range = 3.33%–6.67%). The higher consistency in Experiment 2 might be attributed to a better control of word familiarity.
In contrast to the silent-letter words, only 1.67% of participants failed to pronounce /b/ in the four pronounced-letter words. The word “bulb”, which contains a pronounced “l” sound, generated the highest percentage of errors (M Cantonese = 66.7%, M Mandarin = 40%). But for “d”, Cantonese participants indeed tended to devoice it since it is in syllable-final position. Participants performed well with “h” in both silent-letter and pronounced-letter words.
Orthographic effects and L2 proficiency
Participants’ English proficiency was indicated by the percentage of correctness from LexTALE. Cantonese participants (M = 68, SD = 8.57) had a significantly higher LexTALE percentage-correctness than Mandarin participants (M = 59.7, SD = 9.12), t(58) = 3.65, p < .01. Three Pearson’s correlation tests between homophone read-aloud task error rate, silent-letter read-aloud task error rate, and LexTALE percentage-correctness were conducted separately for Cantonese and Mandarin participants with a Bonferroni-adjusted alpha level of .016 (0.05/3). Table 6 presents the correlation matrix.
Table 6. Correlation Matrix for Error Rates in the Homophone Read-aloud Task, the Silent-Letter Read-aloud Task, and the LexTALE Percentage-correctness in Experiment 2

Note. *p < .016 (Bonferroni-adjusted).
For Cantonese participants, LexTALE percentage-correctness negatively correlated with homophone read-aloud error rate at a Bonferroni-adjusted level (r = −.46, p = .011) while it did not correlate with silent-letter read-aloud error rate (r = −.25, p = .18). No correlation was found between the error rates in the two read-aloud tasks (r = .14, p = .457). For Mandarin participants, both homophone read-aloud error rate (r = −.36, p = .054) and silent-letter read-aloud error rate did not correlate with LexTALE percentage-correctness (r = −.25, p = .175). No correlation was found between the error rates from the two read-aloud tasks at a Bonferroni-adjusted level (r = .38, p = .038).
Matching L2 proficiency in the Cantonese- and Mandarin-speaking groups
No significant difference was found on the mean error rates in the homophone read-aloud task between the Cantonese (M = 15.8%, SD = 13.7%) and Mandarin matched groups (M = 16.5%, SD = 8.39%), t(24) = −1.65, p = .871. An independent-samples t-test on the mean error rates in the silent-letter read-aloud task revealed that the Cantonese matched group (M = 30.1%, SD = 11.9%) performed significantly worse than the Mandarin matched group (M = 19.2%, SD = 12.6%), t(24) = 2.41, p = .024.
Discussion
Experiment 2 examined if Mandarin-speaking participants would be susceptible to more orthographic effects in English production compared to Cantonese-speaking participants due to their L1 experience in using transparent Pinyin that share scripts with English. The results revealed that both Mandarin and Cantonese were subject to intra-orthographic effects in L2 phonology. Mandarin and Cantonese participants did not differ in terms of orthographic effects on their homophone pronunciation when the two groups were matched on L2 proficiency. Both Mandarin and Cantonese participants showed similarly high error rates for the same word pairs (“pear, pair” and “sweet, suite”), implying that the two groups probably both underwent intra-orthographic effects that were attributed to inadequate L2 phonological representations.
However, Mandarin participants performed significantly better than Cantonese participants in silent-letter read-aloud when L2 proficiency is controlled. Our results provided no evidence that Mandarin participants were subjected to additional intra-orthographic effects from transferring the reading strategy of Pinyin to English reading. There was also no evidence that Mandarin participants suffered extra inter-orthographic effects through applying incongruent GPCs from Pinyin to English. Cantonese participants seemed to experience more orthographic effects in silent-letter words. Note that these results must be interpreted carefully due to the small sample size and potential selection bias for controlling L2 proficiency. The conclusions should be considered with caution since part of it was based on null results.
Experiment 2 also tested if L2 proficiency is related to orthographic effects using LexTALE. Consistent with Experiment 1, L2 proficiency was negatively correlated with homophone read-aloud error rate, but not silent-letter read-aloud error rate in Cantonese participants. The homophone read-aloud task may tap onto a vast array of GPCs, while the silent-letter read-aloud task only taps on a specific set of GPC irregularities. In addition, the performance in silent-letter read-aloud may depend heavily on participants’ individual word knowledge and whether they know the rule of silent letters in English. In contrast to our prediction, L2 proficiency did not seem to correlate with orthographic effects in Mandarin participants.
Compared to Experiment 1, Cantonese participants in Experiment 2 seemed to perform more consistently across stimuli with the same silent letters, and that might be attributed to the use of highly familiar words as stimuli. Both Mandarin and Cantonese participants tended to pronounce the silent letter “b” while making few errors for all other silent letters (i.e., “d” and “l”). To explore if the influence of L1 phonotactics could explain this pattern of results, it is important to look at the phonotactic differences between English and Chinese. Chinese ESL learners typically delete or devoice consonant clusters in English (i.e., /mp/ and /lb/) as consonant clusters are not permissible in both Cantonese and Mandarin (Radant et al., Reference Radant, James and Huang2009). All silent letters in our stimuli are part of a consonant cluster that they are either preceded and/or followed by another consonant. If our Cantonese and Mandarin participants tend to delete an element in consonant clusters in English due to the L1 phonotactic influences, deletion should happen to all silent letters, including “b”. But that did not happen.
In our stimuli, the silent letter “b” is always at the final position, except for “debt”, whereas the other silent letters, “d” and “l”, are at the medial positions. The consonant /b/ can only be in word-initial position and is not permissible in word-final position in both Cantonese and Mandarin. Due to this phonotactic constraint from L1, native Cantonese speakers tend to omit the final obstruents in English (Chan, Reference Chan2006; Chan & Li, Reference Chan and Li2000; Edge, Reference Edge1991). Similarly, final-consonant deletion is also prevalent in Mandarin ESL speakers (Broselow et al., Reference Broselow, Chen and Wang1998; Lin & Johnson, Reference Lin and Johnson2010) as obstruent consonants are not permissible in word-final position in Mandarin. Therefore, for both Cantonese and Mandarin participants, pronouncing the silent letter ‘b” in word-final position could not be explained by the influence of L1 phonotactics. Taken together, the intra-orthographic effects observed among our participants were more likely to come from the L2 itself rather than influence of the L1 phonotactics.
General discussion
This study aimed to investigate whether native users of a non-alphabetic language with an opaque writing system, Cantonese-speaking ESL learners (Experiments 1 and 2) and Mandarin-speaking ESL learners (Experiment 2), were subjected to intra-orthographic effects on L2 production. It also examined whether L2 proficiency (Experiments 1 and 2) and phonological awareness (Experiment 1) were related to intra-orthographic effects. Experiment 1 demonstrated that Cantonese ESL learners were subjected to intra-orthographic effects on their L2 pronunciation. Cantonese ESL learners have an orthographically opaque L1 such that their L1 reading strategy does not rely heavily on orthography to decode phonology. The result implied that transfer of the L1 reading strategy that relies on high orthography-phonology correspondences might not be the key factor underlying intra-orthographic effects. A higher level of English proficiency and high phonological awareness is related to greater resistance to the negative influence of L2 orthography in Cantonese participants.
Experiment 2 systematically compared Cantonese ESL learners and Mandarin ESL learners, who use transparent Pinyin to represent the pronunciation of Chinese characters, while controlling for English proficiency between the groups. Cantonese participants were subjected to greater orthographic effects on silent-letter read-aloud than Mandarin participants, whereas no significant difference was found for homophone read-aloud. Mandarin participants did not seem to be subjected to additional inter-orthographic effects due to their training in Pinyin. Consistent with Experiment 1, the homophone read-aloud error rate was negatively correlated with L2 proficiency in Cantonese participants. Unexpectedly, L2 proficiency was not related to orthographic effects in Mandarin participants.
The role of L1 transparency in intra-orthographic effects
Sokolović-Perović et al. (Reference Sokolović-Perović, Bassetti and Dillon2020) found that L2 speakers, whose L1 has both transparent scripts and opaque scripts, were still influenced by L2 orthographic forms in their L2 pronunciation. However, the intra-orthographic effects may not be solely due to the transfer of the L1 reading strategy that relies on orthography to access phonology as suggested by Sokolović-Perović et al. (Reference Sokolović-Perović, Bassetti and Dillon2020). Our findings on Cantonese speakers with no history of using transparent orthography in their L1 ruled out the transfer of the L1 reading strategy as the dominant factor contributing to intra-orthographic effects.
One possibility is that the Cantonese participants used different methods when approaching Cantonese and English reading, due to the nature of Cantonese and English being fundamentally different in terms of the writing systems. The two discrepant scripts might have prohibited native Cantonese participants from directly applying their L1 reading strategies to L2. Meanwhile, it was naturally easy to utilize the orthographic information as a reference to phonology in English reading. Therefore, they could adopt different reading strategies in L1 and L2 or something in between (i.e., a “hybrid” strategy as discussed earlier). To some extent, the results that Pinyin did not exert additional intra-orthographic effects in our Mandarin participants support this speculation, as discussed in the later section.
For ESL learners who use non-alphabetic and opaque L1 scripts, intra-orthographic effects on their English production could be due to usage of non-native-like GPCs in English to decode English graphemes. Non-native-like L2 GPCs can be established and strengthened through interaction between L2 orthographic and phonological input during L2 development (Sokolović-Perović et al., Reference Sokolović-Perović, Bassetti and Dillon2020). It is possible that more exposure to written than spoken input during L2 learning partially explains the strong orthographic effects on L2 pronunciation (Sokolović-Perović et al., Reference Sokolović-Perović, Bassetti and Dillon2020). Besides, it is likely that L2 phonological representations are altered by L2 orthography during the L2 learning process. L2 learners typically learn L2 in an L2 classroom setting that emphasizes reading and writing more than listening and speaking (Richards & Rodgers, Reference Richards and Rodgers2014). Therefore, with high exposure to L2 orthography, L2 learners might use L2 orthographic information to specify L2 phonological representation in a way that native speakers do not. As implied by Muneaux and Ziegler (Reference Muneaux and Ziegler2004), such restructuring might vary item-by-item unsystematically. This helps understand why participants’ performance was sometimes inconsistent across the same kind of stimuli as observed in the silent-letter read-aloud tasks in this study.
The role of L2 proficiency in orthographic effects
Experiment 1 used participants’ HKDSE scores in English right before their university admission as an indication of their English proficiency. Experiment 2 used LexTALE, a valid and standardized test that participants completed during the experiment. Although HKDSE scores had a limitation due to the time gap between test-taking and the experiment, consistent correlations between higher English proficiency and less orthographic effects on L2 production were reported in the two experiments. This implied that the HKDSE score in English was an acceptable measure for L2 proficiency.
Most importantly, this result was also in line with the findings in Bassetti et al. (Reference Bassetti, Mairano, Masterson and Cerni2020), Escudero et al. (Reference Escudero, Simon and Mulak2014), and Veivo and Järvikivi (Reference Veivo and Järvikivi2013). Veivo and Järvikivi (Reference Veivo and Järvikivi2013) pointed out that lower-proficiency learners differ from higher-proficiency learners in terms of the quality of their orthographic and phonological representations. Less proficient learners have separate and less accurate orthographic and phonological representations, while more proficient learners have integrated and more accurate orthographic and phonological representations. Therefore, it was possible that lower-proficiency L2 learners rely more on regularities and make more errors in orthographic effect tasks. In contrast, high-proficiency L2 learners have developed more stable orthographic and phonological representations, which prevent them from being misled by irregular GPCs. It is likely that a greater level of phonological awareness gained at an early stage of L2 learning will facilitate the development of greater proficiency at a later stage, as found by Yeung and Chan (Reference Yeung and Chan2013) among Chinese ESL preschoolers. Those L2 leaners who are sensitive to phonemic knowledge will be more likely to develop integrated and well-structured orthographic and phonological L2 representations, thereby achieving a higher L2 proficiency.
In contrast to Cantonese speakers, L2 proficiency is not related to orthographic effects in Mandarin speakers. It is likely that this difference comes from the individual difference in the knowledge of English words or level of familiarity with English phonological rules. Different approaches to English teaching are adopted in Hong Kong and in Mainland China. As discussed in McDowell and Lorch (Reference McDowell and Lorch2008), English teaching in Mainland China emphasizes phonetic training and the mapping between written forms and sounds. Nearly all the schools in Mainland China teach English phonology by using IPA (International Phonetic Alphabet), a phonetic notation system that uses a set of symbols to represent existent sounds in human spoken languages (Cao, Reference Cao2017). It is likely that the use of IPA helps students to be more aware of the pronunciation of words with unusual grapheme-to-phoneme mappings, for which they gradually memorize in mind. For Mandarin learners in this study, since our stimuli are silent-letter words and homophones in English, the results may be heavily influenced by their personal knowledge of the stimulus words. It depends on whether they know the irregular phonological rules in English (e.g., silent-letter word rules) and particular words with irregular GPCs.
The interference effect of shared orthography
For Mandarin ESL speakers, L1 (Pinyin) and L2 (English) both use alphabetic scripts to represent sounds. To pronounce an English word correctly, they must disassociate the corresponding letter from its established Pinyin phonological representation, especially when the GPCs between Pinyin and English are incongruent. However, our Mandarin participants did not seem to show interference from shared orthography. According to the multilevel activation framework of Chinese lexical processing (Taft et al., Reference Taft, Zhu and Peng1999), the phonological unit is activated at the character level and multicharacter level instead of the radical level. Also, Mandarin is a tonal language that the pronunciation of the word depends on the tone. It is possible that the syllable-based and tonal processing in Mandarin might make it hard to readily transfer the Pinyin GPCs to English that emphasizes phonemic and segmental processing (Bialystok et al., Reference Bialystok, McBride-Chang and Luk2005).
While shared orthography between the L1 and the L2 has been found to be harmful in L2 speech production (Escudero et al., Reference Escudero, Simon and Mulak2014), we cannot rule out the possibility that Pinyin training increased Mandarin participants’ phonological awareness, which helped them better learn English GPCs. Indeed, experience with alphabetic scripts allows Mandarin speakers to decompose words into their constituent sounds (Cheung & Chen, Reference Cheung and Chen2004; Shu et al., Reference Shu, Peng and McBride-Chang2008). Our study did not test phonological awareness for Mandarin participants. Further studies are suggested to use phonemic-related tasks with pseudoword stimuli manipulated on GPC congruence between Pinyin and English.
Previous findings were inconsistent regarding whether shared orthography between Pinyin and English influenced L2 phonological representations. A recent study testing native English children from a Chinese immersion program revealed that Pinyin knowledge was not harmful to English literacy (Lü, Reference Lü2017). In the study by Pytlyk (Reference Pytlyk2011), native English speakers who learned Mandarin as a L2 showed no interference from learning Pinyin compared to learning with no orthography, or with another logographic script (Zhuyin). In contrast, findings from Hayes−Harb and Cheng (Reference Hayes-Harb and Cheng2016) reported an interference effect that native-English speakers learning Mandarin via Pinyin performed worse than those learned via Zhuyin, especially for stimuli that had incongruent phonological forms between Pinyin and English. Bassetti (Reference Bassetti, Andreas, Xin and Yexin2007) also revealed a negative effect of Pinyin learning for English learners of Mandarin.
Differences in stimuli and participants might explain discrepancies in these findings. For example, Bassetti (Reference Bassetti, Andreas, Xin and Yexin2007) specifically tested three vowel pairs. For Pytlyk (Reference Pytlyk2011) and Hayes-Harb and Cheng (Reference Hayes-Harb and Cheng2016), different manipulated consonants were used. Note that the stimuli used in Hayes-Harb and Cheng (Reference Hayes-Harb and Cheng2016) were quasi-Mandarin that were not possible in Mandarin for the purpose of experimental manipulation. General stimuli without manipulation were used in the present study. Further studies are encouraged to use controlled stimuli with mindful manipulation to examine the possible interference effect between Pinyin and English. Besides, Pytlyk (Reference Pytlyk2011) and Hayes-Harb and Cheng (Reference Hayes-Harb and Cheng2016) studied naïve L2 learners, while Bassetti (Reference Bassetti, Andreas, Xin and Yexin2007) and the current study investigated experienced L2 learners. This is crucial as it is not known whether the potential interference effect (if there is) is short term or long-lasting and might differentially affect participants with different L2 proficiency levels.
Conclusions
In summary, our study examined the role of L1 transparency in intra-orthographic effects on L2 pronunciation by studying language learners with a non-alphabetic and orthographically opaque L1 and an alphabetic L2. Findings from our Cantonese and Mandarin ESL learners support that at least for Chinese script users, the transfer of an L1 reading strategy that relies on orthography to decode phonology seems unlikely to be the only factor behind intra-orthographic effects. No extra intra-orthographic effects were found in our Mandarin speakers who were also literate in Pinyin, an alphabetic script that shares the same script with English. L2 proficiency and phonological awareness skills seem to be important predictors for orthographic effects on L2 pronunciation in ESL learners with a non-alphabetic L1.
Acknowledgements
Kit Ying Chan is the final year project supervisor for Wenxiyuan Deng, and part of this research study was conducted in partial fulfillment of the requirements of Bachelor of Social Science in Psychology. Kit Ying Chan is the Master thesis supervisor for Ka Man Au Yeung, and part of this research study was conducted in partial fulfillment of the requirements of Master of Arts in Applied Social Sciences.
Replication Package
Replication data and materials for this article can be found at https://osf.io/njd7z/.