Introduction
English vocabulary has expanded over centuries by ‘borrowing’ lexical items from other languages (Katamba, Reference Katamba2005; Durkin, Reference Durkin2014). Compared with European languages, non-European languages are never major sources of word borrowing in English, with Chinese staying even more peripheral. Scholars have recorded no more than a few hundred English words of Chinese origin. This, however, does not make it easier to study the etymology and semantics of Chinese loanwords. The complication arises from the various source dialects from which Chinese words were borrowed (Mandarin, Cantonese, Amoy, Hokkien, etc.) and also from transcription processes, in which Chinese logograms are ‘romanised’ into phonetic representations so as to be readable for English speakers. It is a procedure easily affected by the transcribers' own cognition and the transcription systems employed, and the arbitrariness of the above variables contributes much to the fact that the orthography of Chinese loanwords, especially those entering the English language early, are prone to changes. This article aims to shed some light on how the ways of transcription may affect the spelling of Chinese loanwords.
Although English loanwords in Chinese have received much attention, especially among Chinese scholars (e.g. Zhou & Jiang, Reference Zhou and Jiang2004; Xu & Tian, Reference Xu and Tian2017), the contribution of Chinese to English vocabulary yields fewer studies. Up to now, most studies about Chinese borrowings relied on different kinds of dictionaries, rather than journals or newspapers. The reason might be the notion that a word does not officially enter the domain of a language until it is recorded in a dictionary. In fact, studies based on newspaper corpora (e.g. Zhu, Reference Zhu2011) tend to overestimate the influence of the Chinese lexicon on English. Among those target dictionaries the Oxford English Dictionary (2018, henceforth OED) is sometimes used: Chan & Kwok (Reference Chan and Kwok1985) and Moody (Reference Moody1996) consult the OED on etymology, but few studies (e.g. Cruz–Cabanillas, Reference Cruz–Cabanillas2008) construct corpora on the basis of the OED. The most recent research involving OED-based analysis is Durkin (Reference Durkin2014), who devotes a paragraph to Chinese borrowings. However, the OED is an ideal source for corpus construction, not only due to its authority, but also because the online version allows researchers to search for particular material with great ease. This is the reason why I used the OED in this paper.
This study focuses on the current orthography of Mandarin loanwords that are recorded in English before 1900 (henceforth termed as ‘early loanwords’). The reason that Cantonese loanwords or those from other dialects are excluded is that Mandarin Chinese is phonologically different from other dialects and employs distinctive transcription systems. Moreover, focusing on early recorded loanwords offers the opportunity to see how transcription systems compete with each other: while Mandarin loanwords in the first half of the 20th century are predominately influenced by the Wade–Giles system (henceforth WG) and those entering English from then on are almost exclusively transcribed in accordance with the Hanyu Pinyin system (henceforth HP), a word that was brought to English in the 19th century or earlier has the chance to be affected by both. This manipulation yields 90 words in the OED. Nine are deleted, either because their origin is debatable (e.g. the word China, which is unlikely of Chinese origin) or they have been obsolete (e.g. moc-main), leaving 81 words in the corpus. Their time of entry is plotted in Figure 1.
The influence of transcription systems
The transcription systems
An important reason for the fact that variations of spelling for Mandarin loanwords are common is ‘[t]here is no unanimity regarding the transcription of Chinese’ (Anderson, Reference Anderson1970: 1). This is mainly determined by the uniqueness of the Chinese language: since Chinese is a logographic language, the representation of its words is not necessarily connected with their phonology, resulting in the fact that the writing system is shared by different spoken varieties which may be mutually unintelligible. In consequence, the transcription process was heavily influenced by the spoken variety a particular transcriber adopted, and it was hardly subject to any standard rule before the 19th century. Nevertheless, things changed with the introduction and promotion of WG in late 1860s and HP in 1950s. Other systems, perhaps as a result of the rise of Sinology, were also created since. For instance, Anderson (Reference Anderson1970) makes comparisons among five different transcription systems, and in Huang & Xu (Reference Huang and Xu2016) the number increases to seven. However, those transcription systems do not share the same weight: as Moody (Reference Moody1996) observes, many words entering English dictionaries between 1912–1949 are in WG, and all loanwords after 1950 use HP romanisation. This is also confirmed in the corpus: it is extremely hard to find a Mandarin loanword transcribed in the Gwoyeu Romatzyh system or the Yale system. The word is either in WG, or HP, or it is currently written without conforming to any transcription system Anderson studies, as shown in Figure 2.
Why are WG and HP much more influential than other systems? There might be several reasons. WG was an early invention by British linguists and prevailed among English speakers, especially British, before other systems were ever devised. HP was introduced much later, but it had been officially promoted by the Chinese government through ‘a series of documents and policies to ensure the legal status of Hanyu Pinyin in language using and language learning,’ (Huang & Xu, Reference Huang and Xu2016: 104), which led a number of foreign media to declare the adoption of the new system (see Mathews, Reference Mathews1978; Chicago Tribune, March 11, 1979). It seems that a transcription system is no more than a scholar's product unless it was created early or promoted.
Transcribing Mandarin before the invention of standard systems
It is not the case that transcription of Mandarin loanwords was totally chaotic before the invention of transcription systems. Some patterns are observed in early transcriptions:
1. the letter k is frequently used to represent the sound [tɕ] or [tɕʰ], such as in Ki (pronounced [tɕʰi:] in Chinese) or Tai ki ([ˈtɑɪˈtɕiː]);
2. ts is occasionally found as well for [tɕʰ], as in Tsing ([tɕɪŋ]);
3. Doubled vowels (e.g. oo, ee) are seen in loanwords like mafoo or wampee;
4. h is sometimes added after a vowel, as in Chah or Peh tong.
Those patterns may reflect what Peperkamp (Reference Peperkamp2005: 346) calls ‘perceptual assimilation,’ during which process ‘non-native sound structures are assimilated to ones that are well-formed in the native language’. In other words, people tended to use established sounds in English to approximately simulate the sounds in Mandarin (Cruz–Cabanillas, Reference Cruz–Cabanillas2008). The alveolo-palatal fricatives [tɕ] and [tɕʰ], for example, are either assimilated as a velar plosive k or a dento-alveolar affricative ts. While the Mandarin vowel sounds longer than their English counterparts, the letter that represents such sound is doubled, a strategy already used to transcribe new words in Old English (Crystal, Reference Crystal2013). Finally, the combination of ‘ah’ or ‘eh’ seems a bit trickier. Moody (Reference Moody1996: 412) briefly mentions h after the vowel when he discusses the etymology of mah-jong, a traditional Chinese board game, suggesting that h represents ‘a marking of a Cantonese lower-range tone’. However, as the author himself acknowledges, mah-jong might rather be of Mandarin origin because in Cantonese this board game is pronounced like mah-jeuk, in which the second character jeuk apparently expresses different phonetics from jong (cf. in Mandarin the same game is transcribed in WG as ma-chiang and in HP as ma-jiang). Instead, as both Wade (Reference Wade1867) and Giles (Reference Giles1892) explain, h indicates the shortness of the vowel before it, a characteristic of the Mandarin ‘entering tone’. The entering tone is typical in some dialects (e.g. Wu dialect) in southeast China but is absent in the Peking (now Beijing) dialect (Giles, Reference Giles1892: xxxviii). As both the later version of WG and HP were developed on the basis of Peking dialect, the final symbol h gradually died out, making it difficult to determine the etymology of some early imported Chinese words. However, since Shanghai, Ningbo and Wenzhou (Giles, Reference Giles1892 notes as ‘Ningpo’ and ‘Wenchow’), located around the Yangtze River, are among the earliest cities that had a commercial connection with the British Empire, it is reasonable to deduce that the early scribes were heavily influenced by those southeast dialectsFootnote 2, leading to the fact that some early Chinese loanwords, including Chah and Peh, end with h, and the influence carries on to mah-jong.
Such patterns are neither consistent in terms of shapes nor apply to every word, revealing the subjectivity of early scribes. Also, the use of the final h shows how regional dialects exert their influence to Mandarin. In fact, Mandarin was not established as the standard spoken language of Chinese before 20th century and the dialect itself was not homogenous but the congregation of several spoken varieties in which much similarities are shared (for a comprehensive summary of the dialects of Chinese, see Giles, Reference Giles1892). As a result, early transcriptions are very individual and cannot be standardised.
On the other hand, the patterns discussed above reflect the fact that Mandarin was in process of becoming a dominant dialect, and the prominence of Mandarin makes it possible for the pursuit of standardisation.Footnote 3 For convenience of comparisons I will take transcriptions of Mandarin before WG as ‘Early Transcriptions’ (henceforth ETs), and the evidence presented above indicates that ETs have shown the trend towards systematic – though it is worth noting that ETs are not deliberately designed transcription systems like WG or HP, but are only attempts to incorporate Chinese loanwords into the established phonetic and phonological conventions in English.
Diachronic changes in spelling: the influence of transcription systems
Influential transcription systems are powerful, not only in dominating the transcription of Chinese words in a particular period but in altering the orthography of a word that was absorbed in English vocabulary before their invention. Ideally, without any changes in spelling, all the 81 words in the corpus should be written in ETs. Nevertheless, as Figure 2 suggests, more than half of the early loanwords may have been subject to some orthographic changes since their current shapes are either in accordance with later invented WG or HP. Yang (Reference Yang2009: 101) briefly discusses how HP refreshes the spelling of an existing word using WG (e.g. from chiao to jiao) in 20th century. Yet the reality is even more complex with early loanwords. As I observed in the OED, this refreshment may happen twice, with WG superseding the original spelling which is then replaced by HP.
As we can imagine, there are several possible routes through which changes took place: for example, a given word might be altered by WG alone (noted as ETs→WG) or both WG and HP (ETs→WG→HP), or it may skip the influence of WG and be directly changed by HP (ETs→HP), or it resists any change (ETs). There are also six words (ti-tzu, tou, tung, Wei, wei ch'i and wen li), all of which entered the English lexicon between 1870 and 1900, that are originally written in WG. Though WG was not finally established until the publication of Herbert A. Giles's Chinese-English Dictionary in 1892, Sir Thomas Wade's textbook already prevailed in Britain as ‘the members of the British Consular Service in China … begin their Chinese studies with the Tzu Erh Chi …’Footnote 4 (Giles, Reference Giles1892: vi). It is not surprising that a few words were imported after 1870 in the WG form or at least a form obviously inspired by the WG rules (the syllable tzu and the aspiration marker are no doubt Wade's inventions; see Wade, Reference Wade1867: ix). Therefore, those six words are marked as WG, and of course they have the potential to receive the influence from HP theoretically (WG→HP). All the 81 words in the corpus are examined according to the six imaginary routes, with the result shown in Figure 3.
Figure 3 shows that WG exerts a much more powerful influence on orthographical changes than HP: 59% of all early loanwords are affected by WG in some way (ETs→WG, ETs→WG→HP or WG), but for HP the figure plummets to less than 9% (ETs→WG→HP). No word in the corpus changes through the routes ETs→HP and WG→HP. It is quite clear that WG is the major factor of orthographic changes over time.
Explaining diachronic changes
Time
The dramatic contrasts lead to the following question: why did HP not show its strength in refreshing the spelling of early loanwords? Apparently it is not because the early loanwords did not survive to the time when HP was invented: the obsolete words marked by the OED have been excluded from the corpus at the beginning, and many of them, including kow-tow and I Ching, are still active in Present-Day English. But the influence of time cannot be ruled out since there is more than half a century between the introduction of WG and HP. A word might be ‘unstable’ when it was first brought to English, prone to external alteration. As it was used more often its spelling became ‘fossilised’ and resistant to further influence. For some words, the periods between the time when they were first recorded and that when their spellings are converted to the WG forms are relatively short. P'o, for instance, was first recorded by OED in 1850 in the form of peh, and it changed to p'o no later than 1914. So did pailou, which was first introduced in 1836 and evolved to the current form at some point before 1887.
On the other hand, we have plenty of counterexamples where the periods are long enough to cast doubt on whether words are ‘fossilised’ over time. T'ai chi was first recorded in English in 1736 in the spelling of Tai ki, and it was altered by WG around 1914, which is almost two centuries later. Qi is another example: first introduced in 1736 as ki, it was fairly stable until 1897, when the form chi (written in WG) was noted (see Table 1). If there is a fossilisation process in early Mandarin loanwords, WG should not be as powerful as to reform their spellings more than 150 years later. Therefore, while time could be a factor which favours the early-invented system, it cannot explain everything.
The design of WG and HP
Another possible factor lies within the systems: WG is more powerful because this system is preferred by native English speakers because it is relatively successful in assimilating the pronunciation of Chinese into English. Although both transcription systems were originally designed not to transliterate Chinese into other languages but to assist with learning Chinese, Wade took into consideration how the targeted learners, i.e. native English speakers, could learn to pronounce Chinese correctly without interference from their knowledge of English phonology (see Wade, Reference Wade1867: ix for his justification of employing hs and dismissing sz). On the other hand, the primary targets of HP are Chinese children who learn Mandarin as their first language, though the designers also intended to make the system ‘internationally acceptable’ (Clement, Reference Clement2002: 45). Despite various studies of how HP enhances learning Chinese (e.g. Wang et al., Reference Wang, Perfetti and Liu2006; Lü, Reference Lü2017) and one study that claims that ‘a switch of Chinese romanization standard from Wade–Giles to Pinyin is likely to gain support from library users of Chinese collections' (Young, Reference Young1992: 28), there is no evidence showing that HP is more welcomed in the English-speaking world. On the contrary, native English speakers occasionally complain that HP ‘introduced new problems by arbitrarily assigning unused letters on the typewriter keyboard to other sounds, such as “q” for one version of the “ch” sound.’ (Brooks & Keliher, Reference Brooks and Keliher2002). Huang & Xu (Reference Huang and Xu2016: 109) also admit that ‘[t]he usage of some letters showing certain sounds is not in accordance with the international principles’. The design of HP involves the redefinition of some letters from the Latin alphabet, which many native English speakers find bizarre. Some redefined letters and their pronunciation according to IPA, as contrasted with WG, are summarised in Table 2.
It is not difficult to see how the HP symbols confuse English speakers: c neither pronounces as [k] as in cook nor [s] as in peace but is very closely to ts in cats. Similarly, q in HP never pronounces as [k] as it does in English, but sounds close to ch in child in spite of being alveolo-palatal. The confusion is exacerbated when native speakers of English see x frequently appearing as the initial letter, as in xia or xu, as those combinations are impossible in English. Without proper training in Chinese it is hard for English speakers to be aware that xia sounds a bit like shia or sia (cf. WG notes it as hsia). WG is also a compromised system because it sometimes uses the same combination e.g. ch to represent potentially different sounds (the apostrophe in ch’ indicates aspiration, but in practice it is often omitted even in dictionaries like the OED). These sounds are carefully distinguished in HP, but the high degree of discrimination is achieved at the cost of employing and converting exotic symbols (q, j and zh, respectively). Therefore, for anyone who has been familiar with English phonology, HP may seem unhelpful as one attempts to transliterate Chinese loanwords.
While WG represents a process of nativisation (Boberg, Reference Boberg1997) or, in this particular situation, anglicisation, HP may do the reverse. We could see it as a system not built on the basis of English phonological conventions but parallel to them. Chinese children who have not been exposed to other languages might find it easy to learn HP, but speakers of other languages could find it difficult to adapt. This is proved in a recent quantitative study by Hayes–Harb & Cheng (Reference Hayes–Harb and Cheng2016: 11), in which the researchers find that ‘[n]ative speakers of English who have access to Pinyin … experienced difficulty learning the words’ phonological forms due to the interference from English grapheme-phoneme correspondence’. One result of this difficulty, as we can assume, is that native English speakers are reluctant to refresh the spellings of Mandarin loanwords according to the rules of HP.
Conclusion
This article briefly discusses how the orthography of Mandarin loanwords that entered English before 1900 is changed by transcription systems that were invented later, such as Wade–Giles and Hanyu Pinyin. Both influential in the 20th century, Wade-Giles seems to be more powerful than its counterparts in transforming the spelling of the loanwords. Apart from time, the design of the transcription systems may also play a role: the system that is built to assimilate the foreign phonetics to English and causes less confusion receives more opportunities to exert its influence.
ZHEN WU is a PhD student at the Survey of English Usage, University College London, where he received his MA degree in English linguistics. With a CELTA qualification, he taught ESL in China for several years before he continued his academic pursuit in the United Kingdom. His research interests include modern English grammar, corpus linguistics and historical syntax. Currently, he focuses especially on the variations of noun phrases in English and syntactic theories on NPs. Email: zhen.wu.16@ucl.ac.uk
Appendix: Mandarin loanwords before 1900 and their time of entry recorded in OED
cha, chah (1616), chen shu (1655), chin chin (1795), congou (1725), fan-tan (1878), fen (1852), feng-shui (1797), fum (1820), hoey (1865), I Ching (1876), kai-shu (1876), kang (1772), kow-tow, kotow (1804), Kuan (1864), kylin (1857), li (1588), likin (1862), li-shu (1824), Lohan (1878), lü (1655), mafoo (1863), Miao (1834), Miaotze (1810), Ming (1795), mou (1836), moutan (1808), nienhao (1820), oolong (1845), paiban (1884), pailou (1836), paitung (1736), pan (1874), pe-tsai (1788), petuntse (1728), pi (1871), pipa (1839), p'o (1850), pongee (1711), qi (1850), Qin (1790), qin (1839), Qing (1790), san hsien (1839), se (1874), Shang (1669), shang (1887), shen (1847), sheng (1795), Song (1657), suan-pan (1736), Sui (1738), suona (1881), ta chuan (1894), T'ai chi (1736), tan (1886), Tang (1669), tao (1704), taotai (1747), te (1895), tiao (1883), t'ien (1613), t'ing (1853), ti-tzu (1874), tou (1899), ts'ao shu (1876), tung (1788), tutang (1613), wampee (1830), Wei (1894), wei ch'i (1871), wenli (1887), wonk (1900), wu-wei (1859), yamun, yamen (1747), yang (1671), yang ch'in (1876), yao (1834), yin (1671), Yuan (1673), yüeh ch'in (1839), yulan (1822)