What and where is the word?

Catherine McBride-Chang; Hsuan-Chih Chen; Benjawan Kasisopa; Denis Burnham; Ronan Reilly; Paavo Leppänen

doi:10.1017/S0140525X1200009X

What and where is the word?

Published online by Cambridge University Press: 29 August 2012

Catherine McBride-Chang ,

Ronan Reilly and

Catherine McBride-Chang: Affiliation:
Department of Psychology, The Chinese University of Hong Kong, Shatin, New Territories, Hong Kong. cmcbride@psy.cuhk.edu.hkhttp://www.psy.cuhk.edu.hk/en/people/cmcbride/cmcbride.phphcchen@psy.cuhk.edu.hkhttp://www.psy.cuhk.edu.hk/en/people/hcchen/hcchen.php
Hsuan-Chih Chen: Affiliation:
Department of Psychology, The Chinese University of Hong Kong, Shatin, New Territories, Hong Kong. cmcbride@psy.cuhk.edu.hkhttp://www.psy.cuhk.edu.hk/en/people/cmcbride/cmcbride.phphcchen@psy.cuhk.edu.hkhttp://www.psy.cuhk.edu.hk/en/people/hcchen/hcchen.php
Benjawan Kasisopa: Affiliation:
MARCS Institute, University of Western Sydney–Bankstown, Milperra, NSW 2144, Australia. b.kasisopa@uws.edu.auhttp://marcs.uws.edu.au/?q=people/benjawan-kasisopa%C2%A0d.burnham@uws.edu.auhttp://marcs.uws.edu.au/?q=people/professor-denis-burnhamronan.reilly@nuim.iehttp://www.cs.nuim.ie/~ronan/
Denis Burnham: Affiliation:
MARCS Institute, University of Western Sydney–Bankstown, Milperra, NSW 2144, Australia. b.kasisopa@uws.edu.auhttp://marcs.uws.edu.au/?q=people/benjawan-kasisopa%C2%A0d.burnham@uws.edu.auhttp://marcs.uws.edu.au/?q=people/professor-denis-burnhamronan.reilly@nuim.iehttp://www.cs.nuim.ie/~ronan/
Ronan Reilly: Affiliation:
MARCS Institute, University of Western Sydney–Bankstown, Milperra, NSW 2144, Australia. b.kasisopa@uws.edu.auhttp://marcs.uws.edu.au/?q=people/benjawan-kasisopa%C2%A0d.burnham@uws.edu.auhttp://marcs.uws.edu.au/?q=people/professor-denis-burnhamronan.reilly@nuim.iehttp://www.cs.nuim.ie/~ronan/ Department of Computer Science, National University of Ireland, Maynooth, County Kildare, Ireland. ronan.reilly@nuim.iehttp://www.cs.nuim.ie/~ronan/
Paavo Leppänen: Affiliation:
Department of Psychology, University of Jyväskylä, FIN-40014 Jyväskylä, Finland. paavo.ht.leppanen@jyu.fihttps://www.jyu.fi/ytk/laitokset/psykologia/henkilokunta/leppanen_p

Article contents

Abstract
References

Rights & Permissions

Abstract

Examples from Chinese, Thai, and Finnish illustrate why researchers cannot always be confident about the precise nature of the word unit. Understanding ambiguities regarding where a word begins and ends, and how to model word recognition when many derivations of a word are possible, is essential for universal theories of reading applied to both developing and expert readers.

Type: Open Peer Commentary
Information: Behavioral and Brain Sciences , Volume 35 , Issue 5 , October 2012 , pp. 295 - 296

DOI: https://doi.org/10.1017/S0140525X1200009X [Opens in a new window]
Copyright: Copyright © Cambridge University Press 2012

Frost's main argument rests on the somewhat tenuous assumption that the concept of word recognition is constant across scripts. His and other theories in alphabetic languages attempting to model visual word recognition have been based on single-syllable words, though there have been recent attempts to explain how longer words, with prefixes and suffixes, for example, are read (Grainger & Ziegler Reference Grainger and Ziegler2011; Perry et al. Reference Perry, Ziegler and Zorzi2010). Most such models of visual word recognition assume that the concept of a word as the unit of analysis in reading across scripts is always clear.

Frost's description of Chinese is particularly problematic in this respect. To be sure, Chinese characters are monosyllabic (he argues that almost all Chinese words are monosyllabic), and most characters can also serve as words. However, the majority of words in Chinese consist of two or more morphemes. Moreover, recognition of both two-morpheme words and characters can be affected uniquely by character as well as word frequency, and for both children and adults. The structures of characters and words are also different, as indicated by experimental manipulations (Liu et al. Reference Liu, Chung, McBride-Chang and Tong2010) that highlight the importance of semantic/morphological information in Chinese (Chen & Shu Reference Chen and Shu2001; Tsang & Chen Reference Tsang and Chen2010; Wong & Chen, in press). For example, whereas parents tend to teach children to write words based primarily on phonological information in alphabetic languages such as Hebrew (Aram & Levin Reference Aram and Levin2001; Reference Aram and Levin2004), they almost never focus on phonological information in teaching words or characters in Chinese (Lin et al. Reference Lin, McBride-Chang, Aram, Levin, Cheung and Chow2009; Reference Lin, McBride-Chang, Aram, Shu, Levin and Cho2012); rather, they point out either visual-orthographic or morphological information as they talk about print.

The actual unit of reading as word versus character continues to be debated in Chinese (Chen et al. Reference Chen, Song, Lau, Wong, Tang, McBride-Chang and Chen2003). A crude analogy in English might be that wallpaper and paperweight or toenail and tiptoe are all common words consisting of common morphemes (e.g., paper; toe). Yet not all English speakers analyze these words as made up of separate morphemes or acquire each morpheme comprising compound words before learning these words themselves; some compound words are learned holistically first. Chinese multimorphemic words are often learned as single entities, with the individual characters comprising them becoming salient only when explicitly analyzed. The way in which Chinese appears on a page, with each character equally spaced and no differences in spacing distinguishing what are conceptual words, may also make the concept of a word confusing. This issue is not a problem for the alphabetic languages highlighted by Frost, because words are spatially distinguished in these orthographies.

However, this is not the case in Thai, another alphabetic script. Thai has no spaces between words, so words cannot be defined by using spaces. Importantly, whereas spacing between words in Chinese does not change the speed of reading for adults, in Thai (artificial, experimental) spacing between words actually facilitates word reading, particularly for poor readers (Kasisopa et al. Reference Kasisopa, Reilly, Burnham, Shen, Bai, Yan and Rayner2010). In Thai, word-position frequencies (i.e., statistical regularities) of particular graphemes within words are used to help identify where a word is likely to begin and end. For example, some graphemes that have a high probability of occurring as the initial or the final grapheme may assist readers operationally in defining word boundaries: They effectively direct eye movements to the optimal viewing position in a word (Kasisopa et al. Reference Kasisopa, Reilly, Burnham, Shen, Bai, Yan and Rayner2010). Nevertheless, there is relatively high ambiguity in defining words in Thai since word segmentation relies heavily on sentential context (Aroonmanakun Reference Aroonmanakun, Kawtrakul and Zock2007). Thus, what is segmented as a single word in one sentence may be a phrase or even a sentence in another context. For example, the string “คนขับรถ” can be a single word or a whole sentence, depending on the context in which it occurs: It is a word in the sentence คนขับรถเป็นผู้ชาย – that is, คนขับรถ (driver) เป็น (is) ผู้ชาย (man) (“the driver is a man”) – but a three-word phrase in the sentence คนขับรถเร็ว – that is, คน (man) ขับ (drive) รถ (car) เร็ว (fast) (i.e., “The man drives the car fast”). In addition, many Thai word strings are ambiguous. For example, ตากลม can be read either as ตาก + ลม (exposed + wind = “exposed to wind”) or ตา + กลม (eye[s] + round = “round eyes”). Such ambiguities and the aforementioned contextual effects make it difficult to design automatic segmentation strategies because the number of decisions that cannot be made by machine is surprisingly high (Aroonmanakun Reference Aroonmanakun, Theeramunkong and Sornlertlamvanich2002). Extensive top-down processing is required for resolving these ambiguities. Thus, in both Chinese and Thai, the reading process involves a certain amount of flexibility in order for word recognition to be a clear and salient concept.

For Finnish, an alphabetic orthography with word boundaries marked by spaces, defining the nature of a visual word is also potentially important. Because Finnish has highly inflectional morphology with 15 cases, plural markers, and different clitics, each noun can have more than 2,000 orthographic variants (Niemi et al. Reference Niemi, Laine and Tuominen1994). The root also often changes together with inflections. Moreover, it is possible to express a complicated concept that in other languages might require multiple words by using a single word, especially for verbs, because Finnish allows so much compounding. For example, “lukea” is the basic form of “to read,” whereas “(vielä) luettuammekin…” means “(even) after we had also read…”. In the latter example, the stem change (“luettua”) denotes the past event, “mme” is an inflectional form for “we,” and “kin” is a common ending meaning “also.” These characteristics have prompted Hirsimäki et al. (Reference Hirsimäki, Creutz, Siivola, Kurimo, Virpioja and Pylkkönen2006) to suggest that, for Finnish, “traditional models based on full words are not very effective” (p. 539). Rather, these researchers advocate a word fragment model. Thus, expert Finnish readers efficiently use frequently occurring sub-units of words for which there are well-formed/strong representations and which often do not obey classical word or syllable boundaries.

From these three examples, it is not entirely clear that there can or should be a universal model of visual word recognition. Indeed, what constitutes a word across scripts may be somewhat ambiguous, though dimensions of orthography, phonology, morphology, syntax, and semantics are clearly universal components of reading.

References

Aram, D. & Levin, I. (2001) Mother–child joint writing in low SES: Sociocultural factors, maternal mediation, and emergent literacy. Cognitive Development 16:831–52.CrossRef Google Scholar

Aram, D. & Levin, I. (2004) The role of maternal mediation of writing to kindergartners in promoting literacy in school: A longitudinal perspective. Reading and Writing 17:387–409.CrossRef Google Scholar

Aroonmanakun, W. (2002) Collocation and Thai word segmentation. In: Proceedings of the Fifth Symposium on Natural Language Processing and the Fifth Oriental COCOSDA (International Committee for the Coordination and Standardization of Speech Databases and Assessment Techniques) Workshop, ed. Theeramunkong, T. & Sornlertlamvanich, V., pp. 68–75. Sirindhorn International Institute of Technology.Google Scholar

Aroonmanakun, W. (2007) Thoughts on word and sentence segmentation in Thai. In: Proceedings of the Seventh Symposium on Natural language Processing, Pattaya, Thailand, December 13–15, 2007, ed. Kawtrakul, A. & Zock, M., pp. 85–90. Kasetsart University.Google Scholar

Chen, H.-C. & Shu, H. (2001) Lexical activation during the recognition of Chinese characters. Psychonomic Bulletin and Review 8:511–18.CrossRef Google Scholar PubMed

Chen, H.-C., Song, H., Lau, W. Y., Wong, E. & Tang, S. L. (2003) Developmental characteristics of eye movements in reading Chinese. In: Reading development in Chinese children, ed. McBride-Chang, C. & Chen, H.-C., pp. 157–69. Praeger.Google Scholar

Grainger, J. & Ziegler, J. (2011) A dual-route approach to orthographic processing. Frontiers in Psychology 2:54. doi:10.3389/fpsyg.2011.00054. (Web journal, online publication).CrossRef Google Scholar PubMed

Hirsimäki, T., Creutz, M., Siivola, V., Kurimo, M., Virpioja, S. & Pylkkönen, J. (2006) Unlimited vocabulary speech recognition with morph language models applied to Finnish. Computer Speech and Language 20(4):515–41.CrossRef Google Scholar

Kasisopa, B., Reilly, R. & Burnham, D. (2010) Orthographic factors in reading Thai: An eye tracking study. In: Proceedings of the Fourth China International Conference on Eye Movements (CICEM), Tianjin, China, May 24–26, 2010, ed. Shen, D., Bai, X., Yan, G. & Rayner, K.. p. 8. Tianjin Normal University.Google Scholar

Lin, D., McBride-Chang, C., Aram, D., Levin, I., Cheung, R. Y. M. & Chow, Y. Y.-Y. (2009) Maternal mediation of writing in Chinese children. Language and Cognitive Processes 24(7–8):1286–311.CrossRef Google Scholar

Lin, D., McBride-Chang, C., Aram, D., Shu, H., Levin, I. & Cho, J.-R. (2012) Maternal mediation of word writing in Chinese across Hong Kong and Beijing. Journal of Educational Psychology 104:121–37.CrossRef Google Scholar

Liu, P. D., Chung, K. K. H., McBride-Chang, C. & Tong, X. (2010) Holistic versus analytic processing: Evidence for a different approach to processing of Chinese at the word and character levels in Chinese children. Journal of Experimental Child Psychology 107:466–78.CrossRef Google Scholar PubMed

Niemi, J., Laine, M. & Tuominen, J. (1994) Cognitive morphology in Finnish: Foundations of a new model. Language and Cognitive Processes 9(3):423–46.CrossRef Google Scholar

Perry, C., Ziegler, J. C. & Zorzi, M. (2010) Beyond single syllables: Large-scale modeling of reading aloud with the Connectionist Dual Process (CDP++) model. Cognitive Psychology 61:106–51.CrossRef Google Scholar PubMed

Tsang, Y.-K. & Chen, H.-C. (2010) Morphemic ambiguity resolution in Chinese: Activation of the subordinate meaning with a prior dominant-biased context. Psychonomic Bulletin and Review 17:875–81.CrossRef Google Scholar PubMed

Wong, A. W. K. & Chen, H.-C. (in press) Is syntactic-category processing obligatory in visual word recognition? Evidence from Chinese. Language and Cognitive Processes. doi:10.1080/01690965.2011.603931.Google Scholar