“Cmabirdge” reads almost as well as “Cambridge,” but only in some languages. Ram Frost is right in pointing out that tolerance to letter-position swaps is not a universal feature of reading. His hypothesis that writing systems “optimally represent the languages' phonological spaces” (sect. 3, para. 1) is appealing and is indeed a crucial consideration when discussing the possibility of spelling reform – some variations in writing systems may be more “rational” than they first appear (Dehaene Reference Dehaene2009, pp. 32–37). Does it follow, however, that current open-bigram models of orthographic processing are, in Ram Frost's words, “ill-advised”? And what is the best strategy to achieve a “universal model of reading”?
From a neuroscientific perspective, much insight can be gained from limited models that consider in detail not only the problems raised by a specific script and language, but also the neurobiological constraints on how the brain might solve them. Our bigram neuron hypothesis, which postulates that the left occipitotemporal visual word form area (VWFA) may contain neurons tuned to ordered letter pairs, was presented in this context as a useful solution to position-invariant recognition of written words in English, French, and related Roman scripts (Dehaene et al. Reference Dehaene, Cohen, Sigman and Vinckier2005). A functional magnetic resonance imaging (fMRI) experiment aimed at testing the predictions of this model demonstrated that reading indeed relies on a hierarchy of brain areas sensitive to increasingly complex properties, from individual letters to bigrams and to higher-order combinations of abstract letter representations (Vinckier et al. Reference Vinckier, Dehaene, Jobert, Dubus, Sigman and Cohen2007). These regions form a gradient of selectivity through the occipitotemporal cortex, with activation becoming more selective for higher-level stimuli towards the anterior fusiform region (Fig. 1) (see also Binder Reference Binder, Medler, Westbury, Liebenthal and Buchanan2006). Interestingly, a similar gradient may also exist in Chinese script (Chan et al. Reference Chan, Tang, Tang, Lee, Lo and Kwong2009). It would be important to probe it in Hebrew readers.
Figure 1. Hierarchical Coding of Letter Strings in the Ventral Visual Stream. Up: Design and examples of stimuli used, with an increasing structural similarity to real words. Down: fMRl results The image illustrates the spatial layout of sensitivity of the occipitotemporal cortex to letter strings of different similarity to real words. Activations become more selective for higher-level stimuli (i.e., stimuli more similar to real words) toward the anterior fusiform regions. This is taken as evidence for a hierarchy of brain areas sensitive to increasingly complex properties, from individual letters to bigrams and to higher-order combinations of letters. (Adapted from Vinckier et al. Reference Vinckier, Dehaene, Jobert, Dubus, Sigman and Cohen2007).
We agree with Frost that developing a more general, language-universal model of reading acquisition is a major goal for future research. However, crucially, we would add that such a universal model should incorporate strong constraints from brain architecture and not just linguistics. Existing connectionist models typically incorporate few neurobiological constraints and, as a result, provide information-processing solutions that need not be realistic at the brain level. Reading is a ventral visual stream process that “recycles” existing visual mechanisms used for object recognition (Dehaene Reference Dehaene2009; Szwed et al. Reference Szwed, Cohen, Qiao and Dehaene2009; Reference Szwed, Dehaene, Kleinschmidt, Eger, Valabregue, Amadon and Cohen2011; however, see Reich et al. Reference Reich, Szwed, Cohen and Amedi2011) As such, it is heavily constrained by the limitations of the visual brain, for example, the necessity to process information step by step through distinct visual areas with increasing receptive fields (V1, V2, V3, V4, V5, LO, MT …). Implementing these constraints into general models has proven very challenging so far (although see Mozer Reference Mozer and Coltheart1987). Indeed, important advances in the field have been predominantly guided by narrow, language-specific theories that hardwire these constraints into their architectures. Nevertheless, the vast neurobiological knowledge about these regions should ultimately be tapped by a more general model. Starting from a generic, biologically realistic neuronal architecture, and using realistic synaptic plasticity rules, the future model would converge on a specific architecture for the VWFA in any language. It could include a Bayesian implementation of the informative fragments model, which falls close to predicting the real-life responses of ventral visual stream neurons involved in object recognition (Ullman Reference Ullman2007).
Would such a model, once developed, substantiate Frost's claim that the internal code for letter strings varies strongly across languages, depending on their phonology and word structure? Here, we should clear up a frequent confusion. During online processing, when an actual word is read by a fluent reader, magnetoencephalography (MEG) experiments, with their high temporal resolution, have shown that the first major response of the visual system, peaking roughly 130 msec after seeing a word, is determined overwhelmingly by the frequency of letter combinations that make up a word, whereas lexical and phonological effects come into play much later (Simos et al. Reference Simos, Breier, Fletcher, Foorman, Castillo and Papanicolaou2002; Solomyak & Marantz Reference Solomyak and Marantz2010). Thus, in adults, the VWFA may reflect a relatively isolated stage of orthographic processing that is essentially immune to phonological and semantic influences (Dehaene & Cohen Reference Dehaene and Cohen2011; but see Price & Devlin Reference Price and Devlin2011). However, this is not to say that, in the course of learning, the acquired orthographical code cannot be influenced by the needs of the phonological and semantic systems to which the VWFA ultimately projects. The anatomical localization of the VWFA is strongly influenced, not only by bottom visual constraints (Hasson et al. Reference Hasson, Levy, Behrmann, Hendler and Malach2002), but also by the lateralization of the target spoken language (Pinel & Dehaene Reference Pinel and Dehaene2009). MEG shows that, in English readers, the visual word form system decomposes the words' morphology into prefixes, roots, and affixes about 170 msec after stimulus onset (Solomyak & Marantz Reference Solomyak and Marantz2010). Such decomposition is automatic and operates even with pseudo-affixed words like “brother” that can be falsely decomposed into “broth” and “er” (Lewis et al. Reference Lewis, Solomyak and Marantz2011). Thus, the visual system has internalized orthographic units that are relevant to morphological and lexical knowledge. Although not yet demonstrated, we consider it likely that the VWFA also codes for frequent substrings that facilitate the mapping onto phonemes, such as “th” or “ain” in English. Indeed, this hypothesis may explain why English reading, with its complex grapheme–phoneme mappings, causes greater activation in the VWFA than does Italian reading (Paulesu et al. Reference Paulesu, McCrory, Fazio, Menoncello, Brunswick, Cappa, Cotelli, Cossu, Corte, Lorusso, Pesenti, Gallagher, Perani, Price, Frith and Frith2000).
In this context, we have no difficulty in accepting Frost's argument that the optimal neural code for letter strings might have to be much less tolerant to letter swaps in Hebrew than in English. This view predicts root detectors in the more anterior part of VWFA of Hebrew readers and sharper tuning curves for letters and bigrams detectors. Testing such predictions for scripts other than Latin is an important goal for future neuroimaging experiments. A readily available tool is fMRI repetition suppression, which has proven sensitive to subtle properties of object, number, and letter tuning (Dehaene et al. Reference Dehaene, Jobert, Naccache, Ciuciu, Poline, Le Bihan and Cohen2004; Grill-Spector et al. Reference Grill-Spector, Kushnir, Edelman, Avidan, Itzchak and Malach1999). Alternatively, multivariate pattern analysis may provide more direct access to the fine-tuning characteristic of the VWFA (Braet et al. Reference Braet, Wagemans and Op de Beeck2012).
“Cmabirdge” reads almost as well as “Cambridge,” but only in some languages. Ram Frost is right in pointing out that tolerance to letter-position swaps is not a universal feature of reading. His hypothesis that writing systems “optimally represent the languages' phonological spaces” (sect. 3, para. 1) is appealing and is indeed a crucial consideration when discussing the possibility of spelling reform – some variations in writing systems may be more “rational” than they first appear (Dehaene Reference Dehaene2009, pp. 32–37). Does it follow, however, that current open-bigram models of orthographic processing are, in Ram Frost's words, “ill-advised”? And what is the best strategy to achieve a “universal model of reading”?
From a neuroscientific perspective, much insight can be gained from limited models that consider in detail not only the problems raised by a specific script and language, but also the neurobiological constraints on how the brain might solve them. Our bigram neuron hypothesis, which postulates that the left occipitotemporal visual word form area (VWFA) may contain neurons tuned to ordered letter pairs, was presented in this context as a useful solution to position-invariant recognition of written words in English, French, and related Roman scripts (Dehaene et al. Reference Dehaene, Cohen, Sigman and Vinckier2005). A functional magnetic resonance imaging (fMRI) experiment aimed at testing the predictions of this model demonstrated that reading indeed relies on a hierarchy of brain areas sensitive to increasingly complex properties, from individual letters to bigrams and to higher-order combinations of abstract letter representations (Vinckier et al. Reference Vinckier, Dehaene, Jobert, Dubus, Sigman and Cohen2007). These regions form a gradient of selectivity through the occipitotemporal cortex, with activation becoming more selective for higher-level stimuli towards the anterior fusiform region (Fig. 1) (see also Binder Reference Binder, Medler, Westbury, Liebenthal and Buchanan2006). Interestingly, a similar gradient may also exist in Chinese script (Chan et al. Reference Chan, Tang, Tang, Lee, Lo and Kwong2009). It would be important to probe it in Hebrew readers.
Figure 1. Hierarchical Coding of Letter Strings in the Ventral Visual Stream. Up: Design and examples of stimuli used, with an increasing structural similarity to real words. Down: fMRl results The image illustrates the spatial layout of sensitivity of the occipitotemporal cortex to letter strings of different similarity to real words. Activations become more selective for higher-level stimuli (i.e., stimuli more similar to real words) toward the anterior fusiform regions. This is taken as evidence for a hierarchy of brain areas sensitive to increasingly complex properties, from individual letters to bigrams and to higher-order combinations of letters. (Adapted from Vinckier et al. Reference Vinckier, Dehaene, Jobert, Dubus, Sigman and Cohen2007).
We agree with Frost that developing a more general, language-universal model of reading acquisition is a major goal for future research. However, crucially, we would add that such a universal model should incorporate strong constraints from brain architecture and not just linguistics. Existing connectionist models typically incorporate few neurobiological constraints and, as a result, provide information-processing solutions that need not be realistic at the brain level. Reading is a ventral visual stream process that “recycles” existing visual mechanisms used for object recognition (Dehaene Reference Dehaene2009; Szwed et al. Reference Szwed, Cohen, Qiao and Dehaene2009; Reference Szwed, Dehaene, Kleinschmidt, Eger, Valabregue, Amadon and Cohen2011; however, see Reich et al. Reference Reich, Szwed, Cohen and Amedi2011) As such, it is heavily constrained by the limitations of the visual brain, for example, the necessity to process information step by step through distinct visual areas with increasing receptive fields (V1, V2, V3, V4, V5, LO, MT …). Implementing these constraints into general models has proven very challenging so far (although see Mozer Reference Mozer and Coltheart1987). Indeed, important advances in the field have been predominantly guided by narrow, language-specific theories that hardwire these constraints into their architectures. Nevertheless, the vast neurobiological knowledge about these regions should ultimately be tapped by a more general model. Starting from a generic, biologically realistic neuronal architecture, and using realistic synaptic plasticity rules, the future model would converge on a specific architecture for the VWFA in any language. It could include a Bayesian implementation of the informative fragments model, which falls close to predicting the real-life responses of ventral visual stream neurons involved in object recognition (Ullman Reference Ullman2007).
Would such a model, once developed, substantiate Frost's claim that the internal code for letter strings varies strongly across languages, depending on their phonology and word structure? Here, we should clear up a frequent confusion. During online processing, when an actual word is read by a fluent reader, magnetoencephalography (MEG) experiments, with their high temporal resolution, have shown that the first major response of the visual system, peaking roughly 130 msec after seeing a word, is determined overwhelmingly by the frequency of letter combinations that make up a word, whereas lexical and phonological effects come into play much later (Simos et al. Reference Simos, Breier, Fletcher, Foorman, Castillo and Papanicolaou2002; Solomyak & Marantz Reference Solomyak and Marantz2010). Thus, in adults, the VWFA may reflect a relatively isolated stage of orthographic processing that is essentially immune to phonological and semantic influences (Dehaene & Cohen Reference Dehaene and Cohen2011; but see Price & Devlin Reference Price and Devlin2011). However, this is not to say that, in the course of learning, the acquired orthographical code cannot be influenced by the needs of the phonological and semantic systems to which the VWFA ultimately projects. The anatomical localization of the VWFA is strongly influenced, not only by bottom visual constraints (Hasson et al. Reference Hasson, Levy, Behrmann, Hendler and Malach2002), but also by the lateralization of the target spoken language (Pinel & Dehaene Reference Pinel and Dehaene2009). MEG shows that, in English readers, the visual word form system decomposes the words' morphology into prefixes, roots, and affixes about 170 msec after stimulus onset (Solomyak & Marantz Reference Solomyak and Marantz2010). Such decomposition is automatic and operates even with pseudo-affixed words like “brother” that can be falsely decomposed into “broth” and “er” (Lewis et al. Reference Lewis, Solomyak and Marantz2011). Thus, the visual system has internalized orthographic units that are relevant to morphological and lexical knowledge. Although not yet demonstrated, we consider it likely that the VWFA also codes for frequent substrings that facilitate the mapping onto phonemes, such as “th” or “ain” in English. Indeed, this hypothesis may explain why English reading, with its complex grapheme–phoneme mappings, causes greater activation in the VWFA than does Italian reading (Paulesu et al. Reference Paulesu, McCrory, Fazio, Menoncello, Brunswick, Cappa, Cotelli, Cossu, Corte, Lorusso, Pesenti, Gallagher, Perani, Price, Frith and Frith2000).
In this context, we have no difficulty in accepting Frost's argument that the optimal neural code for letter strings might have to be much less tolerant to letter swaps in Hebrew than in English. This view predicts root detectors in the more anterior part of VWFA of Hebrew readers and sharper tuning curves for letters and bigrams detectors. Testing such predictions for scripts other than Latin is an important goal for future neuroimaging experiments. A readily available tool is fMRI repetition suppression, which has proven sensitive to subtle properties of object, number, and letter tuning (Dehaene et al. Reference Dehaene, Jobert, Naccache, Ciuciu, Poline, Le Bihan and Cohen2004; Grill-Spector et al. Reference Grill-Spector, Kushnir, Edelman, Avidan, Itzchak and Malach1999). Alternatively, multivariate pattern analysis may provide more direct access to the fine-tuning characteristic of the VWFA (Braet et al. Reference Braet, Wagemans and Op de Beeck2012).
ACKNOWLEDGMENT
MS was funded by an Iuventus Plus grant from the Polish Ministry of Science and Higher Education and an ERC (European Research Council) advanced Grant 230313.