Frost's appeal for universal theorizing contrasts with the recommendation formulated by Jacobs and Grainger (Reference Jacobs and Grainger1994) to “grow” models not wildly but in accord with a few general principles and a few pragmatic stratagems. Frost proposes a two-tier approach to modeling that comprises a “universal” level and a “local” level. This two-tier approach is analogous to Marr's (Reference Marr1982) distinction between the computational theory (that answers the “why?” question) and its implementation (that addresses the “how?” question). Frost is therefore following in the footsteps of a number of protagonists of this top-down approach to scientific theorizing, championed in recent years by the Bayesian tradition. In a nutshell, the claim is that it is the computational theory that is important and we can basically ignore the details of its implementation.
With respect to understanding reading, Frost's major claim is that language-specific research does not contribute to a general theory of reading. Taking the example of research on orthographic processing in Indo-European languages, he states that “sensitivity to letter order per se does not tell us anything interesting about how the brain encodes letter position, from the perspective of a theory of reading. Instead, it tells us something very interesting about how the cognitive system treats letters in specific linguistic environments” (sect. 1.1, para. 6, original emphasis). This is a statement that is just as much perplexing as it is logically flawed. Since a theory of reading must encompass a model of how the cognitive system treats letters in specific linguistic environments, then if the latter is deemed to be of interest, this interest must still hold in the former. Moreover, we would argue that taking Frost's appeal seriously would likely lead research as far astray as the Chomskyan movement's focus on uncovering innate linguistic abilities in a quest for the “universal grammar.”
What might a universal theory of reading look like? First, and here we agree with Frost (but who could disagree?), that it would need to be founded on basic principles of human information processing. Now, recent years have seen a consensus develop around at least two such principles: (1) The brain is an information-processing device that strives to make optimal decisions; and (2) behavior adapts optimally to environmental constraints via the principles of statistical learning. In fact, it is precisely these two principles that, for the moment, constitute Frost's universal model of reading. Not a major breakthrough, we would argue, since (1) such general principles are often acknowledged by theorists working at a more local level (e.g., Dufau et al., in press); and (2) although they certainly provide important constraints on theories of reading, without the details of the implementation they would not be much use to a speech therapist trying to understand why a child cannot learn to read.
Here we describe a universal theory of reading that has been slowly nurtured by the kind of refinement process for scientific progress advocated by Jacobs and Grainger (Reference Jacobs and Grainger1994). Indeed, applying a language-specific approach has helped uncover a number of universal clues. These clues are neural, developmental, and computational. The neural clue is that reading involves a quite particular piece of neural machinery (the visual system and the visual word form area; see Dehaene & Cohen Reference Dehaene and Cohen2011). The developmental clue is that humans are already experts at object recognition when they start to learn to read. The computational clue is that there is one way to represent and compare sequences that has proved very efficient from a purely computational perspective, and goes by the name of “string kernels” (Hannagan & Grainger, in press).
Like three lines crossing at the same point, these cross-language – even cross-domain – clues lead to the proposal that humans must represent words not by assigning absolute positions to different parts, but by keeping track of the relationship between parts – that is, by detecting feature combinations (see also Whitney Reference Whitney2001). In English, for instance, orthographic processing would involve extracting ordered letter combinations: Reading TIME requires extracting, among others, the combination TI but not IT. This is equivalent to representing words as points in a high-dimensional space, the space indexed by all possible letter combinations. In machine learning, the function that compares sequences in this space is known as a string kernel. String kernels drastically improve the linear separability of word representations, a property that is desirable in English as in Hebrew, or whatever the statistics of one's particular lexicon turn out to be. Given that the primate visual system is already believed to use feature combinations to represent visual objects (e.g., Brincat & Connor Reference Brincat and Connor2004), it is also the minimal modification to the child's pre-existing visual object recognition system. In addition, string kernels form a quite versatile solution that has recently found success in disciplines ranging from bioinformatics to automatic language processing. These successes were obtained parsimoniously by varying only three parameters (length of letter combinations, gap penalty, and wildcard character), demonstrating that different kernels can capture the particulars of many application domains. String kernels can also be trained so as to ignore certain dimensions in the letter combination space and to favor others. In this view, the task of learning the visual word code becomes the task of learning to represent words by adequate feature combinations – learning the right string kernel function.
The theory thus follows Frost's first criterion for model evaluation, the “universality constraint,” since it applies to different languages, and indeed goes beyond this restricted universality because it applies to different modalities, different object categories, and even in very different fields of science. It also follows Frost's criterion 2 for model evaluation (“linguistic plausibility”), since, following Grainger and Ziegler (Reference Grainger and Ziegler2011), we argue that the nature of the mapping of orthographic features onto higher-level linguistic representations (i.e., the appropriate string kernel) is constrained by the very nature of these higher-level representations. Finally, and most importantly, the theory also exhibits explanatory adequacy. It provides a better fit to a set of independently established benchmark phenomena (Davis Reference Davis2010) than do competing models, and it does so with a much smaller number of free parameters.
Frost's appeal for universal theorizing contrasts with the recommendation formulated by Jacobs and Grainger (Reference Jacobs and Grainger1994) to “grow” models not wildly but in accord with a few general principles and a few pragmatic stratagems. Frost proposes a two-tier approach to modeling that comprises a “universal” level and a “local” level. This two-tier approach is analogous to Marr's (Reference Marr1982) distinction between the computational theory (that answers the “why?” question) and its implementation (that addresses the “how?” question). Frost is therefore following in the footsteps of a number of protagonists of this top-down approach to scientific theorizing, championed in recent years by the Bayesian tradition. In a nutshell, the claim is that it is the computational theory that is important and we can basically ignore the details of its implementation.
With respect to understanding reading, Frost's major claim is that language-specific research does not contribute to a general theory of reading. Taking the example of research on orthographic processing in Indo-European languages, he states that “sensitivity to letter order per se does not tell us anything interesting about how the brain encodes letter position, from the perspective of a theory of reading. Instead, it tells us something very interesting about how the cognitive system treats letters in specific linguistic environments” (sect. 1.1, para. 6, original emphasis). This is a statement that is just as much perplexing as it is logically flawed. Since a theory of reading must encompass a model of how the cognitive system treats letters in specific linguistic environments, then if the latter is deemed to be of interest, this interest must still hold in the former. Moreover, we would argue that taking Frost's appeal seriously would likely lead research as far astray as the Chomskyan movement's focus on uncovering innate linguistic abilities in a quest for the “universal grammar.”
What might a universal theory of reading look like? First, and here we agree with Frost (but who could disagree?), that it would need to be founded on basic principles of human information processing. Now, recent years have seen a consensus develop around at least two such principles: (1) The brain is an information-processing device that strives to make optimal decisions; and (2) behavior adapts optimally to environmental constraints via the principles of statistical learning. In fact, it is precisely these two principles that, for the moment, constitute Frost's universal model of reading. Not a major breakthrough, we would argue, since (1) such general principles are often acknowledged by theorists working at a more local level (e.g., Dufau et al., in press); and (2) although they certainly provide important constraints on theories of reading, without the details of the implementation they would not be much use to a speech therapist trying to understand why a child cannot learn to read.
Here we describe a universal theory of reading that has been slowly nurtured by the kind of refinement process for scientific progress advocated by Jacobs and Grainger (Reference Jacobs and Grainger1994). Indeed, applying a language-specific approach has helped uncover a number of universal clues. These clues are neural, developmental, and computational. The neural clue is that reading involves a quite particular piece of neural machinery (the visual system and the visual word form area; see Dehaene & Cohen Reference Dehaene and Cohen2011). The developmental clue is that humans are already experts at object recognition when they start to learn to read. The computational clue is that there is one way to represent and compare sequences that has proved very efficient from a purely computational perspective, and goes by the name of “string kernels” (Hannagan & Grainger, in press).
Like three lines crossing at the same point, these cross-language – even cross-domain – clues lead to the proposal that humans must represent words not by assigning absolute positions to different parts, but by keeping track of the relationship between parts – that is, by detecting feature combinations (see also Whitney Reference Whitney2001). In English, for instance, orthographic processing would involve extracting ordered letter combinations: Reading TIME requires extracting, among others, the combination TI but not IT. This is equivalent to representing words as points in a high-dimensional space, the space indexed by all possible letter combinations. In machine learning, the function that compares sequences in this space is known as a string kernel. String kernels drastically improve the linear separability of word representations, a property that is desirable in English as in Hebrew, or whatever the statistics of one's particular lexicon turn out to be. Given that the primate visual system is already believed to use feature combinations to represent visual objects (e.g., Brincat & Connor Reference Brincat and Connor2004), it is also the minimal modification to the child's pre-existing visual object recognition system. In addition, string kernels form a quite versatile solution that has recently found success in disciplines ranging from bioinformatics to automatic language processing. These successes were obtained parsimoniously by varying only three parameters (length of letter combinations, gap penalty, and wildcard character), demonstrating that different kernels can capture the particulars of many application domains. String kernels can also be trained so as to ignore certain dimensions in the letter combination space and to favor others. In this view, the task of learning the visual word code becomes the task of learning to represent words by adequate feature combinations – learning the right string kernel function.
The theory thus follows Frost's first criterion for model evaluation, the “universality constraint,” since it applies to different languages, and indeed goes beyond this restricted universality because it applies to different modalities, different object categories, and even in very different fields of science. It also follows Frost's criterion 2 for model evaluation (“linguistic plausibility”), since, following Grainger and Ziegler (Reference Grainger and Ziegler2011), we argue that the nature of the mapping of orthographic features onto higher-level linguistic representations (i.e., the appropriate string kernel) is constrained by the very nature of these higher-level representations. Finally, and most importantly, the theory also exhibits explanatory adequacy. It provides a better fit to a set of independently established benchmark phenomena (Davis Reference Davis2010) than do competing models, and it does so with a much smaller number of free parameters.