Models are no new beasts to scholars of bilingualism. During the last several decades we have seen many interesting and important models that postulate how the bilingual mind works. But specific, computationally implemented, models are far less common than general, verbal, models of bilingualism. This is because the former require efforts on the part of the researcher to conduct algorithmic and representational implementations, whereas the latter do not. The central question is: What good does implementation do in telling us about the bilingual mind beyond what the verbal models do? This Special Issue is an attempt to address this question with seven computational models of bilingualism from different research labs.
Before I present the reader with an overview of these seven models and their significance and implications with regard to the central question asked above, let me briefly discuss some general points regarding computational modeling of bilingualism in relation to the focus of this Special Issue.
First, the emergence of cognitive science has been strongly based on a computer metaphor of how the mind works (see Gardner, Reference Gardner1985, for a historic review), and as such, computational modeling has played a vital role in understanding human cognitive and linguistic mechanisms. Because of the requirement of implementation, computational modelers need to be explicit about assumptions of their hypotheses, as they must specify, algorithmically, the very basic concepts in a model (e.g., such as “similarity”, “adjacency”, or “association” defined in quantitative and numerical terms). In many of our sister fields and in cognitive science in general, computational models based on algorithmic implementations have been very influential (see examples in Elman, Bates, Johnson, Karmiloff-Smith, Parisi & Plunkett, Reference Elman, Bates, Johnson, Karmiloff-Smith, Parisi and Plunkett1996; MacWhinney, Reference MacWhinney2010). By contrast, while a number of computational models of bilingualism have been developed (see Grosjean & Li, Reference Grosjean and Li2013; Thomas & van Heuven, Reference Thomas, van Heuven, Kroll and de2005, for reviews), the progress here has been slow, especially compared with advances in computational modeling in the monolingual context. The models presented in this Special Issue thus fill a big gap in this regard.
Second, a wide variety of computational models have been implemented for language studies in the past decade, but a specific class of computational models has been proven to be particularly very useful to the understanding of the bilingual mind. This is the class of models that are inspired by connectionism, Parallel Distributed Processing (PDP), or artificial neural networks. For example, one well-known model, the Bilingual Interactive Activation (BIA) model developed by Dijkstra and van Heuven (Reference Dijkstra, van Heuven, Grainger and Jacobs1998) relies on the principle of interactive activation, an important mechanism used in the PDP or connectionist models. In general, connectionist networks have been helpful in understanding a wide range of phenomena in linguistic behavior, such as speech perception, speech production, semantic representation, reading acquisition, and lexical acquisition. Not coincidentally, then, most of the seven computational models presented in this Special Issue incorporate connectionist or PDP mechanisms for learning and representation.
Third, previous computational models of bilingualism have been largely designed to account for linguistic representations in the mature, proficient adult bilingual speaker's knowledge rather than for developmental changes that occur in the bilingual learner. Along with the focus on building “proficient” models is often the use of “localist” representations of lexical items or concepts, which means a one-to-one correspondence between nodes in the model and items in the lexicon (e.g., as in the BIA model). More recently, researchers have recognized the role of computational modeling both in identifying processing mechanisms and in capturing developmental patterns across stages of learning (e.g., French & Jacquet, Reference French and Jacquet2004; Hernandez, Li & MacWhinney, Reference Hernandez, Li and MacWhinney2005; Zhao & Li, Reference Zhao and Li2010). In this Special Issue, we see models that draw on learning principles in describing important developmental variables such as bilingual proficiency and age of L2 onset that underlie both acquisition and processing. Moreover, most of the seven models use “distributed” instead of “localist” representations, implementing more complex and realistic models that make contact with detailed features of the linguistic input or the learner. One challenge to the research community is how we can develop integrated models of both acquisition/development and representation/processing, and readers will be glad to see that several papers in this Special Issue have made an attempt in this direction.
In the first paper, Monner, Vatz, Morini, Hwang and DeKeyser address a long-standing issue in bilingualism and language acquisition: To what extent is the learning of a second language influenced by the entrenchment of one's first language? This question is related to the so-called critical period, which has traditionally been attributed to a biologically determined timetable. Johnson and Newport (Reference Johnson and Newport1989) proposed the “less is more” account instead of a biologically-based account of critical periods for language learning, which argues that less well developed cognitive capacities (e.g., memory) actually confer learning advantages to the young learners. Monner and colleagues tested the “less is more” hypothesis using a connectionist model that learns the gender assignment and agreement in Spanish and French. A significant contribution of this work is that it not only tests a theoretical hypothesis, but also illustrates that computational models can flexibly bring important variables under systematic control, variables that are otherwise confounded in natural learning settings. For example, in Monner et al.'s network, increase of working memory is simulated by the use of new cell assemblies in the model, whereas L1 entrenchment is simulated by training of the network with variable-length exposure to L1 before the onset of L2 (see Zhao & Li, Reference Zhao and Li2010, for manipulations of “early” vs. “late” learners). In this way, the modeling results allow us to dissociate effects due to age of L2 onset and those due to capacity of memory, thereby specifying individual and joint contributions of these two variables. This example illustrates the important role that modeling can play in bilingualism research, given that in human learning the two variables (age and memory) are naturally confounded.
Cuppini, Magosso and Ursino present a computational model to tackle the relationship between the degree of L2 proficiency and the interaction between L1 and L2 lexical semantic representations. In the last decade there has been growing evidence that, contrary to the critical period hypothesis, bilingual speakers may recruit the same brain areas to handle both L1 and L2, although the degree of involvement of these areas may be different depending on the level of L2 proficiency (see Abutalebi, Reference Abutalebi2008, for a review). Cuppini and colleagues’ model has a distinct lexical representation for each language but a common conceptual representation for the two languages, and training in the model shows different strengths of connection between the L1 and L2 lexical systems as a function of different L2 proficiency: as it becomes more proficient in the L2, the model is able to make direct connections between L2 lexical form and conceptual knowledge, not relying heavily on the connections between L1 and L2 lexical form representations. These patterns are consistent with Green's (Reference Green, van, Hulk, Kuiken and Towell2003) convergence hypothesis, according to which increased L2 proficiency will lead to neurocognitive convergence with regard to the way bilinguals represent and process the two languages. The model also aspires to connect with neuroscience and neuroimaging data through simulations with neurally plausible mechanisms such as Hebbian learning and long-term potentiation, which is a unique feature of the model.
At the core of the above studies are the computational learning mechanisms in the model, which provide excellent examples for integrating learning and processing, a much-needed research direction as discussed earlier. The next paper, by Zhao and Li, further illustrates this integration. Many bilingual memory models have been concerned with the issue of single versus distinct storage of bilingual lexical knowledge, and most of these models focus on the lexical knowledge of the proficient adult speaker, as mentioned earlier. Zhao and Li take a developmental approach toward this issue, as they have done elsewhere within the DevLex framework (e.g., Zhao & Li, Reference Zhao and Li2010; see also reviews in Li & Zhao, Reference Li, Zhao and Aronoff2012, in press). Specifically, they build a connectionist model called DevLex II that can track how bilingual lexical representation develops as a function of L2 onset time relative to L1 learning in the model and the direction of interaction between L1 and L2. Implementing both Hebbian learning and spreading activation principles, Zhao and Li's model simulates performance patterns in cross-language semantic priming that have been reported in the empirical literature, including effects of priming direction and types of priming involved (semantic vs. translation). Analysis of such effects in the model also provides a computational account of priming based on the implementation of distributed and overlapping semantic features within and across languages.
Shook and Marian present the Bilingual Language Interaction Network for Comprehension of Speech, or the BLINCS model. In contrast to the BIA model that has been used to account for visual word recognition, the BLINCS model is designed to examine spoken word comprehension. Previous computational models of spoken word recognition include the BIMOLA model (Bilingual Model of Lexical Access, Lewy & Grosjean, Reference Lewy, Grosjean and Grosjean2008), which involves the use of interactive activation principles and localist representations as in the BIA model. The unique features of the BLINCS model are (i) its incorporation of both classic connectionist learning algorithms and unsupervised self-organizing maps (SOM), and (ii) bidirectional excitatory and inhibitory connections within and across different levels of processing. The first feature provides the model with a means to represent the detailed linguistic and phonological properties while at the same time adapt to cognitive demands in real-time processing, whereas the second feature allows the model to capture lexical interactions within and across languages. The BLINCS model is a significant step forward from the earlier SOMBIP model (Self-Organizing Model of Bilingual Processing; Li & Farkas, Reference Li, Farkas, Heredia and Altarriba2002) in that it simulates bilingual lexical activation as it unfolds in time, even though the SOMBIP model was also motivated by considering issues of learning and representation and bidirectional cross-language interactions.
While a large number of connectionist models of language acquisition have relied on the SOM architecture (see Li & Zhao, Reference Li, Zhao and Aronoff2012, for a bibliography), only a handful of bilingual models have used SOM (including the SOMBIP and BLINCS discussed above). Kiran, Grasemann, Sandberg and Miikkulainen present a SOM-based model to simulate patterns of bilingual language recovery in aphasic patients following treatment. Their model, DISLEX, has been previously applied to simulate aphasia and bilingual lexical representation (Miikkulainen, Reference Miikkulainen1997; Miikkulainen & Kiran, Reference Miikkulainen and Kiran2009). In the current study, the model is applied to simulate behavioral patterns on a case-by-case basis for each of the 17 patients who underwent treatment following injury. The model's close match with real behavioral data from individual patients is a testimony that computational models, when properly constructed, can closely reflect realistic linguistic processes (in addition to other advantages of modeling discussed in this Introduction). I say “properly constructed” because in order to reflect empirical patterns, the model must incorporate important variables underlying patterns of behavior, including the patient's language history with regard to age of L1 and L2 acquisition, proficiency, and the dominance of the treatment language. More impressive is the model's ability to predict the efficacy of rehabilitation in each of the bilingual's languages. In reality, each bilingual patient underwent rehabilitation treatment for only one of their languages (English or Spanish) due to empirical constraints, but Kiran et al.'s model is trained for recovery in both languages following lesion, thus showing considerable advantage and flexibility of the model as compared with examination of the actual patient. Finally, in empirical studies the researcher works with the injured patient and cannot go back to study the patient's pre-lesion condition, whereas in computational modeling the researcher can examine the intact model, lesion it, and then track the performance of the same model before and after lesion, as is done by Kiran and colleagues in their study.
A significant role that computational modeling plays is in the selection and manipulation of critical variables and parameters for systematically testing alternative hypotheses and for evaluating the performance and outcome of the relevant hypotheses. The paper by Roelof, Dijkstra and Gerakaki shows just how this is done. Using the WEAVER++ model, the authors attempt to test competing accounts of spoken word production. The model has been previously applied to account for monolingual word production processes, and is used here to test two contrasting hypotheses of bilingual word translation: (i) the discrete-flow model, which assumes that context effects (e.g., semantic facilitation) arise at the conceptual level during translation, and the selected concepts activate the corresponding words, but the words in the to-be-translated language are not activated; and (ii) the continuous-flow model, which assumes a cascade architecture, in which activation can spread from concepts to words across the two languages regardless of the concept selection process. The simulation results demonstrate that the continuous-flow model can account for key findings regarding context effects in translation from L1 to L2 and from L2 to L1. This study also shows how a general computational model such as WEAVER++ can account for very specific bilingual processing effects, in particular, context effects due to semantically-related words or pictures in the L1 and L2 word translation process.
An increasing number of studies have examined bilingual language processing and acquisition in the Chinese–English bilingual context, due to the unique features of the Chinese language and its orthography in comparison to Western languages (see Li, Tan, Bates & Tzeng, Reference Li, Tan, Bates and Tzeng2006, for reviews). The final paper in the Special Issue, by Yang, Shu, McCandliss and Zevin, provides a computational model to study typical and atypical reading processes in Chinese and English that involve distinct writing systems. In order to identify language-general and language-specific patterns in reading and reading acquisition, Yang et al. construct two sets of simulations, one for each language separately, and one for both languages in the same model. By comparing findings from the two sets of simulations within the same computational architecture, they are able to identify the individual and joint contributions of phonology and semantics to reading. More importantly, they find that the same computational architecture gives rise to differences between languages, which suggests that such differences are due to statistical properties of the writing systems to be learned (i.e., in terms of the relationships among print, sound, and meaning), rather than the cognitive architecture for learning to read. Yang et al.'s model clearly contrasts with models that assume different neurocognitive structures for L1 versus L2 reading in different writing systems, such as the accommodation-assimilation hypothesis (Perfetti, Liu, Fiez, Nelson, Bolger & Tan, Reference Perfetti, Liu, Fiez, Nelson, Bolger and Tan2007).
I hope that by this point the reader is convinced that computational models have much to offer to the understanding of the bilingual mind, over and beyond what general verbal, hypothesis-driven, models can do. Implementation of computational models forces the researcher to be very explicit about their hypotheses, predictions, materials, and testing procedures, and at the same time, gives the flexibility of parameter selection and reliability of testing that are often not found in empirical studies. Indeed, the potential of a bilingual computational model lies in its ability to identify gaps in experimental designs, and in systematic manipulation of variables such as age of acquisition (early vs. late) and proficiency (high vs. low), variables that may be naturally confounded in experimental or learning situations. The seven models presented in this Special Issue certainly demonstrate the advantages and the need for developing more computational models of bilingualism, as they deepen our understanding of the complex interactive mechanisms involved in the acquisition and processing of two linguistic systems. The models also provide a good variety of computational architectures, examine a range of theoretical issues, cover linguistic domains of phonology, grammar, lexicon, and analyze both spoken and written languages across different bilingual populations. More important, most of the models attempt to integrate learning and processing within a computational framework, moving the field forward in an important direction.
There remain a number of challenges ahead for computational modeling of bilingualism to play a more significant role. One is how we can bridge computational modeling results with a variety of other behavioral, neuropsychological, and neuroimaging findings. The Kiran et al. study is exemplary in this regard. The studies of Cuppini et al., Roelof et al. and Zhao and Li also make clear efforts to match their modeling data with neural or behavioral responses in order to obtain convergent results. Another challenge that lies ahead is how we can develop models that make predictions in light of the simulations and empirical data. In some cases, the empirical data have not yet been obtained, or cannot be obtained (e.g., as in the case of brain injury one cannot go back to pre-lesion conditions), and this is the occasion where modeling results will be most helpful. Not only should computational modeling verify existing patterns of behavior on another platform, it should also inform theories of bilingualism by making distinct predictions under different hypotheses or conditions. In so doing, computational modeling will provide a new forum for generating novel ideas, inspiring new experiments, and helping formulate new theories (see McClelland, Reference McClelland2009, for a discussion of the role of modeling in cognitive science). Finally, computational modelers should follow a recent call by Addyman and French (Reference Addyman and French2012) to make an effort to provide user-friendly interfaces and tools to non-modelers, so that many more researchers of bilingualism can test computational models without fearing the technical hurdles posed by programming languages, source codes, and simulating environments.