Bordag, Gor, and Opitz (Reference Bordag, Gor and Opitz2021) present the Ontogenesis Model of L2 Lexical Representation (OM) as an explanation of how words in a second language are represented in memory. According to the model, a word is represented by six quantities: three indicating how well its orthography, phonology, and semantics are learned and three representing how strongly those representations are interconnected. Over time, a word's orthographic, phonologic, and semantic representations become better learned and inter-connected (see Figure 4 of the article). As outlined in the paper, that framework provides insight into a broad range of lexical behaviours and empirical regularities from the study of second-language acquisition.
The OM is a timely and valuable contribution. It presents a broad and integrative approach to think about lexical representation. It confronts complexities of multilingual representation. It acknowledges the necessity of developing formal models to handle the scale of a problem as large as language. However, the OM stops short of its principal goal.
Although the OM is a theory of lexical representation, it does not define what a lexical representation is or how a lexical representation develops as a function of language experience. Rather, it assumes words have lexical representations; and focuses, instead, on modelling how quickly and how completely those representations are learned. Thus, if one asks what the OM knows about <CAT>, it can report how well <CAT> is learned but cannot report what <CAT> means. In that sense, the OM is more like a theory of associative learning than a theory of mental representation.
As a point of comparison, consider current models of lexical representation. Theories such as HAL (Lund & Burgess, Reference Lund and Burgess1996), LSA (Landauer & Dumais, Reference Landauer and Dumais1997), BEAGLE (Jones & Mewhort, Reference Jones and Mewhort2007), Word2Vec (Mikolov, Sutskever, Chen, Corrado & Dean, Reference Mikolov, Sutskever, Chen, Corrado and Dean2013), and GloVe (Pennington, Socher & Manning, Reference Pennington, Socher and Manning2014) are fully specified models that, when applied to a record of natural language (e.g., newspapers, novels, or internet chatter), derive a unique vector to represent each word. Once derived, those vectors can be decomposed into their representational elements (Hollis & Westbury, Reference Hollis and Westbury2016), applied, and compared to human lexical behaviour. Consistent with the OM's goals, those theories track changes in lexical behaviour as a function of age (Montag, Jones & Smith, Reference Montag, Jones and Smith2015), reading history (Aujla, Reference Aujlain press), culture (Johns & Jamieson, Reference Johns and Jamieson2019), and multilingual language exposure (Johns, Sheppard, Jones & Taler, Reference Johns, Sheppard, Jones and Taler2016). Those representations can also stand in for human knowledge in computer models of memory (Johns, Jones & Mewhort, Reference Johns, Jones and Mewhort2021), decision making (Bhatia, Reference Bhatia2017), language production (Johns, Jamieson, Crump, Jones & Mewhort, Reference Johns, Jamieson, Crump, Jones and Mewhort2020), and cognitive disorder (Johns et al., Reference Johns, Taler, Pisoni, Farlow, Hake, Kareken, Unverzagt and Jones2018). Thus, whereas the OM provides a framework for thinking about lexical representations, modern theories already articulate methods to directly and computationally derive those representations.
In summary, the OM is an intriguing theoretical framework to explain second-language learning. However, its breadth of vision obscures its specificity of explanation. Our goal is to encourage a formalization of the OM, perhaps grounded in a theory we have named or in a different kind of theory that we have not named (Griffiths, Steyvers & Tenenbaum, Reference Griffiths, Steyvers and Tenenbaum2007; Jamieson, Avery, Johns & Jones, Reference Jamieson, Avery, Johns and Jones2018; Kwantes, Reference Kwantes2005). No matter the outcome, a formal expression of the OM will allow researchers to interrogate the model, compare it to state-of-the-art language models, and leverage its insights to make discoveries. We are hopeful that future work can meet that goal.
Bordag, Gor, and Opitz (Reference Bordag, Gor and Opitz2021) present the Ontogenesis Model of L2 Lexical Representation (OM) as an explanation of how words in a second language are represented in memory. According to the model, a word is represented by six quantities: three indicating how well its orthography, phonology, and semantics are learned and three representing how strongly those representations are interconnected. Over time, a word's orthographic, phonologic, and semantic representations become better learned and inter-connected (see Figure 4 of the article). As outlined in the paper, that framework provides insight into a broad range of lexical behaviours and empirical regularities from the study of second-language acquisition.
The OM is a timely and valuable contribution. It presents a broad and integrative approach to think about lexical representation. It confronts complexities of multilingual representation. It acknowledges the necessity of developing formal models to handle the scale of a problem as large as language. However, the OM stops short of its principal goal.
Although the OM is a theory of lexical representation, it does not define what a lexical representation is or how a lexical representation develops as a function of language experience. Rather, it assumes words have lexical representations; and focuses, instead, on modelling how quickly and how completely those representations are learned. Thus, if one asks what the OM knows about <CAT>, it can report how well <CAT> is learned but cannot report what <CAT> means. In that sense, the OM is more like a theory of associative learning than a theory of mental representation.
As a point of comparison, consider current models of lexical representation. Theories such as HAL (Lund & Burgess, Reference Lund and Burgess1996), LSA (Landauer & Dumais, Reference Landauer and Dumais1997), BEAGLE (Jones & Mewhort, Reference Jones and Mewhort2007), Word2Vec (Mikolov, Sutskever, Chen, Corrado & Dean, Reference Mikolov, Sutskever, Chen, Corrado and Dean2013), and GloVe (Pennington, Socher & Manning, Reference Pennington, Socher and Manning2014) are fully specified models that, when applied to a record of natural language (e.g., newspapers, novels, or internet chatter), derive a unique vector to represent each word. Once derived, those vectors can be decomposed into their representational elements (Hollis & Westbury, Reference Hollis and Westbury2016), applied, and compared to human lexical behaviour. Consistent with the OM's goals, those theories track changes in lexical behaviour as a function of age (Montag, Jones & Smith, Reference Montag, Jones and Smith2015), reading history (Aujla, Reference Aujlain press), culture (Johns & Jamieson, Reference Johns and Jamieson2019), and multilingual language exposure (Johns, Sheppard, Jones & Taler, Reference Johns, Sheppard, Jones and Taler2016). Those representations can also stand in for human knowledge in computer models of memory (Johns, Jones & Mewhort, Reference Johns, Jones and Mewhort2021), decision making (Bhatia, Reference Bhatia2017), language production (Johns, Jamieson, Crump, Jones & Mewhort, Reference Johns, Jamieson, Crump, Jones and Mewhort2020), and cognitive disorder (Johns et al., Reference Johns, Taler, Pisoni, Farlow, Hake, Kareken, Unverzagt and Jones2018). Thus, whereas the OM provides a framework for thinking about lexical representations, modern theories already articulate methods to directly and computationally derive those representations.
In summary, the OM is an intriguing theoretical framework to explain second-language learning. However, its breadth of vision obscures its specificity of explanation. Our goal is to encourage a formalization of the OM, perhaps grounded in a theory we have named or in a different kind of theory that we have not named (Griffiths, Steyvers & Tenenbaum, Reference Griffiths, Steyvers and Tenenbaum2007; Jamieson, Avery, Johns & Jones, Reference Jamieson, Avery, Johns and Jones2018; Kwantes, Reference Kwantes2005). No matter the outcome, a formal expression of the OM will allow researchers to interrogate the model, compare it to state-of-the-art language models, and leverage its insights to make discoveries. We are hopeful that future work can meet that goal.