New theories are a constant of the now vast literature on code-mixing (CM). The Gradient Symbolic Computation model proposed by Goldrick, Putnam and Schwartz (Goldrick, Putnam & Schwartz) will appeal to many, especially those who already espouse constraint-based approaches to grammar. As variationist sociolinguists, we particularly welcome the model's incorporation of “relative probabilities of certain structures”, a feature we believe can enhance our chances of capturing actual CM behavior. We also applaud Goldrick et al.’s efforts to integrate experimental findings on co-activation with grammatical principles. Our questions concern the utility of “doubling constructions” to showcase the model, and by extension, the degree to which it can account for bilinguals’ spontaneous production of CM. A historical perspective on the field shows that none of the myriad theories of CM, often inspired by competing sets of grammatical principles, has yet achieved broad acceptance. In the absence of any widely endorsed evaluation metric – still sadly lacking -– how are we to decide amongst them?
Sociolinguists would require that a model be tested against, and supported by, the data of actual spontaneous bilingual production. We first observe that in this kind of data, the doubling constructions the model is claimed to account for are exceedingly rare. Goldrick et al. acknowledge this, but imply that such constructions are nonetheless a consistent feature of CM corpora (emphasis ours). Empirical research suggests otherwise: in over a dozen bilingual data sets (involving thousands of tokens of CM) systematically studied by us, only a very few (e.g., Poplack, Wheeler & Westwood, Reference Poplack, Wheeler, Westwood, Lilius and Saari1987; Sankoff, Poplack & Vanniarajan, Reference Sankoff, Poplack and Vanniarajan1990) featured any instances at all of such constructions, and rarely did these exceed a handful in any one. This makes the phenomenon not only rare but sporadic, explaining why Sankoff et al. (Reference Sankoff, Poplack and Vanniarajan1990) referred to it (as would we) as an “ad-hoc processing mechanism”. While doubling may nonetheless serve the purpose of model development, its very exceptionality raises the question of whether and how its analysis can be generalized to the bulk of the data, surely a desideratum of any predictive account of CM.
Even if doubling were a robust phenomenon, other key elements of the proposed model are also at odds with the facts of CM on the ground. Goldrick et al. maintain that “speakers are not only uttering lexical items from both languages but are also integrating grammatical principles from each linguistic system” and, further, that such integration is reflected in “the transfer of grammatical patterns from one language to another”. But “blend representations” (p. 6) (already proposed by Weinreich, Reference Weinreich1963), should not be conflated with grammatical convergence in production. Actual bilingual behavior, whether viewed in diachronic or synchronic perspective, fails in the aggregate to support blending; instead, it reveals bilinguals’ knowledge and application of independent, language-particular, grammatical principles.
Diachronically, grammatical convergence between languages in contact, although widely assumed, is seldom satisfactorily demonstrated once accountable methodology and appropriate benchmarks are employed (e.g., Poplack, Zentz & Dion, Reference Poplack, Zentz and Dion2012; Silva-Corvalán, Reference Silva-Corvalán2008; Torres Cacoullos & Travis, Reference Torres Cacoullos and Travis2015). Synchronically, empirical study of many bilingual speech communities has confirmed that the vast majority of CM data is constituted by lone other-language items, which in turn, tend to be morphologically and syntactically integrated into recipient-language grammar (e.g., Poplack & Meechan, Reference Poplack and Meechan1998). This is consonant with – and identical to – lexical borrowing. Here, although the lexical item derives from one (donor) language, only the grammar of the other language is at play, not a blend of both. Multiword CM, on the other hand, is indeed compatible with co-activation of both of the speakers’ grammars: intra-sententially, code-switching is strongly preferred where the word orders of both languages are homologous, a fact modeled by the Equivalence Constraint (Poplack, Reference Poplack1980). But switching at points of congruence between two languages does not entail amalgamation of their grammatical principles. On the contrary, pace coincidental inter-linguistic similarities, the multiword strings are internally consistent only with the grammatical principles of the language from which they are drawn. Thus, neither of the major manifestations of CM involves blending: lexical borrowing because only one (recipient-language) grammar is involved, code-switching because two grammars are independently at play.
Goldrick et al. adduce further evidence for blending from cross-language priming and cognate effects. But the only study of bilingual syntactic priming outside the lab (Travis, Torres Cacoullos & Kidd, Reference Travis, Torres Cacoullos and Kidd2015) reveals that it is weaker across than within languages, and that its relative strength varies according to contextual features and particular constructions of the language of the target, not the prime. As to cognate effects on phonetic variation, these are shown to emerge from the cumulative effect of usage patterns: when bilinguals’ use of both languages is considered, cognate words occur less often than non-cognates in the relevant contexts (Brown, Reference Brown2015). These findings from spontaneous speech, showing that the conditioning factors are language-specific, are consistent with “gradient co-activation” but not with blending of grammars.
Even co-activation, though clearly pertinent in at least some kinds of CM, is difficult to quantify. We agree that linguistic experience is crucial. But in bilinguals’ experience, language-internal inherent variability, for example in word order (unacknowledged here, but see e.g., Poplack, Sayahi, Mourad & Dion, Reference Poplack, Sayahi, Mourad and Dion2015; Sankoff et al., Reference Sankoff, Poplack and Vanniarajan1990), is inescapable. Given variability, what would constitute appropriate input training data for learning algorithms? And how can activation values be meaningfully estimated in the absence of information about speakers’ actual exposure to, relative proficiency in and frequency of use of the languages, not to mention the prevailing norms of their bilingual community? These crucial predictors of the selection, form and placement of CM cannot simply be surmised nor are they typically available from lab-based studies of university-student L2 learners.
In sum, a minor phenomenon (if it qualifies as that) has served as the foundation for a model at odds with the patterns suggested by the behavior of major phenomena. Goldrick et al. assure us that their probabilistic approach can accommodate “sociolinguistic factors”; we look forward to this eventuality. In the interim, we commend the authors for bringing to the fore several of the many challenges confronting the modeling of CM.
New theories are a constant of the now vast literature on code-mixing (CM). The Gradient Symbolic Computation model proposed by Goldrick, Putnam and Schwartz (Goldrick, Putnam & Schwartz) will appeal to many, especially those who already espouse constraint-based approaches to grammar. As variationist sociolinguists, we particularly welcome the model's incorporation of “relative probabilities of certain structures”, a feature we believe can enhance our chances of capturing actual CM behavior. We also applaud Goldrick et al.’s efforts to integrate experimental findings on co-activation with grammatical principles. Our questions concern the utility of “doubling constructions” to showcase the model, and by extension, the degree to which it can account for bilinguals’ spontaneous production of CM. A historical perspective on the field shows that none of the myriad theories of CM, often inspired by competing sets of grammatical principles, has yet achieved broad acceptance. In the absence of any widely endorsed evaluation metric – still sadly lacking -– how are we to decide amongst them?
Sociolinguists would require that a model be tested against, and supported by, the data of actual spontaneous bilingual production. We first observe that in this kind of data, the doubling constructions the model is claimed to account for are exceedingly rare. Goldrick et al. acknowledge this, but imply that such constructions are nonetheless a consistent feature of CM corpora (emphasis ours). Empirical research suggests otherwise: in over a dozen bilingual data sets (involving thousands of tokens of CM) systematically studied by us, only a very few (e.g., Poplack, Wheeler & Westwood, Reference Poplack, Wheeler, Westwood, Lilius and Saari1987; Sankoff, Poplack & Vanniarajan, Reference Sankoff, Poplack and Vanniarajan1990) featured any instances at all of such constructions, and rarely did these exceed a handful in any one. This makes the phenomenon not only rare but sporadic, explaining why Sankoff et al. (Reference Sankoff, Poplack and Vanniarajan1990) referred to it (as would we) as an “ad-hoc processing mechanism”. While doubling may nonetheless serve the purpose of model development, its very exceptionality raises the question of whether and how its analysis can be generalized to the bulk of the data, surely a desideratum of any predictive account of CM.
Even if doubling were a robust phenomenon, other key elements of the proposed model are also at odds with the facts of CM on the ground. Goldrick et al. maintain that “speakers are not only uttering lexical items from both languages but are also integrating grammatical principles from each linguistic system” and, further, that such integration is reflected in “the transfer of grammatical patterns from one language to another”. But “blend representations” (p. 6) (already proposed by Weinreich, Reference Weinreich1963), should not be conflated with grammatical convergence in production. Actual bilingual behavior, whether viewed in diachronic or synchronic perspective, fails in the aggregate to support blending; instead, it reveals bilinguals’ knowledge and application of independent, language-particular, grammatical principles.
Diachronically, grammatical convergence between languages in contact, although widely assumed, is seldom satisfactorily demonstrated once accountable methodology and appropriate benchmarks are employed (e.g., Poplack, Zentz & Dion, Reference Poplack, Zentz and Dion2012; Silva-Corvalán, Reference Silva-Corvalán2008; Torres Cacoullos & Travis, Reference Torres Cacoullos and Travis2015). Synchronically, empirical study of many bilingual speech communities has confirmed that the vast majority of CM data is constituted by lone other-language items, which in turn, tend to be morphologically and syntactically integrated into recipient-language grammar (e.g., Poplack & Meechan, Reference Poplack and Meechan1998). This is consonant with – and identical to – lexical borrowing. Here, although the lexical item derives from one (donor) language, only the grammar of the other language is at play, not a blend of both. Multiword CM, on the other hand, is indeed compatible with co-activation of both of the speakers’ grammars: intra-sententially, code-switching is strongly preferred where the word orders of both languages are homologous, a fact modeled by the Equivalence Constraint (Poplack, Reference Poplack1980). But switching at points of congruence between two languages does not entail amalgamation of their grammatical principles. On the contrary, pace coincidental inter-linguistic similarities, the multiword strings are internally consistent only with the grammatical principles of the language from which they are drawn. Thus, neither of the major manifestations of CM involves blending: lexical borrowing because only one (recipient-language) grammar is involved, code-switching because two grammars are independently at play.
Goldrick et al. adduce further evidence for blending from cross-language priming and cognate effects. But the only study of bilingual syntactic priming outside the lab (Travis, Torres Cacoullos & Kidd, Reference Travis, Torres Cacoullos and Kidd2015) reveals that it is weaker across than within languages, and that its relative strength varies according to contextual features and particular constructions of the language of the target, not the prime. As to cognate effects on phonetic variation, these are shown to emerge from the cumulative effect of usage patterns: when bilinguals’ use of both languages is considered, cognate words occur less often than non-cognates in the relevant contexts (Brown, Reference Brown2015). These findings from spontaneous speech, showing that the conditioning factors are language-specific, are consistent with “gradient co-activation” but not with blending of grammars.
Even co-activation, though clearly pertinent in at least some kinds of CM, is difficult to quantify. We agree that linguistic experience is crucial. But in bilinguals’ experience, language-internal inherent variability, for example in word order (unacknowledged here, but see e.g., Poplack, Sayahi, Mourad & Dion, Reference Poplack, Sayahi, Mourad and Dion2015; Sankoff et al., Reference Sankoff, Poplack and Vanniarajan1990), is inescapable. Given variability, what would constitute appropriate input training data for learning algorithms? And how can activation values be meaningfully estimated in the absence of information about speakers’ actual exposure to, relative proficiency in and frequency of use of the languages, not to mention the prevailing norms of their bilingual community? These crucial predictors of the selection, form and placement of CM cannot simply be surmised nor are they typically available from lab-based studies of university-student L2 learners.
In sum, a minor phenomenon (if it qualifies as that) has served as the foundation for a model at odds with the patterns suggested by the behavior of major phenomena. Goldrick et al. assure us that their probabilistic approach can accommodate “sociolinguistic factors”; we look forward to this eventuality. In the interim, we commend the authors for bringing to the fore several of the many challenges confronting the modeling of CM.