Christiansen & Chater (C&C) claim universality and primacy for their Chunk-and-Pass processing approach in language acquisition and suggest that music provides an example of another complex acoustic signal with multilayered structuring, to which one could apply the Chunk-and-Pass strategy as well. However, fundamental issues that C&C leave unaddressed suggest that this strategy may not be generalizable to typologically diverse languages and to domains beyond language. We discuss two such issues: (1) cross-linguistic differences (e.g., morphology and word-order) and (2) domain-specific differences (e.g., language versus music).
It is unclear how the Chunk-and-Pass strategy would work in the acquisition of synthetic languages, with complex inflectional morphology (e.g., Tamil, Turkish, Navajo, Quechua, Cree, Swahili). Because there is extensive suffixation (through agglutination or fusion), the morpheme-to-word ratio in such languages is high, resulting in lengthy words. A single multimorphemic word expresses meanings that in a language with limited or no inflection would require a multiword clause or sentence to express. Although C&C suggest that chunking of complex multimorphemic words, by means of local mechanisms (e.g., formulaicity), also applies to agglutinative languages, they mainly consider evidence based on English, whose impoverished inflection and low morpheme-to-word ratio (particularly in its verb forms), facilitates chunking using the word (as opposed to its subparts) as a basic unit of analysis.
In C&C's framework, frequency and perceptual salience (rather than innate grammatical mechanisms) drive the chunking process. Existing studies on the acquisition of morphologically complex languages indicate that mechanisms proposed for English do not readily extend to synthetic language types (Kelly et al. Reference Kelly, Wigglesworth, Nordlinger and Blythe2014). Crucially, lexicon-building does not take place through storage of frequently encountered (and frequently used) exemplars in memory; instead, the chunking strategy may be only a first step in the process, in preparation for the next stage, namely, grammatical decomposition of stored units and the acquisition of the combinatorial principles determining their subparts (see Rose & Brittain Reference Rose, Brittain, Pirvulescu, Cuervo, Pérez-Leroux, Steele and Strik2011 for evidence from Northern East Cree). However, even a minor role for the chunking strategy in relation to morphologically complex languages may be problematic. A single verb root/stem, for example, is manifested through numerous surface realizations rendering the frequency factor unreliable. Additionally, evidence from children acquiring Quechua and Navajo, two morphologically rich languages, indicates that regardless of the perceptual salience of the verb root/stem (i.e., word initial in Quechua and word final in Navajo), the children's earliest verb forms were root/bare stems, not permitted in the adult grammar; however, they never produced isolated affixes, contrary to what would be predicted if they were using a simple chunking procedure (Courtney & Saville-Troike Reference Courtney and Saville-Troike2002). Interestingly, Tamil children use bare stems in imperative contexts, similar to adults. In contrast, their earliest indicative (nonimperative) verb forms are non-adult-like and consist predominantly of verbal participles (derived or inflected nonfinite stems) with the auxiliary, tense, and agreement suffixes stripped away (Lakshmanan Reference Lakshmanan2006). The mismatch between the children's early verbs and the adult input (consisting of complex multimorphemic words) emphasizes the role of innate knowledge of fundamental grammatical concepts (e.g., verb root/stem, inflected stem, and affixes).
A Chunk-and-Pass strategy alone (without independent grammatical mechanisms), cannot explain children's success with “free word order” found in many morphologically complex languages. In Tamil, an SOV (Subject-Object-Verb) language, sentential constituents (NPs, PPs, and CPs) may appear in noncanonical sentential positions through rightward and leftward scrambling. Tamil is a null argument language, and sentences with overt realization of all arguments are rare. Tamil children between the ages of 17 months and 42 months, exhibit sensitivity to Case restrictions and movement constraints on scrambling and successfully use adult-like word order permutations to signal interpretive differences (Focus versus Topic) (Sarma Reference Sarma and Karimi2003).
A Chunk-and-Pass strategy would predict that shorter sentences are easier for children to process and produce than longer sentences. However, this cannot explain scenarios where the reverse situation holds. For example, Tamil children (below age 5) produce significantly fewer participial relatives than older children. They also strongly prefer tag relatives to the participial relative, although the former are longer and less frequent than the latter. Crucially, the participial relative, though shorter (and more frequent), is structurally more complex because it involves movement (Lakshmanan Reference Lakshmanan2000).
Let us now examine the generalizability of the Chunk-and-Pass approach to other complex acoustic input, as in the case of music. Some argue that music contains some semantic information, such as meaning that emerges from sound patterns resembling qualities of objects and suggesting emotional content, or sometimes as a result of symbolic connections with related but external material (Koelsch Reference Koelsch2005). For example, Wagner was known to have short musical melodies (leitmotif) that represented characters in his operas, such that interactions between characters could be inferred or interpreted from musical composition. However, these more concrete occurrences are outliers among musical works, and other interpretations of musical semantics remain much weaker than in the context of language. Thus, although it is possible there is structural chunking, music lacks the semantic information to inform something like an “interactionist” approach (McClelland Reference McClelland and Coltheart1987) to parsing.
Another way in which music differs from language is in the context of anticipation. C&C discuss anticipation as a predictive perceptual strategy that helps streamline the process of organizing incoming speech signals. Although music perception involves anticipation, music provides clues of a different nature regarding what will follow in a phrase. Anticipation based on hierarchical phrase structure might be similar across language and music, but listeners also use rhythm, meter, and phrase symmetry to predict how a musical phrase will end. C&C also discuss anticipation in discourse; however, anticipation works differently in music. In ensemble performances, musicians often simultaneously produce and perceive (their own and others') music, which is different from linguistic turn-taking.
In sum, it is unclear why the “mini-linguist” theory of language acquisition and processing theory need to be mutually exclusive – why can't the child be acquiring grammar as a framework for processing and chunking? What also needs explanation is the question: Why would having only domain-general mechanisms for processing different types of complex acoustic input be advantageous?
Christiansen & Chater (C&C) claim universality and primacy for their Chunk-and-Pass processing approach in language acquisition and suggest that music provides an example of another complex acoustic signal with multilayered structuring, to which one could apply the Chunk-and-Pass strategy as well. However, fundamental issues that C&C leave unaddressed suggest that this strategy may not be generalizable to typologically diverse languages and to domains beyond language. We discuss two such issues: (1) cross-linguistic differences (e.g., morphology and word-order) and (2) domain-specific differences (e.g., language versus music).
It is unclear how the Chunk-and-Pass strategy would work in the acquisition of synthetic languages, with complex inflectional morphology (e.g., Tamil, Turkish, Navajo, Quechua, Cree, Swahili). Because there is extensive suffixation (through agglutination or fusion), the morpheme-to-word ratio in such languages is high, resulting in lengthy words. A single multimorphemic word expresses meanings that in a language with limited or no inflection would require a multiword clause or sentence to express. Although C&C suggest that chunking of complex multimorphemic words, by means of local mechanisms (e.g., formulaicity), also applies to agglutinative languages, they mainly consider evidence based on English, whose impoverished inflection and low morpheme-to-word ratio (particularly in its verb forms), facilitates chunking using the word (as opposed to its subparts) as a basic unit of analysis.
In C&C's framework, frequency and perceptual salience (rather than innate grammatical mechanisms) drive the chunking process. Existing studies on the acquisition of morphologically complex languages indicate that mechanisms proposed for English do not readily extend to synthetic language types (Kelly et al. Reference Kelly, Wigglesworth, Nordlinger and Blythe2014). Crucially, lexicon-building does not take place through storage of frequently encountered (and frequently used) exemplars in memory; instead, the chunking strategy may be only a first step in the process, in preparation for the next stage, namely, grammatical decomposition of stored units and the acquisition of the combinatorial principles determining their subparts (see Rose & Brittain Reference Rose, Brittain, Pirvulescu, Cuervo, Pérez-Leroux, Steele and Strik2011 for evidence from Northern East Cree). However, even a minor role for the chunking strategy in relation to morphologically complex languages may be problematic. A single verb root/stem, for example, is manifested through numerous surface realizations rendering the frequency factor unreliable. Additionally, evidence from children acquiring Quechua and Navajo, two morphologically rich languages, indicates that regardless of the perceptual salience of the verb root/stem (i.e., word initial in Quechua and word final in Navajo), the children's earliest verb forms were root/bare stems, not permitted in the adult grammar; however, they never produced isolated affixes, contrary to what would be predicted if they were using a simple chunking procedure (Courtney & Saville-Troike Reference Courtney and Saville-Troike2002). Interestingly, Tamil children use bare stems in imperative contexts, similar to adults. In contrast, their earliest indicative (nonimperative) verb forms are non-adult-like and consist predominantly of verbal participles (derived or inflected nonfinite stems) with the auxiliary, tense, and agreement suffixes stripped away (Lakshmanan Reference Lakshmanan2006). The mismatch between the children's early verbs and the adult input (consisting of complex multimorphemic words) emphasizes the role of innate knowledge of fundamental grammatical concepts (e.g., verb root/stem, inflected stem, and affixes).
A Chunk-and-Pass strategy alone (without independent grammatical mechanisms), cannot explain children's success with “free word order” found in many morphologically complex languages. In Tamil, an SOV (Subject-Object-Verb) language, sentential constituents (NPs, PPs, and CPs) may appear in noncanonical sentential positions through rightward and leftward scrambling. Tamil is a null argument language, and sentences with overt realization of all arguments are rare. Tamil children between the ages of 17 months and 42 months, exhibit sensitivity to Case restrictions and movement constraints on scrambling and successfully use adult-like word order permutations to signal interpretive differences (Focus versus Topic) (Sarma Reference Sarma and Karimi2003).
A Chunk-and-Pass strategy would predict that shorter sentences are easier for children to process and produce than longer sentences. However, this cannot explain scenarios where the reverse situation holds. For example, Tamil children (below age 5) produce significantly fewer participial relatives than older children. They also strongly prefer tag relatives to the participial relative, although the former are longer and less frequent than the latter. Crucially, the participial relative, though shorter (and more frequent), is structurally more complex because it involves movement (Lakshmanan Reference Lakshmanan2000).
Let us now examine the generalizability of the Chunk-and-Pass approach to other complex acoustic input, as in the case of music. Some argue that music contains some semantic information, such as meaning that emerges from sound patterns resembling qualities of objects and suggesting emotional content, or sometimes as a result of symbolic connections with related but external material (Koelsch Reference Koelsch2005). For example, Wagner was known to have short musical melodies (leitmotif) that represented characters in his operas, such that interactions between characters could be inferred or interpreted from musical composition. However, these more concrete occurrences are outliers among musical works, and other interpretations of musical semantics remain much weaker than in the context of language. Thus, although it is possible there is structural chunking, music lacks the semantic information to inform something like an “interactionist” approach (McClelland Reference McClelland and Coltheart1987) to parsing.
Another way in which music differs from language is in the context of anticipation. C&C discuss anticipation as a predictive perceptual strategy that helps streamline the process of organizing incoming speech signals. Although music perception involves anticipation, music provides clues of a different nature regarding what will follow in a phrase. Anticipation based on hierarchical phrase structure might be similar across language and music, but listeners also use rhythm, meter, and phrase symmetry to predict how a musical phrase will end. C&C also discuss anticipation in discourse; however, anticipation works differently in music. In ensemble performances, musicians often simultaneously produce and perceive (their own and others') music, which is different from linguistic turn-taking.
In sum, it is unclear why the “mini-linguist” theory of language acquisition and processing theory need to be mutually exclusive – why can't the child be acquiring grammar as a framework for processing and chunking? What also needs explanation is the question: Why would having only domain-general mechanisms for processing different types of complex acoustic input be advantageous?