The Now-or-Never bottleneck: A fundamental constraint on language

Morten H. Christiansen; Nick Chater

doi:10.1017/S0140525X1500031X

The Now-or-Never bottleneck: A fundamental constraint on language

Published online by Cambridge University Press: 14 April 2015

Morten H. Christiansen and

Nick Chater

Show author details

Morten H. Christiansen: Affiliation:
Department of Psychology, Cornell University, Ithaca, NY 14853 The Interacting Minds Centre, Aarhus University, 8000 Aarhus C, DenmarkHaskins Laboratories, New Haven, CT 06511christiansen@cornell.edu
Nick Chater: Affiliation:
Behavioural Science Group, Warwick Business School, University of Warwick, Coventry, CV4 7AL, United Kingdomnick.chater@wbs.ac.uk

Article contents

Abstract
Introduction
The Now-or-Never bottleneck
Chunk-and-Pass language processing
Acquisition is learning to process
Language change is item-based
Structure as processing
Conclusion
Footnotes
References

Rights & Permissions

Abstract

Memory is fleeting. New material rapidly obliterates previous material. How, then, can the brain deal successfully with the continual deluge of linguistic input? We argue that, to deal with this “Now-or-Never” bottleneck, the brain must compress and recode linguistic input as rapidly as possible. This observation has strong implications for the nature of language processing: (1) the language system must “eagerly” recode and compress linguistic input; (2) as the bottleneck recurs at each new representational level, the language system must build a multilevel linguistic representation; and (3) the language system must deploy all available information predictively to ensure that local linguistic ambiguities are dealt with “Right-First-Time”; once the original input is lost, there is no way for the language system to recover. This is “Chunk-and-Pass” processing. Similarly, language learning must also occur in the here and now, which implies that language acquisition is learning to process, rather than inducing, a grammar. Moreover, this perspective provides a cognitive foundation for grammaticalization and other aspects of language change. Chunk-and-Pass processing also helps explain a variety of core properties of language, including its multilevel representational structure and duality of patterning. This approach promises to create a direct relationship between psycholinguistics and linguistic theory. More generally, we outline a framework within which to integrate often disconnected inquiries into language processing, language acquisition, and language change and evolution.

Keywords

chunking grammaticalization incremental interpretation language acquisition language evolution language processing online learning prediction processing bottleneck psycholinguistics

Type: Target Article
Information: Behavioral and Brain Sciences , Volume 39 , 2016 , e62

DOI: https://doi.org/10.1017/S0140525X1500031X [Opens in a new window]
Copyright: Copyright © Cambridge University Press 2016

1. Introduction

Language is fleeting. As we hear a sentence unfold, we rapidly lose our memory for preceding material. Speakers, too, soon lose track of the details of what they have just said. Language processing is therefore “Now-or-Never”: If linguistic information is not processed rapidly, that information is lost for good. Importantly, though, while fundamentally shaping language, the Now-or-Never bottleneckFootnote ¹ is not specific to language but instead arises from general principles of perceptuo-motor processing and memory.

The existence of a Now-or-Never bottleneck is relatively uncontroversial, although its precise character may be debated. However, in this article we argue that the consequences of this constraint for language are remarkably far-reaching, touching on the following issues:

1. The multilevel organization of language into sound-based units, lexical and phrasal units, and beyond;
2. The prevalence of local linguistic relations (e.g., in phonology and syntax);
3. The incrementality of language processing;
4. The use of prediction in language interpretation and production;
5. The nature of what is learned during language acquisition;
6. The degree to which language acquisition involves item-based generalization;
7. The degree to which language change proceeds item-by-item;
8. The connection between grammar and lexical knowledge;
9. The relationships between syntax, semantics, and pragmatics.

Thus, we argue that the Now-or-Never bottleneck has fundamental implications for key questions in the language sciences. The consequences of this constraint are, moreover, incompatible with many theoretical positions in linguistic, psycholinguistic, and language acquisition research.

Note, however, that arguing that a phenomenon arises from the Now-or-Never bottleneck does not necessarily undermine alternative explanations of that phenomenon (although it may). Many phenomena in language may simply be overdetermined. For example, we argue that incrementality (point 3, above) follows from the Now-or-Never bottleneck. But it is also possible that, irrespective of memory constraints, language understanding would still be incremental on functional grounds, to extract the linguistic message as rapidly as possible. Such counterfactuals are, of course, difficult to evaluate. By contrast, the properties of the Now-or-Never bottleneck arise from basic information processing limitations that are directly testable by experiment. Moreover, the Now-or-Never bottleneck should, we suggest, have methodological priority to the extent that it provides an integrated framework for explaining many aspects of language structure, acquisition, processing, and evolution that have previously been treated separately.

In Figure 1, we illustrate the overall structure of the argument in this article. We begin, in the next section, by briefly making the case for the Now-or-Never bottleneck as a general constraint on perception and action. We then discuss the implications of this constraint for language processing, arguing that both comprehension and production involve what we call “Chunk-and-Pass” processing: incrementally building chunks at all levels of linguistic structure as rapidly as possible, using all available information predictively to process current input before new information arrives (sect. 3). From this perspective, language acquisition involves learning to process: that is, learning rapidly to create and use chunks appropriately for the language being learned (sect. 4). Consequently, short-term language change and longer-term processes of language evolution arise through variation in the system of chunks and their composition, suggesting an item-based theory of language change (sect. 5). This approach points to a processing-based interpretation of construction grammar, in which constructions correspond to chunks, and where grammatical structure is fundamentally the history of language processing operations within the individual speaker/hearer (sect. 6). We conclude by briefly summarizing the main points of our argument.

Figure 1. The structure of our argument, in which implicational relations between claims are denoted by arrows. The Now-or-Never bottleneck provides a fundamental constraint on perception and action that is independent of its application to the language system (and hence outside the diamond in the figure). Specific implications for language (indicated inside the diamond) stem from the Now-or-Never bottleneck's necessitating of Chunk-and-Pass language processing, with key consequences for language acquisition. The impact of the Now-or-Never bottleneck on both processing and acquisition together further shapes language change. All three of these interlinked claims concerning Chunk-and-Pass processing, acquisition as processing, and item-based language change (grouped together in the shaded upper triangle) combine to shape the structure of language itself.

2. The Now-or-Never bottleneck

Language input is highly transient. Speech sounds, like other auditory signals, are short-lived. Classic speech perception studies have shown that very little of the auditory trace remains after 100 ms (Elliott Reference Elliott1962), with more recent studies indicating that much acoustic information already is lost after just 50 ms (Remez et al. Reference Remez, Ferro, Dubowski, Meer, Broder and Davids2010). Similarly, and of relevance for the perception of sign language, studies of visual change detection suggest that the ability to maintain visual information beyond 60–70 ms is very limited (Pashler Reference Pashler1988). Thus, sensory memory for language input is quickly overwritten, or interfered with, by new incoming information, unless the perceiver in some way processes what is heard or seen.

The problem of the rapid loss of the speech or sign signal is further exacerbated by the sheer speed of the incoming linguistic input. At a normal speech rate, speakers produce about 10–15 phonemes per second, corresponding to roughly 5–6 syllables every second or 150 words per minute (Studdert-Kennedy Reference Studdert-Kennedy, Smelser and Gerstein1986). However, the resolution of the human auditory system for discrete auditory events is only about 10 sounds per second, beyond which the sounds fuse into a continuous buzz (Miller & Taylor Reference Miller and Taylor1948). Consequently, even at normal rates of speech, the language system needs to work beyond the limits of auditory temporal resolution for nonspeech stimuli. Remarkably, listeners can learn to process speech in their native language at up to twice the normal rate without much decrement in comprehension (Orr et al. Reference Orr, Friedman and Williams1965). Although the production of signs appears to be slower than the production of speech (at least when comparing the production of ASL signs and spoken English; Bellugi & Fischer Reference Bellugi and Fisher1972), signed words are still very brief visual events, with the duration of an ASL syllable being about a quarter of a second (Wilbur & Nolkn Reference Wilbur and Nolkn1986).Footnote ²

Making matters even worse, our memory for sequences of auditory input is also very limited. For example, it has been known for more than four decades that naïve listeners are unable to correctly recall the temporal order of just four distinct sounds – for example, hisses, buzzes, and tones – even when they are perfectly able to recognize and label each individual sound in isolation (Warren et al. Reference Warren, Obusek, Farmer and Warren1969). Our ability to recall well-known auditory stimuli is not substantially better, ranging from 7 ± 2 (Miller Reference Miller1956) to 4 ± 1 (Cowan Reference Cowan2000). A similar limitation applies to visual memory for sign language (Wilson & Emmorey Reference Wilson and Emmorey2006). The poor memory for auditory and visual information, combined with the fast and fleeting nature of linguistic input, imposes a fundamental constraint on the language system: the Now-or-Never bottleneck. If the input is not processed immediately, new information will quickly overwrite it.

Importantly, the Now-or-Never bottleneck is not unique to language but applies to other aspects of perception and action as well. Sensory memory is rich in detail but decays rapidly unless it is further processed (e.g., Cherry Reference Cherry1953; Coltheart Reference Coltheart1980; Sperling Reference Sperling1960). Likewise, short-term memory for auditory, visual, and haptic information is also limited and subject to interference from new input (e.g., Gallace et al. Reference Gallace, Tan and Spence2006; Haber Reference Haber1983; Pavani & Turatto Reference Pavani and Turatto2008). Moreover, our cognitive ability to respond to sensory input is further constrained in a serial (Sigman & Dehaene Reference Sigman and Dehaene2005) or near-serial (Navon & Miller Reference Navon and Miller2002) manner, severely restricting our capacity for processing multiple inputs arriving in quick succession. Similar limitations apply to the production of behavior: The cognitive system cannot plan detailed sequences of movements – a long sequence of commands planned far in advance would lead to severe interference and be forgotten before it could be carried out (Cooper & Shallice Reference Cooper and Shallice2006; Miller et al. Reference Miller, Galanter and Pribram1960). However, the cognitive system adopts several processing strategies to ameliorate the effects of the Now-or-Never bottleneck on perception and action.

First, the cognitive system engages in eager processing: It must recode the rich perceptual input as it arrives to capture the key elements of the sensory information as economically, and as distinctively, as possible (e.g., Brown et al. Reference Brown, Neath and Chater2007; Crowder & Neath Reference Crowder, Neath, Hockley and Lewandowsky1991); and it must do so rapidly, before new input overwrites or interferes with the sensory information. This notion is a traditional one, dating back to early work on attention and sensory memory (e.g., Broadbent Reference Broadbent1958; Coltheart Reference Coltheart1980; Haber Reference Haber1983; Sperling Reference Sperling1960; Treisman Reference Treisman1964). The resulting compressed representations are lossy: They provide only an abstract summary of the input, from which the rich sensory input cannot be recovered (e.g., Pani Reference Pani2000). Evidence from the phenomena of change and inattentional blindness suggests that these compressed representations can be very selective (see Jensen et al. Reference Jensen, Yao, Street and Simons2011 for a review), as exemplified by a study in which half of the participants failed to notice that someone to whom they were giving directions, face-to-face, was surreptitiously exchanged for a completely different person (Simons & Levin Reference Simons and Levin1998). Information not encoded in the short amount of time during which the sensory information is available will be lost.

Second, because memory limitations also apply to recoded representations, the cognitive system further chunks the compressed encodings into multiple levels of representation of increasing abstraction in perception, and decreasing levels of abstraction in action. Consider, for example, memory for serially ordered symbolic information, such as sequences of digits. Typically, people are quickly overloaded and can recall accurately only the last three or four items in a sequence (e.g., Murdock Reference Murdock1968). But it is possible to learn to rapidly encode, and recall, long random sequences of digits, by successively chunking such sequences into larger units, chunking those chunks into still larger units, and so on. Indeed, an extended study of a single individual, SF (Ericsson et al. Reference Ericsson, Chase and Faloon1980), showed that repeated chunking in this manner makes it possible to recall with high accuracy sequences containing as many as 79 digits. But, crucially, this strategy requires learning to encode the input into multiple, successive, and distinct levels of representations – each sequence of chunks at one level must be shifted as a single chunk to a higher level before more chunks interfere with or overwrite the initial chunks. Indeed, SF chunked sequences of three or four digits, the natural chunk size in human memory (Cowan Reference Cowan2000), into a single unit (corresponding to running times, dates, or human ages), and then grouped sequences of three to four of those chunks into larger chunks. Interestingly, SF also verbally produced items in overtly discernible chunks, interleaved with pauses, indicating how action also follows the reverse process (e.g., Lashley Reference Lashley and Jeffress1951; Miller Reference Miller1956). The case of SF further demonstrates that low-level information is far better recalled when organized into higher-level structures than merely coded as an unorganized stream. Note, though, that lower-level information is typically forgotten; it seems unlikely that even SF could recall the specific visual details of the digits with which he was presented. More generally, the notion that perception and action involve representational recoding at a succession of distinct representational levels also fits with a long tradition of theoretical and computational models in cognitive science and computer vision (e.g., Bregman Reference Bregman1990; Marr Reference Marr1982; Miller et al. Reference Miller, Galanter and Pribram1960; Zhu et al. Reference Zhu, Chen, Torrable, Freeman and Yuille2010; see Gobet et al. Reference Gobet, Lane, Croker, Cheng, Jones, Oliver and Pine2001 for a review). Our perspective on repeated multilevel compression is also consistent with data from functional magnetic resonance imaging (fMRI) and intracranial recordings, suggesting cortical hierarchies across vision and audition – from low-level sensory to high-level perceptual and cognitive areas – integrating information at progressively longer temporal windows (Hasson et al. Reference Hasson, Yang, Vallines, Heeger and Rubin2008; Honey et al. Reference Honey, Thesen, Donner, Silbert, Carlson, Devinsky, Doyle, Rubin, Heeger and Hasson2012; Lerner et al. Reference Lerner, Honey, Silbert and Hasson2011).

Third, to facilitate speedy chunking and hierarchical compression, the cognitive system employs anticipation, using prior information to constrain the recoding of current perceptual input (for reviews see Bar Reference Bar2007; Clark Reference Clark2013). For example, people see the exact same collection of pixels either as a hair dryer (when viewed as part of a bathroom scene) or as a drill (when embedded in a picture of a workbench) (Bar Reference Bar2004). Therefore, using prior information to predict future input is likely to be essential to successfully encoding that future input (as well as helping us to react faster to such input). Anticipation allows faster, and hence more effective, recoding when oncoming information creates considerable time urgency. Such predictive processing will be most effective to the extent that the greatest possible amount of available information (across different types and levels of abstraction) is integrated as fast as possible. Similarly, anticipation is important for action as well. For example, manipulating an object requires anticipating the grip force required to deal with the loads generated by the accelerations of the object. Grip force is adjusted too rapidly during the manipulation of an object to rely on sensory feedback (Flanagan & Wing Reference Flanagan and Wing1997). Indeed, the rapid prediction of the sensory consequences of actions (e.g., Poulet & Hedwig Reference Poulet and Hedwig2006) suggests the existence of so-called forward models, which allow the brain to predict the consequence of its actions in real time. Many have argued (e.g., Wolpert et al. Reference Wolpert, Diedrichsen and Flanagan2011; see also Clark Reference Clark2013; Pickering & Garrod Reference Pickering and Garrod2013a) that forward models are a ubiquitous feature of the computational machinery of motor control and more broadly of cognition.

The three processing strategies we mention here – eager processing, computing multiple representational levels, and anticipation – provide the cognitive system with important means to cope with the Now-or-Never bottleneck. Next, we argue that the language system implements similar strategies for dealing with the here-and-now nature of linguistic input and output, with wide-reaching and fundamental implications for language processing, acquisition and change as well as for the structure of language itself. Specifically, we propose that our ability to deal with sequences of linguistic information is the result of what we call “Chunk-and-Pass” processing, by which the language system can ameliorate the effects of the Now-or-Never bottleneck. More generally, our perspective offers a framework within which to approach language comprehension and production. Table 1 summarizes the impact of the Now-or-Never bottleneck on perception/action and language.

Table 1. Summary of the Now-or-Never bottleneck's implications for perception/action and language

The style of explanation outlined here, focusing on processing limitations, contrasts with a widespread interest in rational, rather processing-based, explanations in cognitive science (e.g., Anderson Reference Anderson1990; Chater et al. Reference Chater, Tenenbaum and Yuille2006 Griffiths & Tenenbaum Reference Griffiths and Tenenbaum2009; Oaksford & Chater Reference Oaksford and Chater1998; Reference Oaksford and Chater2007; Tenenbaum et al. Reference Tenenbaum, Kemp, Griffiths and Goodman2011), including language processing (Gibson et al. Reference Gibson, Bergen and Piantadosi2013; Hale Reference Hale2001; Reference Hale2006; Piantadosi et al. Reference Piantadosi, Tily and Gibson2011). Given the fundamental nature of the Now-or-Never bottleneck, we suggest that such explanations will be relevant only for explaining language use insofar as they incorporate processing constraints. For example, in the spirit of rational analysis (Anderson Reference Anderson1990) and bounded rationality (Simon Reference Simon1982), it is natural to view aspects of language processing and structure, as described below, as “optimal” responses to specific processing limitations, such as the Now-or-Never bottleneck (for this style of approach, see, e.g., Chater et al. Reference Chater, Crocker, Pickering, Oaksford and Chater1998; Levy Reference Levy2008). Here, though, our focus is primarily on mechanism rather than rationality.

3. Chunk-and-Pass language processing

The fleeting nature of linguistic input, in combination with the impressive speed with which words and signs are produced, imposes a severe constraint on the language system: the Now-or-Never bottleneck. Each new incoming word or sign will quickly interfere with previous heard and seen input, providing a naturalistic version of the masking used in psychophysical experiments. How, then, is language comprehension possible? Why doesn't interference between successive sounds (or signs) obliterate linguistic input before it can be understood? The answer, we suggest, is that our language system rapidly recodes this input into chunks, which are immediately passed to a higher level of linguistic representation. The chunks at this higher level are then themselves subject to the same Chunk-and-Pass procedure, resulting in progressively larger chunks of increasing linguistic abstraction. Crucially, given that the chunks recode increasingly larger stretches of input from lower levels of representation, the chunking process enables input to be maintained over ever-larger temporal windows. It is this repeated chunking of lower-level information that makes it possible for the language system to deal with the continuous deluge of input that, if not recoded, is rapidly lost. This chunking process is also what allows us to perceive speech at a much faster rate than nonspeech sounds (Warren et al. Reference Warren, Obusek, Farmer and Warren1969): We have learned to chunk the speech stream. Indeed, we can easily understand (and sometimes even repeat back) sentences consisting of many tens of phonemes, despite our severe memory limitations for sequences of nonspeech sounds.

What we are proposing is that during comprehension, the language system – similar to SF – must keep on chunking the incoming information into increasingly abstract levels of representation to avoid being overwhelmed by the input. That is, the language system engages in eager processing when creating chunks. Chunks must be built right away, or memory for the input will be obliterated by interference from subsequent material. If a phoneme or syllable is recognized, then it is recoded as a chunk and passed to a higher level of linguistic abstraction. And once recoded, the information is no longer subject to interference from further auditory input. A general principle of perception and memory is that interference arises primarily between overlapping representations (Crowder & Neath Reference Crowder, Neath, Hockley and Lewandowsky1991; Treisman & Schmidt Reference Treisman and Schmidt1982); crucially, recoding avoids such overlap. For example, phonemes interfere with each other, but phonemes interfere very little with words. At each level of chunking, information from the previous level(s) is compressed and passed up as chunks to the next level of linguistic representation, from sound-based chunks up to complex discourse elements.Footnote ³ As a consequence, the rich detail of the original input can no longer be recovered from the chunks, although some key information remains (e.g., certain speaker characteristics; Nygaard et al. Reference Nygaard, Sommers and Pisoni1994; Remez et al. Reference Remez, Fellowes and Rubin1997).

In production, the process is reversed: Discourse-level chunks are recursively broken down into subchunks of decreasing linguistic abstraction until the system arrives at chunks with sufficient information to drive the articulators (either the vocal apparatus or the hands). As in comprehension, memory is limited within a given level of representation, resulting in potential interference between the items to be produced (e.g., Dell et al. Reference Dell, Burger and Svec1997). Thus, higher-level chunks tend to be passed down immediately to the level below as soon as they are “ready,” leading to a bias toward producing easy-to-retrieve utterance components before harder-to-retrieve ones (e.g., Bock Reference Bock1982; MacDonald Reference MacDonald2013). For example, if there is a competition between two possible words to describe an object, the word that is retrieved more fluently will immediately be passed on to lower-level articulatory processes. To further facilitate production, speakers often reuse chunks from the ongoing conversation, and those will be particularly rapidly available from memory. This phenomenon is reflected by the evidence for lexical (e.g., Meyer & Schvaneveldt Reference Meyer and Schvaneveldt1971) and structural priming (e.g., Bock Reference Bock1986; Bock & Loebell Reference Bock and Loebell1990; Pickering & Branigan Reference Pickering and Branigan1998; Potter & Lombardi Reference Potter and Lombardi1998) within individuals as well as alignment across conversational partners (Branigan et al. Reference Branigan, Pickering and Cleland2000; Pickering & Garrod Reference Pickering and Garrod2004); priming is also extensively observed in text corpora (Hoey Reference Hoey2005). As noted by MacDonald (Reference MacDonald2013), these memory-related factors provide key constraints on the production of language and contribute to cross-linguistic patterns of language use.Footnote ⁴

A useful analogy for language production is the notion of “just-in-time”Footnote ⁵ stock control, in which stock inventories are kept to a bare minimum during the manufacturing process (Ohno & Mito Reference Ohno and Mito1988). Similarly, the Now-or-Never bottleneck requires that, for example, low-level phonetic or articulatory decisions not be made and stored far in advance and then reeled off during speech production, because any buffer in which such decisions can safely be stored would quickly be subject to interference from subsequent material. So the Now-or-Never bottleneck requires that once detailed production information has been assembled, it be executed straightaway, before it can be obliterated by the oncoming stream of later low-level decisions, similar to what has been suggested for motor planning (Norman & Shallice Reference Norman, Shallice, Davidson, Schwartz and Shapiro1986; see also MacDonald Reference MacDonald2013). We call this proposal Just-in-Time language production.

3.1. Implications of Strategy 1: Incremental processing

Chunk-and-Pass processing has important implications for comprehension and production: It requires that both take place incrementally. In incremental processing, representations are built up as rapidly as possible as the input is encountered. By contrast, one might, for example, imagine a parser that waits until the end of a sentence before beginning syntactic analysis, or that meaning is computed only once syntax has been established. However, such processing would require storing a stream of information at a single level of representation, and processing it later; but given the Now-or-Never bottleneck, this is not possible because of severe interference between such representations. Therefore, incremental interpretation and production follow directly from the Now-or-Never constraint on language.

To get a sense of the implications of Chunk-and-Pass processing, it is interesting to relate this perspective to specific computational principles and models. How, for example, do classic models of parsing fit within this framework? A wide range of psychologically inspired models involves some degree of incrementality of syntactic analysis, which can potentially support incremental interpretation (e.g., Phillips Reference Phillips, Camacho, Choueiri and Watanabe1996; Reference Phillips2003; Winograd Reference Winograd1972). For example, the sausage machine parsing model (Frazier & Fodor Reference Frazier and Fodor1978) proposes that a preliminary syntactic analysis is carried out phrase-by-phrase, but in complete isolation from semantic or pragmatic factors. But for a right-branching language such as English, chunks cannot be built left-to-right, because the leftmost chunks are incomplete until later material has been encountered. Frameworks from Kimball (Reference Kimball1973) onward imply “stacking up” incomplete constituents that may then all be resolved at the end of the clause. This approach runs counter to the memory constraints imposed by the Now-or-Never bottleneck. Reconciling right-branching with incremental chunking and processing is one motivation for the flexible constituency of combinatory categorial grammar (e.g., Steedman Reference Steedman1987; Reference Steedman2000; see also Johnson-Laird Reference Johnson-Laird1983).

With respect to comprehension, considerable evidence supports incremental interpretation, going back more than four decades (e.g., Bever Reference Bever and Hayes1970; Marslen-Wilson Reference Marslen-Wilson1975). The language system uses all available information to rapidly integrate incoming information as quickly as possible to update the current interpretation of what has been said so far. This process includes not only sentence-internal information about lexical and structural biases (e.g., Farmer et al. Reference Farmer, Christiansen and Monaghan2006; MacDonald Reference MacDonald1994; Trueswell et al. Reference Trueswell, Tanenhaus and Kello1993), but also extra-sentential cues from the referential and pragmatic context (e.g., Altmann & Steedman Reference Altmann and Steedman1988; Thornton et al. Reference Thornton, MacDonald and Gil1999) as well as the visual environment and world knowledge (e.g., Altmann & Kamide Reference Altmann and Kamide1999; Tanenhaus et al. Reference Tanenhaus, Spivey-Knowlton, Eberhard and Sedivy1995). As the incoming acoustic information is chunked, it is rapidly integrated with contextual information to recognize words, consistent with a variety of data on spoken word recognition (e.g., Marslen-Wilson Reference Marslen-Wilson1975; van den Brink et al. Reference van den Brink, Brown and Hagoort2001). These words are then, in turn, chunked into larger multiword units, as evidenced by recent studies showing sensitivity to multiword sequences in online processing (e.g., Arnon & Snider Reference Arnon and Snider2010; Reali & Christiansen Reference Reali and Christiansen2007b; Siyanova-Chanturia et al. Reference Siyanova-Chanturia, Conklin and Van Heuven2011; Tremblay & Baayen Reference Tremblay, Baayen and Wood2010; Tremblay et al. Reference Tremblay, Derwing, Libben and Westbury2011), and subsequently further integrated with pragmatic context into discourse-level structures.

Turning to production, we start by noting the powerful intuition that we speak “into the void” – that is, that we plan only a short distance ahead. Indeed, experimental studies suggest that, for example, when producing an utterance involving several noun phrases, people plan just one (Smith & Wheeldon Reference Smith and Wheeldon1999), or perhaps two, noun phrases ahead (Konopka Reference Konopka2012), and they can modify a message during production in the light of new perceptual input (Brown-Schmidt & Konopka Reference Brown-Schmidt and Konopka2015). Moreover, speech-error data (e.g., Cutler Reference Cutler1982) reveal that, across representational levels, errors tend to be highly local: Phonological, morphemic, and syntactic errors apply to neighboring chunks within each level (where material may be moved, swapped, or deleted). Consequently, speech planning appears to involve just a small number of chunks – the number of which may be similar across linguistic levels – but which covers different amounts of time depending on the linguistic level in question. For example, planning involving chunks at the level of intonational bursts stretches over considerably longer periods of time than planning at the syllabic level. Similarly, processes of reduction to facilitate production (e.g., modifying the speech signal to make it easier to produce, such as reducing a vowel to a schwa, or shortening or eliminating phonemes) can be observed across different levels of linguistic representation, from individual words (e.g., Gahl & Garnsey Reference Gahl and Garnsey2004; Jurafsky et al. Reference Jurafsky, Bell, Gregory, Raymond, Bybee and Hopper2001) to frequent multiword sequences (e.g., Arnon & Cohen Priva Reference Arnon and Cohen Priva2013; Bybee & Schiebman Reference Bybee and Scheibman1999).

Some may object that the Chunk-and-Pass perspective's strict notion of incremental interpretation and production leaves the language system vulnerable to the rather substantial ambiguity that exists across many levels of linguistic representation (e.g., lexical, syntactic, pragmatic). So-called garden path sentences such as the famous “The horse raced past the barn fell” (Bever Reference Bever and Hayes1970) show that people are vulnerable to at least some local ambiguities: They invite comprehenders to take the wrong interpretive path by treating raced as the main verb, which leads them to a dead end. Only when the final word, fell, is encountered does it become clear that something is wrong: raced should be interpreted as a past participle that begins a reduced relative clause (i.e., the horse [that was] raced past the barn fell). The difficulty of recovery in such garden path sentences indicates how strongly the language system is geared toward incremental interpretation.

Viewed as a processing problem, garden paths occur when the language system resolves an ambiguity incorrectly. But in many cases, it is possible for an underspecified representation to be constructed online, and for the ambiguity to be resolved later when further linguistic input arrives. This type of case is consistent with Marr's (Reference Marr1976) proposal of the “principle of least commitment,” that the perceptual system resolves ambiguous perceptual input only when it has sufficient data to make it unlikely that such decisions will subsequently have to be reversed. Given the ubiquity of local ambiguity in language, such underspecification may be used very widely in language processing. Note, however, that because of the severe constraints the Now-or-Never bottleneck imposes, the language system cannot adopt broad parallelism to further minimize the effect of ambiguity (as in many current probabilistic theories of parsing, e.g., Hale Reference Hale2006; Jurafsky Reference Jurafsky1996; Levy Reference Levy2008). Rather, within the Chunk-and-Pass account, the sole role for parallelism in the processing system is in deciding how the input should be chunked; only when conflicts concerning chunking are resolved can the input be passed on to a higher-level representation. In particular, we suggest that competing higher-level codes cannot be activated in parallel. This picture is analogous to Marr's principle of least commitment of vision: Although there might be temporary parallelism to resolve conflicts about, say, correspondence between dots in a random-dot stereogram, it is not possible to create two conflicting three-dimensional surfaces in parallel, and whereas there may be parallelism over the interpretation of lines and dots in an image, it is not possible to see something as both a duck and a rabbit simultaneously. More broadly, higher-level representations are constructed only when sufficient evidence has accrued that they are unlikely later to need to be replaced (for stimuli outside the psychological laboratory, at least).

Maintaining, and later resolving, an underspecified representation will create local memory and processing demands that may slow down processing, as is observed, for example, by increased reading times (e.g., Trueswell et al. Reference Trueswell, Tanenhaus and Garnsey1994) and distinctive patterns of brain activity (as measured by ERPs; Swaab et al. Reference Swaab, Brown and Hagoort2003). Accordingly, when the input is ambiguous, the language system may require later input to recognize previous elements of the speech stream successfully. The Now-or-Never bottleneck requires that such online “right-context effects” be highly local because raw perceptual input will be lost if it is not rapidly identified (e.g., Dahan Reference Dahan2010). Right-context effects may arise where the language system can delay resolution of ambiguity or use underspecified representations that do not require resolving the ambiguity right away. Similarly, cataphora, in which, for example, a referential pronoun occurs before its referent (e.g., “He is a nice guy, that John”) require the creation of an underspecified entity (male, animate) when he is encountered, which is resolved to be coreferential with John only later in the sentence (e.g., van Gompel & Liversedge Reference van Gompel and Liversedge2003). Overall, the Now-or-Never bottleneck implies that the processing system will build the most abstract and complete representation that is justified, given the linguistic input.Footnote ⁶

Of course, outside of experimental studies, background knowledge, visual context, and prior discourse will provide powerful cues to help resolve ambiguities in the signal, allowing the system rapidly to resolve many apparent ambiguities without incurring a substantial danger of “garden-pathing.” Indeed, although syntactic and lexical ambiguities have been much studied in psycholinguistics, increasing evidence indicates that garden paths are not a major source of processing difficulty in practice (e.g., Ferreira Reference Ferreira2008; Jaeger Reference Jaeger2010; Wasow & Arnold Reference Wasow, Arnold, Rohdenburg and Mondorf2003).Footnote ⁷ For example, Roland et al. (Reference Roland, Elman and Ferreira2006) reported corpus analyses showing that, in naturally occurring language, there is generally sufficient information in the sentential context before the occurrence of an ambiguous verb to specify the correct interpretation of that verb. Moreover, eye-tracking studies have demonstrated that dialogue partners exploit both conversational context and task demands to constrain interpretations to the appropriate referents, thereby side-stepping effects of phonological and referential competitors (Brown-Schmidt & Konopka Reference Brown-Schmidt and Konopka2011) that have otherwise been shown to impede language processing (e.g., Allopenna et al. Reference Allopenna, Magnuson and Tanenhaus1998). These dialogue-based constraints also mitigate syntactic ambiguities that might otherwise disrupt processing (Brown-Schmidt & Tanenhaus Reference Brown-Schmidt and Tanenhaus2008). This information may be further combined with other probabilistic sources of information such as prosody (e.g., Kraljic & Brennan Reference Kraljic and Brennan2005; Snedeker & Trueswell Reference Snedeker and Trueswell2003) to resolve potential ambiguities within a minimal temporal window. Finally, it is not clear that undetected garden path errors are costly in normal language use, because if communication appears to break down, the listener can repair the communication by requesting clarification from the dialogue partner.

3.2. Implications of Strategy 2: Multiple levels of linguistic structure

The Now-or-Never bottleneck forces the language system to compress input into increasingly abstract chunks that cover progressively longer temporal intervals. As an example, consider the chunking of the input illustrated in Figure 2. The acoustic signal is first chunked into higher-level sound units at the phonological level. To avoid interference between local sound-based units, such as phonemes or syllables, these units are further recoded as rapidly as possible into higher-level units such as morphemes or words. The same phenomenon occurs at the next level up: Local groups of words must be chunked into larger units, possibly phrases or other forms of multiword sequences. Subsequent chunking then recodes these representations into higher-level discourse structures (that may themselves be chunked further into even more abstract representational structures beyond that). Similarly, production requires running the process in reverse, starting with the intended message and gradually decoding it into increasingly more specific chunks, eventually resulting in the motor programs necessary for producing the relevant speech or sign output. As we discuss in section 3.3, the production process may further serve as the basis for prediction during comprehension (allowing higher-level information to influence the processing of current input). More generally, our account is agnostic with respect to the specific characterization of the various levels of linguistic representationFootnote ⁸ (e.g., whether sound-based chunks take the form of phonemes, syllables, etc.). What is central for the Chunk-and-Pass account: some form of sound-based level of chunking (or visual-based in the case of sign language), and a sequence of increasingly abstract levels of chunked representations into which the input is continually recoded.

Figure 2. Chunk-and-Pass processing across a variety of linguistic levels in spoken language. As input is chunked and passed up to increasingly abstract levels of linguistic representations in comprehension, from acoustics to discourse, the temporal window over which information can be maintained increases, as indicated by the shaded portion of the bars associated with each linguistic level. This process is reversed in production planning, in which chunks are broken down into sequences of increasingly short and concrete units, from a discourse-level message to the motor commands for producing a specific articulatory output. More-abstract representations correspond to longer chunks of linguistic material, with greater look-ahead in production at higher levels of abstraction. Production processes may further serve as the basis for predictions to facilitate comprehension and thus provide topdown information in comprehension. (Note that the names and number of levels are for illustrative purposes only.)

A key theoretical implication of Chunk-and-Pass processing is that the multiple levels of linguistic representation, typically assumed in the language sciences, are a necessary by-product of the Now-or-Never bottleneck. Only by compressing the input into chunks and passing them to increasingly abstract levels of linguistic representation can the language system deal with the rapid onslaught of incoming information. Crucially, though, our perspective also suggests that the different levels of linguistic representations do not have a true part–whole relationship with one another. Unlike in the case of SF, who learned strategies to perfectly unpack chunks from within chunks to reproduce the original string of digits, language comprehension typically employs lossy compression to chunk the input. That is, higher-level chunks will not in general contain complete copies of lower-level chunks. Indeed, as speech input is encoded into ever more abstract chunks, increasing amounts of low-level information will typically be lost. Instead, as in perception (e.g., Haber Reference Haber1983), there is greater representational underspecification with higher levels of representation because of the repeated process of lossy compression.Footnote ⁹ Thus, we would expect a growing involvement of extralinguistic information, such as perceptual input and world knowledge, in processing higher levels of linguistic representation (see, e.g., Altmann & Kamide Reference Altmann and Kamide2009).

Whereas our account proposes a lossy hierarchy across levels of linguistic representation, only a very small number of chunks are represented within a level: otherwise, information is rapidly lost due to interference. This has the crucial implication that chunks within a given level can interact only locally. For example, acoustic information must rapidly be coded in a non-acoustic form, say, in terms of phonemes; but this is only possible if phonemes correspond to local chunks of acoustic input. The processing bottleneck therefore enforces a strong pressure toward local dependencies within a given linguistic level. Importantly, though, this does not imply that linguistic relations are restricted only to adjacent elements but, instead, that they may be formed between any of the small number of elements maintained at a given level of representation. Such representational locality is exemplified across different linguistic levels by the local nature of phonological processes from reduction, assimilation, and fronting, including more elaborate phenomena such as vowel harmony (e.g., Nevins Reference Nevins2010), speech errors (e.g., Cutler Reference Cutler1982), the immediate proximity of inflectional morphemes and the verbs to which they apply, and the vast literature on the processing difficulties associated with non-local dependencies in sentence comprehension (e.g., Gibson Reference Gibson1998; Hawkins Reference Hawkins2004). As noted earlier, the higher the level of linguistic representation, the longer the limited time window within which information can be chunked. Whereas dealing with just two center-embeddings at the sentential level is prohibitively difficult (e.g., de Vries et al. Reference de Vries, Christiansen and Petersson2011; Karlsson Reference Karlsson2007), we are able to deal with up to four to six embeddings at the multi-utterance discourse level (Levinson Reference Levinson2013). This is because chunking takes place at a much longer time course at the discourse level compared with the sentence level, providing more time to resolve the relevant dependency relations before they are subject to interference.

Finally, as indicated by Figure 2, processing within each level of linguistic representation takes place in parallel – but with a clear temporal component – as chunks are passed between levels. Note that, in the Chunk-and-Pass framework, it is entirely possible that linguistic input can simultaneously, and perhaps redundantly, be chunked in more than one way. For example, syntactic chunks and intonational contours may be somewhat independent (Jackendoff Reference Jackendoff2007). Moreover, we should expect further chunking across different “channels” of communication, including visual input such as gesture and facial expressions.

The Chunk-and-Pass perspective is compatible with a number of recent theoretical models of sentence comprehension, including constraint-based approaches (e.g., MacDonald et al. Reference MacDonald, Pearlmutter and Seidenberg1994; Trueswell & Tanenhaus Reference Trueswell, Tanenhaus, Clifton, Frazier and Rayner1994) and certain generative accounts (e.g., Jackendoff's [2007] parallel architecture). Intriguingly, fMRI data from adults (Dehaene-Lambertz et al. Reference Dehaene-Lambertz, Dehaene, Anton, Campagne, Ciuciu, Dehaene, Denghien, Jobert, Lebihan, Sigman, Pallier and Poline2006a) and infants (Dehaene-Lambertz et al. Reference Dehaene-Lambertz, Hertz-Pannier, Dubois, Meriaux, Roche, Sigman and Dehaene2006b) indicate that activation responses to a single sentence systematically slows down when moving away from the primary auditory cortex, either back toward Wernicke's area or forward toward Broca's area, consistent with increasing temporal windows for chunking when moving from phonemes to words to phrases. Indeed, the cortical circuits processing auditory input, from lower (sensory) to higher (cognitive) areas, follow different temporal windows, sensitive to more and more abstract levels of linguistic information, from phonemes and words to sentences and discourse (Lerner et al. Reference Lerner, Honey, Silbert and Hasson2011; Stephens et al. Reference Stephens, Honey and Hasson2013). Similarly, the reverse process, going from a discourse-level representation of the intended message to the production of speech (or sign) across parallel linguistic levels, is compatible with several current models of language production (e.g., Chang et al. Reference Chang, Dell and Bock2006; Dell et al. Reference Dell, Burger and Svec1997; Levelt Reference Levelt2001). Data from intracranial recordings during language production are consistent with different temporal windows for chunk decoding at the word, morphemic, and phonological levels, separated by just over a tenth of a second (Sahin et al. Reference Sahin, Pinker, Cash, Schomer and Halgren2009). These results are compatible with our proposal that incremental processing in comprehension and production takes place in parallel across multiple levels of linguistic representation, each with a characteristic temporal window.

3.3. Implications of Strategy 3: Predictive language processing

We have already noted that, to be able to chunk incoming information as fast and as accurately as possible, the language system exploits multiple constraints in parallel across the different levels of linguistic representation. Such cues may be used not only to help disambiguate previous input, but also to generate expectations for what may come next, potentially further speeding up Chunk-and-Pass processing. Computational considerations indicate that simple statistical information gleaned from sentences provides powerful predictive constraints on language comprehension and can explain many human processing results (e.g., Christiansen & Chater Reference Christiansen and Chater1999; Christiansen & MacDonald Reference Christiansen and MacDonald2009; Elman Reference Elman1990; Hale Reference Hale2006; Jurafsky Reference Jurafsky1996; Levy Reference Levy2008; Padó et al. Reference Padó, Crocker and Keller2009). Similarly, eye-tracking data suggest that comprehenders routinely use a variety of sources of probabilistic information – from phonological cues to syntactic context and real-world knowledge – to anticipate the processing of upcoming words (e.g., Altmann & Kamide Reference Altmann and Kamide1999; Farmer et al. Reference Farmer, Monaghan, Misyak and Christiansen2011; Staub & Clifton Reference Staub and Clifton2006). Results from event-related potential experiments indicate that rather specific predictions are made for upcoming input, including its lexical category (Hinojosa et al. Reference Hinojosa, Moreno, Casado, Munõz and Pozo2005), grammatical gender (Van Berkum et al. Reference Van Berkum, Brown, Zwitserlood, Kooijman and Hagoort2005; Wicha et al. Reference Wicha, Moreno and Kutas2004), and even its onset phoneme (DeLong et al. Reference DeLong, Urbach and Kutas2005) and visual form (Dikker et al. Reference Dikker, Rabagliati, Farmer and Pylkkänen2010). Accordingly, there is a growing body of evidence for a substantial role of prediction in language processing (for reviews, see, e.g., Federmeier Reference Federmeier2007; Hagoort Reference Hagoort, Bickerton and Szathmáry2009; Kamide Reference Kamide2008; Kutas et al. Reference Kutas, Federmeier, Urbach, Gazzaniga and Mangun2014; Pickering & Garrod Reference Pickering and Garrod2007) and evidence that such language prediction occurs in children as young as 2 years of age (Mani & Huettig Reference Mani and Huettig2012). Importantly, as well as exploiting statistical relations within a representational level, predictive processing allows top-down information from higher levels of linguistic representation to rapidly constrain the processing of the input at lower levels.Footnote ¹⁰

From the viewpoint of the Now-or-Never bottleneck, prediction provides an opportunity to begin Chunk-and-Pass processing as early as possible: to constrain representations of new linguistic material as it is encountered, and even incrementally to begin recoding predictable linguistic input before it arrives. This viewpoint is consistent with recent suggestions that the production system may be pressed into service to anticipate upcoming input (e.g., Pickering & Garrod Reference Pickering and Garrod2007; Reference Pickering and Garrod2013a). Chunk-and-Pass processing implies that there is practically no possibility for going back once a chunk is created because such backtracking tends to derail processing (e.g., as in the classic garden path phenomena mentioned above). This imposes a Right-First-Time pressure on the language system in the face of linguistic input that is highly locally ambiguous.Footnote ¹¹ The contribution of predictive modeling to comprehension is that it facilitates local ambiguity resolution while the stimulus is still available. Only by recruiting multiple cues and integrating these with predictive modeling is it possible to resolve local ambiguities quickly and correctly.

Right-First-Time parsing fits with proposals such as that by Marcus (Reference Marcus1980), where local ambiguity resolution is delayed until later disambiguating information arrives, and models in which aspects of syntactic structure may be underspecified, therefore not requiring the ambiguity to be resolved (e.g., Gorrell Reference Gorrell1995; Sturt & Crocker Reference Sturt and Crocker1996). It also parallels Marr's (Reference Marr1976) principle of least commitment, as we mentioned earlier, according to which the perceptual system should, as far as possible, only resolve perceptual ambiguities when sufficiently confident that they will not need to be undone. Moreover, it is compatible with the fine-grained weakly parallel interactive model (Altmann & Steedman Reference Altmann and Steedman1988) in which possible chunks are proposed, word-by-word, by an autonomous parser and one is rapidly chosen using top-down information.

To facilitate chunking across multiple levels of representation, prediction takes place in parallel across the different levels but at varying timescales. Predictions for higher-level chunks may run ahead of those for lower-level chunks. For example, most people simply answer “two” in response to the question “How many animals of each kind did Moses take on the Ark?” – failing to notice the semantic anomaly (i.e., it was Noah's Ark, not Moses' Ark) even in the absence of time pressure and when made aware that the sentence may be anomalous (Erickson & Matteson Reference Erickson and Matteson1981). That is, anticipatory pragmatic and communicative considerations relating to the required response appear to trump lexical semantics. More generally, the time course of normal conversation may lead to an emphasis on more temporally extended higher-level predictions over lower-level ones. This may facilitate the rapid turn-taking that has been observed cross-culturally (Stivers et al. Reference Stivers, Enfield, Brown, Englert, Hayashi, Heinemann, Hoymann, Rossano, de Ruiter, Yoon and Levinson2009) and which seems to require that listeners make quite specific predictions about when the speaker's current turn will finish (Magyari & De Ruiter Reference Magyari and de Ruiter2012), as well as being able to quickly adapt their expectations to specific linguistic environments (Fine et al. Reference Fine, Jaeger, Farmer and Qian2013).

We view the anticipation of turn-taking as one instance of the broader alignment that takes place between dialogue partners across all levels of linguistic representation (for a review, see Pickering & Garrod Reference Pickering and Garrod2004). This dovetails with fMRI analyses indicating that although there are some comprehension- and production-specific brain areas, spatiotemporal patterns of brain activity are in general closely coupled between speakers and listeners (e.g., Silbert et al. Reference Silbert, Honey, Simony, Poeppel and Hasson2014). In particular, Stephens et al. (Reference Stephens, Silbert and Hasson2010) observed close synchrony between neural activations in speakers and listeners in early auditory areas. Speaker activations preceded those of listeners in posterior brain regions (including parts of Wernicke's area), whereas listener activations preceded those of speakers in the striatum and anterior frontal areas. In the Chunk-and-Pass framework, the listener lag primarily derives from delays caused by the chunking process across the various levels of linguistic representation, whereas the speaker lag predominantly reflects the listener's anticipation of upcoming input, especially at the higher levels of representation (e.g., pragmatics and discourse). Strikingly, the extent of the listener's anticipatory brain responses were strongly correlated with successful comprehension, further underscoring the importance of prediction-based alignment for language processing. Indeed, analyses of real-time interactions show that alignment increases when the communicative task becomes more difficult (Louwerse et al. Reference Louwerse, Dale, Bard and Jeuniaux2012). By decreasing the impact of potential ambiguities, alignment thus makes processing as well as production easier in the face of the Now-or-Never bottleneck.

We have suggested that only an incremental, predictive language system, continually building and passing on new chunks of linguistic material, encoded at increasingly abstract levels of representation, can deal with the onslaught of linguistic input in the face of the severe memory constraints of the Now-or-Never bottleneck. We suggest that a productive line of future work is to consider the extent to which existing models of language are compatible with these constraints, and to use these properties to guide the creation of new theories of language processing.

4. Acquisition is learning to process

If speaking and understanding language involves Chunk-and-Pass processing, then acquiring a language requires learning how to create and integrate the right chunks rapidly, before current information is overwritten by new input. Indeed, the ability to quickly process linguistic input – which has been proposed as an indicator of chunking ability (Jones Reference Jones2012) – is a strong predictor of language acquisition outcomes from infancy to middle childhood (Marchman & Fernald Reference Marchman and Fernald2008). The importance of this process is also introspectively evident to anyone acquiring a second language: Initially, even segmenting the speech stream into recognizable sounds can be challenging, let alone parsing it into words or processing morphology and grammatical relations rapidly enough to build a semantic interpretation. The ability to acquire and rapidly deploy a hierarchy of chunks at different linguistic scales is parallel to the ability to chunk sequences of motor movements, numbers, or chess positions: It is a skill, built up by continual practice.

Viewing language acquisition as continuous with other types of skill learning is very different from the standard formulation of the problem of language acquisition in linguistics. There, the child is viewed as a linguistic theorist who has the goal of inferring an abstract grammar from a corpus of example sentences (e.g., Chomsky Reference Chomsky1957; Reference Chomsky1965) and only secondarily learning the skill of generating and understanding language. But perhaps the child is not a mini-linguist. Instead, we suggest that language acquisition is nothing more than learning to process: to turn meanings into streams of sound or sign (when generating language), and to turn streams of sound or sign back into meanings (when understanding language).

If linguistic input is available only fleetingly, then any learning must occur while that information is present; that is, learning must occur in real time, as the Chunk-and-Pass process takes place. Accordingly, any modifications to the learner's cognitive system in light of processing must, according to the Now-or-Never bottleneck, occur at the time of processing. The learner must learn to chunk the input appropriately – to learn to recode the input at successively more abstract linguistic levels; and to do this requires, of course, learning the structure of the language being spoken. But how is this structure learned?

We suggest that, in language acquisition, as in other areas of perceptual-motor learning, people learn by processing, and that past processing leaves traces that can facilitate future processing. What, then, is retained, so that language processing gradually improves? We can consider various possibilities: For example, the weights of a connectionist network can be updated online in the light of current processing (Rumelhart et al. Reference Rumelhart and McClelland1986a); in an exemplar-based model, traces of past examples can be reused in the future (e.g., Hintzman Reference Hintzman1988; Logan Reference Logan1988; Nosofsky Reference Nosofsky1986). Whatever the appropriate computational framework, the Now-or-Never bottleneck requires that language acquisition be viewed as a type of skill learning, such as learning to drive, juggle, play the violin, or play chess. Such skills appear to be learned through practicing the skill, using online feedback during the practice itself, although the consolidation of learning occurs subsequently (Schmidt & Wrisberg Reference Schmidt and Wrisberg2004). The challenge of language acquisition is to learn a dazzling sequence of rapid processing operations, rather than conjecturing a correct “linguistic theory.”

4.1. Implications of Strategy 1: Online learning

The Now-or-Never bottleneck implies that learning can depend only on material currently being processed. As we have seen, this implication requires a processing strategy according to which modification to current representations (in this context, learning) occurs right away; in machine-learning terminology, learning is online. If learning does not occur at the time of processing, the representation of linguistic material will be obliterated, and the opportunity for learning will be gone forever. To facilitate such online learning, the child must learn to use all available information to help constrain processing. The integration of multiple constraints – or cues – is a fundamental component of many current theories of language acquisition (see, e.g., contributions in Golinkoff et al. Reference Golinkoff, Hirsh-Pasek, Bloom, Smith, Woodward, Akhtar, Tomasello and Hollich2000; Morgan & Demuth Reference Morgan and Demuth1996; Weissenborn & Höhle Reference Weissenborn and Höhle2001; for a review, see Monaghan & Christiansen Reference Monaghan, Christiansen and Behrens2008). For example, second-graders' initial guesses about whether a novel word refers to an object or an action are affected by that word's phonological properties (Fitneva et al. Reference Fitneva, Christiansen and Monaghan2009); 7-year-olds use visual context to constrain online sentence interpretation (Trueswell et al. Reference Trueswell, Sekerina, Hill and Logrip1999); and preschoolers' language production and comprehension is constrained by pragmatic factors (Nadig & Sedivy Reference Nadig and Sedivy2002). Thus, children learn rapidly to apply the multiple constraints used in incremental adult processing (Borovsky et al. Reference Borovsky, Elman and Fernald2012).

Nonetheless, online learning contrasts with traditional approaches in which the structure of the language is learned offline by the cognitive system acquiring a corpus of past linguistic inputs and choosing the grammar or other model of the language that best fits with those inputs. For example, in both mathematical and theoretical analysis (e.g., Gold Reference Gold1967; Hsu et al. Reference Hsu, Chater and Vitányi2011; Pinker Reference Pinker1984) and in grammar-induction algorithms in machine learning and cognitive science, it is typically assumed that a corpus of language can be held in memory, and that the candidate grammar is successively adjusted to fit the corpus as well as possible (e.g., Manning & Schütze Reference Manning and Schütze1999; Pereira & Schabes Reference Pereira, Schabes and Thompson1992; Redington et al. Reference Redington, Chater and Finch1998). However, this approach involves learning linguistic regularities (at, say, the morphological level), by storing and later surveying relevant linguistic input at a lower level of analysis (e.g., involving strings of phonemes); and then attempting to determine which higher-level regularities best fit the database of lower-level examples. There are a number of difficulties with this type of proposal – for example, that only a very rich lower-level representation (perhaps combined with annotations concerning relevant syntactic and semantic context) is likely to be a useful basis for later analysis. But more fundamentally, the Now-or-Never bottleneck requires that information be retained only if it is recoded at processing time: Phonological information that is not chunked at the morphological level and beyond will be obliterated by oncoming phonological material.Footnote ¹²

So, if learning is shaped by the Now-or-Never bottleneck, then linguistic input must, when it is encountered, be recoded successively at increasingly abstract linguistic levels if it is to be retained at all – a constraint imposed, we argue, by basic principles of memory. Crucially, such information is not, therefore, in a suitably “neutral” format to allow for the discovery of previously unsuspected linguistic regularities. In a nutshell, the lossy compression of the linguistic input is achieved by applying the learner's current model of the language. But information that would point toward a better model of the language (if examined in retrospect) will typically be lost (or, at best, badly obscured) by this compression, precisely because those regularities are not captured by the current model of the language. Suppose, for example, that we create a lossy encoding of language using a simple, context-free phrase structure grammar that cannot handle, say, noun-verb agreement. The lossy encoding of the linguistic input produced using this grammar will provide a poor basis for learning a more sophisticated grammar that includes agreement – precisely because agreement information will have been thrown away. So the Now-or-Never bottleneck rules out the possibility that the learner can survey a neutral database of linguistic material, to optimize its model of the language.

The emphasis on online learning does not, of course, rule out the possibility that any linguistic material that is remembered may subsequently be used to inform learning. But according to the present viewpoint, any further learning requires reprocessing that material. So if a child comes to learn a poem, song, or story verbatim, the child might extract more structure from that material by mental rehearsal (or, indeed, by saying it aloud). The online learning constraint is that material is learned only when it is being processed – ruling out any putative learning processes that involve carrying out linguistic analyses or compiling statistics over a stored corpus of linguistic material.

If this general picture of acquisition as learning-to-process is correct, then we should expect the exploitation of memory to require “replaying” learned material, so that it can be re-processed. Thus, the application of memory itself requires passing through the Now-or-Never bottleneck – there is no way of directly interrogating an internal database of past experience; indeed, this viewpoint fits with our subjective sense that we need to “bring to mind” past experiences or rehearse verbal material to process it further. Interestingly, there is now also substantial neuroscientific evidence that replay does occur (e.g., in rat spatial learning, Carr et al. Reference Carr, Jadhav and Frank2011). Moreover, it has long been suggested that dreaming may have a related function (here using “reverse” learning over “fictional” input to eliminate spurious relationships identified by the brain, Crick & Mitchison Reference Crick and Mitchison1983; see Hinton & Sejnowki Reference Hinton, Sejnowski, Irwin and Sejnowski1986, for a closely related computational model). Deficits in the ability to replay material would, in this view, lead to consequent deficits in memory and inference; consistent with this viewpoint, Martin and colleagues have argued that rehearsal deficits for phonological pattern and semantic information may lead to difficulties in the long-term acquisition and retention of word forms and word meanings, respectively, and their use in language processing (e.g., Martin & He Reference Martin and He2004; Martin et al. Reference Martin, Shelton and Yaffee1994). In summary, then, language acquisition involves learning to process, and generalizations can only be made over past processing episodes.

4.2. Implications of Strategy 2: Local learning

Online learning faces a particularly acute version of a general learning problem: the stability-plasticity dilemma (e.g., Mermillod et al. Reference Mermillod, Bugaïska and Bonin2013). How can new information be acquired without interfering with prior information? The problem is especially challenging because reviewing prior information is typically difficult (because recalling earlier information interferes with new input) or impossible (where prior input has been forgotten). Thus, to a good approximation, the learner can only update its model of the language in a way that responds to current linguistic input, without being able to review whether any updates are inconsistent with prior input. Specifically, if the learner has a global model of the entire language (e.g., a traditional grammar), the learner runs the risk of overfitting that model to capture regularities in the momentary linguistic input at the expense of damaging the match with past linguistic input.

Avoiding this problem, we suggest, requires that learning be highly local, consisting of learning about specific relationships between particular linguistic representations. New items can be acquired, with implications for later processing of similar items; but learning current items does not thereby create changes to the entire model of the language, thus potentially interfering with what was learned from past input. One way to learn in a local fashion is to store individual examples (this requires, in our framework, that those examples have been abstractly recoded by successive Chunk-and-Pass operations, of course), and then to generalize, piecemeal, from these examples. This standpoint is consistent with the idea that the “priority of the specific,” as observed in other areas of cognition (e.g., Jacoby et al. Reference Jacoby, Baker and Brooks1989), also applies to language acquisition. For example, children seem to be highly sensitive to multiword chunks (Arnon & Clark Reference Arnon and Clark2011; Bannard & Matthews Reference Bannard and Matthews2008; see Arnon & Christiansen, Reference Arnon and Christiansensubmitted, for a reviewFootnote ¹³ ). More generally, learning based on past traces of processing will typically be sensitive to details of that processing, as is observed across phonetics, phonology, lexical access, syntax, and semantics (e.g., Bybee Reference Bybee2006; Goldinger Reference Goldinger1998; Pierrehumbert Reference Pierrehumbert2002; Tomasello Reference Tomasello1992).

That learning is local provides a powerful constraint, incompatible with typical computational models of how the child might infer the grammar of the language – because these models typically do not operate incrementally but range across the input corpus, evaluating alternative grammatical hypotheses (so-called batch learning). But, given the Now-or-Never bottleneck, the “unprocessed” corpus, so readily available to the linguistic theorist, or to a computer model, is lost to the human learner almost as soon as it is encountered. Where such information has been memorized (as in the case of SF's encoding of streams of digits), recall and processing is slow and effortful. Moreover, because information is encoded in terms of the current encoding, it becomes difficult to neutrally review that input to create a better encoding, and cross-check past data to test wide-ranging grammatical hypotheses.Footnote ¹⁴ So, as we have already noted, the Now-or-Never bottleneck seems incompatible with the view of a child as a mini-linguist.

By contrast, the principle of local learning is respected by other approaches. For example, item-based (Tomasello Reference Tomasello2003), connectionist (e.g., Chang et al. 1999; Elman Reference Elman1990; MacDonald & Christiansen Reference MacDonald and Christiansen2002),Footnote ¹⁵ exemplar-based (e.g., Bod Reference Bod2009), and other usage-based (e.g., Arnon & Snider Reference Arnon and Snider2010; Bybee Reference Bybee2006) accounts of language acquisition tie learning and processing together – and assume that language is acquired piecemeal, in the absence of an underlying Bauplan. Such accounts, based on local learning, provide a possible explanation of the frequency effects that are found at all levels of language processing and acquisition (e.g., Bybee Reference Bybee2007; Bybee & Hopper Reference Bybee and Hopper2001; Ellis Reference Ellis2002; Tomasello Reference Tomasello2003), analogous to exemplar-based theories of how performance speeds up with practice (Logan Reference Logan1988).

The local nature of learning need not, though, imply that language has no integrated structure. Just as in perception and action, local chunks can be defined at many different levels of abstraction, including highly abstract patterns, for example, governing subject, verb, and object; and generalizations from past processing to present processing will operate across all of these levels. Therefore, in generating or understanding a new sentence, the language user will be influenced by the interaction of multiple constraints from innumerable traces of past processing, across different linguistic levels. This view of language processing involving the parallel interaction of multiple local constraints is embodied in a variety of influential approaches to language (e.g., Jackendoff Reference Jackendoff2007; Seidenberg Reference Seidenberg1997).

4.3. Implications of Strategy 3: Learning to predict

If language processing involves prediction – to make the encoding of new linguistic material sufficiently rapid – then a critical aspect of language acquisition is learning to make such predictions successfully (Altmann & Mirkovic Reference Altmann and Mirkovic2009). Perhaps the most natural approach to predictive learning is to compare predictions with subsequent reality, thus creating an “error signal,” and then to modify the predictive model to systematically reduce this error. Throughout many areas of cognition, such error-driven learning has been widely explored in a range of computational frameworks (e.g., from connectionist networks, to reinforcement learning, to support vector machines) and has considerable behavioral (e.g., Kamin Reference Kamin, Campbell and Church1969) and neurobiological support (e.g., Schultz et al. Reference Schultz, Dayan and Montague1997).

Predictive learning can, in principle, take a number of forms: For example, predictive errors can be accumulated over many samples, and then modifications made to the predictive model to minimize the overall error over those samples (i.e., batch learning). But this is ruled out by the Now-or-Never bottleneck: Linguistic input, and the predictions concerning it, is present only fleetingly. But error-driven learning can also be “online” – each prediction error leads to an immediate, though typically small, modification of the predictive model; and the accumulation of these small modifications gradually reduces prediction errors on future input.

A number of computational models adhere to these principles: Learning involves creating a predictive model of the language, using online error-driven learning. Such models, limited though they are, may provide a starting point for creating an increasingly realistic account of language acquisition and processing. For example, a connectionist model which embodies these principles is the simple recurrent network (Altmann Reference Altmann2002; Christiansen & Chater Reference Christiansen and Chater1999; Elman Reference Elman1990), which learns to map from the current input on to the next element in a continuous sequence of linguistic (or other) input; and which learns, online, by adjusting its parameters (the “weights” of the network) to reduce the observed prediction error, using the back-propagation learning algorithm. Using a very different framework, in the spirit of construction grammar (e.g., Croft Reference Croft2001; Goldberg Reference Goldberg2006), McCauley and Christiansen (Reference McCauley, Christiansen, Carlson, Hölscher and Shipley2011) recently developed a psychologically based, online chunking model of incremental language acquisition and processing , incorporating prediction to generalize to new chunk combinations. Exemplar-based analogical models of language acquisition and processing may also be constructed, which build and predict language structure online, by incrementally creating a database of possible structures, and dynamically using online computation of similarity to recruit these structures to process and predict new linguistic input.

Importantly, prediction allows for top-down information to influence current processing across different levels of linguistic representation, from phonology to discourse, and at different temporal windows (as indicated by Fig. 2). We see the ability to use such top-down information as emerging gradually across development, building on bottom-up information. That is, children gradually learn to apply top-down knowledge to facilitate processing via prediction, as higher-level information becomes more entrenched and allows for anticipatory generalizations to be made.

In this section, we have argued that the child should not be viewed as a mini-linguist, attempting to infer the abstract structure of grammar, but as learning to process: that is, learning to alleviate the severe constraints imposed by the Now-or-Never bottleneck. Next, we discuss how chunk-based language acquisition and processing have shaped linguistic change and, ultimately, the evolution of language.

5. Language change is item-based

Like language, human culture constantly changes. We continually tinker with all aspects of culture, from social conventions and rituals to technology and everyday artifacts (see contributions in Richerson & Christiansen Reference Richerson and Christiansen2013). Perhaps language, too, is a result of cultural evolution – a product of piecemeal tinkering – with the long-term evolution of language resulting from the compounding of myriad local short-term processes of language change. This hypothesis figures prominently in many recent theories of language evolution (e.g., Arbib Reference Arbib2005; Beckner et al. Reference Beckner, Blythe, Bybee, Christiansen, Croft, Ellis, Holland, Ke, Larsen- Freeman and Schoenemann2009; Christiansen & Chater Reference Christiansen and Chater2008; Hurford Reference Hurford, Dunbar, Knight and Power1999; Smith & Kirby Reference Smith and Kirby2008; Tomasello Reference Tomasello2003; for a review of these theories, see Dediu et al. Reference Dediu, Cysouw, Levinson, Baronchelli, Christiansen, Croft, Evans, Garrod, Gray, Kandler, Lieven, Richerson and Christiansen2013). Language is construed as a complex evolving system in its own right; linguistic forms that are easier to use and learn, or are more communicatively efficient, will tend to proliferate, whereas those that are not will be prone to die out. Over time, processes of cultural evolution involving repeated cycles of learning and use are hypothesized to have shaped the languages we observe today.

If aspects of language survive only when they are easy to produce and understand, then moment-by-moment processing will shape not only the structure of language (see also Hawkins Reference Hawkins2004; O'Grady Reference O'Grady2005), but also the learning problem that the child faces. Thus, from the perspective of language as an evolving system, language processing at the timescale of seconds has implications for the longer timescales of language acquisition and evolution. Figure 3 illustrates how the effects of the Now-or-Never bottleneck flow from the timescale of processing to those of acquisition and evolution.

Figure 3. Illustration of how Chunk-and-Pass processing at the utterance level (with the C_i referring to chunks) constrains the acquisition of language by the individual, which, in turn, influences how language evolves through learning and use by groups of individuals on a historical timescale.

Chunk-and-Pass processing carves the input (or output) into chunks at different levels of linguistic representation at the timescale of the utterance (seconds). These chunks constitute the comprehension and production events from which children and adults learn and update their ability to process their native language over the timescale of the individual (tens of years). Each learner, in turn, is part of a population of language users that shape the cultural evolution of language across a historical timescale (hundreds or thousands of years): Language will be shaped by the linguistic patterns learners find easiest to acquire and process. And the learners will, of course, be strongly constrained by the basic cognitive limitation that is the Now-or-Never bottleneck – and, hence, through cultural evolution, linguistic patterns, which can be processed through that bottleneck, will be strongly selected. Moreover, if acquiring a language is learning to process and processing involves incremental Chunk-and-Pass operations, then language change will operate through changes driven by Chunk-and-Pass processing, both within and between individuals. But this, in turn, implies that processes of language change should be item-based, driven by processing/acquisition mechanisms defined over Chunk-and-Pass representations (rather than, for example, being defined over abstract linguistic parameters, with diverse structural consequences across the entire language).

We noted earlier that a consequence of Chunk-and-Pass processing for production is a tendency toward reduction, especially of more frequently used forms, and this constitutes one of several pressures on language change (see also MacDonald Reference MacDonald2013). Because reduction minimizes articulatory processing effort for the speaker but may increase processing effort for the hearer and learner, this pressure can in extreme cases lead to a communicative collapse. This is exemplified by a lab-based analogue of the game of “telephone,” in which participants were exposed to a miniature artificial language consisting of simple form-meaning mappings (Kirby et al. Reference Kirby, Cornish and Smith2008). The initial language contained random mappings between syllable strings and pictures of moving geometric figures in different colors. After exposure, participants were asked to produce linguistic forms corresponding to specific pictures. Importantly, the participants saw only a subset of the language but nonetheless had to generalize to the full language. The productions of the initial learner were then used as the input language for the next learner, and so on for a total of 10 “generations.” In the absence of other communicative pressures (such as the avoidance of ambiguity; Grice Reference Grice and James1967), the language collapsed into just a few different forms that allowed for systematic, albeit semantically underspecified, generalization to unseen items. In natural language, however, the pressure toward reduction is normally kept in balance by the need to maintain effective communication.

Expanding on the notion of reduction and erosion, we suggest that constraints from Chunk-and-Pass processing can provide a cognitive foundation for grammaticalization (Hopper & Traugott Reference Hopper and Traugott1993). Specifically, chunks at different levels of linguistic structure – discourse, syntax, morphology, and phonology – are potentially subject to reduction. Consequently, we can distinguish between different types of grammaticalization, from discourse syntacticization and semantic bleaching to morphological reduction and phonetic erosion. Repeated chunking of loose discourse structures may result in their reduction into more rigid syntactic constructions, reflecting Givón's (Reference Givón1979) hypothesis that today's syntax is yesterday's discourse.Footnote ¹⁶ For example, the resultative construction He pulled the window open might derive from syntacticization of a loose discourse sequence such as He pulled the window and it opened (Tomasello Reference Tomasello2003). As a further by-product of chunking, some words that occur frequently in certain kinds of construction may gradually become “bleached” of meaning and ultimately signal only general syntactic properties. Consider, as an example, the construction be going to, which was originally used exclusively to indicate movement in space (e.g., I'm going to Ithaca) but which is now also used as an intention or future marker when followed by a verb (as in I'm going to eat at seven; Bybee et al. Reference Bybee, Perkins and Pagliuca1994). Additionally, a chunked linguistic expression may further be subject to morphological reduction, resulting in further loss of morphological (or syntactic) elements. For instance, the demonstrative that in English (e.g., that window) lost the grammatical category of number (that _sing vs. those _plur) when it came to be used as a complementizer, as in the window/windows that is/are dirty (Hopper & Traugott Reference Hopper and Traugott1993). Finally, as noted earlier, frequently chunked elements are likely to become phonologically reduced, leading to the emergence of new shortened grammaticalized forms, such as the phonetic erosion of going to into gonna (Bybee et al. Reference Bybee, Perkins and Pagliuca1994). Thus, the Now-or-Never bottleneck provides a constant pressure toward reduction and erosion across the different levels of linguistic representation, providing a possible explanation for why grammaticalization tends to be a largely unidirectional process (e.g., Bybee et al. Reference Bybee, Perkins and Pagliuca1994; Haspelmath Reference Haspelmath1999; Heine & Kuteva Reference Heine and Kuteva2002; Hopper & Traugott Reference Hopper and Traugott1993).

Beyond grammaticalization, we suggest that language change, more broadly, will be local at the level of individual chunks. At the level of sound change, our perspective is consistent with lexical diffusion theory (e.g., Wang Reference Wang1969; Reference Wang1977; Wang & Cheng Reference Wang, Cheng and Wang1977), suggesting that sound change originates with a small set of words and then gradually spreads to other words with a similar phonological make-up. The extent and speed of such sound change is affected by a number of factors, including frequency, word class, and phonological environment (e.g., Bybee Reference Bybee2002; Phillips Reference Phillips2006). Similarly, morpho-syntactic change is also predicted to be local in nature: what we might call “constructional diffusion.” Accordingly, we interpret the cross-linguistic evidence indicating the effects of processing constraints on grammatical structure (e.g., Hawkins Reference Hawkins2004; Kempson et al. Reference Kempson, Meyer-Viol and Gabbay2001; O'Grady Reference O'Grady2005; see Jaeger & Tily Reference Jaeger and Tily2011, for a review) as a process of gradual change over individual constructions, instead of wholesale changes to grammatical rules. Note, though, that because chunks are not independent of one another but form a system within a given level of linguistic representation, a change to a highly productive chunk may have cascading effects to other chunks at that level (and similarly for representations at other levels of abstraction). For example, if a frequently used construction changes, then constructional diffusion could in principle lead to rapid, and far-reaching, change throughout the language.

On this account, another ubiquitous process of language change, regularization, whereby representations at a particular linguistic level become more patterned, should also be a piecemeal process. This is exemplified by another of Kirby et al.'s (2008) game-of-telephone experiments, showing that when ambiguity is avoided, a highly structured linguistic system emerges across generations of learners, with morpheme-like substrings indicating different semantic properties (color, shape, and movement). Another similar, lab-based cultural evolution experiment showed that this process of regularization does not result in the elimination of variability but, rather, in increased predictability through lexicalized patterns (Smith & Wonnacott Reference Smith and Wonnacott2010). Whereas the initial language contained unpredictable pairings of nouns with plural markers, each noun became chunked with a specific marker in the final languages.

These examples illustrate how Chunk-and-Pass processing over time may lead to so-called obligatorification, whereby a pattern that was initially flexible or optional becomes obligatory (e.g., Heine & Kuteva Reference Heine and Kuteva2007). This process is one of the ways in which new chunks may be created. So, although chunks at each linguistic level can lose information through grammaticalization, and although they cannot regain it, a countervailing process exists by which complex chunks are constructed by “gluing together” existing chunks.Footnote ¹⁷ That is, in Bybee's (Reference Bybee2002) phrase, “items that are used together fuse together.” For example, auxiliary verbs (e.g., to have, to go) can become fused with main verbs to create new morphological patterns, as in many Romance languages, in which the future tense is signaled by an auxiliary tacked on as a suffix to the infinitive. In Spanish, the future tense endings -é, -ás, -á, -emos, -éis, -án derive from the present tense of the auxiliary haber, namely, he, has, ha, hemos, habéis, han; and in French, the corresponding endings -ai, -as, -a, -on, -ez, -on derive from the present tense of the auxiliary avoir, namely, ai, as, a, avon, avez, ont (Fleischman Reference Fleischman1982). Such complex new chunks are then subject to erosion (e.g., as is implicit in the example above, the Spanish for you _{informal, plural} will eat is comeréis, rather than *comerhabéis; the first syllable of the auxiliary has been stripped away).

Importantly, the present viewpoint is neutral regarding the extent to which children are the primary source of innovation (e.g., Bickerton Reference Bickerton1984) or regularization (e.g., Hudson et al. 2005) of linguistic material, although constraints from child language acquisition likely play some role (e.g., in the emergence of regular subject-object-verb word order in the Al-Sayyid Bedouin Sign Language; Sandler et al. Reference Sandler, Meir, Padden and Aronoff2005). In general, we would expect that multiple forces influence language change in parallel (for reviews, see Dediu et al. Reference Dediu, Cysouw, Levinson, Baronchelli, Christiansen, Croft, Evans, Garrod, Gray, Kandler, Lieven, Richerson and Christiansen2013; Hruschka et al. Reference Hruschka, Christiansen, Blythe, Croft, Heggarty, Mufwene, Pierrehumbert and Poplack2009), including sociolinguistic factors (e.g., Trudgill Reference Trudgill2011), language contact (e.g., Mufwene Reference Mufwene2008), and use of language as an ethnic marker (e.g., Boyd & Richerson Reference Boyd and Richerson1987).

Because language change, like processing and acquisition, is driven by multiple competing factors, which are amplified by cultural evolution, linguistic diversity will be the norm. Accordingly, we would expect few, if any, “true” language universals to exist in the sense of constraints that can be explained only in purely linguistic terms (Christiansen & Chater Reference Christiansen and Chater2008). Nonetheless, domain-general processing constraints are likely to significantly constrain the set of possible languages (see, e.g., Cann & Kempson Reference Cann, Kempson, Cooper and Kempson2008). This picture is consistent with linguistic arguments suggesting that there may be no strict language universals (Bybee Reference Bybee, Christiansen, Collins and Edelman2009; Evans & Levinson Reference Evans and Levinson2009). For example, computational phylogenetic analyses indicate that word order correlations are lineage-specific (Dunn et al. Reference Dunn, Greenhill, Levinson and Gray2011), shaped by particular histories of cultural evolution rather than following universal patterns as would be expected if they were the result of innate linguistic constraints (e.g., Baker Reference Baker2001) or language-specific performance limitations (e.g., Hawkins Reference Hawkins, Christiansen, Collins and Edelman2009). Thus, the process of piecemeal tinkering that drives item-based language change is subject to constraints deriving not only from Chunk-and-Pass processing and multiple-cue integration but also from the specific trajectory of cultural evolution that a language follows. More generally, in this perspective, there is no sharp distinction between language evolution and language change: Language evolution is just the result of language change over a long timescale (see also Heine & Kuteva Reference Heine and Kuteva2007), obviating the need for separate theories of language evolution and change (e.g., Berwick et al. Reference Berwick, Friederici, Chomsky and Bolhuis2013; Hauser et al. Reference Hauser, Chomsky and Fitch2002; Pinker Reference Pinker1994).Footnote ¹⁸

6. Structure as processing

The Now-or-Never bottleneck implies, we have argued, that language comprehension involves incrementally chunking linguistic material and immediately passing the result for further processing, and production involves a similar cascade of Just-in-Time processing operations in the opposite direction. And language will be shaped through cultural evolution to be easy to learn and process by generations of speakers/hearers, who are forced to chunk and pass the oncoming stream of linguistic material. What are the resulting implications for the structure of language and its mental representation? In this section, we first show that certain key properties of language follow naturally from this framework; we then reconceptualize certain important notions in the language sciences.

6.1. Explaining key properties of language

6.1.1. The bounded nature of linguistic units

In nonlinguistic sequential tasks, memory constraints are so severe that chunks of more than a few items are rare. People typically encode phone numbers, number plates, postal codes, and Social Security numbers into sequences of between two and four digits or letters; memory recall deteriorates rapidly for unchunked item-sequences longer than about four elements (Cowan Reference Cowan2000), and memory recall typically breaks into short chunk-like phrases. Similar chunking processes are thought to govern nonlinguistic sequences of actions (e.g., Graybiel Reference Graybiel1998). As we have argued previously in this article, the same constraints apply throughout language processing, from sound to discourse.

Across different levels of linguistic representation, units also tend to have only a few component elements. Even though the nature of sound-based units in speech is theoretically contentious, all proposals capture the sharply bounded nature of such units. For example, a traditional perspective on English phonology would postulate phonemes, short sequences of which are grouped into syllables, with multisyllabic words being organized by intonational or perhaps morphological groupings. Indeed, the tendency toward few-element units is so strong that long, nonsense words with many syllables such as supercalifragilisticexpialidocious is chunked successively, for example, as tentatively indicated:

$$[[[\hbox{Super}][\hbox{cali}]][[\hbox{fragi}][\hbox{listic}]][[\hbox{expi}][\hbox{ali}]][\hbox{docious}]]$$

Similarly, agglutinating languages, such as Turkish, chunk complex multimorphemic words using local grouping mechanisms that include formulaic morpheme expressions (Durrant Reference Durrant2013). Likewise, at higher levels of linguistic representation, verbs normally have only two or three arguments at most. Across linguistic theories of different persuasions, syntactic phrases typically consist of only a few constituents. Thus, the Now-or-Never bottleneck provides a strong bias toward bounded linguistic units across various levels of linguistic representations.

6.1.2. The local nature of linguistic dependencies

Just as we have argued that Chunk-and-Pass processing leads to simple linguistic units with only a small number of components, so it produces a powerful tendency toward local dependencies. Dependencies between linguistic elements will primarily be adjacent or separated by only a few other elements. For example, at the phonological level, processes are highly local, as reflected by data on coarticulation, assimilation, and phonotactic constraints (e.g., Clark et al. Reference Clark, Yallop and Fletcher2007). Similarly, we expect word formation processes to be highly local in nature, which is in line with a variety of different linguistic perspectives on the prominence of adjacency in morphological composition (e.g., Carstairs-McCarthy Reference Carstairs-McCarthy1992; Hay Reference Hay2000; Siegel Reference Siegel1978). Strikingly, adjacency even appears to be a key characteristic of multimorphemic formulaic units in an agglutinating language such as Turkish (Durrant Reference Durrant2013).

At the syntactic level, there is also a strong bias toward local dependencies. For example, when processing the sentence “The key to the cabinets was …” comprehenders experience local interference from the plural cabinets, although the verb was needs to agree with the singular key (Nicol et al.Reference Nicol, Forster and Veres1997; Pearlmutter et al. Reference Pearlmutter, Garnsey and Bock1999). Indeed, individuals who are good at picking up adjacent dependencies among sequence elements in a serial-reaction time task also experience greater local interference effects in sentence processing (Misyak & Christiansen Reference Misyak, Christiansen, Catrambone and Ohlsson2010). Moreover, similar local interference effects have been observed in production when people are asked to continue the above sentence after cabinets (Bock & Miller Reference Bock and Miller1991).

More generally, analyses of Romanian and Czech (Ferrer-i-Cancho Reference Ferrer i Cancho2004) as well as Catalan, Basque, and Spanish (Ferrer-i-Cancho & Liu Reference Ferrer i Cancho and Liu2014) point to a pressure toward minimization of the distance between syntactically related words. This tendency toward local dependencies seems to be particularly strongly expressed in strict-word-order languages such as English, but somewhat less so for more flexible languages such as German (Gildea & Temperley Reference Gildea and Temperley2010). However, the use of case marking in German may provide a cue to overcome this by indicating who does what to whom, as suggested by simulations of the learnability of different word orders with or without case markings (e.g., Lupyan & Christiansen Reference Lupyan, Christiansen, Gray and Schunn2002; Van Everbroeck Reference Van Everbroeck, Hahn and Stoness1999). This highlights the importance not only of distributional information (e.g., regarding word order) but also of other types of cues (e.g., involving phonological, semantic, or pragmatic information), as discussed previously.

We want to stress, however, that we are not denying the existence of long-distance syntactic dependencies; rather, we are suggesting that our ability to process such dependencies will be bounded by the number of chunks that can be kept in memory at a given level of linguistic representation. In many cases, chunking may help to minimize the distance over which a dependency has to remain in memory. For example, the use of personal pronouns can facilitate the processing of otherwise difficult object relative clauses because they are more easily chunked (e.g., People [you know] are more fun; Reali & Christiansen Reference Reali and Christiansen2007a). Similarly, the processing of long-distance dependencies is eased when they are separated by highly frequent word combinations that can be readily chunked (e.g., Reali & Christiansen Reference Reali and Christiansen2007b). More generally, the Chunk-and-Pass account is in line with other approaches that assign processing limitations and complexity as primary constraints on long-distance dependencies, thus potentially providing explanations for linguistic phenomena, such as subjacency (e.g., Berwick & Weinberg Reference Berwick and Weinberg1984; Kluender & Kutas Reference Kluender and Kutas1993), island constraints (e.g., Hofmeister & Sag Reference Hofmeister and Sag2010), referential binding (e.g., Culicover Reference Culicover2013), and scope effects (e.g., O'Grady Reference O'Grady2013). Crucially, though, as we argued earlier, the impact of these processing constraints may be lessened to some degree by the integration of multiple sources of information (e.g., from pragmatics, discourse context, and world knowledge) to support the ongoing interpretation of the input (e.g., Altmann & Steedman Reference Altmann and Steedman1988; Heider et al. Reference Heider, Dery and Roland2014; Tanenhaus et al. Reference Tanenhaus, Spivey-Knowlton, Eberhard and Sedivy1995).

6.1.3. Multiple levels of linguistic representation

Speech allows us to transmit a digital, symbolic code over a serial, analog channel using time variation in sound pressure (or using analog movements, in sign language). How might we expect this digital-analog-digital conversion to be tuned, to optimize the amount of information transmitted?

The problem of encoding and decoding digital signals over an analog serial channel is well studied in communication theory (Shannon Reference Shannon1948) – and, interestingly, the solutions typically adopted look very different from those employed by natural language. Crucially, to maximize the rate of transfer of information it is generally best to transform the message to be conveyed across the analog signal in a very nonlocal way. That is, rather than matching up portions of the information to be conveyed (e.g., in an engineering context, these might be the contents of a database) to particular portions of the analog signal, the best strategy is to encrypt the entire digital message using the entire analog signal, so that the message is coded as a block (e.g., MacKay Reference MacKay2003). But why is the engineering solution to information transmission so very different from that used by natural language, in which distinct portions of the analog signal correspond to meaningful units in the digital code (e.g., phonemes, words)? The Now-or-Never bottleneck provides a natural explanation.

A block-based code requires decoding a stored memory trace for the entire analog signal (for language, typically, acoustic) – that is, the whole block. This is straightforward for artificial computing systems, where memory interference is no obstacle. But this type of block coding is, of course, precisely what the Now-or-Never bottleneck rules out. The human perceptual system must turn the acoustic input into a (lossy) compressed form right away, or else the acoustic signal is lost forever. Similarly, the speech production system cannot decide to send a single, lengthy analog signal, and then successfully reel off the lengthy corresponding sequence of articulatory instructions, because this will vastly exceed our memory capacity for sequences of actions. Instead, the acoustic signal must be generated and decoded incrementally so that the symbolic information to be transmitted maps, fairly locally, to portions of the acoustic signal. Thus, to an approximation, whereas individual phonemes acoustically exhibit enormous contextual variation, diphones (pairs of phonemes) are a fairly stable acoustic signal, as evident by their use in tolerably good speech synthesis and recognition (e.g., Jurafsky et al. Reference Jurafsky, Martin, Kehler, Vander Linden and Ward2000). Overall, then, each successive segment of the analog acoustic input must correspond to a part of the symbolic code being transmitted. This is not because of considerations of informational efficiency but because of the brain's processing limitations in encoding and decoding: specifically, by the Now-or-Never bottleneck.

The need rapidly to encode and decode implies that spoken language will consist of a sequence of short sound-based units (the precise nature of these units may be controversial, and may even differ between languages, but units could include diphones, phonemes, mora, syllables, etc.). Similarly, in speech production, the Now-or-Never bottleneck rules out planning and executing a long articulatory sequence (as in a block-code used in communication technology); rather, speech must be planned incrementally, in the Just-in-Time fashion, requiring that the speech signal corresponds to sequences of discrete sound-based units.

6.1.4. Duality of patterning

Our perspective has yet further intriguing implications. Because the Now-or-Never bottleneck requires that symbolic information must rapidly be read off the analog signal, the number of such symbols will be severely limited – and in particular, may be much smaller than the vocabulary of a typical speaker (many thousands or tens of thousands of items). This implies that the short symbolic sequences into which the acoustic signal is initially recoded cannot, in general, be bearers of meaning; instead, the primary bearers of meaning, lexical items, and morphemes, will be composed out of these smaller units.

Thus, the Now-or-Never bottleneck provides a potential explanation for a puzzling but ubiquitous feature of human languages, including signed languages. This is duality of patterning: the existence of (one or more) level(s) of symbolically encoded sound structure (whether described in terms of phonemes, mora, or syllables) from which the level of words and morphemes (over which meanings are defined) are composed. Such patterning arises, in the present analysis, as a consequence of rapid online multilevel chunking in both speech production and perception. In the absence of duality of patterning, the acoustic signal corresponding, say, to a single noun, could not be recoded incrementally as it is received (Warren & Marslen-Wilson Reference Warren and Marslen-Wilson1987) – but would have to be processed as a whole, thus dramatically overloading sensory memory.

It is, perhaps, also of interest to note that the other domain in which people process enormously complex acoustic input – music – also typically consists of multiple layers of structure (notes, phrases, and so on, see, e.g., Lerdahl & Jackendoff Reference Lerdahl and Jackendoff1983; Orwin et al. Reference Orwin, Howes and Kempson2013). We may conjecture that Chunk-and-Pass processing operates for music as well as language, thus helping to explain why our ability to process musical input spectacularly exceeds our ability to process arbitrary sequential acoustic material (Clément et al. Reference Clément, Demany and Semal1999).

6.1.5. The quasi-regularity of linguistic structure

We have argued that the Now-or-Never bottleneck implies that language processing involves applying highly local Chunk-and-Pass operations across a range of representational levels; and that language acquisition involves learning to perform such operations. But, as in the acquisition of other skills, learning from such specific instances does not operate by rote but leads to generalization and hence modification from one instance to another (Goldberg Reference Goldberg2006). Indeed, such processes of local generalization are ubiquitous in language change, as we have noted above. From this standpoint, we should expect the rule-like patterns in language to emerge from generalizations across specific instances (see, e.g., Hahn & Nakisa Reference Hahn and Nakisa2000, for an example of this approach to inflectional morphology in German); once entrenched, such rule-like patterns can, of course, be applied quite broadly to newly encountered cases. Thus, patterns of regularity in language will emerge locally and bottom-up, from generalizations across individual instances, through processes of language use, acquisition, and change.

We should therefore expect language to be quasi-regular across phonology, morphology, syntax, and semantics – to be an amalgam of overlapping and partially incompatible patterns, involving generalizations from the variety of linguistic forms from which successive language learners generalize. For example, English past tense morphology has, famously, the regular –ed ending, a range of subregularities (sing → sang, ring → rang, spring → sprang, but fling →flung, wring → wrung; and even bring → brought; with some verbs having the same present and past tense forms, e.g., cost → cost, hit → hit, split → split; whereas others differ wildly, e.g., go → went; am → was; see, e.g., Bybee & Slobin Reference Bybee and Slobin1982; Pinker & Prince Reference Pinker and Prince1988; Rumelhart & McClelland Reference Rumelhart, McClelland, McClelland and Rumelhart1986). This quasi-regular structure (Seidenberg & McClelland Reference Seidenberg and McClelland1989) does indeed seem to be widespread throughout many aspects of language (e.g., Culicover Reference Culicover1999; Goldberg Reference Goldberg2006; Pierrehumbert Reference Pierrehumbert2002).

From a traditional, generative perspective on language, such quasi-regularities are puzzling: Natural language is assimilated, somewhat by force, to the structure of a formal language with a precisely defined syntax and semantics – the ubiquitous departures from such regularities are mysterious. From the present standpoint, by contrast, the quasi-regular structure of language arises in rather the same way that a partially regular pattern of tracks were laid down across a forest, through the overlaid traces of an endless number of agents finding the path of local least resistance; and where each language processing episode tends to facilitate future, similar, processing episodes, just as an animal's choice of path facilitates the use of that path for animals that follow.

6.2. What is linguistic structure?

Chunk-and-Pass processing can be viewed as having an interesting connection with traditional linguistic notions. In both production and comprehension, the language system creates a sequence of chunking operations, which link different linguistic units together across multiple levels of structure. That is, the syntactic structure of a given utterance is reflected in its processing history. This conception is reminiscent of previous proposals, in which syntax is viewed as a control structure for guiding semantic interpretation (e.g., Ford et al. Reference Ford, Bresnan, Kaplan and Bresnan1982; Kempson et al. 2001; Morrill Reference Morrill2010). For example, in describing his incremental parser-interpreter, Pulman (Reference Pulman1985) noted, “Syntactic information is used to build up the interpretation and to guide the parse, but does not result in the construction of an independent level of representation” (p. 132). Steedman (Reference Steedman2000) adopted a closely related perspective when introducing his combinatory categorial grammar, which aims to map surface structure directly onto logic-based semantic interpretations, given rich lexical representations of words that include information about phonological structure, syntactic category, and meaning: “… syntactic structure is merely the characterization of the process of constructing a logical form, rather than a representational level of structure that actually needs to be built …” (p. xi). Thus, in these accounts, the syntactic structure of a sentence is not explicitly represented by the language system but plays the role of a processing “trace” of the operations used to create or interpret the sentence (see also O'Grady Reference O'Grady2005).

To take an analogy from constructing objects, rather than sentences, the process by which components of an IKEA-style flat-pack cabinet are combined provides a “history” (combine a board, handle, and screws to construct the doors; combine frame and shelf to construct the body; combine doors, body, and legs to create the finished cabinet). The history by which the cabinet was constructed may thus reveal the intricate structure of the finished item, but this structure need not be explicitly represented during the construction process. Thus, we can “read off” the syntactic structure of a sentence from its processing history, revealing the syntactic relations between various constituents (likely with a “flat” structure; Frank et al. Reference Frank, Bod and Christiansen2012). Syntactic representations are neither computed during comprehension nor in production; instead, there is just a history of processing operations. That is, we view linguistic structure as processing history. Importantly, this means that syntax is not privileged but is only one part of the system – and it is not independent of the other parts (see also Fig. 2).

In this view, a rather minimal notion of grammar specifies how the chunks from which a sentence is built can be composed. There may be several, or indeed many, orders in which such combinations can occur, just as operations for furniture assembly may be carried out somewhat flexibly (but not completely without constraints – it might turn out that the body must be screwed together before a shelf can be attached). In the context of producing and understanding language, the process of construction is likely to be much more constrained: Each new “component” is presented in turn, and it must be used immediately or it will be lost. Moreover, viewing Chunk-and-Pass processing as an aspect of skill acquisition, we might expect that the precise nature of chunks may change with expertise: Highly overlearned material might, for example, gradually come to be treated as a single chunk (see Arnon & Christiansen, Reference Arnon and Christiansensubmitted, for a review).

Crucially, as with other skills, the cognitive system will tend to be a cognitive miser (Fiske & Taylor Reference Fiske and Taylor1984), generally following a principle of least effort (Zipf Reference Zipf1949). As processing proceeds, there is an intricate interplay of top-down and bottom-up processing to alight on the message as rapidly as possible. The language system need only construct enough chunk structure so that, when combined with prior discourse and background knowledge, the intended message can be inferred incrementally. This observation relates to some interesting contemporary linguistic proposals. For example, from a generative perspective, Culicover (Reference Culicover2013) highlighted the importance of incremental processing, arguing that the interpretation of a pronoun depends on which discourse elements are available when it is encountered. This implies that the linear order of words in a sentence (rather than hierarchical structure) plays an important role in many apparently grammatical phenomena, including weak cross-over effects in referential binding. From an emergentist perspective, O'Grady (Reference O'Grady, MacWhinney and O'Grady2015) similarly emphasized the importance of real-time processing constraints for explaining differences in the interpretation of reflexive pronouns (himself, themselves) and plain pronouns (him, them). The former are resolved locally, and thus almost instantly, whereas the antecedents for the latter are searched for within a broader domain (causing problems in acquisition because of a bias toward local information).

More generally, our view of linguistic structure as processing history offers a way to integrate the formal linguistic contributions of construction grammar (e.g., Croft Reference Croft2001; Goldberg Reference Goldberg2006) with the psychological insights from usage-based approaches to language acquisition and processing (e.g., Bybee & McClelland Reference Bybee and McClelland2005; Tomasello Reference Tomasello2003). Specifically, we propose to view constructions as computational procedures Footnote ¹⁹ – specifying how to process and produce a particular chunk – where we take a broad view of constructions as involving chunks at different levels of linguistic representation, from morphemes to multiword sequences. A procedure may integrate several different aspects of language processing or production, including chunking acoustic input into sound-based units (phonemes, syllables), mapping a chunk onto meaning (or vice versa), incorporating pragmatic or discourse information, and associating a chunk with specific arguments (see also O'Grady Reference O'Grady2005; Reference O'Grady2013). As with other skills (e.g., Heathcote et al. Reference Heathcote, Brown and Mewhort2000; Newell & Rosenbloom Reference Newell, Rosenbloom and Anderson1981), there will be practice effects, where the repeated use of a given chunk results in faster processing and reduced demands on cognitive resources, and with sufficient use, leading to a high degree of automaticity (e.g., Logan Reference Logan1988; see Bybee & McClelland Reference Bybee and McClelland2005, for a linguistic perspective).

In terms of our previous forest track analogy, the more a particular chunk is comprehended or produced, the more entrenched it becomes, resulting in easier access and faster processing; tracks become more established with use. With sufficiently frequent use, adjacent tracks may blend together, creating somewhat wider paths. For example, the frequent processing of simple transitive sentences, processed individually as multiword chunks, such as “I want milk,” “I want candy,” might first lead to a wider track involving the item-based template “I want X.” Repeated use of this template along with others (e.g., “I like X,” “I see X”) might eventually give rise to a more abstract transitive generalization along the lines of N V N (a highway in terms of our track analogy). Similar proposals for the emergence of basic word order patterns have been proposed both within emergentist (e.g., O'Grady Reference O'Grady2005; Reference O'Grady2013; Tomasello Reference Tomasello2003) and generative perspectives (e.g., Townsend & Bever Reference Townsend and Bever2001). Importantly, however, just as with generalizations in perception and motor skills, the grammatical abstractions are not explicitly represented but result from the merging of item-based procedures for chunking. Consequently, there is no representation of grammatical structure separate from processing. Learning to process is learning the grammar.

7. Conclusion

The perspective developed in this article sees language as composed of a myriad of specific processing episodes, where particular messages are conveyed and understood. Like other action sequences, linguistic acts have their structure in virtue of the cognitive mechanisms that produce and perceive them. We have argued that the structure of language is, in particular, strongly affected by a severe limitation on human memory: the Now-or-Never bottleneck. Sequential information, at many levels of analysis, must rapidly be recoded to avoid being interfered with or overwritten by the deluge of subsequent material. To cope with the Now-or-Never bottleneck, the language system chunks new material as rapidly as possible at a range of increasingly abstract levels of representation. As a consequence, Chunk-and-Pass processing induces a multilevel structure over linguistic input. The history of the process of chunk building can be viewed as analogous to a shallow surface structure in linguistics, and the repertoire of possible chunking mechanisms and the principles by which they can be combined can be viewed as defining a grammar. Indeed, we have suggested that chunking procedures may be one interpretation of the constructions that are at the core of linguistic theories of construction grammar. More broadly, the Now-or-Never bottleneck promises to provide a framework within which to reintegrate the language sciences, from the psychology of language comprehension and production, to language acquisition, language change, and evolution, to the study of language structure.

ACKNOWLEDGMENTS

We would like to thank Inbal Arnon, Amui Chong, Brandon Conley, Thomas Farmer, Adele Goldberg, Ruth Kempson, Stewart McCauley, Michael Schramm, and Julia Ying, as well as seven anonymous reviewers for comments on previous versions of this article. This work was partially supported by BSF grant number 2011107, awarded to MHC (and Inbal Arnon), and ERC grant 295917-RATIONALITY, the ESRC Network for Integrated Behavioural Science, the Leverhulme Trust, Research Councils UK Grant EP/K039830/1 to NC.

Footnotes

1. Levinson (Reference Levinson2000) used the term bottleneck in a different, though interestingly related, way to refer to an asymmetry between the speed of language production and comprehension processes. Slower production processes are in this sense a “bottleneck” to communication.

2. Moreover, the rate of information transfer per unit of time appears to be quite similar across spoken and signed languages (Bellugi & Fischer 1972). Indeed, information transfer also appears to be roughly constant across a variety of spoken languages (Pellegrino et al. Reference Pellegrino, Coupé and Marsico2011).

3. Note that the Chunk-and-Pass framework does not take a stand on whether “coded meaning” is necessarily computed before “enriched meaning” (for discussion, see Noveck & Reboul Reference Noveck and Reboul2008). Indeed, to the extent that familiar “chunks” can be “gestalts” associated with standardized enriched meanings, then the coded meaning could, in principle, be bypassed. So, for example, could you pass the salt might be directly interpreted as a request, bypassing any putative initial representation as a yes/no question. Similarly, an idiom such as kick the bucket may directly be associated with the meaning die. The same appears to be true for non-idiomatic compositional “chunks” such as to the edge (Jolsvai et al. Reference Jolsvai, McCauley, Christiansen, Knauff, Pauen, Sebanz and Wachsmuth2013). This viewpoint is compatible with a variety of perspectives in linguistics that treat multiword chunks in the same way as traditional lexical items (e.g., Croft Reference Croft2001; Goldberg Reference Goldberg2006).

4. Our framework is neutral about competing proposals concerning how precisely production and comprehension processes are entwined (e.g., Cann et al. Reference Cann, Kempson, Wedgwood, Kempson, Fernando and Asher2012; Dell & Chang Reference Dell and Chang2014; Pickering & Garrod Reference Pickering and Garrod2013a) – but see Chater et al. (Reference Chater, McCauley and Christiansen2016).

5. The phrase “just-in-time” has been used in the engineering field of speech synthesis in a similar way (Baumann & Schlangen Reference Baumann and Schlangen2012).

6. It is likely that some more detailed information is also maintained and accessible to the language system. For example, Levy et al. (Reference Levy, Bicknell, Slattery and Rayner2009) found more eye-movement regressions when people read The coach smiled at the player tossed the Frisbee compared with The coach smiled toward the player tossed the Frisbee. They argued that this is because at has contextually plausible neighbors (as, and) whereas toward does not. The regression suggests that, on encountering processing difficulty, the language system “checks back” to see whether it recognized an earlier word correctly – but does so only when other plausible alternative interpretations may be possible. This pattern requires that higher-level representations do not throw away lower-level information entirely. Some information about the visual (or perhaps phonological) form of at and toward must be maintained, to determine whether or not there are contextually plausible neighbors that might be the correct interpretation. This pattern is compatible with the present account: Indeed, the example of SF in section 2 indicates how high-level organization may be critical to retaining lower-level information (e.g., interpreting random digits as running times makes them more memorable).

7. It is conceivable that the presence of ambiguities may be a necessary component of an efficient communication system, in which easy-to-produce chunks are reused – thus becoming ambiguous – in a trade-off between ease of production and difficulty of comprehension (Piantadosi et al. Reference Piantadosi, Tily and Gibson2012).

8. Although our account is consistent with the standard linguistic levels, from phonology through syntax to pragmatics, we envisage that a complete model may include finer-grained levels, distinguishing, for example, multiple levels of discourse representation. One interesting proposal along these lines, developed from the work of Austin (Reference Austin1962) and Clark (Reference Clark1996), is outlined in Enfield (Reference Enfield2013).

9. Note that, in particular, the present viewpoint does not rule out the possibility that some detailed information is retained in processing (e.g., Goldinger Reference Goldinger1998; Gurevich et al. Reference Gurevich, Johnson and Goldberg2010; Pierrehumbert Reference Pierrehumbert2002). But such detailed information can be retained only because the original input has been chunked successfully, rather than being stored in raw form.

10. It is often difficult empirically to distinguish bottom-up and top-down effects. Bottom-up statistics across large corpora of low-level representations can mimic the operation of high-level representations in many cases; indeed, the power of such statistics is central to the success of much statistical natural language processing, including speech recognition and machine translation (e.g., Manning & Schütze Reference Manning and Schütze1999). However, rapid sensitivity to background knowledge and nonlinguistic context suggests that there is also an important top-down flow of information in human language processing (e.g., Altmann Reference Altmann2004; Altmann & Kamide Reference Altmann and Kamide1999; Marslen-Wilson et al. Reference Marslen-Wilson, Tyler and Koster1993) as well as in cognition, more generally (e.g., Bar Reference Bar2004).

11. Strictly speaking, “good-enough first time” may be a more appropriate description. As may be true across cognition (e.g., Simon Reference Simon1956), the language system may be a satisficer rather than a maximizer (Ferreira et al. Reference Ferreira, Bailey and Ferraro2002).

12. Some classes of learning algorithm can be converted from “batch-learning” to “incremental” or “online” form, including connectionist learning (Saad Reference Saad1998) and the widely used expectation-maximization (EM) algorithm (Neal & Hinton Reference Neal, Hinton and Jordan1998), typically with diminished learning performance. How far it is possible to create viable online versions of existing language acquisition algorithms is an important question for future research.

13. Nonetheless, as would be expected from a statistical model of learning, some early productivity is observed at the word level, where words are fairly independent and may not form reliable chunks (e.g., children's determiner-noun combinations, Valian et al. Reference Valian, Solt and Stewart2009; Yang Reference Yang2013; though see McCauley & Christiansen Reference McCauley, Christiansen, Bello, Guarini, McShane and Scassellati2014b; Pine et al. Reference Pine, Freudenthal, Krajewski and Gobet2013, for evidence that such productivity is not driven by syntactic categories).

14. Interestingly, though, the notion of “triggers” in the principles and parameters model (Chomsky Reference Chomsky1981) potentially fits with the online learning framework outlined here (Berwick Reference Berwick1985; Fodor Reference Fodor1998; Lightfoot Reference Lightfoot1989): Parameters are presumed to be set when crucial “triggering” information is observed in the child's input (for discussion, see Gibson & Wexler Reference Gibson and Wexler1994; Niyogi & Berwick Reference Niyogi and Berwick1996; Yang Reference Yang2002). However, this model is very difficult to reconcile with incremental processing and, moreover, it does not provide a good fit with empirical linguistic data (Boeckx & Leivada Reference Boeckx and Leivada2013).

15. Note that the stability–plasticity dilemma arises in connectionist modelling: models that globally modify their weights, in response to new items, often learn only very slowly, to avoid “catastrophic interference” with prior items (e.g., French Reference French1999; McCloskey & Cohen Reference McCloskey and Cohen1989; Ratcliff Reference Ratcliff1990). Notably, though, catastrophic interference may occur only if the old input rarely reappears later in learning.

16. Although Givón (Reference Givón1979) discussed how syntactic constructions might derive from previous pragmatic discourse structure, he did not coin the phrase “today's syntax is yesterday's discourse.” Instead, it has been ascribed to him through paraphrasings of his maxim that “today's morphology is yesterday's syntax” from Givón (Reference Givón1971), an idea he attributed to the Chinese philosopher Lao Tse.

17. Apparent counterexamples to the general unidirectionality of grammaticalization – such as the compound verb to up the ante (e.g., Campbell Reference Campbell2000) – are entirely compatible with the present approach: They correspond to the creation of new idiomatic chunks, from other pre-existing chunks, and thus do not violate our principle that chunks generally decay.

18. It remains, of course, of great interest to understand the biological evolutionary history that led to the cognitive pre-requisites for the cultural evolution of language. Candidate mechanisms include joint attention, large long-term memory, sequence processing ability, appropriate articulatory machinery, auditory processing systems, and so on. But this is the study not of language evolution but of the evolution of the biological precursors of language (Christiansen & Chater Reference Christiansen and Chater2008; for an opposing perspective, see Pinker & Bloom Reference Pinker and Bloom1990).

19. The term “computational procedure” is also used by Sagarra and Herschensohn (Reference Sagarra and Herschensohn2010), but they viewed such procedures as developing “in tandem with the growing grammatical competence” (p. 2022). Likewise, Townsend and Bever (Reference Townsend and Bever2001) discussed “frequency-based perceptual templates that assign the initial meaning” (p. 6). However, they argued that this only results in “pseudosyntactic” structure, which is later checked against a complete derivational structure. In contrast, we argue that these computational procedures are all there is to grammar, a proposal that dovetails with O'Grady's (Reference O'Grady2005; Reference O'Grady2013) notion of “computational routines,” but with a focus on chunking in our case.

References

Allopenna, P. D., Magnuson, J. S. & Tanenhaus, M. K. (1998) Tracking the time course of spoken word recognition using eye movements: Evidence for continuous mapping models. Journal of Memory and Language 38:419–39.Google Scholar

Altmann, G. T. M. (2002) Learning and development in neural networks: The importance of prior experience. Cognition 85:43–50.CrossRef Google Scholar PubMed

Altmann, G. T. M. (2004) Language-mediated eye movements in the absence of a visual world: The “blank screen paradigm.” Cognition 93:79–87.Google Scholar

Altmann, G. T. M. & Kamide, Y. (1999) Incremental interpretation at verbs: Restricting the domain of subsequent reference. Cognition 73:247–64.Google Scholar

Altmann, G. T. M. & Kamide, Y. (2009) Discourse-mediation of the mapping between language and the visual world: Eye movements and mental representation. Cognition 111:55–71.Google Scholar

Altmann, G. T. M. & Mirkovic, J. (2009) Incrementality and prediction in human sentence processing. Cognitive Science 33:583–609.Google Scholar

Altmann, G. T. M. & Steedman, M. J. (1988) Interaction with context during human sentence processing. Cognition 30:191–38.CrossRef Google Scholar PubMed

Anderson, J. R. (1990) The adaptive character of thought. Erlbaum.Google Scholar

Arbib, M. A. (2005) From monkey-like action recognition to human language: An evolutionary framework for neurolinguistics. Behavioral and Brain Sciences 28:105–24.Google Scholar

Arnon, I. & Christiansen, M. H. (submitted) The role of multiword building blocks in explaining L1-L2 differencesGoogle Scholar

Arnon, I. & Clark, E. V. (2011) Why brush your teeth is better than teeth – Children's word production is facilitated in familiar sentence-frames. Language Learning and Development 7:107–29.Google Scholar

Arnon, I. & Cohen Priva, U. (2013) More than words: The effect of multi-word frequency and constituency on phonetic duration. Language and Speech 56:349–73.CrossRef Google Scholar PubMed

Arnon, I. & Snider, N. (2010) More than words: Frequency effects for multi-word phrases. Journal of Memory and Language 62:67–82.Google Scholar

Austin, J. L. (1962) How to do things with words. Harvard University Press.Google Scholar

Baker, M. (2001) The atoms of language. Basic Books.Google Scholar

Bannard, C. & Matthews, D. (2008) Stored word sequences in language learning. Psychological Science 19:241–48.Google Scholar

Bar, M. (2004) Visual objects in context. Nature Reviews Neuroscience 5:617–29.Google Scholar

Bar, M. (2007) The proactive brain: Using analogies and associations to generate predictions. Trends in Cognitive Sciences 11:280–89.CrossRef Google Scholar PubMed

Baumann, T. & Schlangen, D. (2012) INPRO_iSS: A component for just-in-time incremental speech synthesis. In: Proceedings of the ACL 2012 System Demonstrations, pp. 103–108. Association for Computational Linguistics.Google Scholar

Beckner, C., Blythe, R., Bybee, J., Christiansen, M. H., Croft, W., Ellis, N., Holland, J., Ke, J., Larsen- Freeman, D. & Schoenemann, T. (2009) Language is a complex adaptive system: Position paper. Language Learning 59(Suppl. 1):1–27.Google Scholar

Bellugi, U. & Fisher, S. (1972) A comparison of sign language and spoken language. Cognition 1:173–200.Google Scholar

Berwick, R. C. (1985) The acquisition of syntactic knowledge. MIT Press.Google Scholar

Berwick, R. C., Friederici, A. D., Chomsky, N. & Bolhuis, J. J. (2013) Evolution, brain, and the nature of language. Trends in Cognitive Sciences 17:91–100.Google Scholar

Berwick, R. C. & Weinberg, A. S. (1984) The grammatical basis of linguistic performance. MIT Press.Google Scholar

Bever, T. (1970) The cognitive basis for linguistic structures. In: Cognition and the development of language, ed. Hayes, J. R., pp. 279–362. Wiley.Google Scholar

Bickerton, D. (1984) The language bioprogram hypothesis. Behavioral and Brain Sciences 7:173–221.Google Scholar

Bock, J. K. (1982) Toward a cognitive psychology of syntax: Information processing contributions to sentence formulation. Psychological Review 89:1–47.Google Scholar

Bock, J. K. (1986) Meaning, sound, and syntax: Lexical priming in sentence production. Journal of Experimental Psychology: Learning, Memory, and Cognition 12:575–86.Google Scholar

Bock, J. K. & Loebell, H. (1990) Framing sentences. Cognition 35:1–39.Google Scholar

Bock, J. K. & Miller, C. A. (1991) Broken agreement. Cognitive Psychology 23:45–93.CrossRef Google Scholar PubMed

Bod, R. (2009) From exemplar to grammar: A probabilistic analogy–based model of language learning. Cognitive Science 33:752–93.Google Scholar

Boeckx, C. & Leivada, E. (2013) Entangled parametric hierarchies: Problems for an overspecified universal grammar. PLOS ONE 8(9):e72357.Google Scholar

Borovsky, A., Elman, J. L. & Fernald, A. (2012) Knowing a lot for one's age: Vocabulary skill and not age is associated with anticipatory incremental sentence interpretation in children and adults. Journal of Experimental Child Psychology 112:417–36.Google Scholar

Boyd, R. & Richerson, P. J. (1987) The evolution of ethnic markers. Cultural Anthropology 2:65–79.Google Scholar

Branigan, H., Pickering, M. & Cleland, A. (2000) Syntactic co-ordination in dialogue. Cognition 75:13–25.CrossRef Google Scholar PubMed

Bregman, A. S. (1990) Auditory scene analysis: The perceptual organization of sound. MIT Press.Google Scholar

Broadbent, D. (1958) Perception and communication. Pergamon Press.Google Scholar

Brown, G. D. A., Neath, I. & Chater, N. (2007) A temporal ratio model of memory. Psychological Review 114:539–76.CrossRef Google Scholar PubMed

Brown-Schmidt, S. & Konopka, A. E. (2011) Experimental approaches to referential domains and the on-line processing of referring expressions in unscripted conversation. Information 2:302–26.CrossRef Google Scholar

Brown-Schmidt, S. & Konopka, A. (2015) Processes of incremental message planning during conversation. Psychonomic Bulletin and Review 22:833–43.Google Scholar

Brown-Schmidt, S. & Tanenhaus, M. K. (2008) Real-time investigation of referential domains in unscripted conversation: A targeted language game approach. Cognitive Science 32:643–84.Google Scholar

Bybee, J. (2002) Word frequency and context of use in the lexical diffusion of phonetically conditioned sound change. Language Variation and Change 14:261–90.Google Scholar

Bybee, J. (2006) From usage to grammar: The mind's response to repetition. Language 82:711–33.Google Scholar

Bybee, J. (2007) Frequency of use and the organization of language. Oxford University Press.Google Scholar

Bybee, J. (2009) Language universals and usage-based theory. In: Language universals, ed. Christiansen, M. H., Collins, C. & Edelman, S., pp. 17–39. Oxford University Press.CrossRef Google Scholar

Bybee, J. & Hopper, P., eds. (2001) Frequency and the emergence of linguistic structure. John Benjamins.CrossRef Google Scholar

Bybee, J. & McClelland, J. L. (2005) Alternatives to the combinatorial paradigm of linguistic theory based on general principles of human cognition. The Linguistic Review 22:381–410.Google Scholar

Bybee, J., Perkins, R. D. & Pagliuca, W. (1994) The evolution of grammar: Tense, aspect and modality in the languages of the world. University of Chicago Press.Google Scholar

Bybee, J. & Scheibman, J. (1999) The effect of usage on degrees of constituency: The reduction of don't in English. Linguistics 37:575–96.CrossRef Google Scholar

Bybee, J. L. & Slobin, D. I. (1982) Rules and schemas in the development and use of the English past tense. Language 58:265–89.Google Scholar

Campbell, L. (2000) What's wrong with grammaticalization? Language Sciences 23:113–61.Google Scholar

Cann, R. & Kempson, R. (2008) Production pressures, syntactic change and the emergence of clitic pronouns. In: Language in flux: Dialogue coordination, language variation, change and evolution, ed. Cooper, R. & Kempson, R., pp. 221–63. College Publications.Google Scholar

Cann, R., Kempson, R. & Wedgwood, D. (2012) Representationalism and linguistic knowledge. In: Philosophy of linguistics, ed. Kempson, R., Fernando, T. & Asher, N., pp. 357–402. Elsevier.Google Scholar

Carr, M. F., Jadhav, S. P. & Frank, L. M. (2011) Hippocampal replay in the awake state: A potential substrate for memory consolidation and retrieval. Nature Neuroscience 14:147–53.CrossRef Google Scholar

Carstairs-McCarthy, A. (1992) Current morphology. Routledge.Google Scholar

Chang, F., Dell, G. S. & Bock, K. (2006) Becoming syntactic. Psychological Review 113:234–72.Google Scholar

Chater, N., Crocker, M. J. & Pickering, M. J. (1998) The rational analysis of inquiry: The case of parsing. In: Rational models of cognition, ed. Oaksford, M. & Chater, N., pp. 441–68. Oxford University Press.Google Scholar

Chater, N., McCauley, S. M. & Christiansen, M. H. (2106) Language as skill: Intertwining comprehension and production. Journal of Memory and Language 89:244–54.Google Scholar

Chater, N., Tenenbaum, J. B. & Yuille, A. (2006) Probabilistic models of cognition: Conceptual foundations. Trends in Cognitive Sciences 10:287–91.Google Scholar

Cherry, E. C. (1953) Some experiments on the recognition of speech with one and with two ears. Journal of the Acoustical Society of America 25:975–79.Google Scholar

Chomsky, N. (1957) Syntactic structures. Mouton.Google Scholar

Chomsky, N. (1965) Aspects of the theory of syntax. MIT Press.Google Scholar

Chomsky, N. (1981) Lectures on government and binding. Mouton de Gruyter.Google Scholar

Christiansen, M. H. & Chater, N. (1999) Toward a connectionist model of recursion in human linguistic performance. Cognitive Science 23:157–205.CrossRef Google Scholar

Christiansen, M. H. & Chater, N. (2008) Language as shaped by the brain. Behavioral & Brain Sciences 31(05):489–58.Google Scholar

Christiansen, M. H. & MacDonald, M. C. (2009) A usage-based approach to recursion in sentence processing. Language Learning 59(Suppl. 1):126–61.Google Scholar

Clark, A. (2013) Whatever next? Predictive brains, situated agents, and the future of cognitive science. Behavioral and Brain Sciences 36(3):181–253.Google Scholar

Clark, H. H. (1996) Using language. Cambridge University Press.Google Scholar

Clark, J., Yallop, C. & Fletcher, J. (2007) An introduction to phonetics and phonology, third edition. Wiley-Blackwell.Google Scholar

Clément, S., Demany, L. & Semal, C. (1999) Memory for pitch versus memory for loudness. Journal of the Acoustical Society of America 106:2805–11.Google Scholar

Coltheart, M. (1980) Iconic memory and visible persistence. Perception & Psychophysics 27:183–228.CrossRef Google Scholar PubMed

Cooper, R. P. & Shallice, T. (2006) Hierarchical schemas and goals in the control of sequential behavior. Psychological Review 113:887–916.Google Scholar

Cowan, N. (2000) The magical number 4 in short-term memory: A reconsideration of mental storage capacity. Behavioral and Brain Sciences 24:87–185.Google Scholar

Crick, F. & Mitchison, G. (1983) The function of dream sleep. Nature 304:111–14.Google Scholar

Croft, W. (2001) Radical construction grammar. Oxford University Press.Google Scholar

Crowder, R. G. & Neath, I. (1991) The microscope metaphor in human memory. In: Relating theory and data: Essays on human memory in honour of Bennet B. Murdock, Jr., ed. Hockley, W. E. & Lewandowsky, S.. Erlbaum.Google Scholar

Culicover, P. W. (1999) Syntactic nuts: Hard cases, syntactic theory, and language acquisition. Oxford University Press.Google Scholar

Culicover, P. W. (2013) The role of linear order in the computation of referential dependencies. Lingua 136:125–44.Google Scholar

Cutler, A., ed. (1982) Slips of the tongue and language production. De Gruyter Mouton.Google Scholar

Dahan, D. (2010) The time course of interpretation in speech comprehension. Current Directions in Psychological Science 19:121–26.Google Scholar

de Vries, M. H., Christiansen, M. H. & Petersson, K. M. (2011) Learning recursion: Multiple nested and crossed dependencies. Biolinguistics 5:10–35.Google Scholar

Dediu, D., Cysouw, M., Levinson, S. C., Baronchelli, A., Christiansen, M. H., Croft, W., Evans, N., Garrod, S., Gray, R., Kandler, A. & Lieven, E. (2013) Cultural evolution of language. In: Cultural evolution: Society, technology, language and religion, ed. Richerson, P. J. & Christiansen, M. H., pp. 303–32. MIT Press.Google Scholar

Dehaene-Lambertz, G., Dehaene, S., Anton, J. L., Campagne, A., Ciuciu, P., Dehaene, G. P., Denghien, I., Jobert, A., Lebihan, D., Sigman, M., Pallier, C. & Poline, J.-P. (2006a) Functional segregation of cortical language areas by sentence repetition. Human Brain Mapping 27:360–71.Google Scholar

Dehaene-Lambertz, G., Hertz-Pannier, L., Dubois, J., Meriaux, S., Roche, A., Sigman, M. & Dehaene, S. (2006b) Functional organization of perisylvian activation during presentation of sentences in preverbal infants. Proceedings of the National Academy of Sciences 103:14240–45.Google Scholar

Dell, G. S., Burger, L. K. & Svec, W. R. (1997) Language production and serial order: A functional analysis and a model. Psychological Review 104:123–47.CrossRef Google Scholar

Dell, G. S. & Chang, F. (2014) The P-chain: Relating sentence production and its disorders to comprehension and acquisition. Philosophical Transactions of the Royal Society B: Biological Sciences 369(1634):20120394.Google Scholar

DeLong, K. A., Urbach, T. P. & Kutas, M. (2005) Probabilistic word pre-activation during language comprehension inferred from electrical brain activity. Nature Neuroscience 8:1117–21.Google Scholar

Dikker, S., Rabagliati, H., Farmer, T. A. & Pylkkänen, L. (2010) Early occipital sensitivity to syntactic category is based on form typicality. Psychological Science 21:629–34.Google Scholar

Dunn, M., Greenhill, S. J., Levinson, S. C. & Gray, R. D. (2011) Evolved structure of language shows lineage-specific trends in word-order universals. Nature 473:79–82.Google Scholar

Durrant, P. (2013) Formulaicity in an agglutinating language: The case of Turkish. Corpus Linguistics and Linguistic Theory 9:1–38.Google Scholar

Elliott, L. L. (1962) Backward and forward masking of probe tones of different frequencies. Journal of the Acoustical Society of America 34:1116–17.Google Scholar

Ellis, N. C. (2002) Frequency effects in language processing. Studies in Second Language Acquisition 24:143–88.Google Scholar

Elman, J. L. (1990) Finding structure in time. Cognitive Science 14(2):179–211.CrossRef Google Scholar

Enfield, N. J. (2013) Relationship thinking: Enchrony, agency, and human sociality. Oxford University Press.Google Scholar

Erickson, T. A. & Matteson, M. E. (1981) From words to meaning: A semantic illusion. Journal of Verbal Learning and Verbal Behavior 20:540–52.Google Scholar

Ericsson, K. A., Chase, W. G. & Faloon, S. (1980) Acquisition of a memory skill. Science 208:1181–82.CrossRef Google Scholar

Evans, N. & Levinson, S. (2009) The myth of language universals: Language diversity and its importance for cognitive science. Behavioral and Brain Sciences 32:429–92.Google Scholar

Farmer, T. A., Christiansen, M. H. & Monaghan, P. (2006) Phonological typicality influences on-line sentence comprehension. Proceedings of the National Academy of Sciences 103:12203–208.Google Scholar

Farmer, T. A., Monaghan, P., Misyak, J. B. & Christiansen, M. H. (2011) Phonological typicality influences sentence processing in predictive contexts: A reply to Staub et al. (2009) Journal of Experimental Psychology: Learning, Memory, and Cognition 37:1318–25.Google Scholar

Federmeier, K. D. (2007) Thinking ahead: The role and roots of prediction in language comprehension. Psychophysiology 44:491–505.Google Scholar

Ferreira, F., Bailey, K. G. & Ferraro, V. (2002) Good-enough representations in language comprehension. Current Directions in Psychological Science 11(1):11–15.Google Scholar

Ferreira, V. (2008) Ambiguity, accessibility, and a division of labor for communicative success. Psychology of Learning and Motivation 49:209–46.Google Scholar

Ferrer i Cancho, R. (2004) The Euclidean distance between syntactically linked words. Physical Review E 70:056135.Google Scholar

Ferrer i Cancho, R. & Liu, H. (2014) The risks of mixing dependency lengths from sequences of different length. Glottotheory 5:143–55.Google Scholar

Fine, A. B., Jaeger, T. F., Farmer, T. A. & Qian, T. (2013) Rapid expectation adaptation during syntactic comprehension. PLOS ONE 8(10):e77661.Google Scholar

Fiske, S. T. & Taylor, S. E. (1984) Social cognition. Addison-Wesley.Google Scholar

Fitneva, S. A., Christiansen, M. H. & Monaghan, P. (2009) From sound to syntax: Phonological constraints on children's lexical categorization of new words. Journal of Child Language 36:967–97.Google Scholar

Flanagan, J. R. & Wing, A. M. (1997) The role of internal models in motor planning and control: Evidence from grip force adjustments during movements of hand-held loads. Journal of Neuroscience 17:1519–28.Google Scholar

Fleischman, S. (1982) The future in thought and language: Diachronic evidence from Romance. Cambridge University Press.Google Scholar

Fodor, J. D. (1998) Unambiguous triggers. Linguistic Inquiry 29:1–36.Google Scholar

Ford, M., Bresnan, J. W. & Kaplan, R. M. (1982) A competence-based theory of syntactic closure. In: The mental representation of grammatical relations, ed. Bresnan, J. W., pp. 727–96. MIT Press.Google Scholar

Frank, S. L., Bod, R. & Christiansen, M. H. (2012) How hierarchical is language use? Proceedings of the Royal Society B: Biological Sciences 279:4522–31.Google Scholar

Frazier, L. & Fodor, J. D. (1978) The sausage machine: A new two-stage parsing model. Cognition 6:291–25.Google Scholar

French, R. M. (1999) Catastrophic forgetting in connectionist networks. Trends in Cognitive Sciences 3:128–35.Google Scholar

Gahl, S. & Garnsey, S. M. (2004) Knowledge of grammar, knowledge of usage: Syntactic probabilities affect pronunciation variation. Language 80:748–75.Google Scholar

Gallace, A., Tan, H. Z. & Spence, C. (2006) The failure to detect tactile change: A tactile analogue of visual change blindness. Psychonomic Bulletin & Review 13:300–303.Google Scholar

Gibson, E. (1998) Linguistic complexity: Locality of syntactic dependencies. Cognition 68:1–76.Google Scholar

Gibson, E., Bergen, L. & Piantadosi, S. T. (2013) Rational integration of noisy evidence and prior semantic expectations in sentence interpretation. Proceedings of the National Academy of Sciences 110:8051–56.Google Scholar

Gibson, E. & Wexler, K. (1994) Triggers. Linguistic Inquiry 25:407–54.Google Scholar

Gildea, D. & Temperley, D. (2010) Do grammars minimize dependency length? Cognitive Science 34:286–310.Google Scholar

Givón, T. (1971) Historical syntax and synchronic morphology: An archaeologist's field trip. Chicago Linguistic Society 7:394–415.Google Scholar

Givón, T. (1979) On understanding grammar. Academic Press.Google Scholar

Gobet, F., Lane, P. C. R., Croker, S., Cheng, P. C. H., Jones, G., Oliver, I. & Pine, J. M. (2001) Chunking mechanisms in human learning. Trends in Cognitive Sciences 5:236–43.Google Scholar

Gold, E. M. (1967) Language identification in the limit. Information and Control 10:447–74.Google Scholar

Goldberg, A. (2006) Constructions at work. Oxford University Press.Google Scholar

Goldinger, S. D. (1998) Echoes of echoes? An episodic theory of lexical access. Psychological Review 105:251–79.Google Scholar

Golinkoff, R. M., Hirsh-Pasek, K., Bloom, L., Smith, L., Woodward, A., Akhtar, N., Tomasello, M. & Hollich, G., eds. (2000) Becoming a word learner: A debate on lexical acquisition. Oxford University Press.Google Scholar

Gorrell, P. (1995) Syntax and parsing. Cambridge University Press.Google Scholar

Graybiel, A. M. (1998) The basal ganglia and chunking of action repertoires. Neurobiology of Learning and Memory 70:119–36.Google Scholar

Grice, H. P. (1967) Logic and conversation. James, William Lectures. Manuscript, Harvard University.Google Scholar

Griffiths, T. L. & Tenenbaum, J. B. (2009) Theory-based causal induction. Psychological Review 116:661–716.Google Scholar

Gurevich, O., Johnson, M. A. & Goldberg, A. E. (2010) Incidental verbatim memory for language. Language and Cognition 2:45–78.Google Scholar

Haber, R. N. (1983) The impending demise of the icon: The role of iconic processes in information processing theories of perception. Behavioral and Brain Sciences 6:1–11.Google Scholar

Hagoort, P. (2009) Reflections on the neurobiology of syntax. In: Biological foundations and origin of syntax. Strüngmann Forum Reports, volume 3, ed. Bickerton, D. & Szathmáry, E., pp. 279–96. MIT Press.Google Scholar

Hahn, U. & Nakisa, R. C. (2000) German inflection: Single route or dual route? Cognitive Psychology 41:313–60.Google Scholar

Hale, J. (2001) A probabilistic Earley parser as a psycholinguistic model. Proceedings of the Second Meeting of the North American Chapter of the Association for Computational Linguistics, Pittsburgh, PA, June 2–7, 2001, pp. 159–66. Association for Computational Linguistics.CrossRef Google Scholar

Hale, J. (2006) Uncertainty about the rest of the sentence. Cognitive Science 30:609–42.Google Scholar

Haspelmath, M. (1999) Why is grammaticalization irreversible? Linguistics 37:1043–68.Google Scholar

Hasson, U., Yang, E., Vallines, I., Heeger, D. J. & Rubin, N. (2008) A hierarchy of temporal receptive windows in human cortex. The Journal of Neuroscience 28(10):2539–50.Google Scholar

Hauser, M. D., Chomsky, N. & Fitch, W. T. (2002) The faculty of language: What is it, who has it, and how did it evolve? Science 298:1569–79.Google Scholar

Hawkins, J. A. (2004) Efficiency and complexity in grammars. Oxford University Press.Google Scholar

Hawkins, J. A. (2009) Language universals and the performance-grammar correspondence hypothesis. In: Language universals, ed. Christiansen, M. H., Collins, C. & Edelman, S., pp. 54–78. Oxford University Press.Google Scholar

Hay, J. (2000) Morphological adjacency constraints: A synthesis. Proceedings of Student Conference in Linguistics 9. MIT Working Papers in Linguistics 36:17–29.Google Scholar

Heathcote, A., Brown, S. & Mewhort, D. J. K. (2000) The power law repealed: The case for an exponential law of practice. Psychonomic Bulletin and Review 7:185–207.Google Scholar

Heider, P., Dery, J. & Roland, D. (2014) The processing of it object relative clauses: Evidence against a fine-grained frequency account. Journal of Memory and Language 75:58–76.Google Scholar

Heine, B. & Kuteva, T. (2002) World lexicon of grammaticalization. Cambridge University Press.Google Scholar

Heine, B. & Kuteva, T. (2007) The genesis of grammar. Oxford University Press.CrossRef Google Scholar

Hinojosa, J. A., Moreno, E. M., Casado, P., Munõz, F. & Pozo, M. A. (2005) Syntactic expectancy: An event-related potentials study. Neuroscience Letters 378:34–39.Google Scholar

Hintzman, D. L. (1988) Judgments of frequency and recognition memory in a multiple-trace memory model. Psychological Review 95:528–51.Google Scholar

Hinton, G. E. & Sejnowski, T. J. (1986) Learning and relearning in Boltzmann machines . In: Graphical models: Foundations of neural computation, ed. Irwin, M. J. & Sejnowski, T. J., pp. 45–76. MIT Press.Google Scholar

Hoey, M. (2005) Lexical priming: A new theory of words and language. Psychology Press.Google Scholar

Hofmeister, P. & Sag, I. A. (2010) Cognitive constraints and island effects. Language 86:366–415.Google Scholar

Honey, C. J., Thesen, T., Donner, T. H., Silbert, L. J., Carlson, C. E., Devinsky, O., Doyle, W. K., Rubin, N., Heeger, D. J. & Hasson, U. (2012) Slow cortical dynamics and the accumulation of information over long timescales. Neuron 76(2):423–34.Google Scholar

Hopper, P. J. & Traugott, E. C. (1993) Grammaticalization. Cambridge University Press.Google Scholar

Hruschka, D., Christiansen, M. H., Blythe, R. A., Croft, W., Heggarty, P., Mufwene, S. S., Pierrehumbert, J. H. & Poplack, S. (2009) Building social cognitive models of language change. Trends in Cognitive Sciences 13:464–69.Google Scholar

Hsu, A., Chater, N. & Vitányi, P. (2011) The probabilistic analysis of language acquisition: Theoretical, computational, and experimental analysis. Cognition 120:380–90.Google Scholar

Hudson Kam, C. L. & Newport, E. L. (2005) Regularizing unpredictable variation: The roles of adult and child learners in language formation and change. Language Learning and Development 1:151–95.Google Scholar

Hurford, J. (1999) The evolution of language and languages. In: The evolution of culture, ed. Dunbar, R., Knight, C. & Power, C., pp. 173–93. Edinburgh University Press.Google Scholar

Jackendoff, R. (2007) A parallel architecture perspective on language processing. Brain Research 1146:2–22.Google Scholar

Jacoby, L. L., Baker, J. G. & Brooks, L. R. (1989) The priority of the specific: Episodic effects in picture identification. Journal of Experimental Psychology: Learning, Memory, and Cognition 15:275–81.Google Scholar

Jaeger, T. (2010) Redundancy and reduction: Speakers manage syntactic information density. Cognitive Psychology 61:23–62.Google Scholar

Jaeger, T. F. & Tily, H. (2011) On language “utility”: Processing complexity and communicative efficiency. Wiley Interdisciplinary Reviews: Cognitive Science 2:323–35.Google Scholar

Jensen, M. S., Yao, R., Street, W. N. & Simons, D. J. (2011) Change blindness and inattentional blindness. Wiley Interdisciplinary Reviews: Cognitive Science 2:529–46.Google Scholar

Johnson-Laird, P. N. (1983) Mental models: Towards a cognitive science of language, inference, and consciousness. Harvard University Press.Google Scholar

Jolsvai, H., McCauley, S. M. & Christiansen, M. H. (2013) Meaning overrides frequency in idiomatic and compositional multiword chunks. In: Proceedings of the 35th Annual Conference of the Cognitive Science Society, ,Berlin, Germany, July 31–August 3, 2013,, ed. Knauff, M., Pauen, M., Sebanz, N. & Wachsmuth, I., pp. 692–97. Cognitive Science Society.Google Scholar

Jones, G. (2012) Why chunking should be considered as an explanation for developmental change before short-term memory capacity and processing speed. Frontiers in Psychology 3:167. doi: 10.3389/fpsyg.2012.00167.Google Scholar

Jurafsky, D. (1996) A probabilistic model of lexical and syntactic access and disambiguation. Cognitive Science 20:137–94.Google Scholar

Jurafsky, D., Bell, A., Gregory, M. L. & Raymond, W. D. (2001) Probabilistic relations between words: Evidence from reduction in lexical production. In: Frequency and the emergence of linguistic structure, ed. Bybee, J. L. & Hopper, P., pp. 229–54. John Benjamins.Google Scholar

Jurafsky, D., Martin, J. H., Kehler, A., Vander Linden, K. & Ward, N. (2000) Speech and language processing: An introduction to natural language processing, computational linguistics, and speech recognition. Prentice Hall.Google Scholar

Kamide, Y. (2008) Anticipatory processes in sentence processing. Language and Linguistic Compass 2:647–70.Google Scholar

Kamin, L. J. (1969) Predictability, surprise, attention and conditioning. In: Punishment and aversive behavior, ed. Campbell, B. A., Church, R. M., pp. 279–96. Appleton-Century-Crofts.Google Scholar

Karlsson, F. (2007) Constraints on multiple center-embedding of clauses. Journal of Linguistics 43:365–92.Google Scholar

Kempson, R., Meyer-Viol, W. & Gabbay, D. (2001) Dynamic syntax: The flow of language understanding. Blackwell.Google Scholar

Kimball, J. (1973) Seven principles of surface structure parsing in natural language. Cognition 2:15–47.Google Scholar

Kirby, S., Cornish, H. & Smith, K. (2008) Cumulative cultural evolution in the laboratory: An experimental approach to the origins of structure in human language. Proceedings of the National Academy of Sciences 105:10681–85.Google Scholar

Kluender, R. & Kutas, M. (1993) Subjacency as a processing phenomenon. Language & Cognitive Processes 8:573–633.Google Scholar

Konopka, A. E. (2012) Planning ahead: How recent experience with structures and words changes the scope of linguistic planning. Journal of Memory and Language 66:143–62.Google Scholar

Kraljic, T. & Brennan, S. (2005) Prosodic disambiguation of syntactic structure: For the speaker or for the addressee? Cognitive Psychology 50:194–231.Google Scholar

Kutas, M., Federmeier, K. D. & Urbach, T. P. (2014) The “negatives” and “positives” of prediction in language. In: The Cognitive Neurosciences V, ed. Gazzaniga, M. S. & Mangun, G. R., pp. 649–56. MIT Press.Google Scholar

Lashley, K. S. (1951) The problem of serial order in behavior. In: Cerebral mechanisms in behavior: The Hixon Symposium, ed. Jeffress, L. A., pp. 112–46. Wiley.Google Scholar

Lerdahl, F. & Jackendoff, R. (1983) A generative theory of tonal music. MIT Press.Google Scholar

Lerner, Y., Honey, C. J., Silbert, L. J. & Hasson, U. (2011) Topographic mapping of a hierarchy of temporal receptive windows using a narrated story. Journal of Neuroscience 31:2906–15.Google Scholar

Levelt, W. J. M. (2001) Spoken word production: A theory of lexical access. Proceedings of the National Academy of Sciences 98:13464–71.Google Scholar

Levinson, S. C. (2000) Presumptive meanings: The theory of generalized conversational implicature. MIT Press.Google Scholar

Levinson, S. C. (2013) Recursion in pragmatics. Language 89:149–62.Google Scholar

Levy, R. (2008) Expectation-based syntactic comprehension. Cognition 106:1126–77.Google Scholar

Levy, R., Bicknell, K., Slattery, T. & Rayner, K. (2009) Eye movement evidence that readers maintain and act on uncertainty about past linguistic input. Proceedings of the National Academy of Sciences 106:21086–90.Google Scholar

Lightfoot, D. (1989) The child's trigger experience: Degree-0 learnability. Behavioral and Brain Sciences 12:321–34.Google Scholar

Logan, G. D. (1988) Toward an instance theory of automatization. Psychological Review 95:492–527.Google Scholar

Louwerse, M. M., Dale, R., Bard, E. G. & Jeuniaux, P. (2012) Behavior matching in multimodal communication is synchronized. Cognitive Science 36:1404–26.Google Scholar

Lupyan, G. & Christiansen, M. H. (2002) Case, word order, and language learnability: Insights from connectionist modeling. In: Proceedings of the 24th Annual Conference of the Cognitive Science Society, Fairfax, VA, August 2002, pp. 596–601, ed. Gray, W. D. & Schunn, C.. Erlbaum.Google Scholar

MacDonald, M. C. (1994) Probabilistic constraints and syntactic ambiguity resolution. Language and Cognitive Processes 9:157–201.Google Scholar

MacDonald, M. C. (2013) How language production shapes language form and comprehension. Frontiers in Psychology 4:226. doi: 10.3389/fpsyg.2013.00226.Google Scholar

MacDonald, M. C. & Christiansen, M. H. (2002) Reassessing working memory: A comment on Just & Carpenter (1992) and Waters & Caplan (1996). Psychological Review 109:35–54.Google Scholar

MacDonald, M. C., Pearlmutter, M. & Seidenberg, M. (1994) The lexical nature of ambiguity resolution. Psychological Review 101:676–703.Google Scholar

MacKay, D. J. (2003) Information theory, inference and learning algorithms. Cambridge University Press.Google Scholar

Magyari, L. & de Ruiter, J. P. (2012) Prediction of turn-ends based on anticipation of upcoming words. Frontiers in Psychology 3:376. doi: 10.3389/fpsyg.2012.00376 Google Scholar

Mani, N. & Huettig, F. (2012) Prediction during language processing is a piece of cake – but only for skilled producers. Journal of Experimental Psychology: Human Perception and Performance 38:843–47.Google Scholar

Manning, C. D. & Schütze, H. (1999) Foundations of statistical natural language processing. MIT press.Google Scholar

Marchman, V. A. & Fernald, A. (2008) Speed of word recognition and vocabulary knowledge in infancy predict cognitive and language outcomes in later childhood. Developmental Science 11:F9–16.Google Scholar

Marcus, M. P. (1980) Theory of syntactic recognition for natural languages. MIT Press.Google Scholar

Marr, D. (1976) Early processing of visual information. Philosophical Transactions of the Royal Society of London, B: Biological Sciences 275:483–519.Google Scholar

Marr, D. (1982) Vision. W. H. Freeman.Google Scholar

Marslen-Wilson, W. D. (1975) Sentence perception as an interactive parallel process. Science 189:226–28.Google Scholar

Marslen-Wilson, W. D., Tyler, L. K. & Koster, C. (1993) Integrative processes in utterance resolution. Journal of Memory and Language 32:647–66.Google Scholar

Martin, R. C. & He, T. (2004) Semantic STM and its role in sentence processing: A replication. Brain and Language 89:76–82.Google Scholar

Martin, R. C., Shelton, J. R. & Yaffee, L. S. (1994) Language processing and working memory: Neuropsychological evidence for separate phonological and semantic capacities. Journal of Memory and Language 33:83–111.Google Scholar

McCauley, S. M. & Christiansen, M. H. (2011) Learning simple statistics for language comprehension and production: The CAPPUCCINO model. In: Proceedings of the 33rd Annual Conference of the Cognitive Science Society, Boston, MA, July 2011. pp. 619–24, ed. Carlson, L. A., Hölscher, C. & Shipley, T. F.. Cognitive Science Society.Google Scholar

McCauley, S. M. & Christiansen, M. H. (2014b) Reappraising lexical specificity in children's early syntactic combinations. In: Proceedings of the 36th Annual Conference of the Cognitive Science Society. pp. 1000–1005. ed. Bello, P., Guarini, M., McShane, M., & Scassellati, B.. Cognitive Science Society.Google Scholar

McCloskey, M. & Cohen, N. J. (1989) Catastrophic interference in connectionist networks: The sequential learning problem. Psychology of Learning and Motivation 24:109–65.Google Scholar

Mermillod, M., Bugaïska, A. & Bonin, P. (2013) The stability-plasticity dilemma: Investigating the continuum from catastrophic forgetting to age-limited learning effects. Frontiers in Psychology 4:504. doi: 10.3389/fpsyg.2013.00504.Google Scholar

Meyer, D. E. & Schvaneveldt, R. W. (1971) Facilitation in recognizing pairs of words: evidence of a dependence between retrieval operations. Journal of Experimental Psychology 90:227–34.Google Scholar

Meyer, A. S. (1996) Lexical access in phrase and sentence production: Results from picture-word interference experiments. Journal of Memory and Language 35:477–96.CrossRef Google Scholar

Miller, G. A. (1956) The magical number seven, plus or minus two: Some limits on our capacity for processing information. Psychological Review 63(2):81–97.Google Scholar

Miller, G. A., Galanter, E. & Pribram, K. H. (1960) Plans and the structure of behavior. Holt, Rinehart & Winston.Google Scholar

Miller, G. A. & Taylor, W. G. (1948) The perception of repeated bursts of noise. Journal of the Acoustic Society of America 20:171–82.Google Scholar

Misyak, J. B. & Christiansen, M. H. (2010) When “more” in statistical learning means “less” in language: Individual differences in predictive processing of adjacent dependencies. In: Proceedings of the 32nd Annual Cognitive Science Society Conference, Portland, OR, August 11–14, 2010, ed. Catrambone, R. & Ohlsson, S., pp. 2686–91. Cognitive Science Society.Google Scholar

Monaghan, P. & Christiansen, M. H. (2008) Integration of multiple probabilistic cues in syntax acquisition. In: Trends in corpus research: Finding structure in data (TILAR Series), ed. Behrens, H., pp. 139–63. John Benjamins.Google Scholar

Morgan, J. L. & Demuth, K. (1996) Signal to syntax: Bootstrapping from speech to grammar in early acquisition. Erlbaum.Google Scholar

Morrill, G. (2010) Categorial grammar: Logical syntax, semantics, and processing. Oxford University Press.Google Scholar

Mufwene, S. (2008) Language evolution: Contact, competition and change. Continuum International Publishing Group.Google Scholar

Murdock, B. B. Jr., (1968) Serial order effects in short-term memory. Journal of Experimental Psychology Monograph Supplement 1–15.Google Scholar

Nadig, A. S. & Sedivy, J. C. (2002) Evidence of perspective-taking constraints in children's on-line reference resolution. Psychological Science 13:329–36.Google Scholar

Navon, D. & Miller, J. O. (2002) Queuing or sharing? A critical evaluation of the single-bottleneck notion. Cognitive Psychology 44:193–251.Google Scholar

Neal, R. M. & Hinton, G. E. (1998) A view of the EM algorithm that justifies incremental, sparse, and other variants. In: Learning in graphical models, ed. Jordan, M. I., pp. 355–68. Kluwer.Google Scholar

Nevins, A. (2010) Locality in vowel harmony. MIT Press.CrossRef Google Scholar

Newell, A. & Rosenbloom, P. S. (1981) Mechanisms of skill acquisition and the law of practice. In: Cognitive skills and their acquisition, ed. Anderson, J. R., pp. 1–55. Erlbaum.Google Scholar

Nicol, J. L., Forster, K. I. & Veres, C. (1997) Subject-verb agreement processes in comprehension. Journal of Memory and Language 36:569–87.Google Scholar

Niyogi, P. & Berwick, R. C. (1996) A language learning model for finite parameter spaces. Cognition 61:161–93.Google Scholar

Norman, D. A. & Shallice, T. (1986) Attention to action: Willed and automatic control of behavior. In: Consciousness and self-regulation, ed. Davidson, R. J., Schwartz, G. E. & Shapiro, D., pp. 1–18. Plenum Press.Google Scholar

Nosofsky, R. M. (1986) Attention, similarity, and the identification–categorization relationship. Journal of Experimental Psychology: General 115:39.Google Scholar

Noveck, I. A. & Reboul, A. (2008) Experimental pragmatics: A Gricean turn in the study of language. Trends in Cognitive Sciences 12:425–31.Google Scholar

Nygaard, L. C., Sommers, M. S. & Pisoni, D. B. (1994) Speech perception as a talker-contingent process. Psychological Science 5:42–46.Google Scholar

O'Grady, W. (2005) Syntactic carpentry: An emergentist approach to syntax. Erlbaum.Google Scholar

O'Grady, W. (2013) The illusion of language acquisition. Approaches to Bilingualism 3:253–85.Google Scholar

O'Grady, W. (2015a) Anaphora and the case for emergentism. In: The handbook of language emergence, ed. MacWhinney, B. & O'Grady, W., pp. 100–22. Wiley-Blackwell.Google Scholar

Oaksford, M. & Chater, N., eds. (1998) Rational models of cognition. Oxford University Press.Google Scholar

Oaksford, M. & Chater, N. (2007) Bayesian rationality. Oxford University Press.Google Scholar

Ohno, T. & Mito, S. (1988) Just-in-time for today and tomorrow. Productivity Press.Google Scholar

Orr, D. B., Friedman, H. L. & Williams, J. C. C. (1965) Trainability of listening comprehension of speeded discourse. Journal of Educational Psychology 56:148–56.Google Scholar

Orwin, M., Howes, C. & Kempson, R. (2013) Language, music and interaction. College Publications.Google Scholar

Padó, U., Crocker, M. W. & Keller, F. (2009) A probabilistic model of semantic plausibility in sentence processing. Cognitive Science 33:794–838.Google Scholar

Pani, J. R. (2000) Cognitive description and change blindness. Visual Cognition 7:107–26.Google Scholar

Pashler, H. (1988) Familiarity and visual change detection. Perception and Psychophysics 44:369–78.Google Scholar

Pavani, F. & Turatto, M. (2008) Change perception in complex auditory scenes. Perception and Psychophysics 70:619–29.Google Scholar

Pearlmutter, N. J., Garnsey, S. M. & Bock, K. (1999) Agreement processes in sentence comprehension. Journal of Memory and Language 41:427–56.Google Scholar

Pellegrino, F., Coupé, C. & Marsico, E. (2011) A cross-language perspective on speech information rate. Language 87:539–58.Google Scholar

Pereira, P. & Schabes, Y. (1992) Inside–outside reestimation from partially bracketed corpora. Proceedings of the 30th Annual Meeting on Association for Computational Linguistics, Newark, DE, June 28–July 2, 1992, pp. 128–35, ed. Thompson, H. S.. Association for Computational Linguistics.Google Scholar

Phillips, B. S. (2006) Word frequency and lexical diffusion. Palgrave Macmillan.Google Scholar

Phillips, C. (1996) Merge right: An approach to constituency conflicts. In: Proceedings of the Fourteenth West Coast Conference on Formal Linguistics, volume 15, Los Angeles, CA, March 1994, pp. 381–95, ed. Camacho, J., Choueiri, L. & Watanabe, M.. University of Chicago Press.Google Scholar

Phillips, C. (2003) Linear order and constituency. Linguistic Inquiry 34:37–90.Google Scholar

Piantadosi, S., Tily, H. & Gibson, E. (2011) Word lengths are optimized for efficient communication. Proceedings of the National Academy of Sciences 108:3526–29.Google Scholar

Piantadosi, S., Tily, H. & Gibson, E. (2012) The communicative function of ambiguity in language. Cognition 122:280–91.Google Scholar

Pickering, M. J. & Branigan, H. P. (1998) The representation of verbs: Evidence from syntactic priming in language production. Journal of Memory and Language 39:633–51.Google Scholar

Pickering, M. J. & Garrod, S. (2004) Toward a mechanistic psychology of dialogue. Behavioral and Brain Sciences 27:169–226.Google Scholar

Pickering, M. J. & Garrod, S. (2007) Do people use language production to make predictions during comprehension? Trends in Cognitive Sciences 11:105–10.Google Scholar

Pickering, M. J. & Garrod, S. (2013a) An integrated theory of language production and comprehension. Behavioral and Brain Sciences 36: 329–47.Google Scholar

Pierrehumbert, J. (2002) Word-specific phonetics. Laboratory Phonology VII. Mouton de Gruyter.Google Scholar

Pine, J. M., Freudenthal, D., Krajewski, G. & Gobet, F. (2013) Do young children have adult-like syntactic categories? Zipf's law and the case of the determiner. Cognition 127:345–60.Google Scholar

Pinker, S. (1984) Language learnability and language development. Harvard University Press.Google Scholar

Pinker, S. (1994) The language instinct: How the mind creates language. William Morrow.CrossRef Google Scholar

Pinker, S. & Bloom, P. (1990) Natural language and natural selection. Behavioral and Brain Sciences 13:707–27.Google Scholar

Pinker, S. & Prince, A. (1988) On language and connectionism: Analysis of a parallel distributed processing model of language acquisition. Cognition 28:73–193.Google Scholar

Potter, M. C. & Lombardi, L. (1998) Syntactic priming in immediate recall of sentences. Journal of Memory and Language 38:265–82.Google Scholar

Poulet, J. F. A. & Hedwig, B. (2006) New insights into corollary discharges mediated by identified neural pathways. Trends in Neurosciences 30:14–21.Google Scholar

Pulman, S. G. (1985) A parser that doesn't. Proceedings of the Second European Meeting of the Association for Computational Linguistics, Geneva, Switzerland, March 1985, pp. 128–35, Association for Computational Linguistics.Google Scholar

Ratcliff, R. (1990) Connectionist models of recognition memory: Constraints imposed by learning and forgetting functions. Psychological Review 97:285–308.Google Scholar

Reali, F. & Christiansen, M. H. (2007a) Processing of relative clauses is made easier by frequency of occurrence. Journal of Memory and Language 57:1–23.Google Scholar

Reali, F. & Christiansen, M. H. (2007b) Word-chunk frequencies affect the processing of pronominal object-relative clauses. Quarterly Journal of Experimental Psychology 60:161–70.Google Scholar

Redington, M., Chater, N. & Finch, S. (1998) Distributional information: A powerful cue for acquiring syntactic categories. Cognitive Science 22:425–469.Google Scholar

Remez, R. E., Fellowes, J. M. & Rubin, P. E. (1997) Talker identification based on phonetic information. Journal of Experimental Psychology: Human Perception and Performance 23:651–66.Google Scholar

Remez, R. E., Ferro, D. F., Dubowski, K. R., Meer, J., Broder, R. S. & Davids, M. L. (2010) Is desynchrony tolerance adaptable in the perceptual organization of speech? Attention, Perception, and Psychophysics 72:2054–58.Google Scholar

Richerson, P. J. & Christiansen, M. H., eds. (2013) Cultural evolution: Society, technology, language and religion. MIT Press.Google Scholar

Roland, D., Elman, J. & Ferreira, V. (2006) Why is that? Structural prediction and ambiguity resolution in a very large corpus of English sentences. Cognition 98:245–72.CrossRef Google Scholar

Rumelhart, D. E. & McClelland, J. L. (1986) On learning the past tenses of English verbs. In: Parallel distributed processing: Explorations in the microstructure of cognition. Volume 2: Psychological and biological models, ed. McClelland, J. L., Rumelhart, D. E. & the PDP Research Group, pp. 216–71. MIT Press.Google Scholar

Rumelhart, D. E., McClelland, J. L. & the PDP Research Group (1986a) Parallel distributed processing: Explorations in the microstructure of cognition, volumes 1 and 2. MIT Press.CrossRef Google Scholar

Saad, D., ed. (1998) On-line learning in neural networks. Cambridge University Press.Google Scholar

Sagarra, N. & Herschensohn, J. (2010) The role of proficiency and working memory in gender and number processing in L1 and L2 Spanish. Lingua 120:2022–39.Google Scholar

Sahin, N. T., Pinker, S., Cash, S. S., Schomer, D. & Halgren, E. (2009) Sequential processing of lexical, grammatical, and articulatory information within Broca's area. Science 326:445.Google Scholar

Sandler, W., Meir, I., Padden, C. & Aronoff, M. (2005) The emergence of grammar: Systematic structure in a new language. Proceedings of the National Academy of Sciences 102:2661–65.Google Scholar

Schmidt, R. A. & Wrisberg, C. A. (2004) Motor learning and performance, third edition. Human Kinetics.Google Scholar

Schultz, W., Dayan, P. & Montague, P. R. (1997) A neural substrate of prediction and reward. Science 275:1593–99.Google Scholar

Seidenberg, M. S. (1997) Language acquisition and use: Learning and applying probabilistic constraints. Science 275:1599–603.Google Scholar

Seidenberg, M. S. & McClelland, J. L. (1989) A distributed, developmental model of word recognition and naming. Psychological Review 96:523–68.Google Scholar

Shannon, C. (1948) A mathematical theory of communication. Bell System Technical Journal 27:623–56.Google Scholar

Siegel, D. (1978) The adjacency constraint and the theory of morphology. North East Linguistics Society 8:189–97.Google Scholar

Sigman, M. & Dehaene, S. (2005) Parsing a cognitive task: A characterization of the mind's bottleneck. PLOS Biology 3(2):e37. doi:10.1371/journal.pbio.0030037.Google Scholar

Silbert, L. J., Honey, C. J., Simony, E., Poeppel, D. & Hasson, U. (2014) Coupled neural systems underlie the production and comprehension of naturalistic narrative speech. Proceedings of the National Academy of Sciences 111:E4687–96.Google Scholar

Simon, H. A. (1956) Rational choice and the structure of the environment. Psychological Review 63:129–38.Google Scholar

Simon, H. A. (1982) Models of bounded rationality: Empirically grounded economic reason. MIT Press.Google Scholar

Simons, D. J. & Levin, D. T. (1998) Failure to detect changes to people during a real-world interaction. Psychonomic Bulletin and Review 5:644–49.Google Scholar

Siyanova-Chanturia, A., Conklin, K. & Van Heuven, W. J. B. (2011) Seeing a phrase “time and again” matters: The role of phrasal frequency in the processing of multiword sequences. Journal of Experimental Psychology: Learning, Memory, and Cognition 37:776–784.Google Scholar

Smith, K. & Kirby, S. (2008) Cultural evolution: Implications for understanding the human language faculty and its evolution. Philosophical Transactions of the Royal Society B 363:3591–603.Google Scholar

Smith, K. & Wonnacott, E. (2010) Eliminating unpredictable variation through iterated learning. Cognition 116:444–49.Google Scholar

Smith, M. & Wheeldon, L. (1999) High level processing scope in spoken sentence production. Cognition 73:205–46.Google Scholar

Snedeker, J. & Trueswell, J. (2003) Using prosody to avoid ambiguity: Effects of speaker awareness and referential context. Journal of Memory and Language 48:103–30.Google Scholar

Sperling, G. (1960) The information available in brief visual presentations. Psychological Monographs: General and Applied 74:1–29.Google Scholar

Staub, A. & Clifton, C. Jr., (2006) Syntactic prediction in language comprehension: Evidence from either … or. Journal of Experimental Psychology: Learning, Memory, and Cognition 32:425–36.Google Scholar

Steedman, M. (1987) Combinatory grammars and parasitic gaps. Natural Language and Linguistic Theory 5:403–39.Google Scholar

Steedman, M. (2000) The syntactic process. MIT Press.Google Scholar

Stephens, G. J., Honey, C. J. & Hasson, U. (2013) A place for time: The spatiotemporal structure of neural dynamics during natural audition. Journal of Neurophysiology 110(9):2019–26.Google Scholar

Stephens, G. J., Silbert, L. J. & Hasson, U. (2010) Speaker–listener neural coupling underlies successful communication. PNAS 107:14425–30.Google Scholar

Stivers, T., Enfield, N. J., Brown, P., Englert, C., Hayashi, M., Heinemann, T., Hoymann, G., Rossano, F., de Ruiter, J. P., Yoon, K.-Y. & Levinson, S. C. (2009) Universals and cultural variation in turn-taking in conversation. Proceedings of the National Academy of Sciences 106:10587–92.Google Scholar

Studdert-Kennedy, M. (1986) Some developments in research on language behavior. In: Behavioral and social science: Fifty years of discovery: In commemoration of the fiftieth anniversary of the “Ogburn Report,” recent social trends in the United States, ed. Smelser, N. J. & Gerstein, D. R., pp. 208–48. National Academy Press.Google Scholar

Sturt, P. & Crocker, M. W. (1996) Monotonic syntactic processing: A cross-linguistic study of attachment and reanalysis. Language and Cognitive Processes 11:449–94.Google Scholar

Swaab, T., Brown, C. M. & Hagoort, P. (2003) Understanding words in sentence contexts: The time course of ambiguity resolution. Brain and Language 86:326–43.Google Scholar

Tanenhaus, M. K., Spivey-Knowlton, M. J., Eberhard, K. M. & Sedivy, J. C. (1995) Integration of visual and linguistic information in spoken language comprehension. Science 268:1632–34.Google Scholar

Tenenbaum, J. B., Kemp, C., Griffiths, T. L. & Goodman, N. D. (2011) How to grow a mind: Statistics, structure, and abstraction. Science 331:1279–85.Google Scholar

Thornton, R., MacDonald, M. C. & Gil, M. (1999) Pragmatic constraint on the interpretation of complex noun phrases in Spanish and English. Journal of Experimental Psychology: Learning, Memory, and Cognition 25:1347–65.Google Scholar

Tomasello, M. (1992) First verbs: A case study of early grammatical development. Cambridge University Press.Google Scholar

Tomasello, M. (2003) Constructing a language: A usage-based theory of language acquisition. Harvard University Press.Google Scholar

Townsend, D. J. & Bever, T. G. (2001) Sentence comprehension: The integration of habits and rules. MIT Press.Google Scholar

Treisman, A. (1964) Selective attention in man. British Medical Bulletin 20:12–16.Google Scholar

Treisman, A. & Schmidt, H. (1982) Illusory conjunctions in the perception of objects. Cognitive Psychology 14:107–41.Google Scholar

Tremblay, A. & Baayen, H. (2010) Holistic processing of regular four-word sequences: A behavioral and ERP study of the effects of structure, frequency, and probability on immediate free recall. In: Perspectives on formulaic language, ed. Wood, D., pp. 151–67. Continuum International Publishing.Google Scholar

Tremblay, A., Derwing, B., Libben, G. & Westbury, C. (2011) Processing advantages of lexical bundles: Evidence from self-paced reading and sentence recall tasks. Language Learning 61:569–613.Google Scholar

Trudgill, P. (2011) Sociolinguistic typology: Social determinants of linguistic complexity. Oxford University Press.Google Scholar

Trueswell, J. C. & Tanenhaus, M. K. (1994) Towards a lexicalist framework of constraint-based syntactic ambiguity resolution. In: Perspectives on sentence processing, ed. Clifton, C., Frazier, L. & Rayner, K., pp. 155–79. Erlbaum.Google Scholar

Trueswell, J. C., Sekerina, I., Hill, N. M. & Logrip, M. L. (1999) The kindergarten-path effect: Studying on-line sentence processing in young children. Cognition 73:89–134.Google Scholar

Trueswell, J. C., Tanenhaus, M. K. & Garnsey, S. M. (1994) Semantic influences on parsing: Use of thematic role information in syntactic ambiguity resolution. Journal of Memory and Language 33:285–318.Google Scholar

Trueswell, J. C., Tanenhaus, M. K. & Kello, C. (1993) Verb-specific constraints in sentence processing: Separating effects of lexical preference from garden-paths. Journal of Experimental Psychology: Learning, Memory, and Cognition 19:528–53.Google Scholar

Valian, V., Solt, S. & Stewart, J. (2009) Abstract categories or limited-scope formulae? The case of children's determiners. Journal of Child Language 36:743–78.Google Scholar

Van Berkum, J. J., Brown, C. M., Zwitserlood, P., Kooijman, V. & Hagoort, P. (2005) Anticipating upcoming words in discourse: Evidence from ERPs and reading times. Journal of Experimental Psychology: Learning, Memory, and Cognition 31:443–67.Google Scholar

van den Brink, D., Brown, C. M. & Hagoort, P. (2001) Electrophysiological evidence for early contextual influences during spoken-word recognition: N200 versus N400 effects. Journal of Cognitive Neuroscience 13:967–85.Google Scholar

Van Everbroeck, E. (1999) Language type frequency and learnability: A connectionist appraisal. In: Proceedings of the 21st Annual Conference of the Cognitive Science Society, Vancouver, British Columbia, Canada, August 1999, pp. 755–60, ed. Hahn, M. & Stoness, S. C.. Erlbaum.Google Scholar

van Gompel, R. P. & Liversedge, S. P. (2003) The influence of morphological information on cataphoric pronoun assignment. Journal of Experimental Psychology: Learning, Memory, and Cognition 29:128–39.Google Scholar

Wang, W. S.-Y. (1969) Competing changes as a cause of residue. Language 45:9–25.Google Scholar

Wang, W. S.-Y., ed. (1977) The lexicon in phonological change. Mouton.Google Scholar

Wang, W. S.-Y. & Cheng, C.-C. (1977) Implementation of phonological change: The Shaung-feng Chinese case. In: The lexicon in phonological change, ed. Wang, W. S.-Y., pp. 86–100. Mouton.Google Scholar

Warren, P. & Marslen-Wilson, W. (1987) Continuous uptake of acoustic cues in spoken word recognition. Perception & Psychophysics 41:262–75.Google Scholar

Warren, R. M., Obusek, C. J., Farmer, R. M. & Warren, R. P. (1969) Auditory sequence: Confusion of patterns other than speech or music. Science 164:586–87.Google Scholar

Wasow, T. & Arnold, J. (2003) Post-verbal constituent ordering in English. In: Determinants of grammatical variation in English, ed. Rohdenburg, G. & Mondorf, B., pp. 119–54. Mouton de Gruyter.Google Scholar

Weissenborn, J. & Höhle, B., eds. (2001) Approaches to bootstrapping: Phonological, lexical, syntactic and neurophysiological aspects of early language acquisition. John Benjamins.Google Scholar

Wicha, N. Y. Y., Moreno, E. M. & Kutas, M. (2004) Anticipating words and their gender: An event-related brain potential study of semantic integration, gender expectancy, and gender agreement in Spanish sentence reading. Journal of Cognitive Neuroscience 16:1272–88.Google Scholar

Wilbur, R. B. & Nolkn, S. B. (1986) The duration of syllables in American Sign Language. Language and Speech 29:263–80.Google Scholar

Wilson, M. & Emmorey, K. (2006) Comparing sign language and speech reveals a universal limit on short-term memory capacity. Psychological Science 17:682–83.Google Scholar

Winograd, T. (1972) Understanding natural language. Cognitive Psychology 3:1–191.Google Scholar

Wolpert, D. M., Diedrichsen, J. & Flanagan, J. R. (2011) Principles of sensorimotor learning. Nature Reviews Neuroscience 12:739–51.Google Scholar

Yang, C. (2002) Knowledge and learning in natural language. Oxford University Press.Google Scholar

Yang, C. (2013) Ontogeny and phylogeny of language. Proceedings of the National Academy of Sciences 110:6324–27.Google Scholar

Zhu, L., Chen, Y., Torrable, A., Freeman, W. & Yuille, A. L. (2010) Part and appearance sharing: Recursive compositional models for multi-view multi-object detection. In: IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPRW 2010), San Francisco, CA, June 2010, Institute of Electrical and Electronics Engineers (IEEE).Google Scholar

Zipf, G. K. (1949) Human behavior and the principle of least effort. Addison-Wesley.Google Scholar

Table 1. Summary of the Now-or-Never bottleneck's implications for perception/action and language

Figure 3. Illustration of how Chunk-and-Pass processing at the utterance level (with the Ci referring to chunks) constrains the acquisition of language by the individual, which, in turn, influences how language evolves through learning and use by groups of individuals on a historical timescale.