What's in the input? Frequent frames in child-directed speech offer distributional cues to grammatical categories in Spanish and English*

ADRIANA WEISLEDER; SANDRA R. WAXMAN

doi:10.1017/S0305000909990067

What's in the input? Frequent frames in child-directed speech offer distributional cues to grammatical categories in Spanish and English*

Published online by Cambridge University Press: 24 August 2009

ADRIANA WEISLEDER and

SANDRA R. WAXMAN

Show author details

ADRIANA WEISLEDER: Affiliation:
Stanford University
SANDRA R. WAXMAN*: Affiliation:
Northwestern University
*: Address for correspondence: Sandra R. Waxman, Department of Psychology, Northwestern University, 2029 Sheridan Road, Evanston, IL 60208-2710; e-mail: s-waxman@northwestern.edu

Article contents

Abstract
Footnotes
References

Rights & Permissions

Abstract

Recent analyses have revealed that child-directed speech contains distributional regularities that could, in principle, support young children's discovery of distinct grammatical categories (noun, verb, adjective). In particular, a distributional unit known as the frequent frame appears to be especially informative (Mintz, 2003). However, analyses have focused almost exclusively on the distributional information available in English. Because languages differ considerably in how the grammatical forms are marked within utterances, the scarcity of cross-linguistic evidence represents an unfortunate gap. We therefore advance the developmental evidence by analyzing the distributional information available in frequent frames across two languages (Spanish and English), across sentence positions (phrase medial and phrase final), and across grammatical forms (noun, verb, adjective). We selected six parent–child corpora from the CHILDES database (three English; three Spanish), and analyzed the input when children were aged 2 ; 6 or younger. In each language, frequent frames did indeed offer systematic cues to grammatical category assignment. We also identify differences in the accuracy of these frames across languages, sentences positions and grammatical classes.

Type: Brief Research Report
Information: Journal of Child Language , Volume 37 , Issue 5 , November 2010 , pp. 1089 - 1108

DOI: https://doi.org/10.1017/S0305000909990067 [Opens in a new window]
Copyright: Copyright © Cambridge University Press 2009

To acquire a human language, children must not only learn individual words, but must also discover the distinct kinds of words that are represented in their language (or grammatical categories, e.g. nouns, verbs, determiners) and how they map to meaning. Even within their first years, infants make significant headway in this arena. By age 0 ; 9, they distinguish between two very broad kinds of words: content words (e.g. nouns, verbs, adjectives) vs. function words (e.g. determiners, prepositions) (Shi, Werker & Morgan, Reference Shi, Werker and Morgan1999). By age 1 ; 1, they begin to make finer distinctions among the content words, teasing apart the grammatical form noun (e.g. ‘cat’) and mapping this form specifically to objects and object categories (e.g. cats). Over the next several months, they make finer distinctions still, teasing apart the forms adjective and verb, and mapping each to its associated range of meanings (properties and events, respectively) (Waxman & Lidz, Reference Waxman, Lidz, Kuhn and Siegler2006, for a review; Waxman & Booth, Reference Waxman and Booth2003).

How do infants accomplish this task? Because languages vary in the way that each grammatical form is marked in the ‘surface’ of the input (e.g. whether or not a modifier typically precedes or follows the noun, whether or not the nouns, adjectives and determiners are marked for grammatical gender), infants' accomplishments must rest upon an ability to glean grammatical form-class information from the surface structure of the utterances that they hear. But this claim – that the surface structure of a language provides reliable cues to grammatical category assignment – has been a controversial one. In the 1950s, structural linguists observed that words from the same grammatical category tend to appear in the same distributional environments (Harris, Reference Harris1951). For example, words that appear with the morpheme -ed in English also tend to appear with the morpheme -s. Based on this observation, several researchers proposed that distributional regularities may serve as cues to grammatical form-class assignment (Maratsos, Reference Maratsos, Levy, Schlesinger and Braine1988; Maratsos & Chalkley, Reference Maratsos, Chalkley and Nelson1980; Kiss, Reference Kiss1973). However, this proposal was not endorsed universally. At issue was whether there is in fact sufficient structure in the language input to support the acquisition of grammatical categories and, if so, whether infants are in fact sensitive to the kinds of regularities present (Chomsky, Reference Chomsky1965; Pinker, Reference Pinker and MacWhinney1987).

For decades this challenge appeared insurmountable. In recent years, however, researchers using computational tools to examine the language input have documented that the distributional evidence available in naturalistic, child-directed speech may in fact offer strong cues to grammatical form-class assignment (Mintz, Newport & Bever, Reference Mintz, Newport and Bever2002; Redington, Chater & Finch, Reference Redington, Chater and Finch1998; Cartwright & Brent, Reference Cartwright and Brent1997). In a compelling recent demonstration, Mintz (Reference Mintz2003) introduced the notion of frequent frames, a distributional pattern defined as two words that bracket one intervening word (e.g. I __ you). In an analysis of six corpora involving adult–child conversations, Mintz identified the most frequent frames in adults' speech. (See (1–6) below for some representative examples.) He then considered whether within each such frame, the intervening words tended to belong to the same grammatical category. For some frames (e.g. (1–3)), the intervening words were predominantly verbs; for others (e.g. (4–6)), the intervening words were predominantly nouns. This suggests that frequent frames constitute a distributional unit that could, in principle, support the acquisition of distinct grammatical classes. Moreover, recent work reveals that young children and adults are sensitive to distributional patterns like these (Gómez, Reference Gómez2002; Gómez & Maye, Reference Gómez and Maye2005; Mintz, Reference Mintz2002). Taken together, these recent demonstrations have breathed new life into the hypothesis that distributional information in the input could support the discovery of distinct grammatical forms.

(1) I __ it
(2) you __ to
(3) I __ you
(4) the __ and
(5) a __ on
(6) the __ is

However, evidence to this effect has thus far focused primarily on the distributional information available in English. Because languages differ considerably in how the grammatical forms are marked on the surface of utterances, the scarcity of cross-linguistic evidence represents a gap that is important to fill. Consider, for example, free word order languages like Turkish, where the sequences in which words can appear vary freely. In such languages, the relevant distributional units may not be sequences of co-occurrences among words (as in the frequent frames for English); instead they may be sequences of co-occurrences among sublexical morphemes (Mintz, Reference Mintz2008). In principle, then, the distributional units that emerge as central in one language may differ from those that emerge in another.

Interestingly, we need not look as far as Turkish to appreciate the importance of cross-linguistic evidence. Even in languages more closely related to English, certain linguistic features may have consequences on the clarity and force of the distributional evidence for grammatical form classes. For example, Romance languages like French, Spanish and Italian exhibit considerable homophony among key function words. Although homophony is also present in English, it is primarily evident among content words, as opposed to function words. In cases involving content words, frequent frames may help to disambiguate among candidate meanings (e.g. The brush is mine. vs. I brush your hair.). In Spanish, however, homophony is not only evident among content words, but is also common among determiners (la, los, las, una, unos, unas) and other function words. For example, los can function either as an article (e.g. los niños juegan ‘the children play’) or as an object pronoun (e.g. Ana los quiere aquí ‘Ana wants them here’); que can mean either ‘what’ or ‘that’; and como can mean either ‘how’, ‘as’ or ‘eat’. In cases involving homophony among function words, the consequences for a frequent frames approach may be considerable: because function words are so frequent in the input, they often serve as framing elements in frequent frames. This particular type of homophony could therefore have adverse consequences on the clarity of distributional cues (Pinker, Reference Pinker and MacWhinney1987; Cartwright & Brent, Reference Cartwright and Brent1997). Chemla, Mintz, Bernal & Christophe (Reference Chemla, Mintz, Bernal and Christophein press) recently reported that even in the face of this kind of homophony, the distributional evidence in a frequent frames analysis of French remained robust. However, because the Chemla et al. analysis considered the input to only one French-acquiring child, additional cross-linguistic evidence is clearly warranted.

In addition to homophony, there are other linguistic phenomena that could have consequences on the clarity and force of a distributional analysis. Consider, for example, a linguistic phenomenon known as noun dropping (Torrego, Reference Torrego1987; Snyder, Senghas & Inman, Reference Snyder, Senghas and Inman2001). Noun dropping refers to a process by which nouns are omitted from the surface of a sentence when their meaning is recoverable from context (e.g. Quiero una azul ‘I want a blue (one)’, El pequeño está dormido ‘The little (one) is sleeping’). Notice that noun dropping, which is more ubiquitous in some languages than others, is relevant to distributional analyses because, as a result of this syntactic process, nouns and adjectives often appear within the same frequent frames (e.g. following a determiner and preceding another element) (Waxman & Guasti, Reference Waxman and Guastiin press; Waxman, Senghas & Benveniste, Reference Waxman, Senghas and Benveniste1997).

In view of these observations, our goal is to advance the cross-linguistic developmental evidence for distributional approaches in three key directions. First, we examine the distributional evidence available to young children acquiring Spanish and compare it to the evidence available to children acquiring English. Second, we consider the relative clarity of frequent frames for identifying the grammatical categories noun, verb and adjective. Third, we consider for the first time a new distributional environment. In addition to the frames described by Mintz (Reference Mintz2003), in which two lexical items serve as framing elements (e.g. you__ it), we also consider phrase-final sequences in which the final utterance boundary serves as a framing element (e.g. the__.).

Our decision to consider these end-frames was motivated by evidence that for young learners in particular, ends of utterances have a privileged status (Slobin, Reference Slobin, Ferguson and Slobin1973). Infants and young children are sensitive to the prosodic cues that signal phrase boundaries (Hirsh-Pasek, Kemler Nelson, Jusczyk, Wright Cassidy, Druss & Kennedy, 1987). Thus, these prosodic cues might serve as framing elements. Moreover, in infant- and child-directed speech, key words are often placed in utterance-final position and tend to receive exaggerated pitch peaks and increased durations (Fernald & Mazzie, Reference Fernald and Mazzie1991). Finally, children more successfully identify words presented in utterance-final, as compared to utterance-medial, position (Fernald & McRoberts, Reference Fernald and McRoberts1993; Shady & Gerken, Reference Shady and Gerken1999). Put succinctly, because infants and young children are attentive to phrase boundaries (and especially to those occurring in utterance-final position), if utterance boundaries constitute framing elements, and if the ends of phrases contain information that is relevant to grammatical form (Gleitman & Wanner, Reference Gleitman, Wanner, Wanner and Gleitman1982; Morgan & Newport, Reference Morgan and Newport1981), then end-frames may constitute a potent source of information. Thus, by including both mid-frames (following Mintz, Reference Mintz2003) and end-frames, we have an opportunity to consider the accuracy of a frequent frames approach when the frames occur in different locations within the utterance (mid- and end-frames), and when they are bounded by different kinds of framing elements (words and utterance boundaries).

METHOD

Input corpora

We selected six parent–child corpora from the CHILDES database (MacWhinney, Reference MacWhinney2000); three in Spanish (Irene (Ojea, Reference Ojea and MacWhinney2000), Koki (Montes, Reference Montes1987) and María (López-Ornat, Fernández, Gallo & Mariscal, Reference López-Ornat, Fernández, Gallo and Mariscal1994)), and three in English (Eve (Brown, Reference Brown1973), Naomi (Sachs, Reference Sachs and Nelson1983) and Nina (Suppes, Reference Suppes1974)). We analyzed the utterances of the adult speakers in all sessions in which the child speaker was 2 ; 6 or younger. The English corpora were among those previously examined by Mintz (Reference Mintz2003). This ensured that our execution of the frequent frames analysis mirrored that reported by Mintz, and provided a point of comparison for Spanish.

Distributional analysis procedure

Our analytic procedure follows that of Mintz (Reference Mintz2003).

Gathering the frames

Following Mintz (Reference Mintz2003), we defined a frame as two linguistic elements with one word intervening. We considered two different types of frames. In the case of mid-frames (A__B), the two framing elements were words (denoted by A and B), and the intervening word (denoted by __ ) varies. This is the frame analyzed by Mintz (Reference Mintz2003). In the case of end-frames (A__.), the first framing element was a word, the second an utterance-final boundary, and the intervening word varies. We segmented every adult utterance into three-element frames. For example, the utterance ‘Look at the doggie over there’ yielded five frames: four mid-frames (‘look __ the’, ‘at __ doggie’, ‘the __ over’, ‘doggie __ there’) and one end-frame (‘over __.’). We did not include any frames that crossed an utterance boundary.

Selecting the frames

Next, we tabulated the frequency of each frame and selected for analysis the forty-five most frequent mid-frames and the forty-five most frequent end-frames.Footnote ¹ These constitute our two groups of frequent frames.

Identifying the intervening words

We then listed all of the intervening words (both types (e.g. dog, cat, run) and tokens (e.g. every instance of dog, cat and run)) that appeared within each frequent frame; each list constituted a frame-based category.

Identifying the grammatical category assignment of intervening words. We assigned each intervening word to a grammatical category (noun, verb, adjective, preposition, adverb, determiner, wh-word, conjunction or interjection). This step of the analysis was carried out by a native speaker of each language; a native Spanish speaker categorized all the words in the Spanish corpora and a native English speaker categorized those in the English corpora. To ensure that the same criteria were applied across the two languages, grammatical category assignments in each language were then checked by a Spanish–English bilingual. For any word that could be assigned to more than one grammatical category (e.g. in English, party is both a noun and verb; in Spanish, vino is both a noun and a verb), the corpus was consulted to identify the correct assignment. In every case, the corpus provided adequate evidence to assign the word's grammatical category.

Quantitative evaluation procedure

Our quantitative evaluation focuses on the accuracy, or consistency, of the frame-based categories. To compute ‘Accuracy’, we compared every pair of words that occurred within any frame (See Mintz (Reference Mintz2003) and Redington et al. (Reference Redington, Chater and Finch1998) for full details). We identified each pair as either a ‘Hit’ (the two words were members of the same grammatical category) or a ‘False Alarm’ (the two words were members of different grammatical categories). We then calculated Accuracy by computing the proportion of Hits (Accuracy=Hits / (Hits+False Alarms)).

In the analyses reported here, we focus on Accuracy for token frequencies. However, it is important to point out that several different analyses yielded the very same pattern of results. First, independent analysis on word types and tokens revealed the same pattern. Second, in addition to Accuracy, we also analyzed the ‘Completeness’ of the frame-based categories. This measure considers the proportion of word pairs from the same grammatical category that were grouped together in the same frame (see Mintz (Reference Mintz2003) for details). In the face of these strongly converging measures, we decided to report the results on Accuracy for token frequencies because this most closely reflects our goal of determining whether frequent frames are equally consistent across different languages and distributional environments. Finally, note that an analysis that strives for high Accuracy (even at the expense of Completeness) would result in categories with high internal consistency. Once this is achieved, some of the categories could be merged (based, for example, on their degree of overlap) in order to achieve higher Completeness (Mintz, Reference Mintz2003).

Baseline categorization: comparison to chance

To obtain a baseline categorization measure, we used a Monte Carlo method. Specifically, we computed accuracy scores for random word categories that were generated for each corpus. To accomplish this task, the intervening words within all of the frame-based categories were randomly distributed to form ‘dummy’ categories, which matched the frame-based categories in size. This random shuffling of the intervening words was repeated 1000 times, with accuracy computed on each shuffle. Accuracy scores obtained from these 1000 shuffles provided a baseline against which to compare the results from the frame-based categories and compute significance levels. For example, if only 2 out of 1000 shuffles matched the score obtained by the frame-based categorization method, the frame-based Accuracy score was said to be significantly above chance with a probability of 0·002.

RESULTS AND DISCUSSION

Table 1 offers an overview of the corpora for each child. Although there is within- and between-language variation in the size of the corpora, preliminary analyses ensured that there were no significant differences between the English and Spanish corpora in the total number of utterances analyzed, or in the number of types or tokens categorized by the frequent frames analysis (all p's >0·3). Moreover, for every child, the words categorized by this analysis comprised a large fraction of the corpus (see Table 1, last column). This is important because it reveals that the frequent frames analysis ‘captured’ the words that predominate in the child's input.

TABLE 1. Descriptive statistics for each corpus

^a # of tokens categorized in frequent frames/total # of tokens in the corpus.

^b Percentage of tokens in the corpus whose types were categorized by frequent frames.

(e.g. If ‘jacket’ is categorized only once by frequent frames, but it appears 35 times in the corpus, the ‘% represented’ would be 35 / total # of tokens in the corpus.)

Comparing frequent frames (actual) to baseline (chance)

For each corpus, we compared the Accuracy score for the frame-based categories to the corresponding baseline measure (derived using the Monte Carlo method described above) (see Table 2). The Accuracy scores for all corpora were significantly higher than baseline (all p's <0·001). This documents that, in the input for each child, there is indeed consistent distributional information within frequent frames that converges on grammatical categories. This replicates Mintz (Reference Mintz2003) and extends the work to a new frame-type (end-frames) and to a new language (Spanish).

TABLE 2. Accuracy scores (token frequency) for each frame-type in each corpus

Comparing across languages, frame-types and grammatical class

We next asked whether there were systematic differences in accuracy between the two languages, between the two types of distributional environments and between the grammatical classes. To address this question, we aggregated the frame-based categories for each language and frame-typeFootnote ² and calculated the accuracy score within each. We then categorized each frequent frame by the modal grammatical class of its intervening words (e.g. if the frame contained more nouns than words from any other grammatical class, it was classified as a Noun-frame). We submitted these accuracy scores to a three-way ANOVA: language (English vs. Spanish) by frame-type (mid-frame vs. end-frame) and by grammatical class (noun-frame vs. verb-frame vs. adjective-frame) (see Table 3). A main effect of language indicated that accuracy was higher in English (M=0·72) than Spanish (M=0·60) (F(1,209)=5·022, p=0·026, η_p²=0·023). A main effect for frame-type indicated that accuracy was higher for mid-frames (M=0·77) than for end-frames (M=0·55) (F(1,209)=13·374, p<0·001, η_p²=0·06). A main effect for grammatical class revealed higher accuracy for verb-frames (M=0·76) than for noun-frames (M=0·71), and higher accuracy for noun-frames than for adjective-frames (M=0·50) (F(2,209)=5·154, p=0·007, η_p²=0·047). All differences among these means were statistically reliable (Tukey's HSD, all p's <0·05) (see Table 3). A subsequent analysis based on the data from each individual child's corpus ensured that these effects were not an artifact of the particular corpora we selected or of aggregating data from different corpora.Footnote ³

TABLE 3. Accuracy scores for each language, frame-type and grammatical class

* Significantly different from each other, p<0·05.

** Significantly different from each other, p<0·001.

These findings provide support for the hypothesis that the clarity of the distributional information available in frequent frames varies across languages, and within languages it varies across different distributional environments and grammatical form classes. In particular, the analyses reported here, coupled with a glance at Table 3, suggest that in both languages, frequent frames contain robust cues for identifying the grammatical forms noun and verb, but weaker cues for the form adjective. This outcome for adjectives, although consistent with our proposal, should be interpreted with some caution. Our analysis identified numerous noun- and verb-frames in each language, and the accuracy of these frames tended to be high. By contrast, we identified only a handful of adjective-frames (three and six, for Spanish and English, respectively). Because these also included a number of nouns and verbs, their accuracy was relatively low. Interestingly, and as predicted, although this pattern was evident in both languages, a glance at Table 3 suggests it was especially pronounced in Spanish.

GENERAL DISCUSSION

The current evidence provides support for the claim that the distributional information in frequent frames contains cues that could be useful to young children as they establish the main grammatical categories of their native language. This work extends previous evidence in three ways. First, it broadens the empirical base, examining the input available to children acquiring Spanish as their mother tongue, and comparing it to the input available to children acquiring English. We demonstrate that even in the face of linguistic features that may render the distributional evidence in Spanish less clear on the surface (including homophony and noun drop), frequent frames nonetheless offer robust cues to grammatical form class.

Second, this work casts a wider distributional net, considering not only frames that occur in utterance-medial positions and consist of only lexical items as framing elements (mid-frames), but also frames that occur in utterance final position and include utterance-final boundaries as framing elements (end-frames). We demonstrate for the first time that in both English and Spanish, end-frames carry distributional cues to grammatical form class. Perhaps not surprisingly, the distributional information for end-frames (which, by definition, are less constrained by their surrounding elements than are mid-frames) is less accurate than that for mid-frames. Yet end-frame information, however noisy, may be especially useful to young children (Slobin, Reference Slobin, Ferguson and Slobin1973), as they are better able to recognize words that occur at the ends of utterances (Fernald & McRoberts, Reference Fernald and McRoberts1993; Shady & Gerken, Reference Shady and Gerken1999).

Third, to the best of our knowledge, this is the first investigation to consider the relative accuracy of the distributional evidence available in frequent frames for the discovery of each of these grammatical categories. We found that frequent frames contained robust cues for the grammatical forms noun and verb, but weaker cues for adjectives, and that this pattern, although evident in both languages, appeared to be more pronounced in Spanish than in English.

On the whole, there were more commonalities than differences between the two languages, suggesting that this linguistic unit (the frame) stands as a powerful source of information for children acquiring either English or Spanish. At the same time, differences between the languages did emerge, which are likely due to linguistic features of the input. For example, in Spanish (as in other Romance languages) there is considerable homophony among function words (e.g. the word la can function either as a determiner (la niña juega ‘the girl plays’) or as a clitic (ella la puso aquí ‘she put it(f) here’)). These function words, which are highly frequent, often emerge as framing elements, and because they are homophonous, they affect the clarity of the distributional cues. Consider, for example, the mid-frame la __ a, which occurred frequently in Spanish. When la was used as a determiner, this frame picked out primarily nouns (e.g. Lleva la muñeca a tu cuarto ‘Take the doll to your room’), but when la was used as a clitic, the very same frame picked out primarily verbs (e.g. No la vuelvas a tocar ‘Don't it-clitic go to touch again’). Examples like this, considerably more common in Spanish than English, compromised frame accuracy.

Noun-drop also appears to have compromised accuracy in Spanish relative to English, especially in cases in which determiners emerged as framing elements. Consider, for example, the end-frame un ___., in which the intervening words included nouns (e.g. Dame un dulce ‘Give me a candy’) and adjectives (e.g. Eres un presumido ‘You are a vain (one)’). The distributional overlap between nouns and adjectives as a consequence of noun-drop was also evident, though less frequent, in mid-frames.

These observations suggest that certain features, including homophony and noun-drop, may indeed compromise the clarity of the distributional evidence available in frequent frames in Spanish. However, this is not to say that as a whole, the distributional evidence available to Spanish-acquiring children is weaker, less informative or impoverished, relative to English. After all, children acquiring Spanish and English exhibit comparable developmental milestones. For example, between ages 1 ; 9 and 2 ; 0, infants learning either English or Spanish begin to distinguish adjectives from nouns, mapping the former primarily to object properties (e.g. color, texture) and the latter to object categories (e.g. dog, animal) (Waxman, Braun & Weisleder, in prep). Instead, we suggest that children acquiring different languages will rely on different kinds of distributional information to establish the grammatical categories of their language.

More specifically, the particular distributional cues that are most informative in one language may be different from those that are most informative in another. For example, in fixed word order languages like English, distributional evidence based on co-occurrences of words may be quite informative. But, in languages with rich inflectional morphology (e.g. Spanish, French, Italian), we suspect that additional cues to grammatical form class may be gleaned from distributions of morphemic, phonetic or prosodic properties. For example, our results suggest that differentiating between the roles of two homophonous words (particularly when they are function words) might be important for grammatical class assignment. Therefore one potential avenue for future research may be to explore whether there are differences in the phonetic or prosodic characteristics of homophonic pairs.

It will also be informative to focus on developmental matters. There is considerable evidence documenting that features of parental input vary not only across languages but also across development. In particular, the complexity of infant-directed speech changes over the first few years. How might these developmental changes affect the clarity of the distributional evidence? Perhaps the earliest input, like the input we have analyzed here, offers clearer distributional evidence to support the discovery of grammatical form classes. However, it is also possible that the early input, which is characterized by short utterances and exaggerated intonational contours (Fernald & Simon, Reference Fernald and Simon1984), can facilitate the identification of individual words in the continuous speech stream, but contains relatively little information to support the discovery of distinct grammatical forms.

It will also be important in future work to ascertain how frequent the frequent frames must be, and what proportion of the tokens available in the corpus they must capture, if they are to be informative. For example, in each of the corpora analyzed here, fewer than 10% of the tokens were categorized in the frequent frames analysis. Although this may seem, at first glance, to cast doubt on the utility of a frequent frames approach, a more careful examination reveals that these word types make up 44–76% of the tokens in the corpus (see Table 1, last column). Thus, those words that were categorized constitute among the most commonly used words in the child's input, suggesting that frequent frames may offer information about grammatical categories for many of children's early words.

In sum, we have shown that there are distributional regularities in the linguistic input to children acquiring either English or Spanish that could, in principle, support the acquisition of distinct grammatical categories. Of course, because infants are sensitive to distributional regularities in domains other than language (Fiser & Aslin, Reference Fiser and Aslin2002), the evidence presented here has broader implications for development. The challenge facing infancy researchers is to discover how infants make use of these regularities across development and across languages.

APPENDIX

Representative examples of noun-, verb- and adjective-frames for each language (English and Spanish) and frame-type (mid- and end-frames)

ENGLISH MID-FRAMES

the __ is (139 tokens, 78 types)

baby(2), bag(1), barn(1), basket(1), bear(2), blanket(1), book(1), bookcase(1), box(2), boy(3), brush(1), bug(3), bus(1), car(2), chicken(1), clip(1), cord(1), couch(1), cow(2), cream(1), crib(1), crumb(1), dinosaur(1), dog(4), doggie(3), doll(1), donkey(1), door(2), duck(4), egg(1), elbow(1), elephant(3), family(1), fish(2), floor(1), flower(1), fox(2), girl(2), gob(1), grass(1), head(1), horse(8), house(2), ice(1), juice(1), kangaroo(1), kite(1), kitty(3), lady(2), lamb(2), lipstick(1), lock(1), man(5), mommy(2), monkey(1), moon(6), mouse(3), neck(1), nest(1), nurse(1), ocean(1), paper(1), pig(2), powder(1), rabbit(4), radio(1), rest(1), roof(1), rooster(1), sleeper(1), smoke(2), squirrel(2), sun(6), this(1), tray(1), truck(4), window(1), zoo(1)

you __ the (446 tokens, 103 types)

arranging(1), at(1), ate(1), bite(1), boo(1), break(1), bring(7), broke(2), catch(3), chase(1), cleaning(1), close(6), closed(3), closing(3), cook(1), count(1), cover(1), crack(5), cracked(1), cracking(2), cut(1), cutting(4), do(1), draw(2), drawing(1), drink(1), drop(1), dropped(6), eat(1), eating(3), fed(4), feed(8), find(11), finish(1), fit(4), fix(3), for(3), found(2), get(14), give(6), giving(1), got(6), hang(1), have(3), hear(5), held(1), hide(1), hitting(1), hold(6), hugging(1), hurt(1), keep(1), knock(1), know(1), leave(1), let(1), like(27), loving(1), make(14), making(6), mean(1), moving(1), on(4), open(4), pat(2), patting(2), paying(1), pick(1), popped(1), pull(1), pulling(3), push(3), pushing(3), put(73), putting(21), read(9), reading(1), remember(4), roll(4), rolling(1), run(1), say(1), see(14), shut(2), smell(1), spilled(1), take(17), taking(4), tearing(2), think(4), throw(5), throwing(1), to(2), took(4), tore(1), turn(6), unscrew(1), unsticking(1), unwind(1), unwrapping(1), use(1), want(30), washing(1), with(4)

a __ one (50 tokens, 17 types)

big(5), black(2), blue(4), brown(2), chocolate(1), fine(1), grape(2), green(7), little(7), new(4), nice(3), nother(1), purple(5), red(2), touch(1), white(1), yellow(2)

SPANISH MID-FRAMES

la __ de (247 tokens, 121 types)

actuación_n(1), amiga_n(3), barba_n(1), basura_n(1), boca_n(1), bolsa_n(4), botella_n(1), busque_n(1), cabeza_n(2), caca_n(2), caja_n(1), cajita_n(1), cama_n(3), cámara_n(1), camisa_n(1), camiseta_n(1), canción_n(20), cancioncilla_n(1), cantabas_n(1), cantidad_n(1), cara_n(15), carita_n(2), casa_n(9), casita_n(12), chaqueta_n(1), cinta_n(1), cola_n(1), colita_n(1), comida_n(3), cosita_n(1), cremallera_n(1), cuna_n(2), cunita_n(2), electricidad_n(1), encima_adv(1), era_n(3), espalda_n(3), esponja_n(2), esquina_n(1), falda_n(1), flor_n(2), foto_n(2), gorra_n(1), granja_n(1), habitación_n(4), hermana_n(2), historia_n(7), hoja_n(2), hojita_n(1), hora_n(2), jefa_n(1), llave_n(1), madrastra_n(3), madre_n(1), maestra_n(2), mamá_n(12), manina_n(1), manito_n(3), mano_n(11), manta_n(1), manzanilla_n(1), merienda_n(1), mesita_n(1), mitad_n(2), muevas_v(2), musiquita_n(3), naricina_n(1), naricita_n(1), nariz_n(1), novia_n(1), oreja_n(1), orejina_n(1), paciencia_n(1), papita_n(2), parte_n(2), patita_n(1), película_n(5), pelota_n(2), piel_n(1), pierna_n(2), pieza_n(1), plusvalía_n(1), posición_n(1), puerta_n(1), pulsera_n(1), resolución_n(1), revista_n(4), rota_adj(1), sabes_v(1), señora_n(4), sensación_n(1), silla_n(2), tapa_n(2), tarta_n(2), tenemos_v(1), tienda_n(8), trata_v(1), última_adj(1), uña_n(2), vaca_n(2), vana_n(1), vida_n(1), voz_n(3), zapatilla_n(1)

le __ a (169 tokens, 42 types)

aviso_v(1), cantamos_v(1), cogiste_v(1), cuentas_v(2), cuentes_v(3), da_v(1), dabas_v(1), das_v(1), decía_v(1), decías_v(1), dice_v(3), dices_v(10), dices_v(9), digas_v(1), dijiste_v(8), doy_v(1), enseñas_v(1), gu_v(1), gusta_v(3), gustaban_v(1), gustan_v(1), hace_v(1), haces_v(1), hiciste_v(3), iba_v(1), ibas_v(3), pasa_v(4), pasaba_v(2), paso_v(2), pediste_v(1), pegas_v(1), perdio_v(2), pone_v(1), quieres_v(1), regalamos_v(1), toca_v(5), va_v(14), vamos_v(16), van_v(1), vas_v(60), ve_v(1), voy_v(5)

un __ de (122 tokens, 37 types)

amigo_n(1), bastoncillo_n(1), beso_n(1), bibi_n(1), bocata_n(1), cachito_n(1), cartel_n(1), censo_n(1), chanchito_n(1), cuento_n(1), daikiri_n(1), dibujito_n(2), galito_n(1), gnomo_n(1), hueso_n(2), huevo_n(1), juego_n(2), librito_n(2), lunar_n(1), osito_n(1), paquete_n(2), par_n(2), pedacito_n(4), pellejito_n(1), pendiente_n(2), poco_adj(30), poquitito_adj(1), poquito_adj(45), pupuyu_n(1), rojo_adj(1), solomillo_n(1), sorbito_n(3), tipo_n(1), trozo_n(1), vaso_n(1), zapato_n(2)

ENGLISH END-FRAMES

that__. (378 tokens, 158 types)

afterwards(1), airplane(2), alright(3), animal(4), apart(1), awful(1), baby(3), be(1), becca(1), better(3), bibbie(1), blanket(2), block(1), blouse(1), blue(1), book(5), bottle(3), box(3), boy(2), button(3), cake(1), called(15), card(2), carton(1), cereal(1), chair(1), chirp(1), clay(1), cookie(1), corner(2), cute(1), d(1), daddy(1), darling(1), dog(3), doggy(2), dolly(3), door(1), easterbunny(2), either(1), eve(9), first(1), fit(3), flower(1), foot(2), for(2), frasers(1), frosting(1), fun(1), funny(3), game(2), girl(2), go(3), good(3), gun(1), hand(1), hard(1), hat(1), help(1), here(2), hole(3), home(3), honey(6), horse(2), hot(2), house(3), hurt(2), hurts(1), in(1), ink(1), is(18), it(4), jar(1), jumped(1), kangaroo(1), kitty(1), knife(1), lamb(1), leila(1), letter(1), lion(1), long(1), makes(1), man(4), many(2), mean(1), means(2), mess(1), milk(1), mom(2), naomi(2), necessary(1), needs(1), nina(3), noise(6), nomi(24), off(2), okay(1), on(3), one(25), page(1), painful(1), papa(1), paper(2), part(3), pen(2), pencil(1), picture(13), piece(2), pillow(1), plate(2), please(1), pool(1), pot(1), present(2), pretty(1), puppet(1), puzzle(1), rabbit(2), racketyboom(2), rain(1), red(1), right(8), round(1), sarah(2), seat(1), shoe(2), side(2), song(1), sounds(1), soup(1), space(1), spell(1), spells(2), spoon(1), squeezes(1), stick(1), stool(1), story(2), stuff(1), sweetie(3), then(1), there(3), thing(2), tonight(1), too(2), tower(1), toy(1), train(2), tree(1), trip(1), two(1), valentine(2), water(1), way(14), what(1), window(2), you(1)

it__. (223 tokens, 152 types)

again(35), all(4), alone(7), along(2), anymore(3), apart(3), around(4), away(16), awhile(2), back(12), back(7), becca(1), before(1), belong(4), belongs(1), better(2), big(1), black(1), blue(3), break(2), bright(1), broke(3), broken(2), called(6), carefully(1), clean(1), cold(3), comes(3), cromer(1), cute(4), did(2), didn't(1), do(2), does(5), doesn't(1), doing(2), down(11), dried(1), drip(1), drop(1), either(3), eve(8), fantastic(1), feel(2), fell(1), first(6), fits(3), fixed(1), flies(1), fly(1), from(2), fun(8), funny(3), go(19), goes(7), going(3), goldie(1), gone(1), good(13), green(1), hard(4), here(11), hit(1), hmm(1), honey(12), horse(1), hot(2), hurt(6), hurts(3), in(24), into(2), is(115), isn't(1), itches(1), later(5), linda(1), look(1), louder(1), melted(1), moves(1), need(1), nina(1), no(2), nomi(27), nomi's(1), normally(1), now(4), off(40), okay(1), on(23), open(2), out(11), out(14), please(1), popped(1), pretty(1), rachel(1), raining(3), rattles(1), red(2), rest(1), right(2), rolled(1), rubs(1), sad(1), say(2), scary(1), shut(1), sideways(1), silly(1), sleeping(1), slowly(1), softly(1), somewhere(2), special(1), spin(1), stew(1), stop(1), sunny(1), sways(2), sweetheart(2), sweetie(1), swims(1), that(1), then(4), there(8), this(1), to(3), today(2), together(9), tomorrow(1), too(7), tore(1), untied(1), up(27), warm(1), was(5), wasn't(1), went(2), what(1), where(2), whistles(2), whole(1), will(3), with(2), work(1), works(2), would(1), yeah(4), yes(4), yet(3), yourself(4)

is__. (728 tokens, 120 types)

asleep(1), awake(1), baby(2), barking(1), better(1), bigger(2), billowing(1), black(1), blue(7), broken(6), called(7), clipclop(1), cold(3), colleen(1), coming(1), cool(2), cromer(2), crying(1), dancing(1), different(1), doing(1), dolly(3), drawing(1), eating(4), empty(2), eve(7), evecummings(1), fine(1), finished(1), fraser(2), frasers(1), fred(1), froggy(1), frosty(1), funny(1), furniture(1), good(1), grass(1), green(1), happening(1), hard(1), he(37), heavy(1), here(3), home(1), honey(4), hot(5), humm(1), hungry(1), indeed(2), inside(1), it(179), jackie(1), jenko(1), jim(1), jumping(1), kanga(1), left(1), lemon(2), light(1), like(1), little(1), locked(1), missing(1), mommys(1), next(1), nice(2), nina(2), nomi(9), now(1), nuzzling(1), ohio(3), ok(1), one(1), open(1), out(3), outside(1), painting(1), papa(5), racketyboom(1), raking(1), reading(1), ready(1), red(4), ricci(1), right(1), roo(1), running(1), saying(1), she(17), sleeping(6), smaller(1), snoopy(1), soap(1), soup(1), stuck(3), sugar(1), tea(1), that(212), there(4), this(80), time(1), timmy(1), tired(3), today(1), too(1), upsidedown(1), walking(1), weak(1), wearing(1), what(4), where(1), white(1), who(1), woopsie(1), working(1), yawning(1), yellow(4), yes(1), yours(1)

SPANISH END-FRAMES

los __. (369 tokens, 171 types)

abrimos_v(1), amiguetes_n(1), ángeles_n(1), animales_n(1), anises_n(1), autobuses_n(1), bambis_n(2), barcos_n(2), barriletes_n(3), bebes_n(3), bebitos_n(1), besos_n(1), bibis_n(2), bichos_n(1), bocatas_n(1), bracitos_n(3), brazos_n(2), caballitos_n(1), caballos_n(2), cables_n(1), cacharritos_n(1), cacharros_n(1), calcetines_n(3), carros_n(2), catalanes_n(1), cereales_n(1), chicos_n(1), chiquitos_n(1), ciervos_n(1), coches_n(2), colores_n(1), columpios_n(5), comen_v(1), como_v(1), conejitos_n(1), conejos_n(2), conoce_v(2), contamos_v(2), cordones_n(1), cristales_n(1), cucharros_n(1), cuento_v(1), cuentos_n(3), de_prep(3), deditos_n(3), dedos_n(1), demás_adj(13), descuidas_v(1), diablitos_n(1), días_n(2), dibujitos_n(1), dibujos_n(1), dientes_n(7), dos_adj(9), elefantes_n(2), elotes_n(1), elotitos_n(1), enemigos_n(1), enseño_v(1), fantasmas_n(1), fideos_n(1), gallegos_n(2), gallos_n(1), ganó_v(1), gato_n(1), guardamos_v(1), guardes_v(1), guardo_v(1), helados_n(1), hiciste_v(1), honguitos_n(6), huesos_n(1), jamases_n(1), juguetes_n(10), lavo_v(1), leotardos_n(2), libritos_n(5), mayores_adj(1), meto_v(1), mezclamos_v(2), mickeys_n(1), mimitos_n(1), mocos_n(2), moquetes_n(1), moquitos_n(1), moros_n(2), mosqueteros_n(2), mosquitos_n(1), muebles_n(2), muñecos_n(3), muñequitos_n(4), nenes_n(12), nervios_n(2), niñitos_n(1), niños_n(15), números_n(3), ojinos_n(3), ojitos_n(5), ojos_n(8), ositos_n(2), osos_n(3), otros_adj(3), pajarinos_n(2), pajaritos_n(1), palitos_n(1), palotes_n(1), pantalones_n(4), papás_n(3), papelitos_n(1), pasajeros_n(1), patitos_n(5), patos_n(2), peces_n(3), pedacitos_n(1), pedales_n(2), pegotitos_n(1), pellejitos_n(1), pelos_n(1), pequeñitos_adj(1), perdio_v(1), perritos_n(7), perros_n(1), pescaditos_n(4), petes_n(1), piececitos_n(1), pies_n(15), pifufos_n(5), pisos_n(1), pollitos_n(2), ponemos_v(1), pongas_v(1), poquís_n(1), portales_n(1), primos_n(1), que_pron(1), quesitos_n(1), quiere_v(1), quieres_v(1), quillitos_n(2), recoges_v(1), regalamos_v(1), regaló_v(1), regalo_v(1), regalos_n(1), reyesmagos_n(7), sacó_v(2), saltitos_n(1), sapos_n(1), seco_v(2), semaforos_n(1), señores_n(4), silloncillos_n(2), techos_n(1), tiene_v(1), titos_n(1), toquen_v(1), tumbamos_v(1), tuyos_pron(4), vagones_n(7), varones_n(1), vecinos_n(1), verá_v(1), ves_v(1), vestidos_n(1), videos_n(1), vio_v(1), viris_v(1), zapatitos_n(6), zapatos_n(8)

te __. (310 tokens, 134 types)

acalores_v(1), acercas_v(1), aclaras_v(2), acuerdas_v(39), advierto_v(1), ahogas_v(2), alejes_v(1), apetece_v(3), apuestas_v(2), atreves_v(1), ayudo_v(2), bajamos_v(1), bañamos_v(1), bañas_v(1), bañaste_v(2), busco_v(1), caes_v(4), caigan_v(1), caigas_v(1), cante_v(1), cantemos_v(1), cas_v(1), castiga_v(1), coja_v(1), conozco_v(1), costipas_v(1), cuente_v(2), cuento_v(1), da_v(2), decia(1), dejo(2), dice(4), dicen(2), diga(1), digo(13), dija(1), dijo(4), dio(1), diste(4), dolIa(1), doy(1), duele_v(3), eh_int(1), enfadabas_v(1), enfadas_v(3), enfadruscas_v(1), enseño_v(1), ensucias_v(1), entendí_v(1), enteras_v(1), entiende_v(1), entiendo_v(7), escapes_v(1), escondas_v(4), escuche_v(1), escucho_v(1), escurres_v(1), explico_v(1), expliques_v(1), gusta_v(27), gustan_v(4), gusto_v(3), hace_v(5), haga_v(1), hago_v(1), hizo_v(1), hundes_v(1), importa_v(2), lavan_v(1), levantas_v(1), levantaste_v(1), llamas_v(6), lleva_v(1), manchas_v(3), manches_v(2), mates_v(1), mato_v(1), moja_v(2), mojaste_v(1), mojes_v(1), molesta_v(2), moleste_v(1), muerda_v(1), ocurra_v(1), oí_v(2), oigo_v(1), oye_v(1), parece_v(15), parezca_v(1), pasa_v(11), pasaba_v(1), paso_v(1), pega_v(1), pegas_v(1), peinamos_v(1), peino_v(2), perdono_v(1), pinchar_v(1), pone_v(3), pongo_v(1), preocupes_v(2), puso_v(1), qué_wh(1), quemo_v(1), quiero_v(1), quito_v(1), quitó_v(1), rebelaste_v(1), ries_v(2), riño_v(1), sabes_v(2), sabrías_v(1), saco_v(1), sale_v(1), salió_v(1), seca_v(2), seco_v(1), seque_v(2), sequemos_v(1), sientas_v(3), sientes_v(1), tiraste_v(1), va_v(1), vas_v(2), vayas_v(1), ve_v(6), veamos_v(1), veía_v(1), ven_v(1), veo_v(4), vestimos_v(2), viste_v(2)

está __. (287 tokens, 158 types)

abajo_adv(1), abierto_adj(1), acá_adv(1), acabando_v(2), adónde_wh(1), agarrado_v(1), agusto_adj(1), ahí_adv(19), alto_adj(1), angelines_n(1), apagada_adj(1), apísima_adj(1), apuntando_v(1), aquía_adv(14), arregladita_adj(1), arregladito_adj(1), así_adv(1), bailando_v(1), bañando_v(5), batido_adj(1), bien_adv(18), bonito_adj(2), buenísimo_adj(1), bueno_adj(5), cabeza_n(1), calentito_adj(1), caliente_adj(6), cansada_adj(1), cansadito_adj(1), cariño_n(1), carlitos_n(2), carne_n(1), carnina_n(1), cerrada_adj(1), cerrado_adj(2), chica_adj(1), chupando_v(1), colonia_n(1), columpiando_v(1), comiendo_v(1), cuqui_n(1), dentro_adv(1), descalzada_adj(1), descalzo_adj(1), descansando_v(2), diciendo_v(3), disfrazado_adj(1), distraído_adj(1), donde_con(1), dormida_adj(1), dormido_adj(1), dura_adj(1), durmiendo_v(2), duro_adj(1), eh_int(3), en_prep(1), enfermita_adj(1), enseñasela_v(1), es_v(1), escuchando_v(1), esponja_n(1), esta_n(3), estornudando_v(1), fea_adj(2), fría_adj(1), frio_adj(1), grabando_v(1), gracias_int(1), gritando_v(2), grover_n(3), guapa_adj(4), guapo_adj(1), guardada_adj(1), guardadita_adj(1), gustando_v(1), hablando_v(1), haciendo_v(25), hay_v(1), hija_n(3), igual_adj(1), irene_n(3), jugando_v(4), koki_n(3), la_det(1), ladrando_v(1), limpio_adj(1), limpita_adj(3), limpito_adj(1), linda_adj(1), llorando_v(5), lloviendo_v(4), luis_n(2), mal_adv(1), mala_adj(1), malito_adj(1), mamá_n(2), mami_n(1), mano_n(1), maría_n(4), masticadin_adj(1), mejor_adj(1), merendando_v(1), mintiendo_v(1), mirando_v(1), mojado_adj(1), nada_n(1), nena_n(2), niña_n(5), no_adv(2), ocupado_adj(1), oscuro_adj(2), otra_adj(2), paco_n(1), papá_n(5), pedrin_n(1), peque_adj(1), pequeño_adj(2), perrita_n(1), peterpan_n(3), pieza_n(1), prohibido_adj(1), pulsera_n(1), qué_wh(7), quedando_v(2), quién_wh(1), quilli_n(1), quique_n(1), retorcido_adj(1), revista_n(1), rica_adj(4), rico_adj(12), rota_adj(2), rotita_adj(1), rotito_adj(1), roto_adj(2), sacando_v(1), santi_n(1), seco_adj(1), sentado_adj(3), solito_adj(2), sucio_adj(2), suficiente_adj(1), tampoco_adv(1), tarde_adv(4), terminamos_v(1), tito_n(1), tocando_v(1), todo_adj(3), tolumpiando_v(1), triste_adj(1), usado_adj(1), vacilando_v(1), vale_int(1), verdad_int(2), vestida_adj(1)

Footnotes

[*]

This research was supported by NIH R01 HD30410 to Waxman. Portions of this research were presented at the meetings of the Society for Research in Child Development (2007). We are indebted to E. Leddon for extensive discussions, and to M. Kaufman and T. Piccin for methodological expertise.

[1] All of the frames in the analyzed set surpassed a frequency threshold of 5% in proportion to the total number of frames in each corpus. Moreover, when we restricted the analysis to the set of twenty-five most frequent frames we obtained comparable results, suggesting that the results are robust even under more restricted conditions.

[2] We note here there were eight English frames that contained only three to five word-types. These frames were excluded from further analysis so that they would not artificially inflate Accuracy. Their inclusion would only increase the size of our effect.

[3] There was a high degree of consistency among the frequent frames in the three corpora in each language: 66% of the frequent frames were found in at least two of the three corpora (52% of mid-frames, 79% of end-frames). Moreover, when we submitted the accuracy scores from each corpus to a Language (English vs. Spanish) by Frame-type (mid-frames vs. end-frames) ANOVA, we replicated the main effects of language and frame-type. This provides further validation for aggregating over the corpora in each language. In addition, Accuracy in our samples was not correlated with either the number of utterances in the corpus, or the percentage of corpus categorized. This provides assurances that the differences between English and Spanish reported here cannot be attributed to either of these factors.

References

REFERENCES

Brown, R. (1973). A first language: the early stages. Cambridge, MA: Harvard University Press.CrossRef Google Scholar

Cartwright, T. A. & Brent, M. R. (1997). Syntactic categorization in early language acquisition: Formalizing the role of distributional analysis. Cognition 63(2), 121–70.CrossRef Google Scholar PubMed

Chemla, E., Mintz, T. H., Bernal, S. & Christophe, A. (in press). Categorizing words using frequent frames: What cross-linguistic analyses reveal about distributional acquisition strategies. Developmental Science.Google Scholar

Chomsky, N. (1965). Aspects of the theory of syntax. Boston, MA: MIT Press.Google Scholar

Fernald, A. & McRoberts, G. (1993). Effects of prosody and word position on infants' lexical comprehension. Paper presented at the Boston University Conference on Language Development, Boston.Google Scholar

Fernald, A. & Mazzie, C. (1991). Prosody and focus in speech to infants and adults. Developmental Psychology 27(2), 209–221.CrossRef Google Scholar

Fernald, A. & Simon, T. (1984). Expanded intonation contours in mothers' speech to newborns. Developmental Psychology 20, 104–113.CrossRef Google Scholar

Fiser, J. & Aslin, R. N. (2002). Statistical learning of new visual feature combinations by infants. Proceedings of the National Academy of Sciences 99(24), 15822–26.CrossRef Google Scholar PubMed

Gleitman, L. R. & Wanner, E. (1982). Language acquisition: The state of the art. In Wanner, E. & Gleitman, L. R. (eds), Language acquisition: The state of the art, 3–48. Cambridge: Cambridge University Press.Google Scholar

Gómez, R. L. (2002). Variability and detection of invariant structure. Psychological Science 13(5), 431–36.CrossRef Google Scholar PubMed

Gómez, R. L. & Maye, J. (2005). The developmental trajectory of nonadjacent dependency learning. Infancy 7(2), 183–206.CrossRef Google Scholar PubMed

Harris, Z. S. (1951). Methods in structural linguistics. Chicago, IL: University of Chicago Press.Google Scholar

Hirsh-Pasek, K., Kemler Nelson, D. G., Jusczyk, P. W., Wright Cassidy, K., Druss, B. & Kennedy, L. (1987). Clauses are perceptual units for young infants. Cognition 26(3), 269–86.CrossRef Google Scholar PubMed

Kiss, G. R. (1973). Grammatical word classes: A learning process and its simulation. Psychology of Learning and Motivation 7, 1–41.CrossRef Google Scholar

López-Ornat, S., Fernández, A., Gallo, P. & Mariscal, S. (1994). La Adquisición de la Lengua Española [The Acquisition of the Spanish Language], Madrid: Siglo XXI.Google Scholar

MacWhinney, B. (2000). The CHILDES Project: Tools for anlyzing talk, 3rd edn, vol. 2: The Database. Mahwah, NJ: Lawrence Erlbaum Associates.Google Scholar

Maratsos, M. (1988). The acquisition of formal word classes. In Levy, Y., Schlesinger, I. M. & Braine, M. D. S. (eds), Categories and processes in language acquisition, 21–44. Hillsdale, NJ: LEA.Google Scholar

Maratsos, M. P. & Chalkley, M. A. (1980). The internal language of children's syntax: The ontogenesis and representation of syntactic categories. In Nelson, K. E. (ed.), Children's language, vol. 2, 127–214. New York: Gardner.Google Scholar

Mintz, T. H. (2002). Category induction from distributional cues in an artificial language. Memory and Cognition 30, 678–86.CrossRef Google Scholar

Mintz, T. H. (2003). Frequent frames as a cue for grammatical categories in child directed speech. Cognition 90(1), 91–117.CrossRef Google Scholar PubMed

Mintz, T. H. (2008). Distributional analysis as a method for categorizing words in infancy. Paper presented at the International Conference for Infant Studies, Vancouver, Canada.Google Scholar

Mintz, T. H., Newport, E. L. & Bever, T. G. (2002). The distributional structure of grammatical categories in speech to young children. Cognitive Science 26(4), 393–424.CrossRef Google Scholar

Montes, R. (1987). Secuencias de clarificación en conversaciones con niños [Clarification sequences in conversations with children]. Morphe 3–4: Universidad Autónoma de Puebla.Google Scholar

Morgan, J. L. & Newport, E. L. (1981). The role of constituent structure in the induction of an artificial language. Journal of Verbal Learning and Verbal Behavior 20, 67–85.CrossRef Google Scholar

Ojea, A. (2000). Llinàs-Ojea corpus: Irene. In MacWhinney, B. (ed.), The CHILDES database.Google Scholar

Pinker, S. (1987). The bootstrapping problem in language acquisition. In MacWhinney, B. (ed.), Mechanisms of language acquisition, 399–442. Hillsdale, NJ: Lawrence Erlbaum Associates.Google Scholar

Redington, M., Chater, N. & Finch, S. (1998). Distributional information: A powerful cue for acquiring syntactic categories. Cognitive Science 22(4), 425–69.CrossRef Google Scholar

Sachs, J. (1983). Talking about the there and then: The emergence of displaced reference in parent–child discourse. In Nelson, K. E. (ed.), Children's language, 1–28. Hillsdale, NJ: Lawrence Erlbaum Associates.Google Scholar

Shady, M. & Gerken, L. (1999). Grammatical and caregiver cues in early sentence comprehension. Journal of Child Language 2, 163–75.CrossRef Google Scholar

Shi, R., Werker, J. F. & Morgan, J. L. (1999). Newborn infants' sensitivity to perceptual cues to lexical and grammatical words. Cognition 72, B11–B21.CrossRef Google Scholar PubMed

Slobin, D. I. (1973). Cognitive prerequisites for the acquisition of grammar. In Ferguson, C. A. & Slobin, D. I. (eds), Studies of child language development, 175–208. New York: Holt, Rinehart & Winston.Google Scholar

Snyder, W., Senghas, A. & Inman, K. (2001). Agreement morphology and the acquisition of Noun-drop in Spanish. Language Acquisition 9(2), 157–73.CrossRef Google Scholar

Suppes, P. (1974). The semantics of children's language. American Psychologist 29, 103–114.CrossRef Google Scholar

Torrego, E. (1987). Empty categories in nominals. Unpublished manuscript, University of Massachusets, Boston.Google Scholar

Waxman, S. R. & Booth, A. E. (2003). The origins and evolution of links between word learning and conceptual organization: New evidence from 11-month-olds. Developmental Science 6(2), 130–37.CrossRef Google Scholar

Waxman, S. R., Braun, I. E. & Weisleder, A. (in prep) The breadth of adjective learning in English- and Spanish-acquiring infants.Google Scholar

Waxman, S. R. & Guasti, M. T. (in press). Linking nouns and adjectives to meaning: New evidence from Italian-speaking children. Language Learning and Development.Google Scholar

Waxman, S. R. & Lidz, J. (2006). Early word learning. In Kuhn, D. & Siegler, R. (eds), Handbook of child psychology, 6th edn, vol. 2, 299–335. Hoboken, NJ: Wiley.Google Scholar

Waxman, S. R., Senghas, A. & Benveniste, S. (1997). A cross-linguistic examination of the noun-category bias: Its existence and specificity in French- and Spanish-speaking preschool-aged children. Cognitive Psychology 32(3), 183–218.CrossRef Google Scholar PubMed

TABLE 1. Descriptive statistics for each corpus

TABLE 2. Accuracy scores (token frequency) for each frame-type in each corpus

TABLE 3. Accuracy scores for each language, frame-type and grammatical class

Article contents

What's in the input? Frequent frames in child-directed speech offer distributional cues to grammatical categories in Spanish and English*

Abstract

METHOD

Input corpora

Distributional analysis procedure

Gathering the frames

Selecting the frames

Identifying the intervening words

Quantitative evaluation procedure

Baseline categorization: comparison to chance

RESULTS AND DISCUSSION

Comparing frequent frames (actual) to baseline (chance)

Comparing across languages, frame-types and grammatical class

GENERAL DISCUSSION

APPENDIX

ENGLISH MID-FRAMES

SPANISH MID-FRAMES

ENGLISH END-FRAMES

SPANISH END-FRAMES

Footnotes

References

REFERENCES

Save article to Kindle

Save article to Dropbox

Save article to Google Drive

Reply to: Submit a response

Your details

You have entered the maximum number of contributors

Conflicting interests