INTRODUCTION
Children with significant early language delays around age 2 ; 0 are likely to display persisting developmental problems and difficulties in school (Shevell, Majnemer, Platt, Webster & Birnbaum, Reference Shevell, Majnemer, Platt, Webster and Birnbaum2005; Rescorla & Alley, Reference Rescorla and Alley2001). Early language delays have been associated with negative child outcomes such as grade retention, ongoing enrollment in special education services, academic problems in reading and math, as well as psychosocial and behavioral problems (McCabe & Marshall, Reference McCabe and Marshall2006; Scarborough, Reference Scarborough, Neuman and Dickinson2001; NICHD, 2005; Hirsh-Pasek & Golinkoff, Reference Hirsh-Pasek, Golinkoff, Odom, Pungello and Gardner-Neblettin press). Thus, assessment of early language skills prior to school entry is crucial to guiding prevention and intervention efforts.
Researchers have used multiple methods for assessing early language development (e.g. standardized instruments, parental reports and conversational interactions) (Roberts, Burchinal & Durham, Reference Roberts, Burchinal and Durham1999; Feldman, Dale, Campbell, Colborn, Kurs-Lasky, Rockette & Paradise, Reference Feldman, Dale, Campbell, Colborn, Kurs-Lasky, Rockette and Paradise2005). Parent report measures are often preferred over other measures since they are inexpensive to administer and do not require trained administrators (Pan, Rowe, Spier & Tamis-Lemonda, Reference Pan, Rowe, Spier and Tamis-Lemonda2004; Hall & Segarra, Reference Hall and Segarra2007). Given that it is important to capture early language skills using reliable and easy-to-use measures, this longitudinal study examines the predictive validity of the MacArthur Communicative Developmental Inventory Short Form, a brief parent report vocabulary checklist used to assess toddlers' expressive vocabulary.
MacArthur Communicative Developmental Inventories (CDIs)
The CDIs are parent report instruments used to obtain information about children's language and communication skills (Fenson, Marchman, Thal, Dale, Reznick & Bates, Reference Fenson, Marchman, Thal, Dale, Reznick and Bates2007). Both long (CDI-LF) and short (CDI-SF) versions exist, although the CDI-LF has been more widely studied (e.g. Feldman, Dollaghan, Campbell, Kurs-Lasky, Janosky & Paradise, Reference Feldman, Dollaghan, Campbell, Kurs-Lasky, Janosky and Paradise2000; Feldman et al., Reference Feldman, Dale, Campbell, Colborn, Kurs-Lasky, Rockette and Paradise2005). CDI-LF has two versions, CDI: Words and Gestures, for children ages 0 ; 8 to 1 ; 4, and CDI: Words and Sentences, for children ages 1 ; 4 through 2 ; 6 (Fenson, Dale, Reznick, Bates, Thal & Pethick, Reference Fenson, Dale, Reznick, Bates, Thal and Pethick1994; Fenson et al., Reference Fenson, Marchman, Thal, Dale, Reznick and Bates2007). CDI-SF is available for children between the ages of 0 ; 8 and 1 ; 6 (Level I) and 1 ; 4 and 2 ; 6 (Level II) (Fenson et al., Reference Fenson, Pethick, Renda, Cox, Dale and Reznick2000).
The current research focuses on the CDI-SF Level II form that draws its items from the CDI-LF. The simulated correlations between CDI-SF Level II and the full CDI vocabulary production scale range between 0·90 and 0·95 (Fenson et al., Reference Fenson, Pethick, Renda, Cox, Dale and Reznick2000). There have been some attempts to study the validity of the CDI short forms (Corkum & Dunham, Reference Corkum and Dunham1996; Pan et al., Reference Pan, Rowe, Spier and Tamis-Lemonda2004). For example, Pan et al. (Reference Pan, Rowe, Spier and Tamis-Lemonda2004) found that the CDI-SF scores of low-income children were moderately associated with spontaneous speech measures, and predicted receptive vocabulary skills at age 3 ; 0. Corkum and Dunham (Reference Corkum and Dunham1996) reported a moderate correlation between CDI-SF scores at age 2 ; 0 and verbal IQ scores at age 4 ; 0. Given the promising psychometric properties of the CDI-SF, it is important that its predictive validity is further examined as research has only followed children to ages 2 ; 0 and 3 ; 0. No studies to date have followed children beyond preschool to examine how the CDI-SF: (a) relates to later language upon school entry; (b) predicts distinct aspects of language longitudinally; and (c) predicts a comprehensive language measure such as the Diagnostic Evaluation of Language Variation Test (DELV; Seymour, Roeper & DeVilliers, Reference Seymour, Roeper and DeVilliers2005). The present four-year longitudinal study fills these gaps and offers the promise of an easy to administer research tool with predictive validity.
Early language skills predict later language and literacy skills
Research demonstrates the continuity of language skills over time (Hart & Risley, Reference Hart and Risley1995; Storch & Whitehurst, Reference Storch and Whitehurst2002; Scarborough, Reference Scarborough, Neuman and Dickinson2001; Dickinson, McCabe, Anastasopoulos, Peisner-Feinberg & Poe, Reference Dickinson, McCabe, Anastasopoulos, Peisner-Feinberg and Poe2003; NICHD, 2005; Hirsh-Pasek & Golinkoff, Reference Hirsh-Pasek, Golinkoff, Odom, Pungello and Gardner-Neblettin press). Studies utilizing the CDI-LF show that parent reports of vocabulary skills may be useful indicators of language acquisition. For example, moderate to strong associations were found between concurrent measures of two-year-olds' CDI-LF vocabulary production and spontaneous vocabulary use (Dunham & Dunham, Reference Dunham and Dunham1992; Fenson et al., Reference Fenson, Dale, Reznick, Bates, Thal and Pethick1994). Expressive vocabulary on the CDI-LF strongly correlated with concurrent measures of child expressive vocabulary (r=0·78; mean age=2 ; 1) (Ring & Fenson, Reference Ring and Fenson2000). Scores on the CDI-LF at age three correlated with scores on tests of cognition and receptive language at age three (Feldman et al., Reference Feldman, Dale, Campbell, Colborn, Kurs-Lasky, Rockette and Paradise2005). Studies also showed predictive results. CDI-LF performance at age two correlated with cognitive and receptive language skills at age three (Feldman et al., Reference Feldman, Dale, Campbell, Colborn, Kurs-Lasky, Rockette and Paradise2005). More recently, Lee (Reference Lee2011) found that total vocabulary size at age two measured by CDI-LF significantly predicted subsequent language achievement up to fifth grade.
Other studies have also shown long-term relations between early vocabulary skills and later language performance. Hart and Risley (Reference Hart and Risley1995) reported that three-year-olds' vocabulary skills significantly predicted their language competence at ages 9 ; 0 and 10 ; 0. The NICHD Early Child Care Research Network (2005) reported that oral composite of expressive language and verbal comprehension at age 3 ; 0 was strongly correlated with expressive vocabulary and oral language composite scores at age 4 ; 6, which were in turn positively related to first grade expressive vocabulary skills. Given such longitudinal results between early CDI-LF scores and subsequent language outcomes, we would predict that CDI-SF would also significantly relate to later language skills.
While many research studies measure language with a composite score, Whitehurst and Lonigan (Reference Whitehurst and Lonigan1998) and others (e.g. NICHD, 2005) suggest that researchers go beyond looking exclusively at global scores to capture specific relationships between different language skills. For example, while receptive vocabulary correlated moderately and positively with syntactic awareness in first grade (Tunmer, Herriman & Nesdale, Reference Tunmer, Herriman and Nesdale1988), expressive vocabulary size on the CDI-SF was associated with growth in parent report of child grammar skills (word and sentence combinations) at ages 2 ; 0 and 3 ; 0 (Dionne, Dale, Boivin & Plomin, Reference Dionne, Dale, Boivin and Plomin2003). A more complex relationship has been found between vocabulary and pragmatics (i.e. language use within communicative context). Although better social communication skills may be associated with increased use of vocabulary by children, children with good vocabulary skills may have difficulty with pragmatics. For example, late-talking children at age 2 ; 0 caught up with their age-matched peers at age 5 ; 0 in expressive grammar and vocabulary, while their weaknesses remained in a number of higher-level language areas including narrative skills and use of pragmatic cues (Girolametto, Wiigs, Smyth, Weitzman & Pearce, Reference Girolametto, Wiigs, Smyth, Weitzman and Pearce2001). Thus, research shows that early language skills extend well beyond vocabulary in unique ways.
In this study, we first ask whether children's CDI-SF Level II expressive vocabulary scores (ages 1 ; 4–2 ; 6) predict language skills in kindergarten (ages 5 ; 6–6 ; 8). We hypothesize that early expressive vocabulary will relate to later vocabulary and related language domains, such as semantics and syntax, but not to pragmatic skills that focus on social uses of vocabulary and therefore are not a direct measure of vocabulary (Seymour et al., Reference Seymour, Roeper and DeVilliers2005). The acquisition of grammar and vocabulary are reciprocal processes (Dixon & Marchman, Reference Dixon and Marchman2007; Harris, Golinkoff & Hirsh-Pasek, Reference Harris, Golinkoff, Hirsh-Pasek, Neuman and Dickinson2012) in that these are developing at the same time and build on each other. For example, infants aged 0 ; 8 have proven sensitive to common grammatical function morphemes (such as mes in French) that then enable them to segment the nouns that follow mes in the speech stream, and then to focus on their meaning (Shi & Lepage, Reference Shi and Lepage2008).
Our second question explores kindergarten language skills in some detail by utilizing a relatively new measure, the DELV, which provides specific information on distinct language skills (i.e. semantic, syntactic, pragmatic skills) (Seymour et al., Reference Seymour, Roeper and DeVilliers2005). We predicted that the CDI-SF would have stronger links with language as opposed to code-related literacy measures (e.g. letter-naming fluency) four years later. Given the length of time between the administration of measures, the inconsistent literature regarding the direct relationship between early language skills and later code-related skills (Whitehurst & Lonigan, Reference Whitehurst and Lonigan1998; Storch & Whitehurst, Reference Storch and Whitehurst2002), and the possible influence of mediators that were not included (e.g. preschool literacy skills), we expected positive but weak relationships between early expressive vocabulary and emergent literacy measures in kindergarten as opposed to somewhat stronger relationships between both sets of language measures.
METHODS
Participants
The sample was composed of parents who provided the early CDI-SF expressive vocabulary scores when they visited the language lab of the third author before their children were age 2 ; 7. These same children returned when they were in kindergarten, ages 5 ; 6 to 6 ; 8 (n=76, mean age: 6 ; 1, SD=0 ; 3). At the early vocabulary data collection, age-at-CDI ranged between 1 ; 5 and 2 ; 6 (M=1 ; 10; SD=0 ; 3). At the time of literacy and language follow-ups, 62% were six-years-old (n=47, range=6 ; 0–6 ; 8, M=6 ; 2, SD=0 ; 2) and 38% were five-years-old (n=29, range=5 ; 6–6 ; 11, M=5 ; 9, SD=0 ; 1). All children were assumed to be typically developing, as no parent reported any hearing, vision or other developmental problems at either time on demographic forms. More girls participated (55% were female; 45% were male) and 91% of the children were identified as Caucasian, 3% were African American, 1·5% were multiracial, and 3% were of other ethnicities. All parents were English speakers, came from middle- to upper-middle-income families, and the majority were married (99%). Seventy-nine percent of mothers and 67% of fathers reported that they were at least college graduates, while 8% of mothers and 21% of fathers reported having some college education.
Data-collection procedures
Children were administered a set of language and literacy measures by highly trained pairs of graduate students who coded the protocols separately. The inter-rater reliability was calculated at r=0·98 across all protocols. Coding discrepancies were resolved by referring back to children's audiotaped responses.
Measures
Vocabulary development: Time 1
The CDI-SF (Level II) contains 100 words for parents to check if their children said the words. CDI-SF raw scores were used in all analyses. Reliability (i.e. Cronbach alphas ranging from 0·97 to 0·98), as well as content and concurrent validity of the CDI short forms are well established (Fenson et al., Reference Fenson, Pethick, Renda, Cox, Dale and Reznick2000).
Measures of language ability: Time 2
Raw scores were obtained from language measures at time 2. The Picture Vocabulary subtest from the Woodcock-Johnson Tests of Achievement-III measures expressive vocabulary skills, and requires identification of pictured objects at the single-word level (Woodcock, McGrew & Mather, Reference Woodcock, McGrew and Mather2001). The DELV Norm Referenced is a comprehensive speech and language test designed for children ages 4 ; 0 to 9 ; 11, which measures performance in syntax, semantics and pragmatics (Seymour et al., Reference Seymour, Roeper and DeVilliers2005). The DELV Syntax domain, composed of Wh-questions, Passives, and Articles subdomains, requires knowledge of how structures and meanings inter-relate. The DELV Semantics domain, composed of Verb Contrast, Preposition Contrast, Quantifiers, and Fast Mapping subdomains, measures the development of language skills related to word meanings. The DELV Pragmatics domain, categorized under Communicative Role-Taking, Short Narrative, and Question Asking subdomains, requires responses to communicative situations. The sum of raw scores across subdomains gives the domain score.
Code-related measures: Time 2
Letter naming fluency, decoding and word recognition skills:
DIBELS Letter Naming Fluency (LNF) requires the ability to name as many letters as possible on a page of random upper- and lower-case letters. Number of letters named correctly in one minute is the total score. The Letter–Word Identification subtest of the WJ-III Achievement Test (Woodcock et al., Reference Woodcock, McGrew and Mather2001) requires identification of letters and reading words. Number of correctly identified letters and read words gives the total score.
Phonological awareness skills:
The Incomplete Words subtest from the Woodcock-Johnson Tests of Cognitive Abilities-III (Woodcock et al., Reference Woodcock, McGrew and Mather2001) requires listening to words with phonemes missing and identifying the complete words. DIBELS Phoneme Segmentation Fluency Test (PSF) measures the ability to segment three- and four-phoneme words into their individual phonemes (Good, Kaminski & Smith, Reference Good, Kaminski, Smith, Good and Kaminski2002). The number of correct phonemes produced in one minute determines the final score. DIBELS Nonsense Word Fluency (NWF) is a standardized test of letter–sound correspondence and measures the ability to read nonsense words, or verbally produce the individual sound of each letter. The number of correct letter–sounds in one minute is the final score (Good et al., Reference Good, Kaminski, Smith, Good and Kaminski2002).
RESULTS
Descriptive statistics
Means and standard deviations of raw scores obtained from all measures are shown in Table 1. Participants performed within the average range for all assessments given. CDI-SF Level II expressive vocabulary scores reported prior to age 3 ; 0 fell in the average range (i.e. 45th percentile for girls and 65th percentile for boys) (Fenson et al., Reference Fenson, Pethick, Renda, Cox, Dale and Reznick2000). Similarly, mean DIBELS subtest scores were at (PSF) or above (LNF & NWF) the 40th percentile according to national kindergarten benchmarks provided by the University of Oregon Center on Teaching and Learning (2008).
Table 1. Means and standard deviations of language and code-related raw scores

notes: n=76; CDI-SF Level II Expressive Voc. refers to MacArthur Communicative Developmental Inventory, Short Form, Level II scores for children ages 1 ; 4 through 2 ; 6 (Fenson et al., Reference Fenson, Pethick, Renda, Cox, Dale and Reznick2000). DIBELS LNF, PSF and NWF refer to Dynamic Indicators of Basic Early Literacy Skills – letter naming fluency, phoneme segmentation fluency and nonsense word fluency scores, respectively (Good et al., Reference Good, Kaminski, Smith, Good and Kaminski2002). WJ-III Incomplete Words, Picture Vocabulary and Letter–word Identification refer to incomplete words, picture vocabulary and letter–word identification subtests from the Woodcock–Johnson III tests of cognitive abilities and of achievement, respectively (Woodcock et al., Reference Woodcock, McGrew and Mather2001). DELV Syntax, DELV Semantics and DELV pragmatics refer to syntax, semantics and pragmatics subdomains on the Diagnostic Evaluation of Language Variation, Norm Referenced Test (Seymour et al., Reference Seymour, Roeper and DeVilliers2005).
Not surprisingly, age-at-CDI and CDI-SF Level II expressive vocabulary scores had a moderate-to-strong correlation (r=0·64, p<0·01), which justified the use of partial correlations to remove the effect of age in correlations between the CDI and kindergarten measures (see Table 2). All the code-related skills in kindergarten were positively related to each other, having small to moderate-to-strong associations. Decoding and word recognition skills measured by WJ-III LWID subtest significantly and positively correlated with DIBELS NWF and DIBELS LNF (r=0·81, and r=0·43, p<0·01, respectively). Language outcomes in kindergarten were also moderately and positively associated with each other. Expressive vocabulary scores measured by the WJ-III picture vocabulary subtest correlated with syntax, semantics and pragmatics ability on the DELV (r=0·49, 0·47, and 0·36 (p<0·01), respectively). The correlations between code-related outcomes in kindergarten and concurrent language outcomes ranged from small and non-significant to significant and moderate. Kindergarten phonemic awareness skills measured by the WJ-III Incomplete Words subtest were positively and moderately associated with all language outcomes except for pragmatics. Semantics was correlated with all code-related outcomes except for letter naming fluency. Age-at-kindergarten testing did not correlate significantly with any child-language and code-related outcomes in kindergarten (p>0·05).
Table 2. Partial and Pearson correlations among all variable raw scores

notes: n=76;
* p<0·05;
** p<0·01.
CDI refers to CDI-SF Level II expressive vocabulary score on the MacArthur Communicative Developmental Inventory, Short Form (Fenson et al., Reference Fenson, Pethick, Renda, Cox, Dale and Reznick2000). AGE refers to children's age at the time of CDI. In parentheses, partial correlations are given between CDI Level II expressive vocabulary scores and kindergarten outcomes when age-at-CDI is controlled. LNF, PSF and NWF refer to Dynamic Indicators of Basic Early Literacy Skills, DIBELS, letter naming fluency, DIBELS phoneme segmentation fluency, and DIBELS nonsense word fluency scores, respectively (Good et al., Reference Good, Kaminski, Smith, Good and Kaminski2002). INC, LWID and PV are incomplete words, letter–word identification, and picture vocabulary subtests from Woodcock–Johnson III tests of cognitive abilities and of achievement, respectively (Woodcock et al., Reference Woodcock, McGrew and Mather2001). SYN, SEM and PR refer to syntax, semantics and pragmatics domains on the Diagnostic Evaluation of Language Variation, Norm Referenced Test, DELV (Seymour et al., Reference Seymour, Roeper and DeVilliers2005).
What is the relationship between CDI-SF Level II expressive vocabulary scores and language outcomes in kindergarten?
When the children's age-at-CDI was controlled, CDI-SF Level II expressive vocabulary scores correlated moderately with WJ-III picture vocabulary scores (r=0·41, p<0·01), DELV syntax (r=0·32, p<0·01), and DELV semantics (r=0·27, p<0·05) scores, but not with DELV pragmatics (r=0·16, p>0·05) scores in kindergarten (see Table 2).
Two-step hierarchical regression models were calculated. Child's age-at-CDI was entered as the first block, and the CDI-SF expressive vocabulary score was entered as the second block. CDI-SF expressive vocabulary scores significantly and positively predicted WJ-III picture vocabulary scores (F(2, 75)=7·70, t=3·90, β=0·54, p<0·01), accounting for 17% unique variance (R 2=0·17). Age-at-CDI did not appear as a significant contributor to the overall variance. CDI-SF expressive vocabulary scores significantly and positively predicted DELV syntax scores in kindergarten (F(2, 75)=4·51, t=2·96, β=0·43, p<0·01), predicting 11% unique variance in syntax performance (R 2=0·11), and age-at-CDI did not appear as a significant contributor to the overall variance (see Table 3).
Table 3. Two-step hierarchical multiple regressions to predict kindergarten language outcomes

notes: n=76; β=standardized beta coefficients; * p<0·05; ** p<0·01. CDI Level II Exp Voc. refers to CDI-SF Level II expressive vocabulary score on the MacArthur Communicative Developmental Inventory, Short Form (Fenson et al., Reference Fenson, Pethick, Renda, Cox, Dale and Reznick2000). WJ-III Picture Vocabulary refers to Picture Vocabulary subtest from the WJ-III Tests of Achievement (Woodcock et al., Reference Woodcock, McGrew and Mather2001). DELV syntax, semantics and pragmatics refer to syntax, semantics, and pragmatics subdomain scores on the Diagnostic Evaluation of Language Variation, Norm Referenced Test, DELV (Seymour et al., Reference Seymour, Roeper and DeVilliers2005).
It is interesting that child's age-at-CDI accounted for 7% of the variance in kindergarten semantics performance. CDI-SF Level II expressive vocabulary scores accounted for an additional 7% variance in kindergarten semantics (Unique R2=0·07), increasing the variance explained by the model to 14% (Model R2=0·14) (F(2, 75)=5·98, t=2·41, β=0·34, p<0·05) (see Table 3). A final hierarchical regression model indicated that CDI-SF expressive vocabulary scores did not provide any significant variance in DELV pragmatics after the variance accounted for by child's age-at-CDI was controlled (Model R2=0·05, p>0·05) (see Table 3).
What is the relationship between CDI-SF Level II expressive vocabulary scores and code-related outcomes in kindergarten?
When the child's age-at-CDI was controlled, CDI-SF Level II expressive vocabulary scores had low-to-moderate significant correlations with WJ-III letter–word identification (r=0·27, p<0·05), and DIBELS nonsense word fluency scores (r=0·26, p<0·05), but no significant correlations with DIBELS letter naming (r=0·16, p>0·05), DIBELS phoneme segmentation (r=0·12, p>0·05), and WJ-III incomplete words scores (r=0·14, p>0·05) in kindergarten (see Table 2).
Additional analyses conducted indicated that the CDI-SF scores did not account for any variance in phonemic awareness skills measured by WJ-III incomplete words, or other code-related skills measured by the DIBELS letter naming fluency and phoneme segmentation fluency scores (p>0·05). Similarly, the CDI-SF scores did not significantly predict WJ-III letter–word identification (p=0·06) or DIBELS nonsense word fluency performances (p=0·07).
DISCUSSION
Early expressive vocabulary skills measured by parental reports on the CDI-SF significantly predicted expressive vocabulary, syntax and semantics, as measured by standardized direct assessment of these skills four years later; explaining 17%, 11% and 7% of the variance in those skills, respectively. These results extend previous research in three major ways. First, they support the use of the CDI-SF in longitudinal research. Second, contrary to most available research with young children, oral language at time 2 was assessed using separate measures for syntax, semantics and pragmatics to seek long-term relationships. Third, continuity of language skills was demonstrated over a four-year period, which has not been previously done using the CDI-SF (Whitehurst & Lonigan, Reference Whitehurst and Lonigan1998; Storch & Whitehurst, Reference Storch and Whitehurst2002; Lee, Reference Lee2011). These results reveal the stability of facets of language development for individual children over a large swath of time.
We found that early expressive vocabulary before age 2 ; 7 accounted for a significant, but modest amount of variance in syntax and semantics in kindergarten. This finding is consistent with previous research on how performance in syntax and semantics is related to vocabulary knowledge (Tunmer et al., Reference Tunmer, Herriman and Nesdale1988; Marchman, Martinez-Sussmann & Dale, Reference Marchman, Martinez-Sussmann and Dale2004; Dixon & Marchman, Reference Dixon and Marchman2007), as well as upholding theoretical accounts of the reciprocal nature of semantic and syntactic development (Bates, Dale & Thal, Reference Bates, Dale, Thal, Fletcher and MacWhinney1995; Harris et al., Reference Harris, Golinkoff, Hirsh-Pasek, Neuman and Dickinson2012). Despite the fact that children encountered a wide range of experiences over the many months between the first vocabulary assessment and the standardized DELV tests, early vocabulary skill remained a predictor of later language. The finding that early expressive vocabulary significantly predicted syntax four years later extends the work of Dionne et al. (Reference Dionne, Dale, Boivin and Plomin2003) in which expressive vocabulary concurrently related to syntactic ability at ages 2 ; 0 and 3 ; 0; and vocabulary at age 2 ; 0 strongly contributed to vocabulary and grammar at age 3 ; 0. Tunmer et al. (Reference Tunmer, Herriman and Nesdale1988) also positively linked concurrent aspects of receptive vocabulary and syntactic ability in first-graders. The positive and significant association between early vocabulary skills and later semantic abilities found here is consistent with Brackenbury and Pye's (Reference Brackenbury and Pye2005) arguments that new word acquisition and storage is one aspect of semantic processing. These findings further suggest that word learning ‘feeds’ on itself (Smith, Reference Smith, Golinkoff, Hirsh-Pasek, Bloom, Smith, Woodard, Akhtar, Tomasello and Hollich2000).
Our results show that the continuity between early vocabulary and language skills over time is not supported in the case of all language domains. Expressive vocabulary scores assessed in the second year of life predicted picture vocabulary, syntax and semantics performance four years later while not appearing as a significant predictor of pragmatic skills. This difference is likely due to the nature of the outcome measure in that the developers of the DELV noted that the scoring of the pragmatic domain is not based on the use of specific vocabulary or particular syntactic structures. Rather, it is based on social uses of vocabulary while conversing. Pragmatics had lower correlations with expressive language than semantics and syntax in several DELV validity studies (Seymour et al., Reference Seymour, Roeper and DeVilliers2005). Therefore, our finding that early vocabulary skills did not predict pragmatic ability while predicting syntax and semantics abilities provides indirect support for the factor structure of the DELV. Most test validations are prospective but here is an instance of a retrospective validation, showing that DELV scores relate in theoretically meaningful ways to a measure of language given four years prior.
Relationship between early expressive vocabulary and code-related emergent literacy skill
Despite some modest correlations, early expressive vocabulary scores did not account for statistically significant variance in code-related skills in kindergarten. These findings confirmed our hypothesis that the direct relationship between early expressive vocabulary and emergent literacy in kindergarten would be positive but weak. While CDI-SF scores did not significantly correlate with code-related skills such as letter naming, phoneme segmentation, and phonological awareness, they did weakly correlate with more advanced skills including letter–sound correspondence (r=0·26, p<0·05) and word recognition and decoding (r=0·27, p<0·05). Letter naming and phonological awareness skills are rather distinct skills from vocabulary, but having a strong vocabulary may help children in many ways. These correlational findings are consistent with those reported by the NICHD study (2005), in which language skills at age 3 ; 0 predicted letter–word identification and expressive picture vocabulary scores at age 4 ; 6. These results are also similar to Lee's (Reference Lee2011) findings that children with a larger expressive vocabulary size measured by the long form CDI at age 2 ; 0 outperformed their peers who had a smaller vocabulary size at age 2 ; 0 in decoding and word recognition skills up to fifth grade.
Although not a primary focus of the study, an examination of the language and reading readiness skills in kindergarten indicated that certain code-related skills (e.g. phonemic awareness and letter–word identification) had moderate-to-strong correlations with concurrent language skills (e.g. picture vocabulary, syntax and semantics). This finding is in line with research that indicates that knowledge in the semantic and syntactic domain makes an important contribution to literacy as reading tasks become more complex (Snowling, Bishop & Stothard, Reference Snowling, Bishop and Stothard2000), and code-breaking takes a back seat to the comprehension of text (Dickinson, Golinkoff & Hirsh-Pasek, Reference Dickinson, Golinkoff and Hirsh-Pasek2010; Nation & Snowling, Reference Nation and Snowling2004). Thus, semantic and syntactic knowledge may be critical as children switch from learning to read to reading to learn.
The main purpose of this study was to validate the CDI-SF by linking it to a comprehensive set of language and literacy measures in kindergarten. It also offers new data about the long-term relations between early vocabulary and later semantics and syntax. Previous studies by Rescorla and Alley (Reference Rescorla and Alley2001) and Heilmann, Weismer, Evans and Hollar (Reference Heilmann, Weismer, Evans and Hollar2005) showed that children who are late in acquiring vocabulary are more likely to have language difficulties later on. It is interesting that the results here emerged with a sample that was restricted to primarily Caucasian middle- and upper-middle-income families with average-performing children. We would predict that if the sample included more heterogeneity – both in terms of social class and possible language delay – these relationships might be even stronger.
To better understand the factors that influence language development over time, future research might include the CDI-SF and other variables such as parent–child literacy interactions, because these interactions directly contribute to children's language competence (Whitehurst & Lonigan, Reference Whitehurst and Lonigan1998). As this study showed, in the second year vocabulary alone measured by the CDI-SF is a strong positive predictor of later language skills in kindergarten (ages 5 ; 6–6 ; 8) and only a weak predictor of emergent literacy skills. Furthermore, the CDI-SF prediction was specific to the syntax and semantics domains and not to the pragmatics domain, supporting research that vocabulary learning and syntax acquisition are reciprocal processes. These results demonstrate the critical importance of early vocabulary skills for children's multi-component linguistic development years later, while showing that the CDI-SF provides a valid indicator of children's vocabulary skills in early childhood.