The ability to refer emerges early on. Around their first birthdays, infants are able to use pointing to refer to objects in their surroundings, either to request them or to inform others about them (Liszkowski, Carpenter, Henning, Striano, & Tomasello, Reference Liszkowski, Carpenter, Henning, Striano and Tomasello2004). With the advent of language, a wider spectrum of referential strategies becomes available, and children face the challenge of referential choice; that is, they have to select the linguistic form that best fits a given communicative scenario.
Without doubt, the most challenging test of referential ability is the production of narratives, where referents are removed in time and space. While narrative production is common to all cultures and is regularly attempted by young children (Miller & Sperry, Reference Miller and Sperry1988), development of this skill takes years and is particularly prone to disruption in cases of language disorder (Norbury, Gemmel, & Paul, Reference Norbury, Gemmel and Paul2014).
From the moment children start telling narratives, they show the basic skills and motivation to adapt referring expressions to their interlocutors’ informational needs. However, adult-like proficiency is very slow to develop, and quality of referring expressions is affected by a range of syntactic, semantic, and pragmatic factors (Hickmann & Hendriks, Reference Hickmann and Hendriks1999). Children find it particularly challenging to manage re-introduction of characters in a narrative, and are likely to use under-informative expressions when doing so. Menig-Peterson (Reference Menig-Peterson1975) analyzed three- and four-year-olds recounts of personal, past experiences, and found that children used appropriate introductions more often when retelling the event to a naive interlocutor than to a knowledgeable one. However, only four-year-olds provided more appropriate reintroductions for the naive interlocutor than for the knowledgeable one. Similarly, Power and Dal Martello (Reference Power and Dal Martello1986) asked five-year-olds to tell the same story to two naive listeners, one after the other, and found that children mistakenly used definite articles for first mentions more often during second narrations (60%) than during first narrations (39%).
Given the extended developmental trajectory of referential communication, it has become important to identify the particular experiences that drive learning. Conversational breakdowns, where children's ambiguous references are followed by their interlocutors’ clarification requests, have been identified as a rich arena for children to learn not only about reference (Ateş-Şen & Küntay, Reference Ateş-Şen, Küntay, Serratrice and Allen2015), but also about perspective taking (Carmiol & Vinden, Reference Carmiol and Vinden2013; Lohmann & Tomasello, Reference Lohmann and Tomasello2003). Robinson and Robinson (Reference Robinson and Robinson1985) discovered that asking for clarification on one trial in a referential communication task led five-year-olds to become more informative on subsequent trials. Matthews, Lieven, and Tomasello (Reference Matthews, Lieven and Tomasello2007) had children play a sticker game, where children requested out-of-their-reach stickers from an experimenter with access to a dense array of stickers. They found that the ability to describe stickers appropriately improved the most when children experienced multiple conversational breakdowns, where a conversational partner asked for clarification following the child's ambiguous descriptions of the stickers. Using the same scenario, Matthews, Butcher, Lieven, and Tomasello (Reference Matthews, Butcher, Lieven and Tomasello2012) found that two- and four-year-olds learned to produce appropriate descriptions of the intended stickers faster after receiving specific feedback (e.g., “Do you need the dad or the boy?”) from their conversational partner than after receiving general feedback (e.g., “Who do you need?”). While the former question models the appropriate descriptions of the referent and its distractors, the latter only conveys a lack of understanding.
Another effective strategy is to simply provide the wrong sticker during such referential communication tasks. Nilsen and Mangal (Reference Nilsen and Mangal2012) found that feedback in the form of an incorrect sticker following children's ambiguous descriptions of the target sticker led to a higher production of more appropriate repairs in the children. This kind of feedback was more effective than other kinds of feedback, such as providing explicit statements of misunderstanding (“I don't know which one you mean”) or giving vague feedback (“Huh?”).
Although these findings demonstrate that experimenter feedback can promote the development of referential communication, these studies were all conducted in an artificial experimental setting, using a traditional referential communication paradigm where referents are in the here and now. There are currently very few studies that have tested the experiences that promote narrative development. Moreover, we do not know whether feedback hypothesized to be helpful is anything like what children hear from caregivers in real-life interaction (cf. Davidson & Snow, Reference Davidson and Snow1996), as opposed to interacting with an unknown experimenter who follows a script.
Study 1 addressed the question of whether mothers ask their young children for clarification when hearing their narratives and, if so, how. It also tested whether caregiver feedback facilitates narrative production. Children looked at a picture-book with an experimenter and were then asked to tell the story to their mother, who was a naive interlocutor. There were two conditions: one where the mother was asked to interact with her child as she normally would (feedback condition) and one where we asked her simply to encourage her child but not to ask questions (no-feedback condition). We identified the types of feedback mothers gave and analyzed whether children were able to effectively repair their initial, ambiguous descriptions when their mothers asked them for clarification. In Study 2, we then tested whether the types of feedback strategies the mothers used were effective in promoting referential development in a more controlled experiment, where a researcher provided the same kinds of feedback that caregivers were found to provide.
Study 1
Method
Participants
Thirty three-year-olds (M = 3 years and 7 months, SD = 3 months, 16 girls) and 30 five-year-olds (M = 5 years and 6 months, SD = 4 months, 22 girls) with no language, speech, or auditory difficulties participated with their mothers. Mother–child dyads were Costa Rican, middle-class, and spoke Spanish as their native language. They were visited in their houses by two experimenters. Thirteen dyads were excluded from the final sample for the following reasons: problems with the recorder (1), children turned off the recorder (1) or did not tell the stories (6), and mothers were not able to follow the instructions during experimental conditions (5).
Materials
Children took the Vocabulario sobre Dibujos Woodcock-Muñoz Subtest (Woodcock, Muñoz-Sandoval, Ruef, & Alvarado, Reference Woodcock, Muñoz-Sandoval, Ruef and Alvarado2005) at the beginning of the session. Additionally, two wordless picture-books were created, each telling a story about the activities of two children and an adult (all of the same gender so that pronominal reference might be ambiguous). One book involved a visit to a park and the other a visit to a fair. The plots of the stories (see Table 1) were adapted from previous studies (Karmiloff-Smith, Reference Karmiloff-Smith, Fletcher and Garman1986; Wagner, Kako, Amick, Carrigan, & Liu, Reference Wagner, Kako, Amick, Carrigan, Liu, Brugos, Clark-Cotton and Ha2005). They were created for this study to make sure none of the children knew them and to control for number of events and characters across stories.
Table 1 Story plots for Study 1

Design and procedure
This study crossed three factors and had children's vocabulary as covariate. The factors were: age (three- and five-year-olds) and order of conditions (feedback given first and feedback given second) as between-subjects variables, and condition (no-feedback and feedback conditions) as a within-subjects variable. Each condition included a familiarization and an elaboration phase. During familiarization, mothers stayed in a separate room while experimenter 1 (E1) introduced the book to the child and asked the child to describe each of the pages in the book (“What is happening here?”). E1 did not provide any information about the story. During elaboration, the mother came into the room and E1 requested the child to tell the story in the book to the mother. E1 left the room with the book. After the retelling, the whole procedure was repeated with a different book in the other experimental condition. Mother–child conversations were audiotaped and subsequently transcribed using the CHAT transcription format (MacWhinney, Reference MacWhinney2000).
The experimental manipulation took place during the elaboration phase. While E1 showed the book to the child, experimenter 2 (E2) instructed the mothers on how to talk about the story with their child. For the feedback condition, mothers were instructed to: “Talk to your child about the story in the book the way you usually do when s/he talks to you about something you don't already know about.” During the no-feedback condition, mothers were instructed to: “Just listen to the child's story about the book, only making small comments such as ‘uhum?’ ‘yes?’, and ‘really?’ to encourage him/her to continue.” Piloting conducted with three mothers suggested parents could follow these instructions. In the main study, parents differed in the degree to which they were able to do it during the no-feedback condition, with some restricting themselves to back-channeling, and some engaging in encouragement (e.g., by saying “that's interesting” and “really?’), and others asking questions for clarification. We considered back-channeling and encouragement acceptable. However, five mothers were not able to refrain from asking questions and these dyads therefore were excluded, as mentioned above. In the main study, parents also provided elaborative comments and questions about the stories during the feedback condition. Given that this study focused on clarification requests and not elaboration, these utterances were not considered. Order of conditions and story presentation were counterbalanced within each age group.
Coding
Maternal clarification requests
Utterances in the transcripts were initially classified as clarification requests or ‘other’. The ‘other’ category included all kinds of comments (e.g., “The brother was feeling bad”) and questions (e.g., “Was the brother doing something nice?”) that did not aim to clarify information provided by the child. Given the purpose of this study, they were not included in the analyses. Clarification requests, in contrast, were further classified into one of four different types: (1) global requests for clarification, or instances aimed at signaling a general lack of understanding, where the mother does not state the specific piece of information she is attempting to clarify, but expresses a general lack of understanding (e.g., child: “two girls went to the amusement park and one lost her balloon.” mother: “What's that?”); (2) general requests for clarification, or instances where the mother uses a wh-question to clarify the child's reference to a character without providing a specific indication of potential answers to her question (e.g., child: “He fell down and he is helping him up.” mother: “Who helped him up?” child: “The brother”); (3) specific requests for clarification, or instances where the mother provides the specific information for the child to either confirm or deny (e.g., child: “There was a child, a Grandpa and the other child fell down.” mother: “How many children were there?” child: “A big one and a small one.” mother: “Did the big one or the small one fall?” child: “The small one”); and (4) recasts, defined as statements where the mother paraphrases the child's previous utterances in a more informative way, integrating pieces of information that were not integrated before (e.g., child: “The small boy got ice cream.” mother: “So, there was a big boy and a small boy”).
Children's descriptions of the characters
Mentions of characters in events described in Table 1 were the unit of analysis. Children received a score of 1 when the characters were uniquely described (e.g., “The girl with braids had her balloon fly away”) and a score of 0 when the characters were ambiguously described (e.g., “[Someone's] balloon flew away”) or not mentioned. During the feedback condition, children's repairs after maternal feedback were coded (e.g., child: “A girl lost her balloon and cried.” mother: “Which girl?” child: “The girl in braids”). Two examiners independently coded 25% of the conversations. After achieving good levels of agreement (all Cohen's ƙ > .81), discrepancies were discussed and resolved before analyses.
Results
Maternal clarification requests
Mothers produced an average of six requests to clarify children's ambiguous descriptions during the feedback condition (see Table 2). Comparisons of mean frequencies in maternal use of global (t(58) = 0·60, p = ·58), general (t(58) = 1·72, p = ·09), and specific requests of clarification (t(58) = 1·38, p = ·17) and recasts (t(58) = 1·47, p = ·15) yielded no significant differences as a function of age. Mothers mainly used specific clarification requests (49·61%).
Table 2 Descriptive statistics for types of maternal strategies to clarify children's ambiguous descriptions of the characters during the feedback condition

Children's descriptions
Children varied considerably in how many times they referred to characters and whether they referred to them ambiguously or clearly. Despite the fact that the number of events and characters was held constant for both stories, children in both age groups produced fewer informative descriptions in one story (fair) than in the other (park) (see Figure 1). Mean vocabulary scores were significantly higher for five-year-olds (M = 22·93, SD = 1·74) than for three-year-olds (M = 20·90, SD = 2·32; t(58) = –3·83, p < ·001).

Fig. 1. Percentage of absent or ambiguous vs. informative descriptions as a function of story, age, and experimental condition in Study 1.
We fitted a mixed-effects logistic regression model to investigate the effect of age, condition, order of conditions, and children's vocabulary on the descriptions uniquely identifying the characters of the events (scored as 1; otherwise 0). All models included the random effect on the intercept for children and events. Our model-building strategy started with a restricted model that included every fixed effect and the following theoretically relevant interactions: (a) Condition × Age and (b) Condition × Order. Subsequent models were implemented by the successive elimination of interactions and/or fixed effects that, according to the log-likelihood ratio test, did not improve the fit of the model to the data. The selection of the interaction or fixed effect to be excluded was based on the p value < ·05 of each variable within the model. For instance, we excluded the Condition × Order interaction from model 1 to model 2 because it did not reach significance within the model. Table A1 in the ‘Appendix’ specifies the model-testing sequence for Study 1.
A log-likelihood ratio test indicates that the model including main effects for condition, age, order of conditions, and the Condition × Age interaction provided the best fit to the data (see Table 3). This model (Model 3 on Table A1) had a better fit to the data than a model (Model 4) without the Condition × Age interaction (χ 2 (1) = 3·92, p = ·04). The Condition × Age interaction reflects the fact that, as can be seen in Figure 1, three-year-olds were more affected by condition than five-year-olds. The odds of uniquely identifying characters were 2·13 times higher during the feedback condition than during the no-feedback condition for the three-year-olds, and 1·26 times higher for the five-year-olds.
Table 3 Models for Study 1 and Study 2

Notes. Approximate 95% confidence intervals are reported in brackets; * odds ratio for the interaction were calculated according to the procedure explained in Chen (Reference Chen2003), and are described in the text.
Discussion: Study 1
This study demonstrated that, during conversations elicited between mother and child, mothers provide children with feedback and children are able to respond, thereby improving the quality of their referring expressions. This beneficial effect of feedback was particularly strong for the three-year-olds. Mothers used both clarification requests and recasts to scaffold children's narratives. The most commonly provided form of feedback was specific clarification requests, which essentially modeled a referring expression that the child could then reuse. Thus, this kind of feedback is not only the best strategy identified to drive children's learning in lab-based referential communication tasks, but it is also commonly used by mothers to clarify ambiguous descriptions during children's story-telling.
The critical question now is to determine whether experiencing the kind of feedback mothers used drives children's ability to recount narratives over the longer term (rather than just facilitating repair for the given narrative in the moment), especially for three-year-olds, as the age group that benefited the most from being exposed to feedback. This is difficult to gauge in more natural settings such as the one used in Study 1 because the questions parents ask are affected by their child's language. For example, parents can only provide specific feedback requests if the child has already provided a minimum amount of information about the potential referents. Therefore, we designed an experimental study that tested the potential functions maternal feedback could be performing for three-year-olds.
Findings from Study 1 indicated that parents’ feedback was performing two main functions. First, clarification requests highlighted to the child that the parent had not understood. Second, the specific clarification requests and the recasts provided children with models of the referring expressions they could use in order to be more informative. To test whether either or both of these aspects of feedback is effective in improving narrative production in the longer term, we ran an experimental study where an experimenter gave different types of feedback to different groups of children who were then retested on their ability to provide second narrations of the stories to a new, naive experimenter one to three days later. The feedback group received general clarification requests (to signal the referential expression had not been understood), the modeling group received models of how to refer to characters in the narrative (but no explicit indication of lack of comprehension), and the control group received no training. Children received training during one session and then, at a later date, were asked to produce the same narrative again. The second attempt was then assessed for quality of referring expressions.
Study 2
Method
Participants
Sixty three-year-olds (M = 3 years and 9 months, SD = 4 months, 25 girls) with no language, speech, or auditory difficulties participated. All children were Costa Rican, middle-class, and spoke Spanish as their native language. Children were individually tested in a separate room in their preschool.
Materials
Children took the Vocabulario sobre Dibujos Woodcock-Muñoz Subtest (Woodcock et al., Reference Woodcock, Muñoz-Sandoval, Ruef and Alvarado2005). Two wordless animated stories about the activities of two children and an adult (all of the same gender) were created for the study and shown on a portable computer (see Table 4). Animations were used because they have been found to be easier for three-year-olds to assimilate than books (Smeets & Bus, Reference Smeets and Bus2014). Considering that differences in the emotional valence of stories could have explained the differences observed in the amount of informative descriptions children produced per story in Study 1 (see McDermott Sales, Fivush, & Peterson, Reference McDermott Sales, Fivush and Peterson2003), plots of stories used in Study 1 were modified so that the two stories had the same emotional valence.
Table 4 Story plots for Study 2

Design and procedure
Children were randomly assigned to one of three experimental groups: control, feedback, or modeling. Children's vocabulary was taken as covariate.
Three different experimenters visited the child on two different occasions. During Session 1, familiarization and training took place. E1 sat next to the child to watch a movie. E1 paused the story after each event in the narrative and asked the child to describe what was happening. E2 (naive interlocutor) came into the room once E1 and the child had completed the story and training took place according to the experimental group. Each child completed two stories during both training and post-test.
Control group
E2 indicated she would like to hear the story the child just saw with E1. E2 sat on the other side of the computer and asked the child to tell her the story while she filled out some forms on the other side of the computer. E2 explained she did not have visual access to the story, therefore children needed to tell the story the best they could. E1 sat next to the child to pause the animation in order for the child to be able to tell the story event by event to E2. While the child told the story, E2 replied with back-channeling strategies (e.g., “uhum?”, “yes?”, and “really?”) but did not give any feedback in the form of clarification requests or statements.
Feedback group
The introduction of the experiment to the child resembled the control group. Instead of using back-channeling, E2 clarified the child's ambiguous references to the characters. This feedback was contingent upon the child's informativity level on each initial description. Feedback from E2 started out at the most general level, in order to clarify the agent of a specific action in the story (e.g., child: “[Someone] dropped his/her ice cream.” E2: “Who dropped his/her ice cream?”). Feedback moved on to a more specific level when the child's description of the characters allowed it (e.g., child: “The girl dropped her ice cream.” E2: “Which girl dropped her ice cream?”, when the child initially introduced two girls in the story). Since it was rarely the case that the children provided enough information to allow the experimenter to ask specific feedback questions, it was decided that all children would only receive general feedback. Thus, this condition tested the effect of asking for clarification in the absence of providing any models as to what to say. The same procedure was repeated with a second story.
Modeling group
The same introduction of the experiment to the child was used. During training, E2 entered the room and sat in front of the computer, next to the child. In contrast to the two previous conditions, the child and E2 shared visual ground on the movies. For all the movie screens except the first, E2 asked the child to describe what was happening (“What is happening here?”). After the child's description of each of the events, E2 provided a description that uniquely identified the characters (see Table 4) with confirmatory intonation. Thus this condition did not highlight any misunderstanding but simply provided children with adequate means of describing referents. Order of presentation of the two stories was counterbalanced across experimental groups to control for its possible effects on children's performance.
Post-test took place during Session 2, one to three days after training. E1 and E3 (a new, naive interlocutor) visited the child. E1 and the child sat next to each other in front of the computer. E3 sat on the other side of the computer and explained she would liked to hear the stories the child saw with E1 the session before, because E3 had not seen them, but she needed to stay on the other side of the table and fill out some forms while hearing them.
Coding
Children's narratives during the post-test were audiotaped and transcribed using the CHAT transcription format (MacWhinney, Reference MacWhinney2000). Children's mentions of the characters in the events described in Table 4 were the unit of analysis. During coding, children received a score of 1 when the characters of the events described in Table 4 were uniquely described (e.g., “The girl in red dropped the ice cream”) and a score of 0 when the characters of that event were ambiguously described (e.g., “[Someone] dropped their ice cream”) or not mentioned. Scores from both stories were combined to give an overall score of unambiguously described referents. Two examiners independently coded 25% of the conversations. After achieving good levels of agreement (all Cohen's ƙ > ·77), discrepancies were discussed and resolved for the analyses.
Results
Children's vocabulary scores at baseline were equivalent across conditions (M = 20·62, SD = 2·48, F(2,57) = 0·19, p = ·83). This variable was nonetheless included in models as a control. As can be seen from Figure 2, children were rarely able to provide informative referring expressions during post-test, yet tended to do so more in the modeling condition.

Fig. 2. Percentage of absent or ambiguous vs. informative descriptions as a function of experimental condition in Study 2.
In order to investigate the effect of experimental group, order of stories, and vocabulary on ability to produce informative descriptions, we fitted a mixed-effects logistic regression model to the data. We used the model-building strategy, model-selection criteria, and random effects structure of Study 1. Table A2 in the ‘Appendix’ specifies the model-testing sequence for Study 2. The model that showed the best fit to the data included group and vocabulary as relevant predictors. This model (Model 2 In Table A2) had a significantly better fit to the data than a model including group only (χ 2 (1) = 4·94 p = ·02). The odds of uniquely identifying the characters were 4·23 times higher for the modeling group than for the control group, while the control and feedback groups did not differ significantly (see Table 3). Moreover, a gain of one unit in the vocabulary score was found to incresease by 1·15 times the odds of accurately identifying the characters.
Discussion: Study 2
Children in the control and feedback groups did not differ in the amount of uniquely identifying descriptions they produced during post-test. In contrast, the children in the modeling group, with access to appropriate descriptions during training (but no requests to clarify their utterances), produced significantly more informative descriptions of characters than children in the control group.
It is important to note that the same stories were used for training and post-test so children in the modeling group could be repeating the descriptions they heard without a deep understanding of the need to switch descriptions, albeit after a delay of one to three days. That is, they could be imitating the experimenter's style of reference without fully knowing why (Bannard, Klinger, & Tomasello, Reference Bannard, Klinger and Tomasello2013). The high rate of ambiguous descriptions at post-tests suggests that doing so was not trivially easy. Even though the modeling group performed the best, with 34% of events described with informative reference at post-test, a further 66% were either described with ambiguous (often null) references (e.g., “Se cae” ‘[someone] falls’) or were not mentioned at all. Therefore, children learned from the experimenter's models a means of describing some events more informatively and chose to use those means but this did not lead to ceiling performance. This is perhaps because there was no need for children to attend to the adult's model during training as it had no consequence for their completion of the task at the time. Nonetheless, this form of scaffolding was significantly more effective than providing general feedback requests, which highlighted the problem but not the solution.
It is likely that narrative development would benefit from multiple exposures to the same narrative, with gradual increases at each exposure in children's internalization of reference strategies. Such repeated experiences potentially occur quite frequently in real life if, for example, an exciting event occurs and the child witnesses people talking about it on multiple occasions. While this idea has not been tested directly, work on book reading has shown that repeated exposure to the same story has been found to be effective for vocabulary learning (see Horst, Parsons, & Bryan, Reference Horst, Parsons and Bryan2011; Sénechal, Reference Sénéchal1997; Wilkinson & Houston-Price, Reference Wilkinson and Houston-Price2013).
These results make it plausible that children learn much about reference in an imitative way (Snow, Reference Snow1981), only building up deeper insight into why descriptions are needed with age. The idea here is that children grasp a global need to be informative in story-telling but not necessarily why disambiguation is needed in any one instance. Instead, they learn strategies that they have observed or learnt to be effective given the global goal (Matthews et al., Reference Matthews, Butcher, Lieven and Tomasello2012). Whether this learning would generalize to novel narratives is an important outstanding question. This will help to clarify whether children are learning anything more from imitation than just the terms needed for effective telling of that specific narrative.
In relation to this question, there was evidence that children's vocabulary was contributing to their performance during post-test. This replicates previous findings (Nilsen & Mangal, Reference Nilsen and Mangal2012) and fits with the conclusion that a major barrier to children's production of narratives is the facility with which they can retrieve relevant lexical (and syntactic) devices – i.e., that their problems are not limited to a lack of perspective taking or desire to communicate effectively (Norbury & Bishop, Reference Norbury and Bishop2003). Since the current stories required only one descriptor to distinguish informatively between characters, and the necessary terms should have been within the grasp of these children, an outstanding question is why retrieving relevant vocabulary/syntactic constructions is such a challenge, and why modeling is a so effective form of support. While further studies will be needed to resolve this, one practical implication for the short term is that this study supports educational and clinical practices that include modeling as strategy for scaffolding language development.
General Discussion
Parents provide scaffolding for children's production of narratives by signaling comprehension problems and modeling possible referring expressions. The most common form of feedback – specific clarification requests – combines these strategies. The models this provides are the active ingredients that help children learn. Children may have a global sense of the need to use these models without understanding their specific function at first (Matthews et al., Reference Matthews, Butcher, Lieven and Tomasello2012). Future research should explore the potentially powerful role for imitation when children have a clear goal and adopt adults’ means of achieving it, at first without fully understanding these means (Klinger, Mayor, & Bannard, Reference Klinger, Mayor and Bannard2016; Want & Harris, Reference Want and Harris2002; Whiten, McGuigan, Marshall-Pescini, & Hopper, Reference Whiten, McGuigan, Marshall-Pescini and Hopper2009).
Appendix
Table A1 Model-testing sequence for Study 1

Table A2 Model-testing sequence for Study 2
