The effect of perceptual availability and prior discourse on young children's use of referring expressions

DANIELLE MATTHEWS; ELENA LIEVEN; ANNA THEAKSTON; MICHAEL TOMASELLO

doi:10.1017/S0142716406060334

The effect of perceptual availability and prior discourse on young children's use of referring expressions

Published online by Cambridge University Press: 14 July 2006

ANNA THEAKSTON and

DANIELLE MATTHEWS: Affiliation:
University of Manchester
ELENA LIEVEN: Affiliation:
University of Manchester and Max Planck Institute for Evolutionary Anthropology
ANNA THEAKSTON: Affiliation:
University of Manchester
MICHAEL TOMASELLO: Affiliation:
Max Planck Institute for Evolutionary Anthropology

Article contents

Abstract
STUDY 1: THE EFFECT OF PERCEPTUAL AVAILABILITY ON REFERENTIAL CHOICE
STUDY 2: THE EFFECT OF PRIOR DISCOURSE ON REFERENTIAL CHOICE
GENERAL DISCUSSION
ACKNOWLEDGMENTS
References

Rights & Permissions

Abstract

Choosing appropriate referring expressions requires assessing whether a referent is “available” to the addressee either perceptually or through discourse. In Study 1, we found that 3- and 4-year-olds, but not 2-year-olds, chose different referring expressions (noun vs. pronoun) depending on whether their addressee could see the intended referent or not. In Study 2, in more neutral discourse contexts than previous studies, we found that 3- and 4-year-olds clearly differed in their use of referring expressions according to whether their addressee had already mentioned a referent. Moreover, 2-year-olds responded with more naming constructions when the referent had not been mentioned previously. This suggests that, despite early social–cognitive developments, (a) it takes time to master the given/new contrast linguistically, and (b) children understand the contrast earlier based on discourse, rather than perceptual context.

Type: Articles
Information: Applied Psycholinguistics , Volume 27 , Issue 3 , July 2006 , pp. 403 - 422

DOI: https://doi.org/10.1017/S0142716406060334 [Opens in a new window]
Copyright: 2006 Cambridge University Press

One of the first uses to which young children put language is reference (Bruner, 1983). However, because any one entity may be referred to in a variety of ways (depending on the perspective the speaker or addressee has or may want to confer upon it) children are faced with the question of referential choice. We focus here on the need to choose referring expressions that a cooperative listener could reasonably understand in a given context. In linguistic terms, this means using a form that conveys the appropriate level of accessibility or “givenness” for the addressee (Ariel, 1988, 1990; Givon, 1983; Gundel, Hedberg, & Zacharski, 1993). Gundel et al. (1993) have proposed six cognitive statuses relevant to the form of referring expressions in natural language discourse. They relate these forms to a givenness hierarchy (adapted and reproduced as Figure 1) and argue that when a speaker uses a given form s/he assumes that the associated cognitive status is met for the addressee, as are all the other cognitive statuses lower down the hierarchy. When the necessary conditions for the use of more than one form are met, Grice's Maxim of Quantity accounts for the actual distribution and interpretation of forms (i.e., use of expressions that are just informative enough).

Gundel's givenness hierarchy and associated forms in English.

Making appropriate referential choices obviously requires social–cognitive skills. The child must assess whether something is accessible for the listener based mainly on (a) the perceptual context or (b) the immediately preceding discourse context. Studies of both social–cognitive and linguistic development suggest early sensitivity to these demands. For example, there is evidence to suggest that infants are already aware that others may or may not visually perceive objects. Brooks and Meltzoff (2002) found that 12-month-olds turned significantly more often to an object when an adult had turned toward it with eyes open than with eyes closed (see also Meltzoff, 2004). Similarly, Moll and Tomasello (2004) found that when an adult excitedly attended to an object that the child could not see (due to a barrier), infants as young as 12 months old would actively get up and move to look behind the barrier.

By the age of 2, children have been shown to go beyond this understanding that others see something else than themselves, to understanding what that something is. In a recent study (Moll & Tomasello, in press) children sat in view of two objects, one of which had an occluder behind it. Children of 24 months (but not 18 months) were good at noticing that people on the other side of the occluder to them would not be able to see this second object. Thus, when an adult entered the room and asked “Where is the other toy? … Can you give it to me?” the 2-year-olds fetched and handed over the toy that was occluded for the adult (but not for the child). They apparently assumed that the adult must have been referring to the toy behind the occluder and not the one out in the open that she could see and presumably have fetched for herself. These children thus demonstrate both impressive pragmatic inferencing and what are commonly referred to as Level 1 perspective taking skills: understanding that you may see an object that another person does not and vice versa (Flavell, 1974, 1992). We can thus assume that by the time children begin to systematically use pronouns to refer to things they have the skills necessary to assess whether referents are perceptually available to their interlocutor or not. Whether understanding that something is perceptually available to someone directly translates into understanding that this same thing is known to someone is more debatable (O'Neill, 1996). However, there is evidence to suggest that children understand that seeing leads to knowing at least by the age of 3 (Pratt & Bryant, 1990).

On the linguistic side of development, studies of preferred argument structure also suggest very early sensitivity to the shared perceptual and discourse context (Allen & Schroder, 2003; Clancy, 2003; Maslen, 2005; Serratrice, 2004). For example, Clancy (2003) found that two Korean children aged 1 year, 8 months (1;8) through to 2;10 clearly followed a preferred argument structure pattern of reference and systematically introduced new referents with full nouns. Similarly, Allen and Schröder's (2003) longitudinal study of four Inuktitut-speaking children (aged 2;3–3;6) found that full lexical referring expressions were more likely to encode new referents than were less informative affixal forms. However, Clancy points out that this sensitivity to referential form interacts with lexical choice and grammatical role. Thus, the children she studied commonly introduced new referents with lexical-referring expressions and a small set of intransitive verbs. Equally, a small set of transitive verbs were generally used to talk about actions that the child or caregiver performed on an inanimate object, giving rise to elliptical or pronominal forms for the animate, given subject and lexical forms for the new, acted-upon object. Consequently, it is difficult to establish if and when the children became aware of the informational status of different referring expressions as separate from the restricted lexical and grammatical contexts in which they were used (cf. Karmiloff-Smith, 1981). Therefore, although the naturalistic observation of preferred argument structure in 2-year-old children suggests sensitivity to referential choice, experimental studies are needed to define precisely when and how this is mastered.

In an experimental study of gesture and reference, O'Neill (1996) found that 2-year-olds gesture differently according to what is given or new for others.¹

Although given/new distinctions are evident in infants' earliest communicative attempts, this tends to be based on what is given (certain/manipulated by) or new for the child not their addressee (Bates, 1976; Greenfield, 1979; Greenfield & Smith, 1976; Ochs & Schieffelin, 1979).

Children aged 2;7 tailored their request to a parent for a toy placed on a high shelf according to whether both the parent and the child saw the toy placed on the shelf or only the child saw the toy moved (while the parent was out of the room). In cases when the parent had not seen where the toy had been put, the child named and gestured to the toy and its location more often than when the parent had seen and thus already knew this information. Younger children (2;3) gestured more often to a desired sticker when their parents had not seen its location than when they had. Campbell, Brooks, and Tomasello (2000) used a similar methodology to investigate the factors that might affect children's linguistic reference (comparing null reference, pronouns, and full nouns). Asking children aged 2;6 and 3;6 to describe an event they had just witnessed, revealed that children did not choose referential forms on the basis of whether their addressee had also witnessed the event or not. However, this might have been due to the fact that at the moment when the adult asked her question, the relevant referent was perceptually available (the child thus needed to remember that the adult had not seen the referent when the event was performed and to inhibit deictic reference). In the current first study, children were required to refer to a character that the addressee could not see at the time of asking. We also manipulated whether the child could see it at the time of the question or not.

In contrast to perceptual availability, previous experiments have found a strong effect of discourse context on children's use of referring expressions. Campbell et al. (2000) found that the youngest children they tested (2;6) made more full noun references in response to the generic question “What happened?” than to the specific question “What did X do?” to which children tended to respond with more pronouns and null references. Wittek and Tomasello (2006) found similar effects for German children aged 2;6 but not 2;0. These results suggest a strong influence of prior discourse on subsequent reference. One problem with this interpretation, though, is that questions are a particularly strong form of discourse context. It might be that children simply rely on highly routinized knowledge that to the question “What did X do?” we answer “VERB” or “PRONOUN VERB.” Alternatively, children might be more generally sensitive to the fact that prior mention of a single referent in recent discourse makes subsequent mention with a pronoun most felicitous. To see if this is the case, in the current second study an adult (who could not see the target referent) either had or had not just mentioned the referent before she asked the same “What happened?” question.

The main aim of the current studies, therefore, was to investigate children's use of referring expressions as a function of perceptual availability of the referent for the addressee and prior mention in discourse. We did this by presenting children with videos of various characters (e.g., a clown) acting out intransitive events (e.g., jumping) and asking them what they could see happening. In the first study we focused on perceptual availability, and in the second study we focused on discourse availability. We were also interested in whether one of these factors was somehow easier for the children to respond to than the other. To help us assess this, the very same children that participated in Study 1 also participated in Study 2.

STUDY 1: THE EFFECT OF PERCEPTUAL AVAILABILITY ON REFERENTIAL CHOICE

In Study 1 we manipulated the perceptual availability of referents for both the child and the addressee by asking children questions about events on a video that (a) the addressee could either see or could not see and (b) the child could either still see happening or could no longer see. We were thus interested in whether and at what age children would appreciate when their addressee could not see the video screen and use full nouns in these conditions in contrast to conditions where their perceptual scene was shared and thus permitted deictic reference. We also attempted to manipulate perceptual availability for the child to see if children might be more likely to appreciate inaccessibility for an addressee when they too did not have visual access to the referent.

Method

Participants

One hundred one normally developing, monolingual, English-speaking children were included in the study (48 boys, 52 girls). There were 31 2-year-olds (range = 2;1–2;10, mean age = 2;6), 33 3-year-olds (range = 3;0–3;10, mean age = 3;5), and 37 4-year-olds (range = 4;0–4;11, mean age = 4;6). A further 15 2-year-olds were not included either because they (a) did not complete the testing session or (b) did not say anything in the first condition presented. The children were tested in the Max Planck Child Study Centre, Manchester, or in a quiet area in their nursery or primary school in the Manchester area.

Materials and design

Videos were made of four characters (a witch, a fairy, a clown, and Father Christmas) individually acting out four verbs (eating, jumping, crying, falling over). Each video was made up of a total of 15 clips of different combinations of actors and actions. These 15 clips were divided into three blocks of 5 clips each, and three versions of the video were made to counterbalance the order of presentation of these blocks.

In Study 1, only the first two blocks of any video served as stimuli (the third block was used in Study 2). These were employed to test the child's use of referring expressions according to whether or not Experimenter 1 (E1, the addressee) and the child (the speaker) could see the video when it was being described. For one of the blocks E1 sat with the child and asked him/her what was happening in the video (i.e., they could both see the video). For the other block, E1 sat on the other side of the television so that she could not see the video and the child, who could see the video, could not see her face. There were thus two within-subjects conditions based on whether the experimenter–addressee could see the video: addressee can see and addressee can't see. Half of the children participated in the addressee can see condition first and the other half participated in the addressee can't see condition first. In addition, children were randomly assigned to one of two between-subjects conditions based on whether the child could see the video playing when asked to describe it. Half the children were asked to describe the video while it was playing (the video playing condition). The other half were asked to wait until each video clip had finished and then to describe what had happened (the video stopped condition). Thus, for half the children the scene was perceptually available as they described it, whereas for the other half no perceptual cues were available at the time of description.

To summarize, children were asked to describe 10 video clips to test their sensitivity to the visual availability of the video for the addressee (within subjects) and themselves, the speaker (between subjects).

Procedure

Before the main testing session began, the child and E1 sat together in front of the television screen and E1 explained that they were going to watch a video with lots of funny people doing all sort of things. E1 explained that sometimes the people might do something fun like jumping, and that she really liked jumping so the child should tell her if anybody did that. Equally, she explained that some people might do something sad like crying, and the child should tell her if somebody does anything sad too. After giving examples of what might happen and encouraging the child to report on this, E1 showed the child a short introduction video with all the characters on. Each character appeared in turn on the screen waving. E1 introduced each character by saying, for example, “Look, there's the clown! He's waving! The clown's waving at you!” Each introduction referred to the character by name and then referred to his waving once with a pronoun and then once with a noun. E1 then replayed the introduction video and asked the child if s/he could name the characters. E1 helped the child to name the characters if necessary and/or agreed on different names (e.g., the man).

After the introduction session E1 explained that they would watch the video and the child should tell her everything that s/he could see. If the first test condition required E1 to sit opposite the child, then E1 explained that she was going to go and do some work behind the TV and that she would like to know what happens in the video but that she could not see it. E1 spent some time making it clear that she could not see the video screen in this case, and she sat such that the child could not see her face from the other side of the TV.

Experimenter 2 (E2) sat in a corner away from the television table. Once E1 and the child were ready, E2 remotely started the video and E1 asked the child what was happening and what s/he could see (in the video stopped condition E1 waited till the video clip had played before asking). If the child didn't respond E2 rewound the video and replayed the clip. If the second presentation still did not elicit a response the next clip was played. Once the first five clips had been presented E1 explained the change in conditions. Depending on whether she was already sat with the child, she either said that she had to go and sit behind the TV to do some work now or she said that she had finished her work and would like to come and sit with the child now as the video looked fun. Again, in the case of going to sit behind the TV, it was made clear that she would not now be able to see the video. The second condition then proceeded as did the first. Throughout the experiment, both “What happened?” and “What did you see” questions were always used as previous studies had found that asking “What happened?” alone did not elicit very many responses.

Transcription and coding in Studies 1 and 2

The children's utterances were coded for the informational status of any referring expressions using four categories: full noun, pronoun, null, and no response. Referring expressions were coded as full noun (e.g., “The clown”) or pronoun (e.g., “he”). Cases where a verb was used with no subject were coded as null (e.g., “jumping”) to contrast with cases where no response was given at all (no response). Rare uses of the referring expressions somebody and someone were coded as full nouns (because somebody can felicitously be used to refer to an inaccessible referent).²

In Study 1, 11 children (one 2-year-old, nine 3-year-olds, and one 4-year-old) gave a total of 27 responses where the only referring expression was somebody or someone (this constitutes less than 5% of the responses coded as informative). The term “the jumping one” and “the bouncing one” were used once each. These terms were considered uninformative, but because both were followed immediately with the referring expression “Father Christmas,” the responses were coded as informative, noun-only constructions. In Study 2, one 2-year-old and one 3-year-old, both in the no-noun condition, responded with the term “someone.” Removing all of these responses had no significant effect on the results in either study.

In cases where more than one referring expression was used in a response, the most informative expression was coded (e.g., the response “She's eating. Fairy having apple.” would be coded as full noun). This coding decision was made because we were interested in whether the child communicated information about the referent to the adult in an informative manner. The presence of any full noun anywhere in the response was deemed to make this response informative as far the referent is concerned.

In addition to coding for informativeness of referring expression, the data were coded for construction type. This provided a measure of (a) how complex the children's descriptions were and (b) whether they used a verb to describe the event or not. Utterances were divided into six categories, with responses that fitted more than one category being assigned to the first category occurring in the following list: noun–verb (“The clown is jumping”), anaphor–verb (“The clown. He is jumping” or “He's jumping, the clown.”), pronoun–verb (“He's jumping”), verb alone (“Jumping”), Noun alone (“The clown”), or no response. The order of this list was based on the complexity of the construction, the use of a verb, and the informativeness of the noun phrase. The categories noun–verb and anaphor–verb are the most complex and informative. Equally complex but less informative are responses of the form pronoun–verb. The following category, “verb alone” indicates that the child did mention the event with a verb but not in a complex construction (no mention of the referent). The category “noun alone” indicates that the child did not mention the event at all, but did mention the referent. The final category indicates that the child did not respond at all. Note that in this coding scheme if a child responded with both a verb-alone construction and a noun-alone construction (e.g., “Crying. A clown.”), this would be coded as verb alone.³

In practice, responses coded as “verb alone” contained both a noun-alone construction and a verb-alone construction in 17% of cases in Study 1 and in 37% of cases in Study 2. For each study, a 3 (Age) × 2 (Condition) ANOVA was run with this response type (responses with both a verb alone and a noun alone) as the dependent variable to check that responses that would have fitted the two categories were not more likely in one experimental condition than another. There were no significant effects or interactions in either study (all ps ≥ .47).

In both Studies 1 and 2 both experimenters transcribed the child's responses as they occurred. Agreement was high: 98% of transcriptions yielded the same coding categories (Cohen's κ = 0.956). However, in some cases the children's utterances were hard to hear (or were genuinely unclear in the case of schwas used before a verb), which led to some discrepancies or blank transcriptions. For all of these cases, two independent coders listened to an audio recording of the test session. If both coders agreed with one of the experimenter's on-line transcriptions then the utterance was coded accordingly. Otherwise, the utterance was rejected. The first of these coders also coded all of the transcripts for informativeness and construction type. The second coder then coded nine transcripts (three for each age group). Coding for agreement was very high with 99% agreement (Cohen's κ = 0.987). The only discrepancy found was resolved.

Results

An overview of the distribution of response types can be seen from Table 1. Because some children were more talkative than others, the mean proportion of referring expressions that were full nouns (denominator = full nouns + pronouns) was calculated as a measure of how informative children's referring expressions were and used as the dependant measure in statistical tests.⁴

We report the results of statistical tests run on raw proportions here. Arc sine transformations were performed on all the proportions and equivalent analyses run on these data with no significant difference in outcome.

One problem that we encountered was that the children often responded when the video was playing in the video-stopped speaker condition. Consequently, our manipulation of whether or not the child could see the video when describing it was not successful, and we cannot assess the impact of speaker condition on children's responses. To check that it was legitimate to collapse across speaker conditions when assessing the effect of addressee condition, a 3 (Age) × 2 (Addressee Condition) × 2 (Speaker Condition) analysis of variance (ANOVA) was run with the proportion referring expressions that were full nouns as the dependent variable. This test included all the children's responses (no matter when they were uttered), and revealed no effect of speaker condition nor any significant interactions with this factor. This shows that the differing instructions given in speaker conditions did not significantly affect responses, and thus collapsing across these conditions to investigate the effect of the addressee conditions alone is justified.⁵

Further tests compared the use of referring expressions in the video-stopped condition according to whether the children, in fact, gave their responses while the video was playing or once it had stopped. This revealed no significant effects. Comparing only those responses uttered while the video was stopped in the video stopped condition with those in the video playing condition also revealed no significant effects.

The effect of visual availability for the addressee on referential choice

To test whether or not the children were more likely to use more informative referring expressions when their addressee could not see the video a 3 (Age) × 2 (Addressee Condition) × 2 (Order of Presentation of Conditions) ANOVA was conducted with the mean proportion of full nouns used as the dependent variable. There was a significant age by addressee condition interaction, F (2, 89) = 4.847, p = .01, η² = 0.098, and a borderline effect of addressee condition, F (1, 89) = 2.833, p = 0.093, η² = 0.031, but no other significant effects or interactions. These results are illustrated in Figure 2. Pairwise comparisons revealed that 3- and 4-year-old children were significantly more likely to use an informative referring expression when their addressee could not see the screen than when the addressee could see (p = .05, p = .011), whereas the 2-year-olds were slightly more likely to use informative referring expressions when the addressee could see, although this difference was not significant.

The mean proportion of referring expressions that were full nounsas a function of age and addressee condition. Error bars represent standard errors.

It may appear surprising that the 4-year-olds were no more likely than the 2-year-olds to use full nouns exclusively when the addressee could not see the video. However, a broader analysis of all the response types given by the children at different ages shows the children's responses did become more complex with age. Figure 3 illustrates that the older children were more likely to express the referent in an intransitive construction that informs the speaker about both the referent and the event in which they are participating. They were especially more likely to use a pronoun–verb construction than the younger children. In contrast, the younger children were more likely simply to name the referent in the video or to use a verb in isolation or not to respond at all. The increase with age in the proportion of pronoun–verb constructions is reflected in the above analysis of referring expressions in a higher overall proportion of pronouns used by older children.

The different response types as a percentage of all responses given at 2, 3, and 4 years in the visual presence conditions.

To analyze the effect of addressee conditions on all the different response constructions, a 3 (Age) × 2 (Addressee Condition) ANOVA was performed for each of the response types (except anaphor–noun responses, which were too infrequent) with mean proportion responses as the dependent variables. The ANOVA on noun–verb responses revealed a significant Age × Addressee Condition interaction, F (2, 98) = 5.211, p = .007, η² = 0.096, and significant main effects of condition, F (1, 98) = 8.067, p = .005, η² = 0.076, and of age, F (2, 98) = 7.079, p = .001, η² = 0.126. Pairwise comparisons revealed that 3- and 4-year-olds were significantly more likely to use noun–verb constructions when their addressee could not see the video than when she could (p = .001, p = .012), whereas this difference was not significant at 2 years. The ANOVA on pronoun–verb responses revealed a significant Age × Addressee Condition interaction, F (2, 98) = 3.886, p = .024, η² = 0.073, and significant main effects of age, F (2, 98) = 3.517, p = .033, η² = 0.067, and condition, F (1, 98) = 4.308, p = .041, η² = 0.042, such that pronoun–verb responses were provided significantly more often in the can-see condition. Pairwise comparisons showed that this difference was significant at 3 and 4 years only (p = .048, p = .006). The ANOVA on verb-alone responses revealed a borderline effect of condition, F (1, 98) = 2.951, p = .089, η² = 0.029, and a significant main effect of age, F (2, 98) = 6.195, p = .003, η² = 0.112. There were no significant interactions. The 3-year-olds were significantly more likely to use verbs without a subject in the can-see condition than in the can't-see condition (p = .029), whereas this difference was not significant at 2 or 4 years. The ANOVA on noun-alone responses revealed an effect of age only, F (2, 95) = 12.005, p < .001, η² = 0.197. This demonstrates that the older children gave significantly fewer responses of this type.

To summarize, the 2-year-olds' responses did not differ significantly according to addressee conditions. The 3-year-olds were more likely to use informative, noun–verb responses when their addressee could not see the video, whereas when their addressee could see they tended to give more pronoun–verb or verb-alone responses. The 4-year-olds gave more informative, noun–verb responses when their addressee could not see and more pronoun–verb responses when she could.

Discussion of Study 1

The main conclusion to be drawn from Study 1 is that whether or not an addressee can see what is being referred to does not appear to affect 2-year-olds' choice of referring expression. However, perceptual availability for the addressee does have some effect on children's referring expressions from the age of 3. More precisely, the 3- and 4-year-olds were more likely to use noun–verb responses when their addressee could not see to what they were referring. In contrast, when their addressee could see, the 4-year-olds gave more pronoun–verb responses (as is appropriate), whereas 3-year-olds tended to give either pronoun–verb or verb-alone responses in this condition. It would appear that, from the age of 3, children begin to provide an appropriate form when a referent is inaccessible for the addressee (i.e., to differentially use full nouns). As they get older, children also begin to use the appropriate form when the referent is given (i.e., to use pronouns in place of null reference).

Unfortunately, the manipulation of perceptual availability for the child (video stopped/playing conditions) was not successful. Future experiments could improve on this by using videos in which an animated action appeared for only a couple of seconds. Then, to manipulate perceptual availability for the child, either the animation could be frozen and remain on screen (available) or disappear and leave a blank screen (unavailable). This would make it impossible to respond at an inappropriate time for the given condition.

Having established that children aged 3 and above begin to use referring expressions differently according to perceptual availability for the addressee, the question we now turn to is at what age the same children would be affected by prior mention in discourse.

STUDY 2: THE EFFECT OF PRIOR DISCOURSE ON REFERENTIAL CHOICE

In Study 2 we tested this by varying whether or not the person asking a general “What happened” question had previously mentioned the referent with a full noun or not. Thus, children sat with a first adult who commented on the character in the video, then a second adult, who could not see the video asked what was happening. In one condition this second adult had overheard the name of the character involved and remarked, for example, “Was that the Clown? Oh! What happened?” whereas in the other condition she had not overheard and asked “That sounds like fun! What happened?” If children are genuinely sensitive to previous mention in discourse, rather than just reacting to different question types, then we would expect them to respond to the first question with a pronoun (plus verb) and to the second with a full noun (plus verb).