Introduction
One of the core aspects of human communication revolves around the choice of linguistic expressions for referent identification, i.e., the use of proper names (e.g., Laura), Noun Phrases – NPs – (e.g., the girl, my sister, my sister's car) and pronouns (e.g., she, them, someone) to talk about entities in the world. Adults, and, to some extent, preschool and school-age children are sensitive to a number of structural, semantic and discourse-pragmatic constraints when it comes to producing referential expressions in a communicative context (see Serratrice & Allen, Reference Serratrice and Allen2015, for an overview of the acquisition of reference).
Despite a general sensitivity to the aforementioned constraints, there are individual differences in the extent to which both adults and children rely on perspective-taking skills to process and produce referential expressions. Taking the perspective of a conversational partner requires the inhibition of one's own perspective and the shifting to that of the addressee. Recent work on adult speakers (Ryskin, Benjamin, Tullis & Brown-Schmidt, Reference Ryskin, Benjamin, Tullis and Brown-Schmidt2015; Wardlow, Reference Wardlow2013), and some emerging work in child and adolescent speakers (Nilsen & Graham, Reference Nilsen and Graham2009; Nilsen, Varghese, Xu & Fecica, Reference Nilsen, Varghese, Xu and Fecica2015; Torregrossa, Reference Torregrossa, Choi, Demirdache, Lungu and Voeltzel2017; Wardlow & Heyman, Reference Wardlow and Heyman2016), has identified executive function skills, particularly working memory (WM), and cognitive control, i.e., the ability to resolve a conflict by inhibiting an irrelevant response and promoting relevant information, as significant predictors of individual variation in referential communication success. The use of a referential expression implies a choice, for example a pronoun vs. a NP. This choice arises from the selection between different options and, at least in some cases, it is the outcome of the resolution of a conflict between competing alternatives. For example, if the speaker and the addressee have different levels of access to a target referent, their mental representations will not entirely overlap. The onus is on the speaker to inhibit a potentially egocentric perspective and promote an addressee-friendly perspective that will maximise the chances of convergence between the mental representations of both speaker and addressee. This can translate into choosing a more informative NP (e.g., the tall girl), as opposed to a more reduced and less informative expression (e.g., she). Because conflict monitoring and resolution depend on the inhibition of irrelevant information, the promotion of relevant information, or both, we will adopt the term cognitive control to include both the inhibition and the promotion aspects of the process (Teubner-Rhodes, Mishler, Corbett, Andreu, Sanz-Torrent, Trueswell & Novick, Reference Teubner-Rhodes, Mishler, Corbett, Andreu, Sanz-Torrent, Trueswell and Novick2016).
WM refers to the ability to store and manipulate information, and it has been connected to perspective-taking and referential choice in at least two ways. Firstly, it underpins the storage and updating of the interlocutor's perspective and the comparison of that perspective with one's own to check for convergence (Nilsen & Bacso, Reference Nilsen and Bacso2017; Wardlow, Reference Wardlow2013). Secondly, it may be implicated in the use of feedback in the case in which one of the interlocutors explicitly signals a mismatch between their perspective and that of their conversational partner. Higher verbal WM capacity has been shown to correlate positively with 5- and 6-year-olds ability to use an adult's non-verbal feedback to produce a discourse-appropriate referential expression (Wardlow & Heyman, Reference Wardlow and Heyman2016).
A parallel line of research has singled out bilingual speakers – both older adults and children – as having an advantage in the same executive function skills of cognitive control that are associated with referential choice (Bialystok & Martin, Reference Bialystok and Martin2004; Morales, Calvo & Bialystok, Reference Morales, Calvo and Bialystok2013). Whether bilinguals genuinely have superior WM skills compared to monolinguals, or not, is, however, not yet clear. Some studies report no difference between bilingual and monolingual children (Barbosa, Jiang & Nicoladis, Reference Barbosa, Jiang and Nicoladis2017; Bialystok, Luk & Kwan, Reference Bialystok, Luk and Kwan2005; Engel de Abreu, Reference Engel de Abreu2011), others report an advantage for bilingual children (Morales et al., Reference Morales, Calvo and Bialystok2013).
In the present study we combine these two independent lines of inquiry to investigate how degrees of exposure to/and use of English and another home language, language proficiency in English, and executive function skills (cognitive control and verbal WM), predict the choice of linguistic expressions in a referential communication task in monolingual and bilingual children between the ages of 5 and 7. In the task we manipulated a linguistic factor (the discourse mention of a competitor to the target referent) and a non-linguistic factor (the visual presence of a competitor to the target referent) to provide new evidence on the sources of contextual information used by children in reference production. Previous work has focused on children's use of deictic expressions in referential communication tasks (e.g., Nilsen & Graham, Reference Nilsen and Graham2009), while we were specifically interested in children's use of anaphoric expressions to refer to a previously mentioned antecedent.
Research including bilingual children has sometimes neglected to take into account the SES profile of participants. This is an important limitation, as SES is known to be predictive of both language and of cognitive skills. In the present study we therefore included a measure of SES in our analyses.
Constraints on referential choice
Adult speakers are sensitive to a number of structural and discourse-pragmatic constraints in their referential choices. They tend to use more pronouns for referents that are in subject position (Arnold, Reference Arnold2001) and/or in sentence-initial position (Järvikivi, van Gompel, Hyönä & Bertram, Reference Järvikivi, van Gompel, Hyönä and Bertram2005), or for referents that are topics (Anderson, Garrod & Sanford, Reference Anderson, Garrod and Sanford1983). Conversely, competent speakers tend to use more informative referential expressions (e.g., proper names and indefinite NPs) when the referent is new to the discourse (Gordon, Hendrick, Ledoux & Yang, Reference Gordon, Hendrick, Ledoux and Yang1999), or when the use of a pronoun might lead to potential ambiguity (Arnold, Reference Arnold2008). Adult speakers generally can take the perspective of their listener into account, and they choose their referential expressions accordingly. Perspective-taking is predicated upon the ability to distinguish between what is in the common ground (Clark, Reference Clark1992), and therefore shared knowledge between speaker and listener, and what is in the privileged ground, i.e., knowledge that is only accessible to the speaker. The common ground can either be established perceptually, i.e., when it includes referents that are visually accessible to both interlocutors, and/or it can be established linguistically via the use of discourse-appropriate referential expressions.
Competent adult speakers typically engage in modelling their addressee's perspective to produce a referential expression that is optimal for their conversational partner (Hendriks, Englert, Wubs & Hoeks, Reference Hendriks, Englert, Wubs and Hoeks2008). In essence the assumption is that competent speakers maintain their own mental representation of their addressee's mental representation. However, the extent to which these meta-representations always require an effortful and intentional commitment on the part of the speaker, and whether they necessarily rely on explicit Theory of Mind skills, is debated in the literature (Horton & Brennan, Reference Horton and Brennan2016).
Even before they have a fully developed Theory of Mind, three-year-olds are already at least partly sensitive to the same constraints that regulate referential choice in adult speakers (see Allen, Hughes & Skarabela, Reference Allen, Hughes, Skarabela, Serratrice and Allen2015, for a review). Pre-school children are more likely to omit arguments, or use reduced expressions, when they are part of the common ground either through joint attention (Skarabela, Reference Skarabela2007), previous linguistic mention (Allen & Schröder, Reference Allen, Schröder, Du Bois, Kumpf and Ashby2003; Clancy, Reference Clancy, Du Bois, Kumpf and Ashby2003; Guerriero, Oshima-Takane & Kuriyama, Reference Guerriero, Oshima-Takane and Kuriyama2006; Stephens, Reference Stephens2015), or prior mention and/or perceptual availability (Campbell, Brooks & Tomasello, Reference Campbell, Brooks and Tomasello2000; De Cat, Reference De Cat2011; Matthews, Lieven, Theakston & Tomasello, Reference Matthews, Lieven, Theakston and Tomasello2006; Rozendaal & Baker, Reference Rozendaal and Baker2010; Salazar Orvig, Marcos, Morgensterns, Hassan, Leber-Marin & Parès, Reference Salazar Orvig, Marcos, Morgenstern, Hassan, Leber-Marin and Parès2010a; Salazar Orvig, Marcos, Morgensterns, Hassan, Leber-Marin & Parès, Reference Salazar Orvig, Marcos, Morgenstern, Hassan, Leber-Marin and Parès2010b).
At the same time, children are notoriously less capable than adults when it comes to taking their listener's perspective into account and to adjusting their referential choices accordingly. This has been observed in production studies in pre-schoolers (De Cat, Reference De Cat2011, Reference De Cat, Serratrice and Allen2015), in five-year-olds (Theakston, Reference Theakston2012), and in six-year-olds (Serratrice, Reference Serratrice2008) when children need to provide a referential expression, and up to adolescence in comprehension where participants need to make a choice between potential referents (Dumontheil, Küster, Apperly & Blakemore, Reference Dumontheil, Küster, Apperly and Blakemore2010).
Individual variation in perspective-taking skills: cognitive control and verbal WM
It is becoming increasingly apparent that there are individual differences in the degree of perspective-taking abilities, and that this variation may correlate with the ability to interpret referential expressions in discourse-pragmatic appropriate ways (Brown-Schmidt, Reference Brown-Schmidt2009; Lin, Keysar & Epley, 2010; Ryskin et al., Reference Ryskin, Benjamin, Tullis and Brown-Schmidt2015). Studies on adults have focused on the relationship between perspective-taking abilities (indexed by referential choice) and cognitive control and WM (two core components of executive function). There is some additional evidence that cognitive control also plays a role in perspective-taking and referential interpretation in pre-school children. In two referential communication studies with three- and five-year-olds, Nilsen and Graham (Reference Nilsen and Graham2009) reported that performance on a cognitive control task significantly predicted comprehension accuracy for both the younger and the older children. However neither WM nor cognitive control were predictive of accuracy in a production task in which the five-year-olds had to provide a disambiguating adjective to identify a referent in the privileged ground condition. Nilsen and Graham (Reference Nilsen and Graham2009) speculated that this non-significant finding could be due to the fact that their measure for assessing children's perspective taking (i.e., the number of adjectives in the common ground condition) was not sufficiently sensitive to reveal the impact of cognitive control.
Some of the adult studies point to a positive correlation between cognitive control skills and perspective-taking abilities in the online interpretation (Brown-Schmidt, Reference Brown-Schmidt2009; Lin et al., Reference Lin, Keysar and Epley2010) and production of referential expressions (Wardlow, Reference Wardlow2013), but others have failed to replicate this finding with monolingual and bilingual adults in a spatial perspective-taking task (Ryskin, Brown-Schmidt, Canseco-Gonzalez, Yiu & Nguyen, Reference Ryskin, Brown-Schmidt, Canseco-Gonzalez, Yiu and Nguyen2014), and with children with ADHD in a referential communication task (Nilsen, Mangal & Macdonald, Reference Nilsen, Mangal and Macdonald2013).
Verbal WM (WM) has also been recently linked to individual differences in perspective-taking skills in the production of referential expressions in monolingual adults (Wardlow, Reference Wardlow2013). Referential choice requires the speaker to focus on those conceptual features that make the target different from potential competitors that may or may not be accessible to the addressee. This evaluation process relies on the storage in memory of the features of the target and it additionally requires a comparison with the features of the competitors. This is a complex set of operations that involve both the storage and the manipulation of information. In essence these demands are comparable to those of a WM task where the information must be retained in memory while being subjected to additional operations. Adopting a computational modelling approach, Hendriks (Reference Hendriks2016) has argued for individual differences in WM capacity and processing speed as predictors of informativity in referential choice. Hendriks (Reference Hendriks2016) reports on a series of computational simulations where the manipulation of WM capacity in the network led to significant differences in the use of pronouns vs. NPs to refer back to a potentially ambiguous antecedent (van Rij, Reference van Rij2012). In the low WM model there was a significantly higher proportion of underspecified and underinformative pronouns than in the high WM model where more pragmatically adequate NPs were used.
The role of verbal WM has not yet been explored in connection with referential choice in bilingual children. In monolingual children, Nilsen and Graham (Reference Nilsen and Graham2009) did not find WM to be predictive, possibly because of the relatively low task demands, but Wardlow and Heyman (Reference Wardlow and Heyman2016) found it to be positively correlated with 5- and 6-year-olds’ ability to benefit from adult non-verbal feedback in a referential production task. Children with higher WM improved their use of discourse-appropriate referential expressions in the course of the experiment when they received feedback that they were being uninformative. In a sample of monolingual German-speaking 8- to 10-year-olds Torregrossa (Reference Torregrossa, Choi, Demirdache, Lungu and Voeltzel2017) also found a positive correlation between WM – indexed by backward-digit-span scores – and the discourse-appropriate use of demonstrative pronouns in a story-telling task pronouns. In the light of Wardlow's (Reference Wardlow2013) preliminary findings with adult speakers, Torregrossa's (Reference Torregrossa, Choi, Demirdache, Lungu and Voeltzel2017) findings with 8- to 10-year-olds, and the results in the feedback condition for the 5- and 6-year-olds in Wardlow and Heyman's (Reference Wardlow and Heyman2016) study, it is theoretically interesting to test whether the relationship between choice of referring expressions and verbal WM generalizes to bilingual child speakers.
The role of language experience, language proficiency, and SES
A parallel but independent line of research has shown, albeit not uncontroversially (see Valian, Reference Valian2015), that cognitive control is one area in which bilinguals may have an advantage over monolinguals (Bialystok, Reference Bialystok2015). If bilingual children do have an advantage when it comes to inhibiting information that is in their privileged ground and promoting information in the common ground, and if this kind of cognitive control is conducive to referential communication, it follows that bilingual children should, in principle, be more successful in choosing discourse-appropriate linguistic expressions in a referential communication task that requires cognitive control. To date, no studies have directly investigated whether individual differences in cognitive control and WM confer an advantage to young bilinguals when it comes specifically to referential choice. The literature on referential expressions in bilingual children and adults has principally focused on the issue of cross-linguistic influence, and on whether the interpretation of third person pronouns is affected in a null-subject language when the other language has obligatory overt subjects (Serratrice & Hervé, Reference Serratrice, Hervé, Serratrice and Allen2015). More recently some studies with infants and young children have reported a bilingual advantage for sensitivity to referential cues (Fan, Liberman, Keysar & Kinzler, Reference Fan, Liberman, Keysar and Kinzler2015; Liberman, Woodward, Keysar & Kinzler, Reference Liberman, Woodward, Keysar and Kinzler2017)
Although superior cognitive control skills may put bilingual children in a privileged position in terms of perspective-taking and referential choice, other factors must also be considered as predictors of discourse-appropriate linguistic choices. The bilingual language experience is, by its very nature, distributed across language, and – at least in relative terms – bilingual children receive proportionally less input in each language that monolingual children. Although relative amount of exposure is only an indirect and imperfect approximation of input quantity (Carroll, Reference Carroll2017; De Houwer, Reference De Houwer, Grüter and Paradis2014; Hurtado, Grüter, Marchman & Fernald, Reference Hurtado, Grüter, Marchman and Fernald2014), it has repeatedly been shown to correlate robustly with measures of language proficiency (Hoff, Welsh, Place & Ribot, Reference Hoff, Welsh, Place, Ribot, Grüter and Paradis2014; Unsworth, Reference Unsworth2013).
It is plausible to expect a positive correlation between overall language skills and the ability to select discourse-appropriate referring expressions. Hence, whatever advantage superior cognitive control skills might confer to bilinguals when it comes to referential choice, if any, it may be offset by lower language proficiency when compared to monolingual children. Ryskin et al. (Reference Ryskin, Brown-Schmidt, Canseco-Gonzalez, Yiu and Nguyen2014) make a similar claim to account for the lack of a bilingual advantage in a spatial perspective-taking task with adults. Some evidence that language proficiency may play a role comes from a referential communication study (Fan et al., Reference Fan, Liberman, Keysar and Kinzler2015) which also included measures of language proficiency (receptive vocabulary), cognitive control, and fluid intelligence, in a group of monolingual 5-year-olds and two groups of age-matched children who were either bilingual, or exposed to a multilingual environment. The only significant effect was that of group with both the bilingual and multilingual exposure children outperforming the monolinguals. Crucially the three groups did not differ in terms of receptive vocabulary, and therefore it remains to be seen whether bilinguals with lower language skills than monolinguals might be adversely affected in a linguistic task.
Another variable that may potentially affect children's linguistic and cognitive performance is SES. SES is a complex construct and it is considered a proxy for access to a range of economic, educational and occupational resources (Hauser & Warren, Reference Hauser and Warren1997; McLoyd, Reference McLoyd1998). Although there is a vast and expanding literature on the relationship between SES and language and cognitive development, attributing a causal role to SES in child development is not straightforward because SES is a multifaceted notion and so are language and cognition (Duncan & Magnuson, Reference Duncan and Magnuson2012). For example, SES has been shown to affect vocabulary size but not utterance length (Hoff-Ginsberg, Reference Hoff-Ginsberg1998), grammar but not pragmatic development (Wells, Reference Wells, Fletcher and Garman1986), and the effects are greater for expressive than receptive vocabulary (Snow, Reference Snow and MacWhinney1999).
In monolinguals the complex relationship between linguistic and cognitive development and SES is well documented (Hackman & Farah, Reference Hackman and Farah2009; Hackman, Gallop, Evans & Farah, Reference Hackman, Gallop, Evans and Farah2015). When it comes to bilingual children, there is inevitably an added layer of complexity. In bilingual populations SES also has a predictive role on language and cognitive skills, although it is not often easy to tease apart the relative contribution of bilingualism and SES. In many studies there are significant cultural differences between the bilingual and the monolingual groups, and the immigrant status of the bilinguals may present an additional confound. A number of studies have recently tried to disentangle SES from bilingualism (Calvo & Bialystok, Reference Calvo and Bialystok2014; Carlson & Meltzoff, Reference Carlson and Meltzoff2008) and the main finding seems to be that both bilingualism and SES independently account for the variance observed in linguistic and cognitive tasks. The relationship between SES, bilingualism, and language and cognitive performance is however complex (Gathercole, Kennedy & Thomas, Reference Gathercole, Kennedy and Thomas2015) and is mediated by language exposure, age and the specific aspect of language (e.g., vocabulary vs. grammar), or of non-verbal cognition being tested.
The present study
To date, the relationship between perspective-taking skills, cognitive control, verbal WM, and referential choice has mostly been studied in the context of online comprehension. Studies investigating the predictive role of executive function skills in production have reported mixed results (Nilsen & Graham, Reference Nilsen and Graham2009; Wardlow, Reference Wardlow2013; Ryskin et al., Reference Ryskin, Benjamin, Tullis and Brown-Schmidt2015; Torregrossa, Reference Torregrossa, Choi, Demirdache, Lungu and Voeltzel2017; Wardlow & Heyman, Reference Wardlow and Heyman2016).
The first aim of the present study is to test whether cognitive control, as measured by the Simon task, and verbal WM, as measured by backward digit recall, are predictive of referential choice in a production task in which child participants need to build a complex situation model and identify a target referent in settings in which we manipulate the presence of discourse and visual competitors. The prediction is that the Simon task score and the backward digit recall score will correlate positively with the informativeness of the participants’ referential choices.
The second aim of the present study is to investigate the contribution of language experience to perspective-taking abilities and referential choice. English-speaking monolingual children and bilingual children with varying degrees of exposure to a language other than English (henceforth the home language) are therefore included in the study. Language experience is conceptualized here both in terms of cumulative amount of exposure and use of the home language (Bilingual Profile Index, BPI, De Cat, Gusnanto & Serratrice, 2017; De Cat & Serratrice, Reference De Cat and Serratriceunder review), and in terms of language proficiency as measured by the Articles sub-test of the Diagnostic Evaluation of Language Variation (Seymour, Roeper & de Villiers, 2003), a dialect-neutral assessment for 4- to 9-year-olds, that minimizes the effects of language exposure differences in bilingual and bicultural children. We expect that children with better language proficiency – which is in turn likely to be predicted by the amount of exposure and use of English – will be more sensitive to the presence of discourse and visual competitors. It is also conceivable that language experience and language proficiency would interact, such that bilingual children might display an advantage only if their English proficiency falls within the range of their monolingual counterparts – as shown by Fan et al. (Reference Fan, Liberman, Keysar and Kinzler2015).
Finally, studies of perspective-taking skills have typically investigated the comprehension and use of NPs containing disambiguating size or colour adjectives (e.g., the small duck, the red square) that directly pick out an entity in a visual display and are therefore not anaphoric (e.g., Nilsen & Graham, Reference Nilsen and Graham2009; Wardlow & Heyman, Reference Wardlow and Heyman2016). In contrast, in the present study we are focusing on the use of anaphoric expressions, i.e., third person pronouns vs. NPs, and on how the discourse and visual contexts determine the choice of a referential expression for a target referent in the presence of one or two antecedents that may be either visually present, linguistically mentioned, both, or neither.
The experiment is modelled on the studies in Fukumura, van Gompel and Pickering (Reference Fukumura, van Gompel and Pickering2010) with monolingual adult participants where they manipulated the linguistic mention and the visual presence of a competitor to a target referent. Although Fukumura et al. (Reference Fukumura, van Gompel and Pickering2010) did not address this issue, the use of an NP in conditions in which a pronoun is ambiguous should – at least partly – be predicted by cognitive control and verbal WM. Those participants that are more successful at inhibiting their egocentric perspective, and have better WM resources to deal with a complex scene, should be those that are sensitive to the presence of a discourse and visual referent that is in competition with the target.
Our prediction is that, if – similarly to adults – children are sensitive to both the linguistic and the non-linguistic features of the context in creating a discourse model, they will produce more informative referential expressions, i.e., full NPs (e.g., the princess, the cowboy) when the competitor is previously mentioned and when it is visually present.
SES will be included as a predictor in the analyses alongside measures of language proficiency, language exposure and use, cognitive control and verbal WM, to assess the contribution that these child-internal factors might make to the use of anaphoric expressions in a demanding language production task.
Methods
Participants
After receiving ethical approval for the study by the University Research Ethics Committee of the second author's institution, children were recruited in state primary schools in the North of England. The final sample included 172 children attending year 1 or year 2 of primary school (between the ages of 5 and 7), all of whom were schooled exclusively in English. Half of the children (N = 87) were also exposed to a language other than English at home; these children will be referred to as bilinguals. In this study we adopted a broad definition of bilingualism that reflects the typical situation of many classrooms in the UK where children are classified as learners of English as an Additional Language (EAL) if ‘a first language, where it is other than English, is recorded where a child was exposed to the language during early development and continues to be exposed to this language in the home or in the community.’ (DfE School Census Guide 2016-2017, p.63). Because of this inclusionary criterion, the children in our bilingual group had a wide range of exposure (as low as 9%) to 28 different home languages: Punjabi (21% of bilingual participants), Urdu (17%), Arabic (9%), French (8%), Spanish (6%), Bengali, Cantonese, Catalan, Dutch, Farsi, Greek, Hindi, Italian, Kurdish, Mandarin, Marathi, Mirpuri, Nepalese, Pashto, Polish, Portuguese, Shona, Somali, Swedish, Tamil, Telugu, Thai, Tigrinya (languages with no percentage indicator accounted for less than 5% of the sample). Our bilingual group was therefore deliberately heterogeneous to capture the variability of children who are currently considered as bilingual (EAL learners) in multilingual classrooms in the UK, and to capitalise on the notion of bilingualism as a continuous measure.
Measures
In addition to the main referential communication task that is the object of this study, we collected information on the children's SES, on their exposure and use of English and the home language, and tested their proficiency in English, their verbal WM and their cognitive control skills.
Socio-economic Status (SES).
The children came from schools in a range of different catchment areas to ensure variation in SES. We collected information on parental education and occupation via questionnaires. Children were allocated an SES score on the basis of the highest level of occupation or education in the household (either mother or father). Education was coded on a five-point scale (none, primary, secondary, further, university), and the occupational data was coded according to the reduced method of the UK National Statistics socio-economic classification. We used the reversed occupational data scores to make the interpretation of the association with the educational level data more transparent, so that a higher value represents an advantage. As expected there was a strong association between the two measures (Χ 2 (4, N = 174) = 83.57, p < 0.0001). We also found a weak but significant negative correlation between level of bilingualism as measured by the children's cumulative amount of exposure and use measured by the Bilingual Profile Index – as described below – and SES as measured by parental occupation (r = −.25, p = 0.0009).
Language exposure and use
We used a parental questionnaire to estimate the bilingual children's relative amount of exposure and use of English and of the home language. The questionnaire, which includes both current and cumulative estimates of the amount of exposure and use, is modelled on the BiLEC (Unsworth, Reference Unsworth2013). The parents (usually the mother) completed the questionnaire in English, Bengali, Punjabi or Urdu with the help of a bilingual assistant. They were asked to quantify the amount of their child's current exposure and use of the two languages on a typical school day, at weekends, and during holiday periods. School days were divided into slots of one hour before and after school during which children were exposed predominantly to English. It is possible that children may have used the home language with some same-language peers at school but because parents – and not teachers – were asked to complete the questionnaire, we did not have access to this information and we conservatively assumed that during school hours children only heard and used English. Parents were asked about all of the child's interlocutors, and to estimate on a five-point scale how often they addressed the child in the home language (never, rarely, half of the time, usually, always). We later converted the scores into discrete percentage bands ranging from 0 (never) to 100% (always). Parents were also asked to recall age of first exposure to English. To calculate the current relative amount of exposure to English and the home language for a given child we extrapolated the number of hours that the child spends with each interlocutor on a yearly basis, and we multiplied this figure for the percentage of time the child used either English or the home language with each interlocutor. The percentages for each of the child's interlocutors were added and then divided by the total number of hours of interaction pooled for all interlocutors; if several interlocutors were present at the same time, the estimate was divided by the number of interlocutors for the relevant time window. The resulting was a percentage expressing the relative amount of input for English and the home language. We applied the same method to the calculation of a relative measure of child's output, i.e., use of English or the home language. For the cumulative amount of input/output in each language we firstly calculated the number of months of home language use only, i.e., before children were exposed to English – this was 0 for the simultaneous bilingual children – we then multiplied the number of months of bilingual exposure by the proportion of current input/output. The resulting figure is the total number of months equivalent to full-time exposure to the home language.
The use of parental questionnaires to collect information on quantity and quality of child-directed input has obvious limitations and has lately come under critical scrutiny (Carroll, Reference Carroll2017). Although we acknowledge the constraints of this data collection method, we are also confident that it is a pragmatic solution whose validity and robustness have been repeatedly confirmed (De Houwer, Reference De Houwer2017; Paradis, Reference Paradis2017).
Current and cumulative measures of input and output in the home language were highly correlated in our sample (current input and output: r = .90, p < 0.0001; cumulative input and output: r = .95, p < 0.0001). Because we wanted to use both dimensions of the language experience as predictors in our analysis but needed to avoid collinearity for modelling purposes, we used Principal Component Analysis (PCA) to decorrelate the two measures and create a composite score of cumulative input and output which we call the Bilingual Profile Index (BPI, De Cat et al., Reference De Cat, Gusnanto and Serratrice2017; De Cat & Serratrice, under review). The PCA of cumulative input and cumulative output yielded two principal components, the first of which captured 98% of the variability (given the strength of the correlation between the two cumulative measures). The BPI scores correspond to the loadings of that first component, reversed (so that a higher score corresponds to more experience in the home language) and aligned with a score of 0 for monolinguals. The BPI can be interpreted as a cumulative and gradient measure of a bilingual child's experience of their home language, effectively close to the number of full-time months of exposure corrected for any imbalance between exposure and use. The range of the BPI in our sample is from 0 to 96.
Language proficiency
We used the Articles sub-test of the Diagnostic Evaluation of Language Variation – DELV (Seymour et al., Reference Seymour, Roeper and De Villiers2003) as a measure of language proficiency in English, the language of schooling. The DELV is a language assessment of syntax, semantics, pragmatics and phonology for children between the ages of 4 and 9. This test was specifically developed to neutralize dialectal differences and it focuses on language structures that are common to all children from English-speaking backgrounds regardless of the particular variety of English they speak. We chose the Articles sub-test as an independent measure of language proficiency as it taps into some of the same discourse-pragmatic skills that are required for the appropriate use of referential expressions.Footnote 1
Verbal working memory (WM)
We used the Backward Digit Span task from the Wechsler Intelligence Scales for Children (Wechsler, 1991) as a proxy measure for children's verbal WM capacity. The backward digit span was administered according to the WISC-IIIUK instructions: for each digit span the experimenter administered two trials, regardless of whether the first trial was passed or failed, and discontinued the test after failure on both trials of any item. Backward digit recall is one of three complex memory span measures (the other two being listening recall and counting recall) that in a confirmatory analysis were shown to load onto one single factor by Gathercole, Pickering, Ambridge and Wearing (Reference Gathercole, Pickering, Ambridge and Wearing2004). Unlike forward digit recall, which only requires the storage and immediate recall of a sequence of spoken items and taps into the phonological loop, backward digit recall implies both the phonological loop, for the storage of items, and the central executive, for the additional processing in the reversing of the digits.
Cognitive control
Children were administered a computer-based version of the Simon task (Simon & Wolf, 1963) programmed and run via E-Prime. The Simon task is considered a complex response inhibition task (Garon, Bryson & Smith, Reference Garon, Bryson and Smith2008) because it involves moderate WM demands in addition to the inhibition of a prepotent response. Participants need to hold a rule in mind (press the left button when you see x, press the right button when you see y), respond according to this rule (physically press the key), inhibit a prepotent response when the rule changes and respond accordingly (press left button when you see y, press the right button when you see x).
The Simon task is one of many complex inhibition tasks that have been used in the developmental literature to measure children's ability to inhibit a prepotent response while responding to a salient conflicting response option (see Garon et al., Reference Garon, Bryson and Smith2008 for a comprehensive review). With specific reference to the bilingual-monolingual comparison, previous studies have shown that bilingual children outperform monolingual peers only in tasks that assess the interference suppression component of cognitive control (Bialystok & Shapero, Reference Bialystok and Shapero2005; Qu, Low, Zhang, Li & Zelazo, 2016), but not in tasks that assess response inhibition alone (Martin-Rhee & Bialystok, Reference Martin-Rhee and Bialystok2008).
Children sat in front of a 15.6” computer screen and used an E-Prime serial response button box with colour-coded buttons (red on the left and green on the right). Children started with 8 practice trials followed by 48 test trials; there was no neutral condition in which the coloured square would appear in the middle of the screen. Accuracy and Reaction Times (RTs) were automatically recorded by E-Prime. The index of cognitive control abilities used as a predictor in the present study corresponds to the modelled score in the Simon task, i.e., children's score adjusted for age, SES, bilingual experience (indexed by the BPI), and accuracy at the previous trial.Footnote 2 These correspond to the significant predictors of a Cox Proportional Hazard regression analysis, as reported in detail in De Cat et al. (Reference De Cat, Gusnanto and Serratrice2017). The Cox PH model captures response accuracy and speed within the same analysis, so the resulting score combines both aspects of children's performance.
Table 1 provides descriptive statistics for the monolingual and bilingual groups.
Table 1. Bilingual and monolingual participants by gender, age, SES, and cumulative language exposure and use (bilinguals only)
![](https://static.cambridge.org/binary/version/id/urn:cambridge.org:id:binary:20210119073730145-0852:S1366728918000962:S1366728918000962_tab1.png?pub-status=live)
Materials and experimental design
Following the design of the studies in Fukumura et al. (Reference Fukumura, van Gompel and Pickering2010), the experiment manipulated the visual presence and the linguistic mention of a competitor to a target referent in a 2 × 2 design in four conditions: competitor present and mentioned, competitor present and not mentioned, competitor absent and mentioned, competitor absent and not mentioned. There were five items in each of the four conditions and ten filler items. Each experimental item consisted of a set of two coloured photographs of iconic Playmobil characters (e.g., fireman, cowboy, ghost, queen), while the fillers included coloured geometric shapes and animals. Both the first and the second photograph in the experimental set always included the target referent (e.g., a fireman). In the competitor present conditions another referent of the same gender also appeared in both photographs (e.g., a fireman and a pirate). Half of the experimental items contained characters of feminine gender, and the position of the target and the competitor was counterbalanced throughout the experiment.
See Figures 1 and 2 for examples of experimental items in the competitor visually present or absent conditions, and the Appendix for a full set of experimental and filler items.
![](https://static.cambridge.org/binary/version/id/urn:cambridge.org:id:binary:20210119073730145-0852:S1366728918000962:S1366728918000962_fig1.png?pub-status=live)
Fig. 1. First picture in the no visual competitor conditions.
![](https://static.cambridge.org/binary/version/id/urn:cambridge.org:id:binary:20210119073730145-0852:S1366728918000962:S1366728918000962_fig2.png?pub-status=live)
Fig. 2. First picture in the visual competitor conditions.
The first photograph in each set was presented alongside a digitally recorded sentence spoken by a female native speaker of Northern British English. The sentence was a passive whose subject contained a genitive phrase where the possessor was the animate target referent and the possessum was an inanimate entity (e.g., The fireman's bed has been made). In the conditions in which the competitor was mentioned it appeared in the passive's by-phrase (e.g., The fireman's bed has been made by a pirate).
The rationale for embedding the target referent as the possessor in a genitive phrase (e.g., The fireman in The fireman's bed) was to reduce its accessibility and thus generally decrease the likelihood that participants would only ever use pronouns in their continuation. It also allowed us to tease apart sentence-initial position from topichood. Like Fukumura et al. (Reference Fukumura, van Gompel and Pickering2010) we also wanted to ensure that the bias towards using a pronoun for a highly salient subject antecedent would not completely obliterate the role of the visual context. The photographs were embedded in a PowerPoint presentation. The second picture appeared after the first had disappeared off the screen and was accompanied by the pre-recorded prompt “And now…”.
Procedure
The children were tested on school premises. Two female experimenters took part in the task; experimenter A sat next to the participant; the participant sat in front of a laptop computer and the two were separated by a divider so they could not see what the other was looking at but they could see each other. Experimenter B introduced the task to the participant as a communication game and explained that the aim was to give instructions to experimenter A so that she could re-create the scenes in the child's pictures with the toys that she was given by experimenter B. Experimenter B pressed the space bar on the child's laptop on each trial to start the experiment and to move on to the next item. Before the experiment started there were two practice trials with feedback. No children had to be discarded for not understanding the task. At the start of each trial experimenter B pressed the space bar and the first picture appeared on the computer screen accompanied by the pre-recorded linguistic description (e.g.. “The fireman's bucket has been filled (by a musician)”) lasting an average of 4000 ms. The space bar was pressed again at the end of the sentence and the target picture would appear accompanied by the prompt “And now…”. This was the participant's cue to start giving directions to experimenter A to arrange the toys to recreate the scene that the child would describe (e.g., And now the fireman/he/the man is carrying the bucket). Experimenter A had the same toys that were present in the child's picture. When the participant had completed their instruction they looked round the divider to see whether the experimenter's toy arrangement matched the photograph on their computer screen. The experimenter remained in their seat, they showed the participant their toys and asked “Like that?”. Whenever the participant used an under-informative pronoun, experimenter A always chose the competitor to give the participant indirect feedback about their level of underinformativity.
Transcription and scoring
Participants’ instructions to experimenter A were digitally recorded and transcribed using CHAT for CHILDES (MacWhinney, Reference MacWhinney2000); utterances were later imported into Excel and coded for the following features: mention of target referent (1 = target referent; 0 = competitor); label used (repeated name from the preamble sentence, e.g., the king; an alternative label in the same semantic field– e.g., the prince instead of the king; an alternative label that only matched the referent in gender, e.g., the man instead of the king, the lady instead of the dentist); discourse integration (1 = pronouns and definite NPs anaphorically referring to the target referent – e.g., he/she/the queen; 0 = indefinite pronouns – e.g., somebody – and indefinite NPs – e.g., a man – that do not make clear anaphoric reference to the target).
The “discourse integration” coding operates a binary distinction between anaphoric and non-anaphoric expressions; the “label used” coding provides a more fine-grained distinction within different types of anaphoric referential expressions. While the king, the prince, the man are all definite NPs, they vary along a continuum of disambiguating information. We deliberately chose stereotypical and easily identifiable referents for the experimental items (i.e., king, fireman, astronaut, queen, nurse, etc.). To be maximally informative in the task, participants should ideally have used the label that was provided in the preamble description associated with the first photograph in the experimental pair. Using a different and less informative label might lead to potential ambiguity that would, in turn, increase as a function of the label's lack of informativeness. So, in the case of a label in the same semantic field (e.g., prince instead of king) the likelihood of ambiguity would not be as high as in the case of a highly underspecified definite NP like the man that would give experimenter A only a vague cue to select the appropriate target toy to reconstruct the scene, and would be just as underinformative as a third person or an indefinite pronoun.
Results
Table 2 provides descriptive statistics for the results of the DELV Articles sub-test (language proficiency), the backward digit recall task (verbal WM) and the Simon task (cognitive control) for the monolingual and the bilingual groups. Note that the scales are different for the three measures. For the DELV, it is accuracy proportion from 0 to 1; for the backward digit recall it is the number of accurately recalled digits from 0 to 4 (as a score), and for the Simon task it is an index of cognitive control adjusted for age, SES, bilingual experience and accuracy at the previous trial; negative scores indicate better cognitive control skills.
Table 2. Language proficiency, WM and cognitive control scores
![](https://static.cambridge.org/binary/version/id/urn:cambridge.org:id:binary:20210119073730145-0852:S1366728918000962:S1366728918000962_tab2.png?pub-status=live)
A linear regression model fitted using the lme4 package (version 1.1.11) in R (version 3.2.4) to the overall score in the DELV Articles sub-test showed that performance was negatively correlated with the BPI (t(168) = −2.90; p = 0.004); as expected, bilingual children performed more poorly than monolinguals overall, greater exposure and use of the home language was correlated with lower proficiency scores. There was no significant effect of the BPI in the verbal WM task (t(181) = −0.29; p = 0.77). For the Simon task the results of a Cox-P Regression model showed a near-significant effect of group (X2(1) = 3.8, p = 0.05) and a significant effect of home language experience over and above the effect of group, as the BPI was a positive predictor (X2(1) = 12.13, p = 0.0005). There was however no significant interaction between bilingualism and cue congruency, and hence no Simon effect in the strict sense (in line with previous studies).
We conducted three analyses to address the role of cognitive control, verbal WM, cumulative home language exposure and use, SES, and language proficiency on the children's use of referential expressions. In the first analysis, following Fukumura et al. (Reference Fukumura, van Gompel and Pickering2010), our DV only included exact repetitions of the target referent named in the context sentence vs. the use of third person pronouns. Two further analyses were necessary to capture the broader picture. In the second analysis, we included all referential expressions that made anaphoric reference to the target and investigated their informativeness by creating a binary DV: (1) underinformative expressions: third person singular pronouns (e.g., he/she) and underinformative definite NPs – e.g., the man instead of the king, the lady instead of the queen; and (2) definite NPs that were either exact repetition of the definite NP in the preamble sentence, or semantically related labels (e.g., the prince instead of the king, the singer instead of the musician).
The third analysis identifies the factors that predict lack of discourse integration. We used a two-way distinction between indefinites signalling a lack of anaphoric discourse integration (i.e., indefinite NPs and indefinite pronouns), and pronouns and definite NPs that made anaphoric reference to the target.
We fitted generalized linear mixed models using the lme4 package (version 1.1.15) in R (version 3.4.4). The models were fitted incrementally by adding predictors one by one and retaining them only if they improved the model fit, yielding a significant reduction in AIC and a significant R-squared value, with model comparison estimated by likelihood ratio tests (Baayen, Reference Baayen2008). In each of the three analyses we treated item as a random factor, participant was not included as random factor because it would compete with the fixed factors capturing participant-related variables such as the BPI, SES or proficiency. We tested for the significance of the following fixed factors: the presence/absence of a discourse or a visual competitor, the Simon task score (cognitive control), the backward digit recall score adjusted for age and proficiency (verbal WM), the DELV Articles sub-test score (language proficiency), the BPI score (cumulative home language use and exposure), the SES score, and age (in months). Age and Simon task scores were centered to facilitate the interpretation of the models. The following interactions were also tested in all analyses: visual competitor x discourse competitor (yielding the 4 experimental conditions), discourse competitor x each participant-related predictor (BPI, SES, WM, cognitive control), visual competitor x each participant-related predictor (BPI, SES, WM, cognitive control), BPI x SES, BPI x proficiency, WM x proficiency. Gender was added as a covariate. Age correlated strongly with other participant-related predictors and could therefore not be included in the models without resulting in lack of convergence. In the following we report the optimal models.
To be consistent with the protocol in Fukumura et al. (Reference Fukumura, van Gompel and Pickering2010) we excluded references to the competitor. The total amount of data points expected, given the number of participants (172) and items (20), was 3440; there were 66 no response; therefore the actual number was 3374. We excluded the following data from all analyses: 86 items were excluded because of reference to the competitor, or because the utterance was (partly) unintelligible. We also excluded a problematic experimental item (N = 115) for a total of 201 items, i.e. 6% of the data.
In the first analysis, the repeated name was expected to feature as the subject in the first sentence that participants produced to describe the second picture in the experimental item. As in Fukumura et al. (Reference Fukumura, van Gompel and Pickering2010) we excluded a further 155 tokens where the target referent was indefinite or lacked a determiner, as well as 310 tokens that were not exact repetitions of the named referent. Altogether, 19% of the data was excluded from the first analysis. The remaining responses included a total of 1766 NPs and 942 pronouns.
The dependent variable was the likelihood of producing a definite NP (as opposed to a pronoun) to identify the target referent in the second picture of the experimental items. We used logistic regression to model the probability (in terms of logits) associated with the values of the dependent variable. NP use was predicted by the visual presence of a competitor (z = 3.21, p < .001), and there was a negative correlation between the BPI and NP use (z = −3.47, p < .001) showing that bilingual children with more exposure to the home language produced fewer NPs. There was a significant interaction between the Simon task score and the presence of a discourse competitor (z = 2.09, p < .05) indicating that sensitivity to the presence of a discourse competitor was positively associated with better cognitive control skills. The interaction between WM and language proficiency was also significant (z = 8.39, p < 0.001); children with better WM capacity and better proficiency produced more NPs. The model did not converge with the addition of age as a continuous predictor. Including a binary predictor for age (5- and 6-year-olds) resulted in a significantly worse model fit in this and in all subsequent analyses.
![](https://static.cambridge.org/binary/version/id/urn:cambridge.org:id:binary:20210119073730145-0852:S1366728918000962:S1366728918000962_fig3.png?pub-status=live)
Fig. 3. Partial effects of visual and discourse competitors on the likelihood of using a full, as per the mixed-effect model for Analysis 1. (0 indicates no competitor).
![](https://static.cambridge.org/binary/version/id/urn:cambridge.org:id:binary:20210119073730145-0852:S1366728918000962:S1366728918000962_fig4.png?pub-status=live)
Fig. 4. Partial effects of bilingualism (LEFT), English proficiency (CENTER) and working memory (RIGHT) on the likelihood of producing a full NP, as per the optimal mixed-effect model for Analysis 1. The likelihoods are shown for the experimental condition in which there is no competitor (visual or discourse). Intercept adjustments for each condition are shown in figure.
To investigate whether there was indeed a trade-off between language proficiency and language experience that may disadvantage bilingual children, we compared the use of NPs in bilingual and monolingual children who performed above and below the monolingual mean on the DELV. In this additional analysis visual presence of a competitor remained significant (z = 3.19, p < .001), and so was the main effect of verbal WM (z = 3.93, p < .0001). Language experience and language proficiency were significant predictors. Monolingual children as a group used more NPs (z = 3.35, p < .001) and all children with language proficiency above the mean also used more NPs (z = 9.35, p < .001). There was a significant interaction between the Simon task score and the presence of a discourse competitor (z = 2.12, p = .03). Further, there was an interaction between language experience (monolingual/bilingual) and language proficiency (below/above the monolingual mean) (z = −2.15, p = .03) whereby monolingual children below the language proficiency mean used more NPs than bilingual children below the language proficiency mean. For children above the language proficiency mean there was no difference as a function of language experience, as shown in Figure 5.
![](https://static.cambridge.org/binary/version/id/urn:cambridge.org:id:binary:20210119073730145-0852:S1366728918000962:S1366728918000962_fig5.png?pub-status=live)
Fig. 5. The interaction between language experience (monolingual, bilingual) and language proficiency on the production of NPs (analysis 1).
As children used NPs other than the repeated name in their story continuation, in a second set of logistic regression analyses, we investigated the level of informativity of the label used to identify the target referent. The dependent variable included all the referential expressions that children used to identify a target referent where there was evidence of an attempt at discourse integration; we therefore excluded all bare nouns, indefinite NPs and indefinite pronouns (155 items), with 8.3% of data excluded in total. The dependent variable was binary and had two levels: (1) underinformative expressions – third person singular pronouns and less informative definite NPs (e.g., the man; the lady), and (2) more informative definite NPs (repeated NPs from the preamble, semantic substitutions, e.g., the prince for the king). Using the WM score where language proficiency and age were partialled out did not allow the model to converge, we therefore used the raw WM score. The optimal model shows that children were more informative in the presence of a visual competitor (z = 2.15, p = .03), while the mention of a discourse competitor had no significant effect (z = −1.15, p = .25). The interaction between WM and language proficiency was a significant predictor of informativity (z = 9.59, p < .001), while none of the other predictors made a significant contribution to the model.
As we did earlier, we repeated this analysis including the mean monolingual language proficiency as a threshold to investigate a potential language proficiency disadvantage for bilingual children in the production of informative NPs. The effect of visual competitor was significant (z = 2.14, p = 0.03), as was the effect of WM (z = 4.88, p < .001). Similarly to what we found in the first set of analyses, monolingual children (z = 3.56, p < 0.001) and children with language proficiency above the monolingual mean (z = 9.51, p < 0.001) produced significantly more informative NPs. The significant interaction between language proficiency and language experience (z = −2.18, p = 0.03) showed once again that there was no difference as a function of language experience for children whose proficiency was above the monolingual mean, but for those below the mean threshold monolinguals produced more informative NPs.
Our third and final set of analyses investigated the possible causes for not encoding the target referent with a definite NP or a pronoun (which resulted in exclusion from the first and the second analyses). This third analysis revealed whether children were able to integrate the discourse information provided in the preamble – where the target was introduced with or without a competitor – and the target in their own scene description. The dependent variable was the definiteness of the target expression used, a proxy measure for discourse integration. Only bare nouns were excluded (44 items), on top of the items excluded from all analyses. The excluded items amounted to 7.3% of the data in total. In this logistic regression analysis, the coefficients indicate the likelihood of using a definite expression, thereby integrating the target expression with the preceding discourse without discriminating further between more informative full NPs and less informative pronouns. Very few items displayed lack of discourse integration: 3% in monolinguals and 4% in bilinguals.
The presence of a visual competitor adversely affected discourse integration (z = −2.87, p < .001); children were more likely to use an indefinite expression, rather than a definite NP, when a competitor was visually present. More exposure to the home language also negatively affected the production of definite expressions in bilingual children (z = −2.96, p < .001). Children with better cognitive control skills (z = 3.14, p < .001) and boys (z = 2.89, p < .01) were more likely to produce a referential expression that connected the target description to the previous discourse. Finally, the significant interaction between the visual presence of a competitor and of its discourse mention (z = 2.26, p < .05) indicates that children were more likely to introduce the target referent anew in the presence of a visual competitor (and even more so when the competitor had also been introduced in the discourse).
We repeated this final analysis by including a language proficiency threshold as we did previously and we confirmed a significant negative effect of the presence of a visual competitor (z = −2.94, p < .001), a significant positive effect of cognitive control (z = 3.08, p < .001), a significant effect of gender with boys outperforming girls (z = −2.88, p < .001). There was a significant interaction between the presence of a discourse competitor and cognitive control skills (z = 2.34, p < .01) with children with better cognitive control skills producing more NPs in the presence of a linguistically mentioned competitor. No other main effects or interactions were significant.
Discussion
The aim of this study was to investigate whether 5- to 7-year-old children, with or without exposure to another language in addition to English, can use both discourse and visual information in a complex referential communication task. Cognitive control skills, verbal WM, language proficiency, language exposure and use, and SES were investigated as predictors of the choice of discourse-appropriate anaphoric expressions in the task.
The role of cognitive control and WM in referential choice
With the exception of analysis 2, cognitive control – as indexed by the Simon task score – was a significant predictor of NP use. In analysis 1 and 3 – when a language proficiency threshold is introduced as a predictor – better cognitive control predicted sensitivity to the presence of a discourse competitor. In analysis 3, better cognitive control also predicted discourse integration in the absence of the additional language proficiency threshold.
Within the context of the current experiment, the manipulation of the presence and discourse mention of a competitor to the target referent unpredictably varied the need to resolve a referential conflict. In the condition in which the target had no linguistic or perceptual competition, no conflict arose. However, in the remaining three conditions the discourse and/or perceptual presence of a competitor created a referential conflict. The resolution of this conflict required the children to both inhibit the preferred choice of a pronoun for a recently mentioned target referent, and to use a more informative referential expression (a NP) instead for the benefit of their addressee. The unpredictability of an upcoming potential referential conflict necessitated a level of monitoring that we hypothesised would correlate with their cognitive control abilities as indexed by the performance on the Simon task.
We never found an interaction between language experience and cognitive control in the prediction of NP use suggesting that cognitive control abilities conferred an advantage to both groups of children independently of bilingualism, contrary to our initial hypothesis. This could be because the bilingual advantage for cognitive control abilities in this group of children was modest (albeit significant, see also De Cat et al., Reference De Cat, Gusnanto and Serratrice2017). In our predictions we also hypothesised that whatever bilingual advantage there might be in cognitive control might be offset by bilingual children's lower proficiency skills. We did find, at least in analysis 1 and 3, that the degree of exposure and use of the home language negatively correlated with NP use before controlling for language proficiency. In an additional set of analyses we investigated whether keeping language proficiency constant for the monolingual and the bilingual children might mitigate the proficiency disadvantage against the bilinguals. Using the mean performance of the monolingual children on the language proficiency task, we split the groups above and below the monolingual mean, and we did repeatedly find that those bilingual children who had language proficiency skills above the monolingual mean were no different from their monolingual counterparts in the use of informative NPs. They were however no better, as might be expected on the assumption of a bilingual advantage in cognitive control. The reason for this lack of bilingual advantage, once proficiency was controlled for, is likely to stem from the heterogeneity of our bilingual group. We deliberately had very broad selection criteria for the bilingual children in our recruitment schools so that we could include all of the children that were classified in the UK education system as having English as an additional language (EAL learners). This resulted in children who differed vastly in the cumulative amount of input and output and in the range of languages spoken. As our understanding of the bilingual cognitive advantage is progressively refined we now know that a large number of variables, both at the level of the individual bilingual speakers and at the level of the tasks used (Mishra et al., 2012), can significantly affect the presence of said advantage. Among other things, language distance, interactional situations – i.e., the degree to which bilinguals use their two languages on a daily basis in conversational contexts (Green & Abutalebi, Reference Green and Abutalebi2013) – and immigrant status have all been shown to potentially play a role on the presence of a bilingual cognitive advantage (Bialystok, Reference Bialystok2017). In our sample we had a large range of typologically different languages that are more or less closely related to English (e.g., Swedish vs. Cantonese), and we did not collect information on children's daily pattern of interactional contexts, i.e., whether they were more likely to find themselves in single-language situations, dual-language situations, or in contexts with a high density of code-switching (see Green & Abutalebi, Reference Green and Abutalebi2013, for the role of interactional contexts on cognitive control). In the absence of this information we can therefore only speculate as to the precise nature of the lack of a bilingual advantage.
In relation to the experimental manipulations of the competitor, cognitive abilities did not predict sensitivity to the presence of a visual competitor, presumably because of young children's very high sensitivity to visual cues (which was unaffected by any participant-related factor), but they did interact with the discourse mention of a competitor. This correlation between cognitive control and choice of NP in the presence of a discourse competitor suggests that children with better conflict monitoring abilities could inhibit the prepotent response to use a pronoun for a referent that was highly salient to them and choose a more informative NP instead for the benefit of their addressee.
The significant effect of verbal WM in interaction with proficiency in analyses 1 and 2 indicates that in this linguistically complex referential communication task, children with a higher WM capacity and better language proficiency were more successful at using either a repeated, definite NP (analysis 1) or more informative expressions (analysis 2) for their listener. The lack of a significant effect for WM in analysis 3 shows that WM capacity did not correlate with discourse integration in more general terms.
Although both definite NPs and pronouns are anaphoric devices that refer back to an antecedent in the common ground, the use of pronouns in the absence of shared common ground suggests lack of perspective-taking. In that case, the pronoun is anaphorically appropriate for the speaker but not for the listener. Choosing a referential expression purely from one's own privileged ground clearly does not necessitate the complex evaluation of two different scenarios (the speaker's and the listener's) and as such does not engage the same WM skills that are necessary when multiple points of view are considered. If children are using pronouns inappropriately, because they are only considering the privileged ground, they are not making the “costly effort” of simultaneously considering their addressee's perspective, an attempt that would pose higher demands on their WM.
Support for the role of verbal WM in the production of expressions in referential communication tasks with child speakers comes from two studies (Torregrossa, Reference Torregrossa, Choi, Demirdache, Lungu and Voeltzel2017; Wardlow & Heyman, Reference Wardlow and Heyman2016) that included an independent measure of verbal WM in a referential production task in school-age children. Wardlow and Heyman (Reference Wardlow and Heyman2016) investigated how feedback affects children's use of underinformative expressions (i.e., NPs lacking a disambiguating size adjective) and the role that WM plays in predicting their ability to actually use feedback to improve their perspective-taking and consequently use discourse-appropriate expressions for the benefit of a naïve instruction-follower. In their study WM was positively correlated with the use of a modifier (e.g., big in the big triangle) only in the feedback condition, although – despite the lack of a significant correlation in the no feedback condition – there was no significant difference in the strength of the two correlation coefficients. This suggests that WM does facilitate children's reliance on feedback to increase their awareness of which referential expressions are needed in the absence of shared common ground. At the same time this result does not exclude that WM might be implicated in perspective-taking skills and the use of discourse appropriate referential expressions more widely. In contrast with the Wardlow and Heyman's (Reference Wardlow and Heyman2016) study – where children were only required to provide a definite NP with or without a modifying size adjective – and Nilsen and Graham (Reference Nilsen and Graham2009) – who did not find a predictive relationship in their production study – our sentence-level referential communication tasks were considerably more complex both visually and linguistically. The linguistic and perceptual complexity of the present experiment is likely to have been more taxing in terms of WM skills and hence the reason for our positive finding. From a computational point of view, Hendriks (Reference Hendriks2016) has recently made the case for the crucial role of WM in tracking referents and in the choice of referring expressions.
Language proficiency and WM interacted in analysis 1 and 2 to predict the use of a repeated definite NP (analysis 1), and of the informativeness of referring expressions (analysis 2), but in analysis 3 there was no contribution of either WM or language proficiency. Children with a better mastery of definiteness distinctions in English (as indexed by the DELV Articles sub-test) were more likely to use a maximally informative referring expression. Higher proficiency was also likely to reflect children's ability to parse the preamble sentence and, although we did not have an independent measure of vocabulary, there is reason to expect that they were also more likely to have larger vocabularies that would include the referential labels used in the experiment (e.g., fireman, astronaut) or semantically related alternatives (e.g., the prince instead of the king). In analysis 3, proficiency did not appear to make a significant contribution, suggesting that it does not affect general discourse integration abilities in the age group studied here. More interestingly, when language proficiency was controlled for across the bilingual and the monolingual groups, the bilingual disadvantage disappeared. Once bilingual children functioned within the monolingual range they were just as adept as their monolingual counterparts in this complex referential communication task.
In addressing the first two aims of our study we can conclude that cognitive control and WM positively correlate with the ability to use informative referential expressions in a task that taps into the use of anaphoric devices. In particular conflict monitoring interacted with the presence of a discourse competitor, the more demanding of the two experimental manipulations. The effect of bilingualism on referential abilities (as indexed in this task) is complex. On the one hand, it conferred a disadvantage: children with reduced experience in English generally used less informative labels for the target referent, but they were no different from monolinguals once they were operating above the monolingual mean in terms of proficiency.
Building a situation model: the impact of competitors (from discourse or visual modalities)
At least two studies have previously used a referential communication task with children and measures of cognitive control skills and WM to explore the role of individual differences in perspective-taking and referential choice (Nilsen & Graham, Reference Nilsen and Graham2009; Wardlow & Heyman, Reference Wardlow and Heyman2016). Neither of these studies however assessed the extent to which children can use anaphoric referential expressions in a sentential context; instead participants were simply required to use a colour or size adjective to disambiguate a referent for the benefit of a naïve listener. Our task was considerably more demanding. In addition to manipulating the linguistic mention and the visual presence of a competitor, our task also required children to parse a sentence containing an antecedent (e.g., The astronaut) that was embedded as the possessor in a genitive ’s-phrase (e.g., The astronaut's bike has been found (by a boy). And now… THE ASTRONAUT is cycling) and hence was not the syntactic subject of the sentence. The intended effect of not using a subject antecedent was to reduce the accessibility of the referent in the discourse. The reduced linguistic saliency of the target referent was also meant to increase the likelihood that the visual competitor – when present – would become part of the situation model. This expectation was based on studies on adults, who have been shown to take visual information into account (Fukumura et al., Reference Fukumura, van Gompel and Pickering2010), but only when the visual competitor is sufficiently salient (Arnold & Griffin, Reference Arnold and Griffin2007). Finally, none of the previous studies addressed the role of bilingual language experience in referential communication.
A number of studies have investigated children's sensitivity to the discourse status of the referent and its visual availability to the addressee (Campbell et al., Reference Campbell, Brooks and Tomasello2000; Demir, So, Özyürek & Goldin-Meadow, Reference Demir, So, Özyürek and Goldin-Meadow2012; Graf, Theakston, Lieven & Tomasello, Reference Graf, Theakston, Lieven and Tomasello2015; Matthews et al., Reference Matthews, Lieven, Theakston and Tomasello2006; Serratrice, Reference Serratrice2008, Reference Serratrice2013). By crossing linguistic mention and visual presence of a competitor in this study's design, we have been able to assess the relative and joint contribution of both factors to the speaker's discourse model.
Across our three analyses the repeated finding is that children were strongly influenced by the presence of a visual competitor, but much less by that of a discourse competitor. When looking at a scene with only one visually available referent, children were less likely to use a full NP than when two referents were visually present. In contrast, whether a discourse competitor had been mentioned in the preamble (or not) significantly affected NP use, only in children with higher cognitive control skills. The lack of a significant interaction between the two experimental conditions in the first analysis shows that the mention of a discourse competitor did not increase the likelihood of NP use significantly above and beyond what was driven by the visual presence of a referent alone. This result differs from the findings for adult speakers by Fukumura et al. (Reference Fukumura, van Gompel and Pickering2010) where both the visual presence and discourse mention of a competitor significantly affected the use of NPs, and where a trend towards an interaction suggested that the effect of linguistic mention and visual context were not independent. Children at the ages tested here appear to be much more sensitive to the visual modality than the discourse modality (De Cat, Reference De Cat, Serratrice and Allen2015). Taking the latter into account appears to have demanded a greater cognitive effort, as indicated by the significant interaction with the Simon score.
An additional factor, explaining the challenge of discourse mention in these children, is the complexity of the preamble sentence: as discourse competitors were introduced in the by-phrase of a passive construction. The minimal assumption underlying the creation of a discourse model is that the linguistic input must be parsed and meaningfully understood, i.e., syntactic and thematic roles must be assigned as relevant. An agent appearing in a by-phrase is not as salient as an agent appearing in subject position (usually corresponding to the topic in English), or a patient appearing in object position (usually in focus) in a canonical active sentence. It is therefore possible that the syntactic position in which the competitor appeared decreased its salience so much that it became unlikely to interfere in any meaningful way with the saliency of the target referent. We know that English-speaking children have some difficulties with full passives into the early school years; truncated adjectival passives are comprehended and produced earlier than full actional passives including an agent in the by-phrase (Maratsos, Fox, Becker & Chalkley, Reference Maratsos, Fox, Becker and Chalkley1985) and syntactic priming of full passives does not have long-lasting effects a week after training in 5-year-olds (Kidd, Reference Kidd2012). It may be that the NP in the by-phrase was not fully parsed in our task, or only superficially so in some form of shallow processing, further reducing the likelihood that it could be incorporated into the discourse model and lead to referential competition with the target. However, we did not find an interaction between proficiency and discourse competitor – which would be expected if our parsing hypothesis were along the right lines.
The finding that only the children who had better cognitive control skills produced more NPs when a competitor was mentioned speaks to the role of conflict monitoring skills in referential production. It also adds to the results of corpus studies, which have shown that even pre-school children use a more informative referential expression and/or omit fewer arguments when a referent has more than one potential antecedent (Allen, Reference Allen2000; Clancy, Reference Clancy1992; Serratrice, Reference Serratrice2005). The artificiality of our experimental task and the associated cognitive demands made it harder for children to be able to demonstrate these skills.
In contrast, and similarly to what has been found for adults, the salient visual presence of a competitor, whether it was linguistically mentioned or not, did affect children's use of NPs. This is evidence that, even in the absence of linguistic mention, a referent can become part of the discourse model for children as it does for adults. However, the lack of an interaction between visual and discourse information, in the children's case, is likely to be due to the primacy for visual information (De Cat, Reference De Cat, Serratrice and Allen2015).
Conclusion
The findings of this study point to a significant role of cognitive control, verbal WM capacity and language proficiency in accounting for individual differences in the choice of anaphoric referential expressions in both bilingual and monolingual children. They also shed some light on the complex interaction between cognitive control, language experience, and language proficiency. Given the heterogeneity of our sample we are at present not in a position to say what other factors that are integral to the bilingual language experience can further modulate this interaction. We deliberately chose a heterogeneous but representative sample of bilingual children in the kind of multilingual classroom that is nowadays common in many English-speaking countries. The downside of this approach is that we could not isolate and control for specific variables such as language distance, immigration status, different types of interactional contexts. Future research should address these factors more systematically to further refine our understanding of how the language experience shapes both the cognitive and linguistic dimensions of bilingual speakers.
Author ORCIDs
Ludovica Serratrice, 0000-0001-5141-6186
Acknowledgements
This research was funded by a grant from the Leverhulme Trust (RPG-2012-633), which is gratefully acknowledged. Special thanks to Sanne Berends for leading on data collection and coding, and to Furzana Shah for assistance with the data collection. Many thanks to Arief Gusnanto for statistical consultancy, and to the many schools who opened their doors to our project, to the participating children for their enthusiasm and their efforts, and to their parents for filling in lengthy questionnaires.
Appendix
The by phrase in parentheses was included as part of the experimental sentences when the item was presented in the competitor mentioned condition.
List of experimental and filler items
![](https://static.cambridge.org/binary/version/id/urn:cambridge.org:id:binary:20210119073730145-0852:S1366728918000962:S1366728918000962_tabU1.png?pub-status=live)