Schilbach et al.'s stimulating target article, proposing the centrality of real-time encounters to social cognition, tallies with work from the field of linguistics that exposes the intricacy of spoken interaction and assigns it a dominant role in shaping both social cognition and overall language structure (Enfield & Levinson Reference Enfield and Levinson2006). In this commentary, I indicate some grammatical features of particular interest for social cognition in the hope of developing more subtle overall hypotheses about interactive social cognition as played out in verbal interaction.
Any comprehensive theory of social cognition, in neuroscience as in linguistics, obviously needs to draw on third-, first-, and second-person elements. With regard to the third-person, the episodic side of this concerns how languages characterize events: grammatical categories encoding who benefits from the event, how obligations are created, whether actions are volitional, and whether agents achieve their goals. Additionally, many languages grammatically encode more durable information about social relationships: kinship relations (Evans Reference Evans2003) or different types of possession and group affiliations relevant to social reasoning.
With regard to first-person accounts, the empathetic representation of others' experiences, beliefs, and intentions – staple theory of mind – is enabled by many grammatical devices, including complement-taking attitude operators (John believes that …), and constructions which represent beliefs and intentions as (fictitiously) quoted speech: “he plans/wants to go the river” becomes “he ‘I will go to the river’ saying-does” in many Papuan languages (Reesink Reference Reesink1993). Of particular interest here, given that the representation of others' psychological states is mediated and hypothetical, so hence never truly “first person,” are the many languages which have evolved means of representing the mental worlds of others in a way that simultaneously depicts them through two person-perspectives at once (Evans Reference Evans, Cover and Kim2006), such as the “logophoric pronouns” in many West African languages, which present a third person's first-person perspective.
Though the grammar-derived metaphor of person categories employed by Schilbach et al. is useful and should generate fertile new research angles, as we pass to the second person we note one caveat where the European-grammar-derived categories might lead us astray. For many of the phenomena in Schilbach et al.'s article, we are really dealing here with the interaction between two participants – of a first and a second person – rather than simply of a second person per se, except in the special case covered in section 3.1.1 under the rubric of “Being addressed as you.”Footnote
1
Now many non-Indo-European languages have four person categories, not three, adding a “first-person inclusive,” to denote the union of speaker and addressee. A more precise projection of grammatical metaphors would help draw our attention to possible differences between truly second-person and first-person inclusive phenomena in speaker-addressee interaction.
Languages are abundantly sensitive to the complexities of interpersonal interaction, which requires the simultaneous conjuring of: (a) alternating roles of two people as speaker and addressee, whereby “I becomes you in the address of the one who in turn designates himself as I” (Benveniste Reference Benveniste and Meek1971, pp. 224–25); (b) footing between participants, such as intimacy or formality conditioning the choice between pronouns like du or Sie; (c) management of mutual attention; and (d) the dynamics that follow from asymmetries in who knows what. I focus below on just (c) and (d).
Regarding (c), a phenomenon only beginning to come to the attention of linguists, is the category of engagement, which encodes speaker's assessments of how far the hearer's attention is currently locked in with their own. This can apply to either events or entities. In the Colombian language Andoke (Landaburu Reference Landaburu, Guéntcheva and Landaburu2007) the choice of grammatical auxiliary encodes whether or not the speaker judges that the addressee is attending to the event being described. In Turkish (Özyürek & Kita, n.d., unpublished manuscript) there is a three-term demonstrative set: bu versus o, like “this” versus “that,” encode close versus far from speaker in situations where joint attention is already established, but there is an extra term şu reserved for situations where joint attention is still being established.
Regarding (d), knowledge asymmetries between speaker and hearer – Heritage's (Reference Heritage2012) “epistemic gradient” – are a potent driver of interactive cognitive coordination, realized most centrally through what Karcevski (Reference Karcevski and Godel1941) called “ignorative-deictic” systems. This typically pairs a question word (where? when?) with a deictic response (there! then!) to adjust knowledge representations during interaction. As this English example demonstrates, many languages exhibit tight formal resemblances between the “ignorative” (≈ interrogative) and “deictic” forms. In English, the pairings are limited (there is no rhyming deictic counterpart of who or which). But there are other languages, such as Japanese and Tamil, where perfect formal proportions run through extensive systems organized around different epistemic domains (including many – like “in which manner,” “which side” – that are not obviously lexicalized in English). Japanese is particularly informative here, because its deictic series regularly opposes three values: near the speaker (k-initial, e.g., kore “this one”), near the addressee (s-initial: sore “that one near you”) and near neither (a-initial: are “that one [near neither of us]). We do not yet know if this shapes different attentional strategies in English and Japanese demonstrative use.
Some epistemic asymmetries reflect the difference between what is subjectively knowable (e.g., “feel lonely”) and what can be known by observation (e.g., give outward signs of feeling lonely). Many languages, for example, Japanese, employ different grammatical constructions for these two types. Interestingly, as interactants pass from statement (“I am lonely”) to question (“Are you lonely?”), the locus of “subjective authority” is passed to the addressee, sanctioning the use of the basic “private predicate” form in the second person (and now no longer applicable to the first).
Ultimately we must seek a model of social cognition that is equally informed by neuroscience and by linguistics. Studies of diverse grammatical systems and how they are used have the advantage of drawing on the variety of cognitively congenial systems evolved by different communities through time and – by hypothesis – potentially reconfiguring the brains of different language-speakers in subtly varying ways, which should form the subject matter for a second-generation of second-person neuroscience that includes interaction with language structure, as well as interaction with addressees.
Schilbach et al.'s stimulating target article, proposing the centrality of real-time encounters to social cognition, tallies with work from the field of linguistics that exposes the intricacy of spoken interaction and assigns it a dominant role in shaping both social cognition and overall language structure (Enfield & Levinson Reference Enfield and Levinson2006). In this commentary, I indicate some grammatical features of particular interest for social cognition in the hope of developing more subtle overall hypotheses about interactive social cognition as played out in verbal interaction.
Any comprehensive theory of social cognition, in neuroscience as in linguistics, obviously needs to draw on third-, first-, and second-person elements. With regard to the third-person, the episodic side of this concerns how languages characterize events: grammatical categories encoding who benefits from the event, how obligations are created, whether actions are volitional, and whether agents achieve their goals. Additionally, many languages grammatically encode more durable information about social relationships: kinship relations (Evans Reference Evans2003) or different types of possession and group affiliations relevant to social reasoning.
With regard to first-person accounts, the empathetic representation of others' experiences, beliefs, and intentions – staple theory of mind – is enabled by many grammatical devices, including complement-taking attitude operators (John believes that …), and constructions which represent beliefs and intentions as (fictitiously) quoted speech: “he plans/wants to go the river” becomes “he ‘I will go to the river’ saying-does” in many Papuan languages (Reesink Reference Reesink1993). Of particular interest here, given that the representation of others' psychological states is mediated and hypothetical, so hence never truly “first person,” are the many languages which have evolved means of representing the mental worlds of others in a way that simultaneously depicts them through two person-perspectives at once (Evans Reference Evans, Cover and Kim2006), such as the “logophoric pronouns” in many West African languages, which present a third person's first-person perspective.
Though the grammar-derived metaphor of person categories employed by Schilbach et al. is useful and should generate fertile new research angles, as we pass to the second person we note one caveat where the European-grammar-derived categories might lead us astray. For many of the phenomena in Schilbach et al.'s article, we are really dealing here with the interaction between two participants – of a first and a second person – rather than simply of a second person per se, except in the special case covered in section 3.1.1 under the rubric of “Being addressed as you.”Footnote 1 Now many non-Indo-European languages have four person categories, not three, adding a “first-person inclusive,” to denote the union of speaker and addressee. A more precise projection of grammatical metaphors would help draw our attention to possible differences between truly second-person and first-person inclusive phenomena in speaker-addressee interaction.
Languages are abundantly sensitive to the complexities of interpersonal interaction, which requires the simultaneous conjuring of: (a) alternating roles of two people as speaker and addressee, whereby “I becomes you in the address of the one who in turn designates himself as I” (Benveniste Reference Benveniste and Meek1971, pp. 224–25); (b) footing between participants, such as intimacy or formality conditioning the choice between pronouns like du or Sie; (c) management of mutual attention; and (d) the dynamics that follow from asymmetries in who knows what. I focus below on just (c) and (d).
Regarding (c), a phenomenon only beginning to come to the attention of linguists, is the category of engagement, which encodes speaker's assessments of how far the hearer's attention is currently locked in with their own. This can apply to either events or entities. In the Colombian language Andoke (Landaburu Reference Landaburu, Guéntcheva and Landaburu2007) the choice of grammatical auxiliary encodes whether or not the speaker judges that the addressee is attending to the event being described. In Turkish (Özyürek & Kita, n.d., unpublished manuscript) there is a three-term demonstrative set: bu versus o, like “this” versus “that,” encode close versus far from speaker in situations where joint attention is already established, but there is an extra term şu reserved for situations where joint attention is still being established.
Regarding (d), knowledge asymmetries between speaker and hearer – Heritage's (Reference Heritage2012) “epistemic gradient” – are a potent driver of interactive cognitive coordination, realized most centrally through what Karcevski (Reference Karcevski and Godel1941) called “ignorative-deictic” systems. This typically pairs a question word (where? when?) with a deictic response (there! then!) to adjust knowledge representations during interaction. As this English example demonstrates, many languages exhibit tight formal resemblances between the “ignorative” (≈ interrogative) and “deictic” forms. In English, the pairings are limited (there is no rhyming deictic counterpart of who or which). But there are other languages, such as Japanese and Tamil, where perfect formal proportions run through extensive systems organized around different epistemic domains (including many – like “in which manner,” “which side” – that are not obviously lexicalized in English). Japanese is particularly informative here, because its deictic series regularly opposes three values: near the speaker (k-initial, e.g., kore “this one”), near the addressee (s-initial: sore “that one near you”) and near neither (a-initial: are “that one [near neither of us]). We do not yet know if this shapes different attentional strategies in English and Japanese demonstrative use.
Some epistemic asymmetries reflect the difference between what is subjectively knowable (e.g., “feel lonely”) and what can be known by observation (e.g., give outward signs of feeling lonely). Many languages, for example, Japanese, employ different grammatical constructions for these two types. Interestingly, as interactants pass from statement (“I am lonely”) to question (“Are you lonely?”), the locus of “subjective authority” is passed to the addressee, sanctioning the use of the basic “private predicate” form in the second person (and now no longer applicable to the first).
Ultimately we must seek a model of social cognition that is equally informed by neuroscience and by linguistics. Studies of diverse grammatical systems and how they are used have the advantage of drawing on the variety of cognitively congenial systems evolved by different communities through time and – by hypothesis – potentially reconfiguring the brains of different language-speakers in subtly varying ways, which should form the subject matter for a second-generation of second-person neuroscience that includes interaction with language structure, as well as interaction with addressees.