We enthusiastically applaud the call for a second-person neuroscience as described in the target article by Schilbach et al., and are excited for the new insights this avenue of research will bring. In this commentary, we expand on the target article in two ways. First, we suggest increased emphasis on characterizing the differences in neural processing during an interaction as compared to observation. Second, we elaborate on the potential importance of this research to our understanding of autism.
Schilbach et al. argue that engagement in interpersonal interaction fundamentally changes cognitive and neural processing as compared to such processing during observation alone. For example, interpersonal interaction may recruit additional neural regions or systems that are not present during third-person observation. The authors present compelling preliminary pieces of evidence to support their theory, as well as many suggestions for future directions. Here we discuss several notable differences between observing a person and engaging with a person, which can make isolating the interaction component difficult. First, engagement with another involves a contingency (or back and forth) between participants rather than passive perception and as such includes an element of action. Second, and related to the first, when a response is required (as is common in an interaction as compared to observation) attentional demands may be higher. Third, the stimuli used to elicit the feeling of being in an interaction have different low-level characteristics than those that signal no interaction. A final possibility is that there is something special about being engaged with another that goes beyond the simple differences described above.
Controlling for these differences is important, and although Schilbach et al. devote attention to this problem, they do not address whether stimulus characteristics secondary to the interaction could drive differences in neural processing. For example, in one study participants are presented with a face either facing towards or away from them that makes either communicative or arbitrary facial movements (Schilbach et al. Reference Schilbach, Wohlschlaeger, Kraemer, Newen, Shah, Fink and Vogeley2006). Ventral medial prefrontal cortex (vMPFC) and amygdala regions are recruited to a greater extent for the communicative facial expressions directed towards the participant but these regions may be sensitive to direct gaze and facial movement independent of social engagement. While these expressions are typically encountered in the context of an interaction, they are also seen in movies, TV, and pictures when the viewer is (presumably) detached. Although interaction is often a more ecologically valid social situation, it is an open question of how, once other factors are controlled for, this interaction fundamentally changes the neural correlates of social processing.
We have begun to address this question (Redcay et al. Reference Redcay, Dodell-Feder, Pearrow, Mavros, Kleiner, Gabrieli and Saxe2010) by borrowing a method from developmental psychology (e.g., Kuhl et al. Reference Kuhl, Tsao and Liu2003; Murray & Trevarthen Reference Murray, Trevarthen, Field and Fox1985) in which participants are engaged in a simple, highly scripted interaction that is either conducted via live video feed (“face-to-face”) or via video recording. The recorded conditions included one in which the same video from the live interaction was repeated and one in which a video of the experimenter from a different interaction was played. Crucially participants were told to continue to play along in the recorded condition even though the experimenter would not be able to see or hear them. These controls allowed for an examination of brain regions that were recruited during an interaction that could not be accounted for by differences in stimulus properties. Comparison of live and recorded conditions revealed that largely the same set of brain regions were engaged in both conditions. For example, robust recruitment was seen in the posterior superior temporal sulcus (STS) during both live and recorded conditions, which is not surprising given the STS's role in human action perception (e.g., Pelphrey et al. Reference Pelphrey, Morris and McCarthy2004; Saxe et al. Reference Saxe, Xiao, Kovacs, Perrett and Kanwisher2004). Interestingly, the live condition showed increased activation of the posterior STS, and this extended more posteriorly into the temporo-parietal junction (TPJ), a region associated with theory of mind processing. Thus, this study offers support for differences in the magnitude of activation in brain systems between live, contingent interaction, and non-contingent interaction when stimulus characteristics are constant and some support for fundamental differences in the brain systems recruited. Future studies which control for action, attention, and stimulus characteristics (in addition to those proposed by Schilbach et al.) will be critical to disentangle where the “bookends” (sect. 4) begin and end; in other words, what are the differential effects of second- versus third-person approaches to social cognition on neural patterns of activation?
Characterizing these “bookends” is especially important for understanding autism, a developmental disorder characterized by impairments in social interaction, particularly in the intentional coordination of attention with others, or joint attention (e.g., Charman Reference Charman2003; Mundy & Newell Reference Mundy and Newell2007). However, offline laboratory-based tasks often fail to find deficits in joint attention behaviors (Nation & Penny Reference Nation and Penny2008; Redcay et al. Reference Redcay, Dodell-Feder, Mavros, Kleiner, Pearrow, Triantafyllou, Gabrieli and Saxe2012). Similarly, tasks tapping into belief representations demonstrate fairly typical performance (e.g., Senju et al. Reference Senju, Southgate, White and Frith2009) and even typical neural patterns of activation (e.g., Dufour et al. Reference Dufour, Redcay, Young, Mavros, Moran, Triantafyllou, Saxe, Miyake, Peebles and Cooper2012). One possibility is that these third-person studies may be failing to capture the challenges of a real-time social interaction for a person with autism. A recent study of ours (Redcay et al., in press) compared patterns of activation during a real-time joint attention game between high-functioning adults with autism and typical adults. Whereas typical adults demonstrated selective recruitment of the left posterior STS and dorsal medial prefrontal cortex (dMPFC) during joint, as compared to solo, attention, the participants with autism revealed a pattern of reduced selectivity due to both hypoactivity during the joint conditions and hyperactivity in the solo condition. These data suggest a failure to modulate these brain regions according to whether the task required a social interaction. Importantly, the differential effects of second- versus third-person interaction might vary between typical and atypical populations, or change throughout development. This presents a major challenge to our understanding of the neurobiology of social processing in autism, but we are optimistic that a continued second-person neuroscience approach will reveal the mechanisms underlying real-world social difficulties in autism.
We enthusiastically applaud the call for a second-person neuroscience as described in the target article by Schilbach et al., and are excited for the new insights this avenue of research will bring. In this commentary, we expand on the target article in two ways. First, we suggest increased emphasis on characterizing the differences in neural processing during an interaction as compared to observation. Second, we elaborate on the potential importance of this research to our understanding of autism.
Schilbach et al. argue that engagement in interpersonal interaction fundamentally changes cognitive and neural processing as compared to such processing during observation alone. For example, interpersonal interaction may recruit additional neural regions or systems that are not present during third-person observation. The authors present compelling preliminary pieces of evidence to support their theory, as well as many suggestions for future directions. Here we discuss several notable differences between observing a person and engaging with a person, which can make isolating the interaction component difficult. First, engagement with another involves a contingency (or back and forth) between participants rather than passive perception and as such includes an element of action. Second, and related to the first, when a response is required (as is common in an interaction as compared to observation) attentional demands may be higher. Third, the stimuli used to elicit the feeling of being in an interaction have different low-level characteristics than those that signal no interaction. A final possibility is that there is something special about being engaged with another that goes beyond the simple differences described above.
Controlling for these differences is important, and although Schilbach et al. devote attention to this problem, they do not address whether stimulus characteristics secondary to the interaction could drive differences in neural processing. For example, in one study participants are presented with a face either facing towards or away from them that makes either communicative or arbitrary facial movements (Schilbach et al. Reference Schilbach, Wohlschlaeger, Kraemer, Newen, Shah, Fink and Vogeley2006). Ventral medial prefrontal cortex (vMPFC) and amygdala regions are recruited to a greater extent for the communicative facial expressions directed towards the participant but these regions may be sensitive to direct gaze and facial movement independent of social engagement. While these expressions are typically encountered in the context of an interaction, they are also seen in movies, TV, and pictures when the viewer is (presumably) detached. Although interaction is often a more ecologically valid social situation, it is an open question of how, once other factors are controlled for, this interaction fundamentally changes the neural correlates of social processing.
We have begun to address this question (Redcay et al. Reference Redcay, Dodell-Feder, Pearrow, Mavros, Kleiner, Gabrieli and Saxe2010) by borrowing a method from developmental psychology (e.g., Kuhl et al. Reference Kuhl, Tsao and Liu2003; Murray & Trevarthen Reference Murray, Trevarthen, Field and Fox1985) in which participants are engaged in a simple, highly scripted interaction that is either conducted via live video feed (“face-to-face”) or via video recording. The recorded conditions included one in which the same video from the live interaction was repeated and one in which a video of the experimenter from a different interaction was played. Crucially participants were told to continue to play along in the recorded condition even though the experimenter would not be able to see or hear them. These controls allowed for an examination of brain regions that were recruited during an interaction that could not be accounted for by differences in stimulus properties. Comparison of live and recorded conditions revealed that largely the same set of brain regions were engaged in both conditions. For example, robust recruitment was seen in the posterior superior temporal sulcus (STS) during both live and recorded conditions, which is not surprising given the STS's role in human action perception (e.g., Pelphrey et al. Reference Pelphrey, Morris and McCarthy2004; Saxe et al. Reference Saxe, Xiao, Kovacs, Perrett and Kanwisher2004). Interestingly, the live condition showed increased activation of the posterior STS, and this extended more posteriorly into the temporo-parietal junction (TPJ), a region associated with theory of mind processing. Thus, this study offers support for differences in the magnitude of activation in brain systems between live, contingent interaction, and non-contingent interaction when stimulus characteristics are constant and some support for fundamental differences in the brain systems recruited. Future studies which control for action, attention, and stimulus characteristics (in addition to those proposed by Schilbach et al.) will be critical to disentangle where the “bookends” (sect. 4) begin and end; in other words, what are the differential effects of second- versus third-person approaches to social cognition on neural patterns of activation?
Characterizing these “bookends” is especially important for understanding autism, a developmental disorder characterized by impairments in social interaction, particularly in the intentional coordination of attention with others, or joint attention (e.g., Charman Reference Charman2003; Mundy & Newell Reference Mundy and Newell2007). However, offline laboratory-based tasks often fail to find deficits in joint attention behaviors (Nation & Penny Reference Nation and Penny2008; Redcay et al. Reference Redcay, Dodell-Feder, Mavros, Kleiner, Pearrow, Triantafyllou, Gabrieli and Saxe2012). Similarly, tasks tapping into belief representations demonstrate fairly typical performance (e.g., Senju et al. Reference Senju, Southgate, White and Frith2009) and even typical neural patterns of activation (e.g., Dufour et al. Reference Dufour, Redcay, Young, Mavros, Moran, Triantafyllou, Saxe, Miyake, Peebles and Cooper2012). One possibility is that these third-person studies may be failing to capture the challenges of a real-time social interaction for a person with autism. A recent study of ours (Redcay et al., in press) compared patterns of activation during a real-time joint attention game between high-functioning adults with autism and typical adults. Whereas typical adults demonstrated selective recruitment of the left posterior STS and dorsal medial prefrontal cortex (dMPFC) during joint, as compared to solo, attention, the participants with autism revealed a pattern of reduced selectivity due to both hypoactivity during the joint conditions and hyperactivity in the solo condition. These data suggest a failure to modulate these brain regions according to whether the task required a social interaction. Importantly, the differential effects of second- versus third-person interaction might vary between typical and atypical populations, or change throughout development. This presents a major challenge to our understanding of the neurobiology of social processing in autism, but we are optimistic that a continued second-person neuroscience approach will reveal the mechanisms underlying real-world social difficulties in autism.