Introduction
Schizophrenia is a psychiatric disorder associated with several abnormalities at the cognitive, behavioral and brain levels (Shenton et al. Reference Shenton, Dickey, Frumin and McCarley2001; Wible et al. Reference Wible, Preus and Hashimoto2009). Recently, social cognition impairments have been reported in schizophrenia, including deficits in emotional prosody perception as observed in both behavioral and functional magnetic resonance imaging investigations (Shaw et al. Reference Shaw, Dong, Lim, Faustman, Pouget and Alpert1999; Edwards et al. Reference Edwards, Pattison, Jackson and Wales2001; Kucharska-Pietura et al. Reference Kucharska-Pietura, David, Masiak and Phillips2005; Bozikas et al. Reference Bozikas, Kosmidis, Anezoulaki, Giannakou, Andreou and Karavatos2006; Hoekert et al. Reference Hoekert, Kahn, Pijnenborg and Aleman2007; Shea et al. Reference Shea, Sergejew, Burnham, Jones, Rossell, Copolov and Egan2007; Leitman et al. Reference Leitman, Wolf, Laukka, Ragland, Valdez, Turetsky, Gur and Gur2011). An association between inefficient prosody processing in schizophrenia and clinical symptomatology that included both positive (Rossell & Boundy, Reference Rossell and Boundy2005; Shea et al. Reference Shea, Sergejew, Burnham, Jones, Rossell, Copolov and Egan2007) and negative symptoms (e.g. Leitman et al. Reference Leitman, Foxe, Butler, Saperstein, Revheim and Javitt2005) was noted in several studies, while other studies did not find this association (e.g. Kucharska-Pietura et al. Reference Kucharska-Pietura, David, Masiak and Phillips2005). In addition, recent studies have highlighted the contributions of sensory auditory abnormalities (e.g. pitch perception) to prosody dysfunction in schizophrenia (e.g. Leitman et al. Reference Leitman, Laukka, Juslin, Saccente, Butler and Javitt2010).
Emotional prosody represents the non-verbal vocal expression of emotion and its perception is a multi-stage process (Schirmer & Kotz, Reference Schirmer and Kotz2006; Wildgruber et al. Reference Wildgruber, Ackermann, Kreifelts and Ethofer2006; Paulmann & Kotz, Reference Paulmann and Kotz2008a, Reference Paulmann and Kotzb; Paulmann et al. Reference Paulmann, Seifert and Kotz2009). According to the three-stage model of prosody processing developed by Schirmer & Kotz (Reference Schirmer and Kotz2006), the first stage (around 100 ms) is related to the sensory processing of the acoustic signal. It is followed by the detection of emotionally salient acoustic cues (around 200 ms), and by the cognitive evaluation of the emotional significance of the vocal information (after 300 ms). The first stage of emotional prosody processing is mediated by the bilateral secondary auditory cortex (e.g. Rauschecker, 1998; Hart et al. Reference Hart, Hall and Palmer2003), and is indexed by the N100 component (Paulmann & Kotz, Reference Paulmann and Kotz2008b; Paulmann et al. Reference Paulmann, Seifert and Kotz2009). In fact, a recent study demonstrated that the sensitivity to emotional salience can occur as early as 100 ms and is indexed by N100 amplitude modulated by the emotional valence of non-verbal vocalizations (Liu et al. Reference Liu, Pinheiro, Guanghui, Nestor, McCarley and Niznikiewicz2012).
The second stage recruits temporal areas, including the superior temporal gyrus and the anterior temporal sulcus (Kotz et al. Reference Kotz, Meyer, Alter, Besson, von Cramon and Friederici2003; Mitchell et al. Reference Mitchell, Elliott, Barry, Cruttenden and Woodruff2003; Grandjean et al. Reference Grandjean, Sander, Pourtois, Schwartz, Seghier, Scherer and Vuilleumier2005). Electrophysiologically, the P200 component was found to index the detection of emotional salience from speech stimuli (Paulmann & Kotz, 2008; Paulmann et al. Reference Paulmann, Seifert and Kotz2009). Finally, the last stage recruits frontal areas, including the inferior frontal gyrus and the orbito-frontal cortex (Buchanan et al. Reference Buchanan, Lutz, Mirzazade, Specht, Shah, Zilles and Jancke2000; Gandour et al. Reference Gandour, Wong, Dzemidzic, Lowe, Tong and Li2003; Wildgruber et al. Reference Wildgruber, Riecker, Hertrich, Erb, Grodd, Ethofer and Ackermann2005). While the third stage cannot be probed with event-related potential (ERP) methodology, behavioral data can shed some light on these integrative processes (Paulmann & Kotz, Reference Paulmann and Kotz2008b; Paulmann et al. Reference Paulmann, Seifert and Kotz2009). Importantly, these stages are reciprocally connected, in the sense that sensory stages make an impact on higher-order processes, and top-down mechanisms (e.g. attention) may modulate sensory processes, as demonstrated in both healthy subjects and schizophrenia patients (e.g. Ethofer et al. Reference Ethofer, Anders, Erb, Herbert, Wiethoff, Kissler, Grodd and Wildgruber2006; Leitman et al. Reference Leitman, Laukka, Juslin, Saccente, Butler and Javitt2010, Reference Leitman, Wolf, Laukka, Ragland, Valdez, Turetsky, Gur and Gur2011).
At perceptual and physical levels, emotional prosody is instantiated by intensity, pitch (fundamental frequency; f 0), speech rhythm (duration of syllables and pauses) and voice quality/timbre (Schirmer & Kotz, Reference Schirmer and Kotz2006; Wildgruber et al. Reference Wildgruber, Ackermann, Kreifelts and Ethofer2006). Each emotion seems to have a particular acoustic profile (Banse & Scherer, Reference Banse and Scherer1996). There are only two published ERP studies of processing prosody using a ‘naturalistic’ design where sentences are delivered with either neutral or emotional intonation without introducing discrepancy between sentence fragments, or between message and the tone with which it was delivered (Paulmann & Kotz, Reference Paulmann and Kotz2008b; Paulmann et al. Reference Paulmann, Seifert and Kotz2009). In both studies the ERP prosody effects were found within the P200 latency window, with no late latency components found sensitive to prosody changes.
In spite of consistently reported deficits in emotional prosody discrimination in schizophrenia, no ERP studies of prosody processing have been conducted in schizophrenia. In the present study we used a ‘naturalistic’ paradigm, similar to the studies by Paulmann & Kotz (2008) and Paulmann et al. (Reference Paulmann, Seifert and Kotz2009), to investigate the temporal course of emotional prosody processing in schizophrenia. In line with previous studies in non-clinical subjects, we expected the prosody effects to be indexed by N100 and P200. Given the functional significance of these two components, we expected to probe the first two stages of prosody processing. Behavioral data were expected to provide an indirect probe of the third stage of prosody processing.
We used both sentences with semantic content (SSC) and ‘pure prosody’ sentences (PPS) where the semantic content was ‘unintelligible’. Prosodic information that is carried in a speech signal by the dynamic combination of different acoustic parameters normally co-exists with semantic information. However, it is not fully understood how lexical and supra-segmental features of the speech signal may interact to convey emotion and how such processes differ from the processes of extracting emotional information from a speech signal that carries supra-segmental information alone, in healthy controls and in schizophrenia.
Our primary hypothesis was that schizophrenia patients would show deficits at all three stages of prosody processing. Our secondary hypothesis was that the first two stages of prosody processing indexed by N100 and P200 would be more impaired in the patient group in the SSC. Based on consistent reports of disrupted sensory processing of prosodic acoustic cues in schizophrenia (Leitman et al. Reference Leitman, Foxe, Butler, Saperstein, Revheim and Javitt2005, Reference Leitman, Wolf, Laukka, Ragland, Valdez, Turetsky, Gur and Gur2011) and of abnormalities in processing language-specific cues (e.g. Niznikiewicz et al. Reference Niznikiewicz, Mittal, Nestor and McCarley2010), we expected that prosodic abnormalities would be more pronounced for more complex speech stimuli containing lexical–semantic cues than for stimuli relying on purely prosodic cues.
Specifically, given previous reports of difficulties in prosody discrimination in schizophrenia that have been related to sensory abnormalities (e.g. pitch perception), we hypothesized a lack of ERP differentiation between different prosody types, as indexed by similar P200 amplitude to the three types of emotional prosody (Paulmann & Kotz, Reference Paulmann and Kotz2008b). Additionally, we expected that difficulties in evaluating the emotional significance of speech stimuli would be reflected in reduced accuracy in schizophrenia when compared with healthy control individuals.
Finally, given previous studies suggesting an association between deficits in emotional prosody discrimination and positive symptomatology in general (Poole et al. Reference Poole, Tobias and Vinogradov2000), as well as with auditory verbal hallucinations in particular (Rossell & Boundy, Reference Rossell and Boundy2005; Shea et al. Reference Shea, Sergejew, Burnham, Jones, Rossell, Copolov and Egan2007) or negative symptomatology (Leitman et al. Reference Leitman, Foxe, Butler, Saperstein, Revheim and Javitt2005, Reference Leitman, Laukka, Juslin, Saccente, Butler and Javitt2010), we predicted that N100 and P200 amplitude would be associated with clinical scores.
Method
Participants
A total of 15 right-handed males diagnosed with chronic schizophrenia and 15 healthy male controls matched for age, handedness and parental socio-economic status participated in the experiment (Table 1). Comparison subjects were recruited from advertisements in local newspapers.
Data are given as mean (standard deviation) or number of participants. n.a., Non-applicable; WAIS, Wechsler Adult Intelligence Scale; IQ, intelligence quotient; PANSS, Positive and Negative Syndrome Scale; SAPS, Scale for the Assessment of Positive Symptoms; SANS, Scale for the Assessment of Negative Symptoms.
a Hollingshead Two-Factor Index of Social Position (Hollingshead, Reference Hollingshead1965).
b Wechsler (Reference Wechsler1997).
c Kay et al. (Reference Kay, Fiszbein and Opler1987).
* Significant.
The inclusion criteria were: English as first language; right handedness (Oldfield, Reference Oldfield1971); no history of neurological illness; no history of Diagnostic and Statistical Manual of Mental Disorders, fourth edition, text revision (DSM-IV) diagnosis of drug or alcohol abuse (APA, 2000); verbal intelligence quotient (IQ) above 75 (Wechsler, Reference Wechsler1997); no hearing, vision or upper body impairment. For healthy controls, an additional exclusion criterion was a history of psychiatric disorder in oneself or in first-degree relatives. Patients were diagnosed (for healthy controls, screened) using the Structured Clinical Interview for DSM-IV for Axis I (First et al. Reference First, Spitzer, Gibbon and Williams2002) and Axis II (First et al. Reference First, Spitzer, Gibbon and Williams1995) disorders.
Before participation in the study, all participants had the procedures fully explained to them and read and signed an informed consent form to confirm their willingness to participate in the study (following Harvard Medical School and Veterans Affairs Boston Healthcare System guidelines).
Stimuli
Stimuli were 228 auditory sentences presented in a pseudo-randomized order: 114 sentences were spoken by a female speaker of American English with training in theatre techniques. The recordings were made in a quiet room with an Edirol R-09 recorder and a CS-15 cardioid-type stereo microphone (Eridol, USA), with a sampling rate of 22 kHz and 16-bit quantization.
All sentences had neutral semantic content, similar syntactic structure (subject + verb + object) and length (four words), and all started with a proper noun (see Appendix Fig. A1). A total of 38 sentences were spoken with happy, 38 with angry, and 38 with neutral intonation. Auditory stimuli were acoustically analysed using Praat (version 5.0.43; P. Boersma and D. Weenink, The Netherlands; http://www.fon.hum.uva.nl/praat/). Mean pitch, intensity and duration were compared across conditions (Fig. 1). There were no differences between emotional categories in mean intensity. However, significant differences across emotional categories were found for: minimum pitch (p < 0.0001), maximum pitch (p < 0.0001), mean pitch (p < 0.0001) and mean duration (p < 0.0001).
A total of 18 subjects (11 female) not participating in the ERP experiment assessed the emotional valence of the sentences. Angry sentences were rated as ‘angry’ by 94.08% (s.d. = 7.72), happy sentences were rated as ‘happy’ by 91.44% (s.d. = 7.54), and neutral sentences were rated as ‘neutral’ by 99.87% (s.d. = 0.85) of subjects. Only sentences appropriately rated by at least 90% of participants were included in the stimuli list (SSC condition). For the synthesis of sentences used in the PPS condition, we followed the procedure described in Pinheiro et al. (Reference Pinheiro, Galdo-Alvarez, Rauber, Sampaio, Niznikiewicz and Goncalves2011) (see Fig. 1). In order to ensure that the PPS condition was indeed devoid of semantic content, 10 volunteers listened to the PPS and none of them was able to identify semantic meaning in these sentences. These same volunteers (six female) rated the emotional valence of PPS. Angry PPS sentences were rated as ‘angry’ by 55% of subjects, happy PPS sentences were rated as ‘happy’ by 65% of subjects, and neutral PPS sentences were rated as ‘neutral’ by 89% of subjects.
Procedure
Each participant was seated comfortably at a distance of 100 cm from a computer monitor in a sound-attenuating chamber. The experimental session was divided into two blocks (block 1 – SSC; block 2 – PPS), each containing 114 pseudorandomized sentences (38 in each emotional prosody type). All sentences were presented binaurally through headphones at a sound level comfortable for each subject, and were not repeated during the experiment. Stimuli presentation, timing of events and recording of subjects' responses were controlled by the Superlab Pro software package (2008; http://www.superlab.com/). Before each experimental block, participants were given a brief training with feedback, to make sure that they understood the instructions and became familiarized with the task and with the response box.
Before each sentence onset, a fixation cross was presented centrally on the screen for 1000 ms and remained there during sentence presentation to minimize eye movements. At the end of the sentence, the cross was replaced by a blank screen and, 1500 ms later, a question mark appeared for 5 s (see Fig. 1 for illustration of the task structure). Participants were asked to decide if the sentence was spoken in a neutral, positive or negative intonation by pressing one of three keys (the order of the keys counterbalanced across subjects). Each response key was marked with an emoticon to minimize working memory demands. A short pause was provided after 57 sentences. No feedback was provided. The experimental session lasted 45 min.
Data acquisition and analysis
Electroencephalogram (EEG) recording procedure
The EEG was recorded with 64 electrodes mounted on a custom-made cap (Electro-cap International, USA), according to the modified expanded 10–20 system (American Electroencephalographic Society, 1991) using the Biosemi system (Active 2). The EEG was acquired in a continuous mode at a digitization rate of 512 Hz, with a bandpass of 0.01 to 100 Hz. Data were re-referenced offline to the mathematical average of the mastoids. Horizontal and vertical electro-oculograms were recorded for eye movement and blink detection and rejection, via electrodes placed on the left and right temples and one below the left eye. Electrode impedances were kept below 5 kΩ.
EEG data analysis
The EEG data were processed using Brain Analyzer software (Brain Products GmbH, Germany). EEG epochs containing eye blinks or movement artifacts exceeding ±100 μV were not included in individual ERP averages. After artifact rejection, at least 75% of trials per condition per subject entered the analyses. Separate ERPs for each condition were created for each participant. Averages were computed using a 200-ms pre-stimulus baseline, time-locked to the sentence onset and spanning the length of a sentence (1800 ms). This approach was adopted following all existing ERP studies on emotional prosody processing using non-spliced sentences (Paulmann & Kotz, Reference Paulmann and Kotz2008b; Paulmann et al. Reference Paulmann, Seifert and Kotz2009) to ensure the comparability of our results with previous studies. Additionally, as a careful examination of the f 0 (that carries the majority of the prosodic information – e.g. Scherer, Reference Scherer1995; Banse & Scherer, Reference Banse and Scherer1996) distribution illustrates, the first 300 ms of the sentence duration included all major shifts in f 0 sufficiently contributing to prosody recognition.
After the inspection of grand average waveforms and following previous ERP studies of emotional prosody processing (Paulmann & Kotz, Reference Paulmann and Kotz2008b; Paulmann et al. Reference Paulmann, Seifert and Kotz2009), the N100 and P200 were selected for analysis. The N100 was measured as the most negative data point between 100 and 200 ms post-stimulus, and P200 was measured as the most positive data point between 200 and 300 ms. Since maximal effects were observed at fronto-central sites (Fig. 2), consistent with previous reports (Paulmann & Kotz, Reference Paulmann and Kotz2008b), N100 and P200 were measured at frontal (Fz, F3/4) and central electrodes (Cz, C3/4) and all these electrodes were entered into analyses reported below.
Statistical analyses
ERP and behavioral data
For ERP data, multivariate analyses of variance (MANOVAs) were computed for the between-group comparisons of N100 and P200 peak amplitude, for SSC and PPS separately, with group as a between-subjects factor and sentence condition (SSC, PPS) and emotion (neutral, happy, angry) as within-subjects factors, using SPSS 20.0 (SPSS, Corp., USA). Within-group comparisons in both SSC and PPS served to characterize patterns of ERP differences in emotion processing in each group that were not captured in between-group comparisons. Finally, in order to test for hemispheric differences, we conducted an exploratory analysis of hemisphere (left: F3, C3; right: F4, C4) effects. The MANOVA model was identical to that described above with the hemisphere used as one of the within-subjects factors. Accuracy data were subjected to MANOVAs with sentence condition and emotion as within-subjects factors, and group as a between-subjects factor.
For all significant main effects and interactions, pairwise comparisons were run using the Sidak correction.
Correlational analyses
Spearman's ρ correlations were performed in an exploratory analysis of the relationship between N100 and P200 amplitudes (at frontal and central electrodes) in both SSC and PPS conditions and Positive and Negative Syndrome Scale subscales scores (Kay et al. Reference Kay, Fiszbein and Opler1987). In addition, we correlated N100 and P200 amplitude with mean equivalent chlorpromazine dosage, as well as with illness duration, to test for effects of medication and chronicity and with behavioral data to test for the association between early sensory processes and behavioral indices of prosody recognition. All significance levels are two-tailed with the preset significance α level of p < 0.05.
Results
ERP results
N100 amplitude
The omnibus MANOVA yielded a main effect of sentence condition (F 1,28 = 8.451, p < 0.01) and emotion (F 2,27 = 9.126, p < 0.01). Planned comparisons showed that N100 was more negative in SSC than in PPS (p < 0.01), and for happy relative to neutral sentences (p < 0.01). No hemispheric differences were observed (p > 0.05). Importantly, a group × sentence condition× emotion interaction was found (F 2,27 = 0.628, p = 0.051). We followed this interaction with MANOVAs for each sentence condition separately, and then for each emotion in each sentence condition separately.
In between-group comparisons for SCC, a main effect of group was found (F 1,28 = 5.101, p = 0.032); N100 was less negative in patients than in healthy controls. In particular, groups differed for angry SCC (F 1,28 = 4.152, p = 0.051) and tended to differ for happy SSC (F 1,28 = 3.325, p = 0.079). No main effect of group or interaction involving the group factor was revealed in between-group comparisons for PPS.
In within-group comparisons for SSC, an emotion effect was observed in healthy controls (F 2,13 = 3.516, p = 0.024) – N100 was more negative for neutral relative to angry sentences – but not in the schizophrenia group (p > 0.05). Within-group comparisons for PPS showed an emotion effect both in healthy controls (F 2,13 = 41.922, p < 0.001) and in schizophrenia (F 2,13= 8.615, p < 0.01). In both groups, N100 was less negative for neutral relative to both happy and angry prosody.
P200 amplitude
The omnibus MANOVA revealed a main effect of sentence condition (F 1,28 = 5.026, p = 0.033) and emotion (F 2,27 = 9.342, p < 0.01). No hemispheric differences were observed (p > 0.05). Importantly, emotion interacted with group (F 2,27 = 4.775, p = 0.017) and an effect of group approached significance (F 1,28 = 3.799, p = 0.061). We followed this interaction with separate MANOVAs for each emotion type. A group × sentence condition interaction was observed for angry prosody (F 1,28 = 4.60, p = 0.041); more positive P200 for angry SSC was found in the schizophrenia group relative to healthy controls (p = 0.015). Additionally, a group effect was observed for happy prosody (F 1,28 = 5.897, p = 0.022); P200 was more positive in the schizophrenia group relative to healthy controls for both sentence conditions.
In within-group comparisons for SSC, an emotion effect was observed both in the healthy controls (F 2,13 = 27.406, p < 0.001) and schizophrenia (F 2,13= 5.452, p = 0.019) groups. However, differences were observed in the way both groups extracted emotional salience from the acoustic signal. In healthy controls, P200 was more positive for neutral relative to both happy and angry sentences, and more positive for happy relative to angry sentences. In patients, P200 was more positive to both happy (p = 0.022) and neutral (p = 0.033) relative to angry prosody in SSC.
Within-group analyses for PPS showed an emotion effect in both groups (healthy controls: F 2,13 = 6.997, p < 0.01; schizophrenia group: F 2,13 = 13.332, p < 0.01). In spite of a larger P200 for happy PPS in schizophrenia, the profile of emotion effects was similar in both groups; P200 was less positive for neutral relative to both happy and angry prosody.
Behavioral results
The overall MANOVA showed an effect of sentence condition (F 1,28 = 137.439, p < 0.001) and emotion (F 2,27 = 17.012, p < 0.001); fewer correct responses were observed for PPS relative to SSC, and for angry relative to both happy (p < 0.001) and neutral (p < 0.01) sentences.
Groups differed in the number of correct responses (F 1,28 = 6.746, p = 0.015), with higher error rates observed in the schizophrenia group (24.71, s.d. = 8.49) relative to healthy controls (29.7, s.d. = 5.49). When contrasting directly group performance for each emotion type separately in each sentence condition with one-way ANOVAs, results showed higher error rates in the patients' group for angry sentences in the SSC (F 1,28 = 7.515, p = 0.011), and neutral sentences in the PPS condition (F 1,28 = 6.367, p = 0.018), as well as a trend for reduced accuracy for angry sentences in the PPS condition (F 1,28 = 3.600, p = 0.068) (Fig. 3).
Correlations between ERP and behavior results
Correlations were found between N100 and P200 amplitude and behavioral performance in schizophrenia and in healthy controls. In schizophrenic patients, less negative N100 amplitude in neutral SSC was associated with higher error rates in neutral SSC; less positive P200 amplitude in angry SSC was associated with higher error rates in angry SSC. In healthy controls, less negative N100 amplitude in angry PPS was associated with higher error rates in angry PPS.
Correlations between clinical scales and electrophysiological and behavioral results
Significant correlations were found between delusions and P200 amplitude for happy SSC and PPS. Higher scores on the delusion scale correlated with more positive P200 amplitude for happy prosody both in the SSC and PPS conditions. No correlations were found between ERP data and negative symptomatology, and between behavioral performance and clinical symptomatology (p > 0.05). In addition, N100 and P200 amplitudes were not modulated by medication dosage or illness duration (p > 0.05) (Fig. 4).
Discussion
Overall results
The results of this study provided insights both into the way prosody is processed in normal populations and into the differences between healthy and schizophrenia individuals. Both behavioral and electrophysiological results suggested that neutral prosody and emotional prosody are processed differently and that the presence of semantic information influences the way prosody is processed. Importantly, group differences spanned all three stages of prosody processing (Schirmer & Kotz, Reference Schirmer and Kotz2006), and interacted with the semantic status of sentences.
N100: sensory processing of the acoustic signal
The first component found sensitive to both prosodic manipulations and group membership was the N100, suggesting that sensitivity to prosodically relevant features exists already at the N100 level (see also Liu et al. Reference Liu, Pinheiro, Guanghui, Nestor, McCarley and Niznikiewicz2012). This is the first study to report this result in both normal and schizophrenia groups for prosody rather than emotional sounds processing since N100 in previous ERP studies of prosody processing, though present, was not formally analysed.
A reduced N100 amplitude in SSC was observed in schizophrenia. Reduced N100 reflecting compromised function and structure of the auditory cortex in schizophrenia has been reported in numerous studies (Forces et al. Reference Forces, Venables and Sponheim2008; Rosburg et al. Reference Rosburg, Boutros and Ford2008; Turetsky et al. Reference Turetsky, Bilker, Siegel, Kohler and Gur2009). Here, we report, for the first time, reduced N100 to complex prosodic cues. Importantly, the N100 reduction was not present across the board for all stimuli but was dependent on sentence condition and emotion, with N100 reduced in patients relative to healthy controls specifically for angry SSC. This finding fits with previous evidence demonstrating reduced activation in the limbic system for negative emotional sounds in schizophrenia patients with chronic auditory hallucinations (Kang et al. Reference Kang, Kim, Seok, Chun, Lee and Park2009). The low reactivity to anger in normal speech may be related to the experience of hallucinations and/or delusions that often have an angry content and form (e.g. Garety et al. 2011). Even though no significant correlation was found between N100 amplitude for angry prosody and clinical symptoms, this relationship should be explored with a larger sample.
In contrast, N100 amplitude in PPS was not different from healthy controls. Note that, in PPS, the lexical–semantic information was absent due to the systematical distortion of the acoustic signal (see Method section above). Thus, the specificities of this sentence condition might have contributed to the pattern of observed differences.
Additionally, in patients, N100 amplitude did not distinguish between emotional versus neutral prosody in SSC, even though more negative N100 to emotional relative to neutral prosody in PPS was observed in patients similarly to healthy controls. The finding of N100 group differences for normal speech supports previous studies indicating sensory contributions to impaired recognition of emotional prosody in schizophrenia, in particular deficits in pitch perception abilities (Leitman et al. Reference Leitman, Laukka, Juslin, Saccente, Butler and Javitt2010). However, the PPS findings qualify this conclusion by pointing to a dynamic interplay between bottom-up and top-down processes (e.g. memory predictions about an incoming stimulus) during sensory discrimination (Chandrasekaran et al. Reference Chandrasekaran, Krishnan and Gandour2009; Diekhof et al. Reference Diekhof, Biedermann, Ruebsamen and Gruber2009; Krishnan et al. Reference Krishnan, Keefe and Kraus2009; Schadow et al. Reference Schadow, Lenz, Dettler, Frund and Hermann2009; Kumar et al. Reference Kumar, Sedley, Nourski, Kawasaki, Oya, Patterson, Howard, Friston and Griffiths2011; Marmel et al. Reference Marmel, Perrin and Tillmann2011). Specifically, they point to the influence of top-down processes on the sensory processing of prosodic cues, suggesting that lexical–semantic information present in a non-distorted acoustic signal may interact with early sensory processes making it more difficult for schizophrenia patients to process normal speech relative to a ‘pure prosody’ speech signal.
Analysed separately, healthy controls showed more negative N100 amplitude to neutral relative to emotional (i.e. angry) prosody in SSC, and to emotional (both happy and angry) relative to neutral prosody in PPS [the finding of increased N100 for neutral relative to emotional prosody in SSC is similar to the result reported by Liu et al. (Reference Liu, Pinheiro, Guanghui, Nestor, McCarley and Niznikiewicz2012) for non-verbal emotional vocalizations]. Given the role of N100 as an index of sensory processing and evidence suggesting that its amplitude is modulated by stimulus physical properties, the differentiation observed at this early stage is probably related more to differences in the sensory signal characteristics than to valence processing per se. Previous studies have shown increased N100 amplitude for higher-intensity auditory stimuli relative to lower-intensity stimuli (Connolly, Reference Connolly1993; Gonsalvez et al. Reference Gonsalvez, Barry, Rushby and Polich2007). In addition, f 0 was found to modulate N100 amplitude and latency (Stufflebeam et al. Reference Stufflebeam, Poeppel, Rowley and Roberts1998; Seither-Preisler et al. Reference Seither-Preisler, Patterson, Krumbholz, Seither and Lutkenhoner2006).
P200: deriving emotional significance from acoustic cues
Larger P200 amplitude for angry SSC and for happy SSC and PPS was observed in schizophrenic patients when compared with healthy controls, suggesting abnormal detection of emotional salience in emotional auditory stimuli. The abnormalities found in happy prosody processing both in SSC and PPS may indicate a specific difficulty in extracting emotional salience from happy stimuli, as proposed previously (deficit in positive emotion perception; Loughland et al. Reference Loughland, Williams and Gordon2002).
Of note, a positive relationship was observed between abnormal processing of happy prosody in both SSC and PPS and delusions scores. The observed correlations are consistent with reports of impaired affect perception related to delusions (e.g. Rossell et al. Reference Rossell, Batty and Hughes2010). Given that P200 amplitude is modulated by attention (decreased P200 amplitude seems to be related to increased attention; Crowley & Colrain, Reference Crowley and Colrain2004), the increased P200 amplitude to positive social information in schizophrenia patients with higher delusions scores may represent a decreased capture of attention by happy prosody contrasted with increased attention to perceived threat. A complementary hypothesis is that the extraction of emotional salience from an acoustic signal associated with happy prosody is associated with increased difficulty when compared with angry prosody. This hypothesis is based on more recent studies suggesting an association between P200 amplitude and required cognitive effort (Lenz et al. Reference Lenz, Schadow, Thaerig, Busch and Herrmann2007). The observed correlations thus support the hypothesis of a relationship between prosodic deficit and positive symptoms (Poole et al. 1997; Rossell & Boundy, Reference Rossell and Boundy2005; Shea et al. Reference Shea, Sergejew, Burnham, Jones, Rossell, Copolov and Egan2007) and, in particular, with the role of misattribution of emotional salience in delusions (Holt et al. Reference Holt, Titone, Long, Goff, Cather, Rauch, Judge and Kuperberg2006).
Analysed separately, healthy controls showed more positive P200 for neutral relative to emotional prosody in SSC, similarly to Paulmann & Kotz (Reference Paulmann and Kotz2008b). This supports previous studies suggesting that, at the P200 level, auditory information is segregated into neutral and emotional prosody, based on salience of acoustic features (Schirmer & Kotz, Reference Schirmer and Kotz2006; Liu et al. Reference Liu, Pinheiro, Guanghui, Nestor, McCarley and Niznikiewicz2012). However, differently from Paulmann & Kotz (Reference Paulmann and Kotz2008b), a valence-specific effect was found: P200 was more positive for happy relative to angry prosody. This result suggests that the processes associated with deriving emotional significance from acoustic cues indexed by P200 distinguished between specific kinds of emotion. Differences in stimuli (e.g. number of emotion types) and task instructions may have contributed to the discrepancy between our findings and those of Paulmann & Kotz (Reference Paulmann and Kotz2008b). In the study by Paulmann & Kotz (Reference Paulmann and Kotz2008b), participants did not explicitly judge the prosody type, but made decisions on a probe word that followed each prosodic sentence. The difference between the implicit and explicit emotional prosody recognition may have contributed to the different results.
In healthy controls, emotion effects were also observed in PPS: P200 was more positive for emotional relative to neutral prosody. Similar P200 effects were reported by Sauter & Eimer (Reference Sauter and Eimer2010) and Liu et al. (Reference Liu, Pinheiro, Guanghui, Nestor, McCarley and Niznikiewicz2012) for non-semantic emotional vocalizations. Functionally, P200 is at the interface between sensory and cognitive processing and indexes initial categorization processes. The different profile of categorizing neutral versus emotional prosody observed in SSC and PPS corroborates the idea that sensory cues are used differently in both sentence conditions. Also, they support the idea that emotional salience is not derived from a single acoustic cue, but from a specific configuration of acoustic parameters that form an emotional auditory object or gestalt (Laukka, Reference Laukka2005; Schirmer & Kotz, Reference Schirmer and Kotz2006; Paulmann & Kotz, Reference Paulmann and Kotz2008b). The relative importance of each prosodic parameter is likely to change in the absence of intelligible semantic information.
Analysed separately, schizophrenia patients revealed again a pattern of prosody processing that was different from healthy controls, and depended on the presence of semantic information. In contrast to healthy controls, in SSC, similarly increased P200 amplitude was observed for neutral and happy sentences relative to angry sentences. Studies with healthy subjects suggest that P200 amplitude is modulated by both pitch (lower pitch associated with higher P200 amplitude) and attention (the higher the attention, the lower the P200 amplitude) (for a review, see Crowley & Colrain, Reference Crowley and Colrain2004). Given the pattern of observed differences, it is likely that the differential engagement of attentional resources by each prosody type contributed to specific P200 effects in each group. Thus, healthy controls may have deployed more attentional resources to both emotional stimuli, resulting in the reduced P200. In contrast, in schizophrenia, reduction in P200 amplitude was selectively observed only for angry sentences which, as mentioned above, may be related to more attentional resources and/or less cognitive effort devoted to angry, relative to both happy and neutral affect in SSC. These data may indicate that the processing of neutral and happy prosody is more difficult in schizophrenia as suggested previously (Loughland et al. Reference Loughland, Williams and Gordon2002; Holt et al. Reference Holt, Titone, Long, Goff, Cather, Rauch, Judge and Kuperberg2006; Seiferth et al. Reference Seiferth, Pauly, Habel, Kellermann, Shah, Ruhrmann, Klosterkotter, Schneider and Kircher2008), especially for speech signals carrying both prosodic and semantic information.
Behavioral response: cognitive evaluation of emotional significance
At the third stage of emotional prosody processing, the listener integrates the information provided by the physical properties of the stimuli with the meaning conveyed by linguistic (e.g. semantic) information so that cognitive judgements can be made (Schirmer & Kotz, Reference Schirmer and Kotz2006). As demonstrated by Paulmann & Kotz (Reference Paulmann and Kotz2008b) and Paulmann et al. (Reference Paulmann, Seifert and Kotz2009), ERP data were insensitive to this stage of analysis. However, as suggested by Paulmann et al. (Reference Paulmann, Seifert and Kotz2009), behavioral results were informative in this respect. In all subjects, emotional recognition was better in SSC than in PPS. The absence of a memory representation for PPS may have increased task demands and made it more difficult to distinguish between different prosody types (Kotz et al. Reference Kotz, Meyer, Alter, Besson, von Cramon and Friederici2003). As suggested by previous studies (Kotz & Paulmann, Reference Kotz and Paulmann2007; Paulmann et al. Reference Paulmann, Seifert and Kotz2009), the increased recognition rates for SSC over PPS may indicate that the availability of the semantic channel is related to an emotional processing advantage and that semantics cannot be ignored even when the task demands are focused on emotional prosody only. In between-group comparisons, schizophrenia individuals committed more errors irrespective of sentence condition or emotional category as reported in other studies (Bozikas et al. Reference Bozikas, Kosmidis, Anezoulaki, Giannakou, Andreou and Karavatos2006; Shea et al. Reference Shea, Sergejew, Burnham, Jones, Rossell, Copolov and Egan2007). Thus, by the time a subject was ready to make a response, abnormalities were evident in processing all types of sentences and emotions.
In schizophrenia, an association was found between less negative N100 amplitude for neutral SSC and higher error rates in neutral prosody SSC recognition. This may suggest that a decrease in the level of attentiveness (indexed by less negative N100) to neutral prosody SSC may be related to higher error rates in neutral prosody (SSC) recognition. Additionally, less positive P200 for angry prosody in SSC processing correlated with higher error rates related to angry prosody recognition in SSC, an emotion for which patients showed higher error rates when compared with happy prosody discrimination. The finding that reduced P200 was associated with higher error rates for angry prosody recognition in SSC suggests that the ability to disengage attention from negative stimuli (that would be indexed by increased P200 amplitude) may, in fact, result in a better performance (Green et al. Reference Green, Williams and Davidson2003).
These associations suggest that abnormalities at early processing stages are indeed related to later decision stages, thus supporting the contribution of sensory abnormalities to neutral and angry prosody recognition in schizophrenia.
Together, these findings point to group differences in the integration of information derived from early perceptual analyses (N100 and P200) with the conceptual knowledge of emotions that is important for cognitive evaluation of perceptual analyses. Thus, abnormalities in the initial processes of extracting acoustic information and assigning emotional salience to an utterance might set the stage for difficulties in assigning emotional meaning and integrating it with semantic content leading to errors in emotion recognition. Importantly, our data suggest reciprocal relationships between higher-order processes and sensory-based operations in bringing about prosody dysfunction in schizophrenia.
Limitations and future directions
The limitations include a chronic, medicated schizophrenia sample, raising the issue of a potential medication effect. However, the correlational analyses between both medication dosage and illness duration, and ERP and behavioral responses were not significant. Also, deficits in the processing of emotional prosody were observed in first-episode patients (Haskins et al. Reference Haskins, Shutty and Kellogg1995; Edwards et al. Reference Edwards, Pattison, Jackson and Wales2001), in children with schizophrenia (Baltaxe & Simmons, Reference Baltaxe and Simmons1995), and tend to occur independent of medication (Kerr & Neale, Reference Kerr and Neale1993; Ross et al. Reference Ross, Orbelo, Cartwright, Hansel, Burgard, Testa and Buck2001), arguing against the possibility that medication, duration of illness or institutionalization contributed to the results.
There is some evidence suggesting that deficits in emotional prosody perception tend to be dependent on schizophrenia subtype (Shea et al. Reference Shea, Sergejew, Burnham, Jones, Rossell, Copolov and Egan2007). Future studies should address emotional prosody processing in different schizophrenia subgroups.
Only males were studied in the current investigation and some studies suggest that emotional perception deficits are worse in male than in female patients (Bozikas et al. Reference Bozikas, Kosmidis, Anezoulaki, Giannakou, Andreou and Karavatos2006; Scholten et al. Reference Scholten, Aleman and Kahn2008). Therefore, future studies should include women.
Conclusions
Our findings, for the first time, elucidate the interaction between sensory and higher-order processes in bringing about prosody processing abnormality in schizophrenia using ERP methodology. This is the first study to provide electrophysiological evidence for emotional prosody processing abnormalities.
These findings have important implications for understanding social interactions in schizophrenia, since difficulties in decoding emotions from prosodic cues may contribute to difficulties in social reciprocity (Brekke et al. Reference Brekke, Kay, Lee and Green2005) and to poor outcome (Green et al. Reference Green, Kern, Braff and Mintz2000; Leitman et al. Reference Leitman, Wolf, Laukka, Ragland, Valdez, Turetsky, Gur and Gur2011).
Appendix
Acknowledgements
This work was supported by two doctoral grants (doctoral grant no. SFRH/BD/35882/2007 and research grant no. PTDC/PSI-PCL/116626/2010) from Fundação para a Ciência e a Tecnologia (FCT, Portugal) awarded to A.P.P., and by two grants from the National Institute of Mental Health (no. RO1 MH 040799 awarded to R.W.M. and no. RO3 MH 078036 awarded to M.A.N.). We gratefully acknowledge all the participants of this study. We are also grateful to Elizabeth Thompson and Israel Molina for their help with data acquisition.
Declaration of Interest
None.