Hostname: page-component-745bb68f8f-kw2vx Total loading time: 0 Render date: 2025-02-06T12:22:16.837Z Has data issue: false hasContentIssue false

Can the Acoustic Analysis of Expressive Prosody Discriminate Schizophrenia?

Published online by Cambridge University Press:  02 November 2015

Francisco Martínez-Sánchez*
Affiliation:
Universidad de Murcia (Spain)
José Antonio Muela-Martínez
Affiliation:
Universidad de Jaén (Spain)
Pedro Cortés-Soto
Affiliation:
Universidad de Jaén (Spain)
Juan José García Meilán
Affiliation:
Universidad de Salamanca (Spain)
Juan Antonio Vera Ferrándiz
Affiliation:
Universidad de Murcia (Spain)
Amaro Egea Caparrós
Affiliation:
Universidad de Murcia (Spain)
Isabel María Pujante Valverde
Affiliation:
Universidad de Murcia (Spain)
*
*Correspondence concerning this article should be addressed to Francisco Martínez-Sánchez. Dpto. de Psicología Básica y Metodología. Facultad de Psicología. Universidad de Murcia. Campus Universitario de Espinardo. 30100. Murcia (Spain). E-mail: franms@um.es
Rights & Permissions [Opens in a new window]

Abstract

Emotional states, attitudes and intentions are often conveyed by modulations in the tone of voice. Impaired recognition of emotions from a tone of voice (receptive prosody) has been described as characteristic symptoms of schizophrenia. However, the ability to express non-verbal information in speech (expressive prosody) has been understudied. This paper describes a useful technique for quantifying the degree of expressive prosody deficits in schizophrenia, using a semi-automatic method, and evaluates this method’s ability to discriminate between patient and control groups. Forty-five medicated patients with a diagnosis of schizophrenia were matched with thirty-five healthy comparison subjects. Production of expressive prosodic speech was analyzed using variation in fundamental frequency (F0) measures on an emotionally neutral reading task. Results revealed that patients with schizophrenia exhibited significantly more pauses (p < .001), were slower (p < .001), and showed less pitch variability in speech (p < .05) and fewer variations in syllable timing (p < .001) than control subjects. These features have been associated with «flat» speech prosody. Signal processing algorithms applied to speech were shown to be capable of discriminating between patients and controls with an accuracy of 93.8%. These speech parameters may have a diagnostic and prognosis value and therefore could be used as a dependent measure in clinical trials.

Type
Research Article
Copyright
Copyright © Universidad Complutense de Madrid and Colegio Oficial de Psicólogos de Madrid 2015 

The difficulty in identifying and expressing emotional states is one of the major symptoms of schizophrenia as well as a predictor of social adjustment (Brazo, Beaucousin, Lecardeur, Razafimandimby, & Dollfus, Reference Brazo, Beaucousin, Lecardeur, Razafimandimby and Dollfus2014; Hooker & Park, Reference Hooker and Park2002; Kee, Green, Mintz, & Brekke, Reference Kee, Green, Mintz and Brekke2003); This “flat” affective state is present in at least 66% of schizophrenics (Trémeau et al., Reference Trémeau, Malaspina, Duval, Correa, Hager-Budny, Coin-Bariou and Gorman2005).

The difficulty to identify the emotional state of others in facial expression (Kohler, Walker, Martin, Healey, & Moberg, Reference Kohler, Walker, Martin, Healey and Moberg2010) and in the tone of their voice (Hoekert, Kahn, Pijnenborg, & Aleman, Reference Hoekert, Kahn, Pijnenborg and Aleman2007) has been well established as one of the main predictors of deterioration in all phases of the disorder: from the first episode (Horan et al., Reference Horan, Green, DeGroot, Fiske, Hellemann, Kee and Nuechterlein2012), during its chronicity (Green et al., Reference Green, Bearden, Cannon, Fiske, Hellemann, Horan and Nuechterlein2012), and even in its high-risk states (Allot et al., Reference Allott, Schäfer, Thompson, Nelson, Bendall, Bartholomeusz and Amminger2014).

These difficulties are part of the alterations in social cognition (described as the ability to construct representations of the relationship between oneself and others, and to use them flexibly in behavior regulation, Adolphs, Reference Adolphs2001), along with social perception social, theory of mind, social knowledge and attributional style, which have been consistently linked to schizophrenia (Couture, Penn, & Roberts, Reference Couture, Penn and Roberts2006; Green & Horan, Reference Green and Horan2010; Kring & Ellis, Reference Kring and Ellis2013; Penn, Sanna, & Roberts, Reference Penn, Sanna and Roberts2008).

Similarly, the emotional expressive ability is also impaired (Cohen, Kim, & Najolia, Reference Cohen, Kim and Najolia2013), although there is evidence (Kring & Moran, Reference Kring and Moran2008) indicating the dissociation between expressiveness and emotional experience, since flat affect does not necessarily lead to a reduction in emotional experience.

Prosody has been studied to a lesser extent than the ability to identify and express emotions facially. In speech, not only the changes in the melody produced by variations in the frequency of opening and closing of the vocal cords are perceived, but also the changes of rhythm, speed, intonation, pauses, intensity and other spectral alterations that are perceived by the listener as melodic variations, and interpreted subjectively as paralinguistic signals, essential for the understanding and interpretation of the utterance and the identification of the emotional and motivational state of the speaker (Patel, Reference Patel2008).

Receptive prosody studies have shown that patients exposed to voice samples with different emotions and questioned about the emotion they have heard show a marked difficulty to identify such emotions. An extensive meta-analysis (Hoekert et al., Reference Hoekert, Kahn, Pijnenborg and Aleman2007) comprising twenty articles with a total of 663 patients, has described a significant effect size (d = –1.24) when comparing the performance of schizophrenic participants and controls.

Regarding expressive prosody, the studies that have collected voice samples and have asked participants to encode different emotional states have concluded that there is a marked difficulty in expressing emotions verbally (Putnam & Kring, Reference Putnam and Kring2007). Hoekert et al.’s (Reference Hoekert, Kahn, Pijnenborg and Aleman2007) meta-analysis has also concluded there was a significant effect size (d = –1.11) between the eleven studies reviewed with data from 186 patients.

Most of the studies on expressive prosody have used tasks in which participants were asked to encode various emotional states perceived in voice (Hoekert et al., Reference Hoekert, Kahn, Pijnenborg and Aleman2007); fewer studies have asked their participants to spontaneously narrate sad, happy and anger events they have experienced (Alpert, Rosenberg, Pouget, & Shaw, Reference Alpert, Rosenberg, Pouget and Shaw2000; Shaw et al., Reference Shaw, Dong, Lim, Faustman, Pouget and Alpert1999). These methods have, in our view, reduced the ecological validity of the study of expressive prosody, since high intensity discrete emotional states (anger, sadness, fear, etc.) are rarely expressed in colloquial language in the manner described in these studies. In everyday social interaction, the affective state, intentions and aptitude are constantly and inexorably present in the tone of voice, though less explicitly. An alternative method is used by Cohen et al. (Cohen, Iglesias, & Minor, Reference Cohen, Iglesias and Minor2009; Cohen & Hong, Reference Cohen and Hong2011), consisting of analyzing the prosody of verbal responses triggered by emotional stimuli presentations, using the “International Affective Picture System” (Lang, Bradley, & Cuthbert, Reference Lang, Bradley and Cuthbert2005.

In contrast to these methods, only three studies have used emotionally neutral readings (Cohen, Alpert, Nienow, Dinzeo, & Docherty, Reference Cohen, Alpert, Nienow, Dinzeo and Docherty2008; Dickey et al., Reference Dickey, Vu, Voglmaier, Niznikiewicz, McCarley and Panych2012; Leentjens, Wielaert, Harskamp, & Wilmink, Reference Leentjens, Wielaert, Harskamp and Wilmink1998). To evaluate the non-emotional expressive prosody in schizophrenia, this paper presents a potentially useful technique, using an emotionally neutral text that has been proven useful in quantifying prosody in other disorders (Martínez-Sánchez, Meilán, Pérez, Carro, & Arana, Reference Martínez-Sánchez, Meilán, Pérez, Carro and Arana2012). The procedure consists of the semi-automatic analysis of the variations in the trajectory of pitch and height perception of the fundamental frequency (of the vocal cords’ vibrations) of the vocalic syllable nuclei, as this is the point of greatest loudness, using a purely acoustic base to extract the harmonic peak without the need of phonetic segmentation. The data derived from the behavior of the F0 in the intensity of vocalic segments yield a complete melodic pattern of the speaker that shows significant changes in tone, both upstream (prosodic peaks) and downstream (prosodic valleys), within the syllabic nucleus, as well as between different nuclei (see Annex I for a description of the prosodic parameters used).

This procedure has many advantages over previously used prosodic analyses, since it increases the reliability and validity of results, speeds up the production of prosodic parameters and minimizes the influence of the coding skills of the subject, as it uses an emotionally neutral text. It also increases its ecological validity, as it is closer to the colloquial language used in everyday interactions. Finally, the procedure does not require phonetic segmentation, which virtually eliminates any errors the experimenter could commit in the process of quantification, as well as any differences in estimation between various experimenters.

In the present work, this procedure is used in order to objectively quantify the deficits in expressive prosody in schizophrenia, as well as to assess its discriminatory power between groups. It is hypothesized that the group of schizophrenia patients will show a significantly flatter prosodic profile, characterized by less variability in the dynamics and the path of the vocalic nuclei and in voice intensity as well as an increase in the number of pauses than those obtained by the control group.

Method

Statistical design

A cross-sectional, analytical, observational and retrospective design was used.

Participants

A sample of 80 participants, divided into two groups, was recruited: 45 patients diagnosed with schizophrenia and 35 asymptomatic controls.

The group of patients (M age = 39.49, SD = 10.89; 71.1% male) were recruited from various Mental Health units of the Andalusian Health Service of the province of Jaen (Therapeutic and Rehabilitation Community of the Jaen area and the Mental Health Clinics of Martos and Andujar). All participants were evaluated using the clinical version of the Structured Clinical Interview (First, Spitzer, Gibbon, & Williams, Reference First, Spitzer, Gibbon and Williams1996), following the criteria established in the DSM-IV-TR (2000). The average duration of the disorder, from its initial diagnosis, was 21.17 years (SD = 5.65), the mean number of relapses was 3.47 (SD = 1.89) and the mean time elapsed since the last relapse was 45.88 days (SD = 25.67). The average dose of chlorpromazine equivalent units was 669.88 mg / day (SD = 559.31).

Meanwhile, the control group (M age = 35.34, SD = 10.48; 62.9% male) was matched with the patient sample for age, sex and educational level, and was extracted from the same social environment as the patient sample. They had no history of mental or neurological disorders or drug or alcohol abuse, which were considered exclusion criteria.

Materials and Procedure

Brief Psychiatric Rating Scale (BPRS; Lukoff, Nuechterlein, & Ventura, Reference Lukoff, Nuechterlein and Ventura1986) in the Spanish validation by Peralta and Cuesta (Reference Peralta and Cuesta1994). The 0–5 points response range was used as it increases the inter-rater reliability (Bech, Larsen, & Andersen, Reference Bech, Larsen and Andersen1988). It is composed of 18 items, to be administered by an experienced therapist after a semi-structured interview (15–25 minutes in length). Each item is scored using a Likert-type scale with 5 levels of intensity, where 0 represents the absence of the symptom and 4 represents extreme gravity.

In order to record speech, a professional Fostex FR-2LE recorder was used, with a resolution of 24 bits and a 48 kHz sampling rate, using a cardioid AKG D3700S microphone. Samples were edited using the acoustic voice analysis 5.1.42 Praat program (Boersma & Weenink, Reference Boersma and Weenink2013). Annex I contains the definition of the various parameters used.

The study was conducted between June and December of 2013. All participants were adequately informed in order to sign their consent according to the protocols of the Bioethics Committees of the participating institutions. This study complied with the ethical principles of the Declaration of Helsinki for medical research involving human subjects. The procedure was performed in a single session, initially collecting socio-demographic information (age, marital status, etc.) and clinical data (age at onset of symptoms and diagnosis, duration of illness, number of admissions, time since last admission, etc.) as well as administering the BPRS. The doses of chlorpromazine equivalent units ingested by patients were also registered. Subsequently, speech recordings were made.

The task entailed reading the first paragraph (405 syllables) of “Don Quijote” by Miguel de Cervantes. The recordings were performed in a silent room (but not acoustically isolated), placing the microphone 8 cm away and at a 45° angle from the participant’s mouth in order to prevent any aerodynamic noise.

To quantify prosodic patterns, the automatic prosodic transcription of the recordings was performed using the algorithms implemented by Mertens (Reference Mertens2004) on the Praat program (Boersma & Weenink, Reference Boersma and Weenink2013). The estimation of the prosodic speech profile was performed analyzing the variations in the height trajectory and pitch perception (prosodic peaks and valleys) of the F0 of the vocalic syllable nuclei that contain voice signals, on a peak intensity delimited -3dB and -9dB to left and right, respectively, in order to represent the melodic movements perceived by the human ear. The value of the left limit (–3dB) eliminates most of the microprosodic disturbances and stylizes the beginning of the syllable, while the right (–9dB) limit preserves the variations in tone of accented vowels.

In this paper, a detection range for the F0 of 65–650 Hz was established for 0.005s windows; the following threshold intensity was established for the automatic segmentation in the stylization of the algorithm: Glissando = 0.32/T2, DG = 30, dmin = 0.05. To determine the presence of a vowel, a 0.32/T2 semitone threshold was allocated, where T is the duration of the vowel in seconds. If the tone’s exchange rate is higher than the threshold defined by the perceptual values of voice detection, a value proportional to the glissando threshold (continuous slippage of the melodic line in the same syllable) was assigned, whereas if the value is lower than the threshold, the same value as the median of the voice sample analyzed was assigned. It should be noted that while the standard psychoacoustic threshold for isolated vowels is G = 0.16/T2, voice flow is rarely linear during natural speech, hence, the value assigned in this present study has been shown to more adequately model voice variations, especially in automatic transcription.

Data analyses

The IBM SPSS (version 21) statistical package was used. The Student-t test for independent samples was used to define the differential prosodic profile between groups and Pearson correlations were used to assess the relationship between variables. Finally, a discriminant analysis was performed in order to assess the ability of prosodic variables to classify subjects into both groups.

Results

Different statistical tests were conducted to assess the absence of statistically significant intergroup differences among the sociodemographic variables. No differences for the variable “age” (t 79 = –1.76; p = .091), educational level (measured in months of schooling; t 79 = 1.74; p = .085), or for the distribution of sex per group (χ 2 = .611; p = .434) were observed.

The mean comparisons show the existence of significant intergroup differences in all the prosodic variables, but not for those dependent on the frequency (F0) (Table 1). The main prosodic parameters (valleys, prosodic dynamics, inter- and intra-syllabic and phonation trajectories) of the Schizophrenia patient group yielded significantly lower levels than those obtained by the control group (Figure 1), showing in general, a sparsely prosodic and melodically flatter speech than that of controls (Figure 2). Moreover, they spent more time performing the task, made a greater proportion of pauses and exhibited a significantly lower voice intensity, accentuating their perception of dysprosody.

Table 1. Descriptive statistics and mean comparison tests of the analyzed prosodic parameters

Note: dB = Decibels; Hz = Hertz; F0 = Fundamental frequency; ST/s= Semitones/second.

Figure 1. Prosodic trajectories of syllable nuclei for both groups.

Figure 2. Example of prosogram from a patient with schizophrenia and from a control participant.

As expected, education level significantly affected the performance of the task in the control group. For example, in the schizophrenia group, the higher the educational level, the lower the time spent on reading the text and the lower the proportion of pauses performed. It should be noted that for a test to be used for screening, it must be scarcely sensitive to the effects of education in the pathological group; the poor correlation between prosodic variables and educational level in the patient group for this task can be appreciated in Table 2.

Table 2. Correlations between prosodic variables and months of schooling

Note: + p < .006 (Bonferroni’s correction).

The years of chronicity of the disorder is the clinical variable most strongly associated with prosodic variables in the schizophrenia patients group; the greater the number of years elapsed since diagnosis of the disorder, the smaller the intrasyllabic (r = –.377; p = .028), and phonation (r = –.422; p = .013) trajectories were. Similarly, the time elapsed since the last relapse correlated with the phonation trajectory (r = .404; p = .018). Drug treatment did not induce significant changes in prosody, although a trend was observed which suggests that the higher the dose of chlorpromazine was, the less time was spent on the reading task (r = –.280; p = .063) and fewer pauses were made (r = –.251; p = .091). Moreover, the BPRS scores were not significantly correlated to the prosodic variables, although the positive symptoms scale score negatively correlated with the intensity of voice (r = –.346; p = .021). The scale’s item that evaluates blunted affect did not correlate significantly with any prosodic parameter, obtaining the highest degree of correlation with the “Intrasyllabic trajectory” variable and with blunted affect (–.219; p = .118). No significant correlations were found between the total scores of the BPRS scale and the prosodic parameters evaluated, or between these parameters and the scores obtained in the positive and negative symptoms subscales.

Finally, a discriminant analysis was performed, in order to assess the ability of the prosodic parameters studied to distinguish subjects in both groups. The canonical discriminant function explained 100% of the variance (canonical correlation = .828; λ14 = .314; χ2 14 = 82.16; p < .001), with Intensity, Intrasyllabic trajectory, Total length, Phonation trajectory and Pause rate being the variables with the greatest discriminating power (with standardized canonical discriminant function coefficients of .630, .541, –.403, .365 y –.344 respectively). Conversely, the dependent variables of the fundamental frequency (M F0 = –.003; F0 Range = –.026; F0 SD = .077) were those that showed less discriminatory power. Overall, the discriminant analysis allowed for the correct classification of 93.8% of the original data (Table 3); in the cross-validation analysis, 87.5% of cases were correctly classified.

Table 3. Percentage of correctly classified cases in the discriminant analysis

Note: 93.8 % of the original cases were classified correctly.

Discussion

The objective of this research was to study the differential expressive prosodic patterns in a group of schizophrenic patients and in a group of asymptomatic subjects using an unemotional text as the reading task. The results show the existence of marked intergroup differences, with the clinical group exhibiting a slow and low-intensity speech, with many pauses. Moreover, this group’s dysprosody is characterized by scarce tone changes, both within the syllable nucleus, and between adjacent syllabic nuclei. All these characteristics yield an “emotionally flattened” speech (Tremeau, Reference Trémeau2006).

These results concur with those obtained in several studies. Firstly, the low dynamics of both the inter- and intra-syllabic nuclei observed in our results, in spite of not have been studied before, is the result of the scarce changes in F0. This is consistent with previous studies (Alpert et al., Reference Alpert, Rosenberg, Pouget and Shaw2000) reporting that the schizophrenic speech is characterized by few inflections of speech. Secondly, the large number of pauses has also been previously identified as a characteristic of the disease (Alpert, Kotsaftis, & Pouget, Reference Alpert, Kotsaftis and Pouget1997; Clemmer, Reference Clemmer1980; Cohen & Elvevåg, Reference Cohen and Elvevåg2014; Cohen, Mitchell, & Elvevåg, Reference Cohen, McGovern, Dinzeo and Covington2014). They can be attributed to the reduction in verbal fluency and the discrimination of phonemes, processes that are altered in schizophrenia (Johnson-Selfridge & Zalewski, Reference Johnson-Selfridge and Zalewski2001; Kugler & Caudrey, Reference Kugler and Caudrey1983). Thirdly, the voice signal is less intense (Pascual, Solé, Castillón, Abadía, & Tejedor, Reference Pascual, Solé, Castillón, Abadía and Tejedor2005), which also accentuates the perception of dysprosodic speech. Finally, these subjects take longer to complete the task, evidencing the limited cognitive resources available to perform a complex task such as reading (Cohen, McGovern, Dinzeo, & Covington, Reference Cohen, McGovern, Dinzeo and Covington2014; Melinder & Barch, Reference Melinder and Barch2003).

The dependent parameters of the fundamental frequency (mean, standard deviation and range) showed no differences between groups. This is not unexpected, as even though these measures contribute to variations in tone, they are microprosodic measures, as opposed to the suprasegmental parameters studied, which have been shown to be significant intergroup discriminators.

The observed differences in the variations in the syllabic dynamics of the vocalic nucleus are especially noteworthy, as they show a differential prosodic profile in both groups regarding their melodic slopes. Thus, although the percentage of upward vocalic nuclei (expressed by prosodic peaks) did not differ between the groups, the downward vocalic nuclei (prosodic valleys) were significantly lower in the group with schizophrenia. In the Spanish language, declarative intonation is typically downward, fully explaining the above-mentioned, while the interrogative intonation is, however, generally upward, presenting an incomplete utterance (Cantero, Reference Cantero and Mendoza2003). As the text used in the task was a declarative text, the clinical group’s intonation was clearly inadequate, a fact that the listener may perceived as discordant with the meaning of the sentence being read.

The clinical variables have proven to be insignificant in expressive prosody. Our results show a correlation between the years of chronicity of the disorder and the impairment of intersyllabic and phonation trajectories. Although there is a lack of data from other investigations to compare, it is known that the ability to identify facial and prosodic expressions reduces with the duration of illness (Kucharska-Pietura, David, Masiak, & Phillips, Reference Kucharska-Pietura, David, Masiak and Phillips2005; Silver, Goodman, Knoll, Isakov, & Modai, Reference Silver, Goodman, Knoll, Isakov and Modai2005; Silver, Shlomo, Turner, & Gur, Reference Silver, Shlomo, Turner and Gur2002). Although drug treatment does not induce changes in prosody, beyond increasing the duration of the task and the number of pauses, Hoekert et al.’s (Reference Hoekert, Kahn, Pijnenborg and Aleman2007) meta-analysis does not verify the existence of any significant relationships between these variables.

The relationship between the increase in positive symptoms and the reduction in the intensity of the voice is paradoxical, because one would expect the negative symptoms to be related to a lower intensity of voice (considering it an indicator of flat prosody), as reported by Cohen and Hong (Reference Cohen and Hong2011), although their results did not reach statistical significance, while positive symptoms would correlate to an increase in verbal fluency. It is known that positive symptoms correlate with the difficulty in discriminating the fundamental frequency changes in sentences with an emotional content (Matsumoto et al., Reference Matsumoto, Samson, O’Daly, Tracy, Patel and Shergill2006). Similarly, patients who experience hallucinations and delusions show alterations when identifying changes in the tone of voice (Johns et al., Reference Johns, Roseell, Frith, Ahmad, Hemsley, Kuipers and McGuire2001).

Finally, the scores of the BPRS subscales have not yielded any significant correlations with prosody. These scales’ limitations are well-known (Cohen & Elvevåg, Reference Cohen and Elvevåg2014; Nicholson, Chapman, & Neufeld, Reference Nicholson, Chapman and Neufeld1995). Even though they are able to identify differences of up to six standard deviations when comparing negative symptoms of patients and controls (Emmerson et al., Reference Emmerson, Ben-Zeev, Granholm, Tiffany, Golshan and Jeste2009), they are relatively insensitive to changes in the patient's condition and they induce response biases that make it difficult for even trained evaluators to notice specific aspects of behavior related to alogia and blunted affect within the patient’s speech (Alpert, Shaw, Pouget, & Lim, Reference Alpert, Shaw, Pouget and Lim2002). On the other hand, our results coincide with those reported by Cohen et al. (Reference Cohen, Kim and Najolia2013), who found no relationship between the symptoms of a schizophrenic group and variables related to the expressive prosody (number of pauses and variability of F0).

While the slow implementation of the task (duration and rate of pauses) is conditioned by educational level in both groups (though to a greater extent in the clinical group), the variables related to the trajectory of the syllabic nuclei exhibit no relationship to these variables in any of the two groups, showing that the prosody results are scarcely sensitive to the cultural level, coinciding with the results reported by Leentjens et al. (Reference Leentjens, Wielaert, Harskamp and Wilmink1998).

The used procedure has proven to be, in our view, a valid and reliable alternative to accurately record non-emotional expressive prosody in colloquial speech, as its results can relate to patients’ everyday interactions with their environment. The small number of existing studies using non-emotional stimuli to assess the expressive prosody in schizophrenia is surprising (Cohen et al., Reference Cohen, Alpert, Nienow, Dinzeo and Docherty2008; Dickey et al., Reference Dickey, Vu, Voglmaier, Niznikiewicz, McCarley and Panych2012; Leentjens et al., Reference Leentjens, Wielaert, Harskamp and Wilmink1998).

Furthermore, the procedure provides numerous advantages to those traditionally employed. It allows to objectively quantify the degree of dysprosody in a fast and non-intrusive way, without causing discomfort to the patient. Additionally, the acoustic analyses are highly sensitive to changes in the voice; therefore, their use is potentially useful in the study of the evolution of the disorder and in evaluating drug treatments. The high discriminating ability (93.8%) is higher than that achieved by Kliper et al. (80.95%; Kliper, Vaizman, Weinshall, & Portuguese, Reference Kliper, Vaizman, Weinshall and Portuguese2010).

This study has several limitations. Firstly, patients were medicated, making it impossible to assess the differential effect of drug treatment on prosody; however, the correlations obtained seem to rule out any connection. Secondly, although several variables that can modulate the results (age, education level, etc.) have been evaluated, it is necessary to expand the number of variables that can influence the results. Thirdly, the possible existence of differences in prosody dependent on the demand of the task should be researched, since the cognitive demand required while reading a neutral text is very different from that required by speech in response to stimuli of an emotional character. The differences in the demands imposed by the tasks can potentially explain the disparity in the results obtained in various investigations, since cognitive resources particularly determine speech production in patients with schizophrenia.

Future studies, with an extended sample, may shed more data on the utility of this procedure and its clinical implications, especially in longitudinal studies. The obtained results support the use of the procedure; however the refinement of acoustic analysis algorithms should be sought, in order to achieve higher levels of discriminatory power between groups.

Annex I. Definition of the used parameters

References

Adolphs, R. (2001). The neurobiology of social cognition. Current Opinion in Neurobiology, 11, 231239. http://dx.doi.org/10.1016/S0959-4388(00)00202-6 Google Scholar
Allott, K. A., Schäfer, M. R., Thompson, A., Nelson, B., Bendall, S., Bartholomeusz, C. F., … Amminger, G. P. (2014). Emotion recognition as a predictor of transition to a psychotic disorder in ultra-high risk participants. Schizophrenia Research, 153, 2531. http://dx.doi.org/10.1016/j.schres.2014.01.037 Google Scholar
Alpert, M., Kotsaftis, A., & Pouget, E. R. (1997). At issue: Speech fluency and schizophrenic negative signs. Schizophrenia Bulletin, 23, 171177.CrossRefGoogle ScholarPubMed
Alpert, M., Rosenberg, S. D., Pouget, E. R., & Shaw, R. J. (2000). Prosody and lexical accuracy in flat affect schizophrenia. Psychiatry Research, 97, 107118. http://dx.doi.org/10.1016/S0165-1781(00)231-6 Google Scholar
Alpert, M., Shaw, R. J., Pouget, E. R., & Lim, K. O. (2002). A comparison of clinical ratings with vocal acoustic measures of flat affect and alogia. Journal of Psychiatry Research, 36, 347353. http://dx.doi.org/10.1016/S0022-3956(02)00016-X CrossRefGoogle ScholarPubMed
Bech, P., Larsen, J. K., & Andersen, J. (1988). The BPRS: Psychometric Developments. Psychopharmacology Bulletin, 24(1), 117121.Google Scholar
Boersma, P., & Weenink, D. (2013). Praat: Doing phonetics by computer (5.1.42 version). [Computer software]. Amsterdam, The Netherlands: University of Amsterdam. Retrieved from http://www.praat.org/ Google Scholar
Brazo, P., Beaucousin, V., Lecardeur, L., Razafimandimby, A., Dollfus, S. (2014). Social cognition in schizophrenic patients: The effect of semantic content and emotional prosody in the comprehension of emotional discourse. Frontiers in Psychiatry, 5, 120. http://dx.doi.org/10.3389/fpsyt.2014.00120 Google Scholar
Cantero, F. J. (2003). Fonética y didáctica de la pronunciación [Phonetics and Didactics of pronunciation]. In Mendoza, A. (Ed.), Didáctica de la Lengua y la Literatura para Primaria [Didactics of the language and literature for elementary]. (pp. 124). Madrid, Spain: Pearson.Google Scholar
Clemmer, E. J. (1980). Psycholinguistic aspects of pauses and temporal patterns in schizophrenic speech. The Journal of Psycholinguistic Research, 9, 161185. http://dx.doi.org/10.1007/BF01067469 Google Scholar
Cohen, A. S., & Elvevåg, B. (2014). Automated computerized analysis of speech in psychiatric disorders. Current Opinion in Psychiatry, 27, 203209. http://dx.doi.org/10.1097/YCO.0000000000000056 Google Scholar
Cohen, A. S., & Hong, S. L. (2011). Understanding constricted affect in schizotypy through computerized prosodic analysis. Journal of Personality Disorders, 25, 478491. http://dx.doi.org/10.1521/pedi.2011.25.4.478 Google Scholar
Cohen, A. S., Alpert, M., Nienow, T. M., Dinzeo, T. J., & Docherty, N. M. (2008). Computerized measurement of negative symptoms in schizophrenia. Journal of Psychiatry Research, 42, 827836. http://dx.doi.org/10.1016/j.jpsychires.2007.08.008 CrossRefGoogle ScholarPubMed
Cohen, A. S., Iglesias, B., & Minor, K. S. (2009). The neurocognitive underpinnings of diminished expressivity in schizotypy: What the voice reveals. Schizophrenia Research, 109, 3845. http://dx.doi.org/10.1016/j.schres.2009.01.010 CrossRefGoogle ScholarPubMed
Cohen, A. S., Kim, Y., & Najolia, G. M. (2013). Psychiatric symptom versus neurocognitive correlates of diminished expressivity in schizophrenia and mood disorders. Schizophrenia Research, 146, 249253. http://dx.doi.org/10.1016/j.schres.2013.02.002 Google Scholar
Cohen, A. S., McGovern, J. E., Dinzeo, T. J., & Covington, M. A. (2014). Speech deficits in serious mental illness: A cognitive resource issue?. Schizophrenia Research, 160, 173179. http://dx.doi.org/10.1016/j.schres.2014.10.032 Google Scholar
Cohen, A. S., Mitchell, K. R., & Elvevåg, B. (2014). What do we really know about blunted vocal affect and alogia? A meta-analysis of objective assessments. Schizophrenia Research, 159, 533538. http://dx.doi.org/10.1016/j.schres.2014.09.013 Google Scholar
Couture, S. M., Penn, D. L., & Roberts, D. L. (2006). The functional significance of social cognition in schizophrenia: A review. Schizophrenia Bulletin, 32, S44S63. http://dx.doi.org/10.1093/schbul/sbl029 Google Scholar
Dickey, C. C., Vu, M. A. T., Voglmaier, M. M., Niznikiewicz, M. A., McCarley, R. W., & Panych, L. P. (2012). Prosodic abnormalities in schizotypal personality disorder. Schizophrenia Research, 142, 2030. http://dx.doi.org/10.1016/j.schres.2012.09.006 Google Scholar
Emmerson, L. C., Ben-Zeev, D., Granholm, E., Tiffany, M., Golshan, S., & Jeste, D. V. (2009). Prevalence and longitudinal stability of negative symptoms in healthy participants. International Journal of Geriatric Psychiatry, 24, 14381444. http://dx.doi.org/10.1002/gps.2284 Google Scholar
First, M. B., Spitzer, R. L., Gibbon, M., & Williams, J. B. W (1996). Structured Clinical Interview for DSM-IV Axis I Disorders—Clinician Version (SCID-CV). New York, NY: New York State Psychiatric Institute, Biometrics Research Department.Google Scholar
Green, M. F., & Horan, W. P. (2010). Social cognition in Schizophrenia. Current Directions in Psychological Science, 19, 243248. http://dx.doi.org/10.1177/0963721410377600 Google Scholar
Green, M. F., Bearden, C. E., Cannon, T. D., Fiske, A. P., Hellemann, G. S., Horan, W. P., … Nuechterlein, K. H (2012). Social cognition in schizophrenia, Part 1: Performance across phases of illness. Schizophrenia Bulletin, 38, 854864. http://dx.doi.org/10.1093/schbul/sbq171 Google Scholar
Hoekert, M., Kahn, R. S., Pijnenborg, M., & Aleman, A. (2007). Impaired recognition and expression of emotional prosody in schizophrenia: Review and meta-analysis. Schizophrenia Research, 96, 135145. http://dx.doi.org/10.1016/j.schres.2007.07.023 Google Scholar
Hooker, C., & Park, S. (2002). Emotion processing and its relationship to social functioning in schizophrenia patients. Psychiatry Research, 112, 4150. http://dx.doi.org/10.1016/S0165-1781(02)00177–4 CrossRefGoogle ScholarPubMed
Horan, W. P., Green, M. F., DeGroot, M., Fiske, A., Hellemann, G. S., Kee, K., … Nuechterlein, K. H. (2012). Social cognition in schizophrenia, Part 2: 12-month stability and prediction of functional outcome in first-episode patients. Schizophrenia Bulletin, 38, 865872. http://dx.doi.org/10.1093/schbul/sbr001 Google Scholar
Johns, L. C., Roseell, S., Frith, C., Ahmad, F., Hemsley, D., Kuipers, E., & McGuire, P. K. (2001). Verbal self-monitoring and auditory verbal hallucinations in patients with schizophrenia. Psychological Medicine, 31, 705715. http://dx.doi.org/10.1017/S0033291701003774 Google Scholar
Johnson-Selfridge, M., & Zalewski, C. (2001). Moderator variables of executive functioning in schizophrenia: Meta-analytic findings. Schizophrenia Bulletin, 27, 305316. http://dx.doi.org/10.1093/oxfordjournals.schbul.a006876 Google Scholar
Kee, K. S., Green, M. F., Mintz, J., & Brekke, J. S. (2003). Is emotion processing a predictor of functional outcome in schizophrenia? Schizophrenia Bulletin, 29, 487497. http://dx.doi.org/10.1093/oxfordjournals.schbul.a007021 Google Scholar
Kliper, R., Vaizman, Y., Weinshall, D., & Portuguese, S. (2010). Evidence for depression and dchizophrenia in speech prosody. Athens, Greece: Proceedings of ISCA Tutorial and Research Workshop on Experimental Linguistics.Google Scholar
Kohler, C. G., Walker, J. B., Martin, E. A., Healey, K. M., & Moberg, P. J. (2010). Facial emotion perception in schizophrenia: A meta-analytic review. Schizophrenia Bulletin, 36, 10091019. http://dx.doi.org/10.1093/schbul/sbn192 Google Scholar
Kring, A. M., & Ellis, O. (2013). Emotion deficits in people with schizophrenia. Annual Review of Clinical Psychology, 9, 409433. http://dx.doi.org/10.1146/annurev-clinpsy-050212-185538 Google Scholar
Kring, A. M., & Moran, E. K. (2008). Emotional response deficits in schizophrenia: Insights from affective science. Schizophrenia Bulletin, 34, 819834. http://dx.doi.org/10.1093/schbul/sbn071 CrossRefGoogle ScholarPubMed
Kucharska-Pietura, K., David, A. S., Masiak, M., & Phillips, M. L. (2005). Perception of facial and vocal affect by people with schizophrenia in early and late stages of illness. The British Journal of Psychiatry, 187, 523528. http://dx.doi.org/10.1192/bjp.187.6.523 Google Scholar
Kugler, B. T., & Caudrey, D. J. (1983). Phoneme discrimination in schizophrenia. The British Journal of Psychiatry, 142, 5359. http://dx.doi.org/10.1192/bjp.142.1.53 Google Scholar
Lang, P. J., Bradley, M. M., & Cuthbert, B. N. (2005). International affective picture system (IAPS): Affective ratings of pictures and instruction manual. Technical Report A-6. Gainesville, FL: University of Florida.Google Scholar
Leentjens, A. F. G., Wielaert, S. M., Harskamp, F. V., & Wilmink, F. W. (1998). Disturbances of affective prosody in patients with Schizophrenia; a cross sectional study. The Journal of Neurology, Neurosurgery, & Psychiatry, 64, 375378. http://dx.doi.org/10.1136/jnnp.64.3.375 Google Scholar
Lukoff, D., Nuechterlein, K. H., & Ventura, J. (1986). Manual for the Expanded Brief Psychiatric Rating Scale (BPRS). Schizophrenia Bulletin, 12, 594602.Google Scholar
Martínez-Sánchez, F., Meilán, J. J., Pérez, E., Carro, J., & Arana, J. M. (2012). Patrones prosódicos en sujetos con la Enfermedad de Alzheimer [Prosodic profiles of patients suffering from Alzheimer’s Disease]. Psicothema, 24(1), 1621.Google Scholar
Matsumoto, K, Samson, G. T., O’Daly, O. D., Tracy, D. K, Patel, A. D., & Shergill, S. S. (2006). Impaired prosodic discrimination in patients with schizophrenia. British Journal of Psychiatry, 89, 180181.Google Scholar
Melinder, M. R. D., & Barch, D. M. (2003). The influence of a working memory load manipulation on language production in schizophrenia. Schizophrenia Bulletin, 29, 473485. http://dx.doi.org/10.1093/oxfordjournals.schbul.a007020 Google Scholar
Mertens, P. (2004). Le prosogramme: Une transcription semi-automatique de la prosodie [The prosogram: Semi-automatic transcription of prosody]. Cahiers de l'Institut de Linguistique de Louvain, 30, 725. http://dx.doi.org/10.2143/CILL.30.1.519212 Google Scholar
Michelas, A., Faget, C., Portes, C., Lienhart, A. S., Boyer, L., Lançon, C., & Champagne-Lavau, M. (2014). Do patients with schizophrenia use prosody to encode contrastive discourse status? Frontiers in Psychology, 5, 755. http://dx.doi.org/10.3389/fpsyg.2014.00755 Google Scholar
Nicholson, I. R., Chapman, J. E., & Neufeld, R. W. J. (1995). Variability in BPRS definitions of positive and negative symptoms. Schizophrenia Research, 17, 177185. http://dx.doi.org/10.1016/0920-9964(94)00088-P Google Scholar
Pascual, S., Solé, B., Castillón, J. J., Abadía, M. J., & Tejedor, M. J. (2005). Prosodia afectiva y reconocimiento facial y verbal de la emoción en la esquizofrenia [Expressive prosody and facial and verbal emotional recognition in schizophrenia]. Revista de Psiquiatría de la Facultad de Medicina de Barcelona, 32, 179183.Google Scholar
Patel, A. D. (2008). Music, language and the brain. Oxford, UK: Oxford University.Google Scholar
Penn, D. L., Sanna, L. J., & Roberts, D. L. (2008). Social perception in schizophrenia: An overview. Schizophrenia Bulletin, 34, 408411.Google Scholar
Peralta, V., & Cuesta, M. J. (1994). Validación de la escala de los síndromes positivo y negativo (PANSS) en una muestra de esquizofrénicos españoles [Standardization of the Positive and Negative Syndrome Scale (PANSS) in a sample of Spanish schizophrenic patients]. Actas Luso-Españolas de Neurología y Psiquiatría, 22, 171177.Google Scholar
Putnam, K. M., & Kring, A. M. (2007). Accuracy and intensity of posed emotional expressions in unmedicated schizophrenia patients: Vocal and facial channels. Psychiatry Research 151, 6776. http://dx.doi.org/10.1016/j.psychres.2006.09.010 Google Scholar
Shaw, R. J., Dong, M., Lim, K. O., Faustman, W. O., Pouget, E. R., & Alpert, M. (1999). The relationship between affect expression and affect recognition in schizophrenia. Schizophrenia Research, 37, 245250. http://dx.doi.org/10.1016/S0920-9964(98)00172–8 Google Scholar
Silver, H., Goodman, C., Knoll, G., Isakov, V., & Modai, L. (2005). Schizophrenia patients with a history of severe violence differ from nonviolent schizophrenia patients in perception of emotions but not cognitive function. Journal of Clinical Psychiatry, 66, 300308. http://dx.doi.org/10.4088/JCP.v66n0305 Google Scholar
Silver, H., Shlomo, N., Turner, T., & Gur, R. C. (2002). Perception of happy and sad facial expressions in chronic schizophrenia: Evidence for two evaluative systems. Schizophrenia Research, 55, 171177. http://dx.doi.org/10.1016/S0920-9964(01)00208–0 Google Scholar
Trémeau, F. A. (2006). Review of emotion deficits in schizophrenia. Dialogues in Clinical Neuroscience, 8, 5970.CrossRefGoogle ScholarPubMed
Trémeau, F., Malaspina, D., Duval, F., Correa, H., Hager-Budny, M., Coin-Bariou, L., … Gorman, J. M. (2005). Facial expressiveness in patients with schizophrenia compared to depressed patients and nonpatient comparison subjects. The American Journal of Psychiatry, 162(1), 92101. http://dx.doi.org/10.1176/appi.ajp.162.1.92 Google Scholar
Figure 0

Table 1. Descriptive statistics and mean comparison tests of the analyzed prosodic parameters

Figure 1

Figure 1. Prosodic trajectories of syllable nuclei for both groups.

Figure 2

Figure 2. Example of prosogram from a patient with schizophrenia and from a control participant.

Figure 3

Table 2. Correlations between prosodic variables and months of schooling

Figure 4

Table 3. Percentage of correctly classified cases in the discriminant analysis