Hostname: page-component-745bb68f8f-b6zl4 Total loading time: 0 Render date: 2025-02-06T12:50:07.128Z Has data issue: false hasContentIssue false

Do you hear it now? A native advantage for sarcasm processing*

Published online by Cambridge University Press:  10 March 2015

SARA PETERS*
Affiliation:
Newberry College
KATHRYN WILSON
Affiliation:
Winthrop University
TIMOTHY W. BOITEAU
Affiliation:
University of South Carolina
CARLOS GELORMINI-LEZAMA
Affiliation:
INECO Foundation, Buenos Aires
AMIT ALMOR
Affiliation:
University of South Carolina
*
Address for correspondence: Sara A. Peters, Psychology Department, Newberry College, 2100 College St, Newberry, SC 29108sara.ann.peters@gmail.com
Rights & Permissions [Opens in a new window]

Abstract

Context and prosody are the main cues native-English speakers rely on to detect and interpret sarcastic irony within spoken discourse. The importance of each type of cue for detecting sarcasm has not been fully investigated in native speakers and has not been examined at all in adult English learners. Here, we compare the extent to which native-English speakers and Arabic-speaking English learners rely on contextual and prosodic cues to identify sarcasm in spoken English, situating these findings within current cross-linguistic effects literature. We show Arabic speakers utilize the cues to a different extent than native speakers: they tend not to utilize prosodic information, focusing on contextual semantic information. These results help clarify the relative weight of contextual and prosodic cues in native-English speakers and support theories that suggest that prosody and emotion could transfer separately in second language learning such that one could transfer while the other does not.

Type
Research Article
Copyright
Copyright © Cambridge University Press 2015 

During spoken language communication, the prosody a speaker uses is one factor that can impact the interpretation of their message. Prosody combines with the semantic information from the preceding discourse context, affecting the listener's interpretation of the utterance. One case where the interaction of context and prosody plays a crucial role in spoken language is the use of sarcastic irony (referred to colloquially as sarcasm), wherein speakers may intentionally choose to alter the prosody to change the message of the utterance. Gibbs (Reference Gibbs1986, p. 1), citing the Oxford English Dictionary, notes that irony is “the use of words to express something other than and especially the opposite of the literal meaning of a sentence”, while sarcasm is further defined as having a “bitter, caustic" tone “usually directed against an individual”. As such, sarcasm can be thought of as an instance of irony that involves an emotional aspect, possibly one of aggression.

In English, sarcasm is used in both written and spoken language, and has been argued to serve a variety of purposes dependent on the needs of the speaker or writer (Capelli, Nakagawa & Madden, Reference Capelli, Nakagawa and Madden1990; Weingartner & Klin, Reference Weingartner and Klin2005; Weingartner & Klin, Reference Weingartner and Klin2009). These purposes include conveying the speaker's or author's ideas or feelings, censuring an individual (Kreuz & Glucksberg, Reference Kreuz and Glucksberg1989; Jorgensen, Miller & Sperber, Reference Jorgensen, Miller and Sperber1984), as a face-saving function (Brown & Levinson, Reference Brown and Levinson1978; Jorgensen, Reference Jorgensen1996), and as a signal of intimacy (Slugoski & Turnbull, Reference Slugoski and Turnbull1998). In addition, sarcasm has been found to make a statement more memorable (Gibbs, Reference Gibbs1986). Successful detection and comprehension of sarcasm is therefore required to resolve the correct meaning of utterances in a variety of contexts. Within these contexts, a communicator's statement may at first appear ambiguous and unexpected given previous discourse and, without correct resolution, the interpreter may not comprehend the message correctly.

Sarcasm is also often used as a type of humor, and the nature of sarcasm is such that the literal meaning of the message is often contrary to the intended meaning. It is precisely this inherent property of sarcasm that presents the major challenge for comprehenders in terms of correct message resolution. One important theoretical account that can be used to frame sarcasm interpretation is Relevance Theory (Sperber & Wilson, Reference Sperber and Wilson1986; Sperber & Wilson, Reference Sperber and Wilson1995). According to Relevance Theory, comprehension follows the path of “least resistance” by adopting the least costly interpretation of the input that is deemed sufficiently relevant. Successful interpretation of sarcasm can therefore be facilitated by a combination of cues that will make this interpretation the easiest explanation for the available information. In spoken-language comprehension, the combination of a context that affords a sarcastic interpretation, sarcastic prosody, and an utterance that adds no useful information unless interpreted sarcastically can make listeners easily choose the sarcastic interpretation. This approach therefore suggests that the previous context, the prosody of the input and the literal meaning of the input being interpreted jointly determine the difficulty of reaching a sarcastic interpretation. One implication of this view is that when the context and the literal meaning of the input render a sarcastic interpretation less costly than a literal interpretation, sarcastic prosody may not need to be very pronounced or may not be needed at all for listeners to interpret the input as sarcasm. Similarly, when the context and literal meaning of the input appear similarly relevant under both literal and sarcastic interpretations, or when the literal interpretation is more relevant, a stronger prosodic cue will be necessary in order to ensure the interpretation of the input as sarcastic.

Previous work has examined the influence of combined context and prosody cues on comprehension of sarcasm in English. Woodland and Voyer (Reference Woodland and Voyer2011) asked participants to rate how sarcastic a spoken sentence such as “Aren't you smart” sounded following contexts that described a successful outcome e.g., “You're working on a difficult problem for an assignment with a friend and you solve the problem. Your friend says. . .” (“Positive” contexts); and contexts that described a failure to achieve the desired outcome e.g., “You're working on an easy problem for an assignment with a friend and you solve the problem. Your friend says. . .” (“Negative” contexts). They found that following a Positive context, participants expected the speaker to sound sincere, and following a Negative context, sarcastic. This is compatible with several of the possible pragmatic functions of sarcasm described earlier, such as expressing the speaker's feelings, censuring an individual, and signaling intimacy. However, while this study offered information on how listeners might respond in an explicit sarcasm judgment task with clear expectations about its use, it does not offer information regarding sarcasm detection during normal language processing. This is further confounded by the fact that Woodland and Voyer's (Reference Woodland and Voyer2011) items were presented partially using synthesized speech.

Spoken language

Returning specifically to prosody, prior work has reported that several aspects appear important for native-English speakers when determining whether a spoken utterance is sarcastic or not. Rockwell (Reference Rockwell2000) noted that, in English, the identification of sarcasm is primarily dependent on three aspects of prosody: slower tempo (speech rate), greater intensity (amplitude), and lower pitch level. More recently, Cheang and Pell (Reference Cheang and Pell2008) reaffirmed the importance of these cues, but noted that intensity appears to be the most important cue. Native English speakers’ reliance on these cues to correctly interpret the meaning of sarcastic utterances has both theoretical and practical implications.

Previous work has also noted that the prosody cues described above are relatively salient for English speakers and are picked up at an earlier age than the contextual cues for sarcasm (e.g., children reporting “It sounded mean”, Bryant & Fox Tree, Reference Bryant and Fox Tree2002; Capelli et al., Reference Capelli, Nakagawa and Madden1990; Nakassis & Snedeker, Reference Nakassis, Snedeker, Greenhill, Hughs, Littlefield and Walsh2002). As normal adult English speakers are typically able to integrate context and prosody for sarcasm resolution, this suggests that children and adults may rely on different strategies to resolve the sarcastic utterance. This could be because children have more limited vocabulary, forcing them to rely more heavily on other available cues. Another possible reason may be that children do not possess the working memory capacity to integrate context and prosody as they learn the nuances of the English language (Gathercole, Pickering, Ambridge & Wearing, Reference Gathercole, Pickering, Ambridge and Wearing2004; Hitch, Towse & Hutton, Reference Hitch, Towse and Hutton2001), forcing them to rely more heavily on only one source of information. This suggests that the English-speaking children may find it easier to focus on the prosodic information in the message, rather than utilize additional resources to try and integrate contextual information in order to detect sarcasm or sarcastic irony that may be present. If the difference between children and adult native speakers is not a function of a developmental stage, but instead a function of resource availability and partial knowledge of meaning, similar differences could exist between adult non-native speakers and native English-speaking children. Like children, non-native speakers may only possess partial knowledge of meaning, and because L2 processing may pose higher resource demands, they, also like children, only have limited resources for the interpretation of sarcasm. We next turn to discuss in more detail the issues non-native speakers face when interpreting sarcasm.

Non-native speakers & second language acquisition

Neither the original work presented in Rockwell (Reference Rockwell2000) nor Cheang and Pell (Reference Cheang and Pell2008) addressed the applicability of these findings beyond native-English speakers (hereafter NS). However, existing work in the field of second language acquisition (hereafter SLA) can be applied to the question of sarcasm interpretation across languages. Models of SLA typically differ in the number and order of linguistic and conceptual strategies hypothesized to be transferred from the source to target language, or, in cases of more than two known languages, between all known languages. This transfer could occur in any direction (L1 → L2, L2 → L1, or bidirectionally) and between any number of spoken languages (e.g., when L2 or L3 influence L1, etc.), and is generally termed “cross-linguistic influence” (hereafter CLI). Different models argue that CLI can vary between partial (little dependence between languages) to full (utilizing the majority of strategies and conceptual information from one language when processing another) (Schwartz & Sprouse, Reference Schwartz, Sprouse, Hoekstra and Schwartz1994; Schwartz & Sprouse, Reference Schwartz and Sprouse1996; Håkansson, Pienemann & Sayehli, Reference Håkansson, Pienemann and Sayehli2002; Jarvis & Pavlenko, Reference Jarvis and Pavlenko2008).

Earlier work on the processing of prosody in non-native English speakers (hereafter NNS) suggests that NNS rely on different cues than NS to resolve sarcasm in English. For example, like the previously mentioned work on child populations, work with NNS has reported differences in the processing of prosody generally (Dupoux, Pallier, Sebastian & Mehler, Reference Dupoux, Pallier, Sebastian and Mehler1997). A portion of this work has shown differences between NS and NNS (and by L1) in areas such as stress identification (Dupoux et al., Reference Dupoux, Pallier, Sebastian and Mehler1997; Cutler, Dahan & van Donselaar, Reference Cutler, Dahan and van Donselaar1997). This evidence supports the argument that NNS may not be able to utilize prosody information to the same extent as adult native speakers. Since the utilization of prosody is a key aspect of the comprehension of sarcasm in spoken language, it is possible this will put the NNS at an immediate disadvantage when identification of sarcasm depends on prosody rather than context. One area of study within the literature on CLI that is particularly applicable to the present question is phonological transfer.

Discussion of phonological transfer by Jarvis & Pavlenko (Reference Jarvis and Pavlenko2008) covers several different levels at which phonological information from L1 can transfer to L2 or L3 and back. Jarvis & Pavlenko review work on the transfer of strategies dealing with the patterned sequences of segmental information and suprasegmental information. They describe work by Broselow (1992), with Arabic speakers of different dialects (Egyptian and Iraqi), who applied consonant cluster reduction patterns that differed from one another when transferring to an English L2 (Broselow, 1992 from Jarvis & Pavlenko, Reference Jarvis and Pavlenko2008). Such differences in processing phonological information could translate into difficulties in processing prosody as well, which in turn could lead to differences in processing sarcastic prosody between NNS and NS. In particular, if NNS need to focus more available processing resources on overriding an initial phonological pattern interpretation in order to apply the correct L2 rules of phonological processing, they may not have sufficient resources to interpret the subtle nuances of sarcastic prosody.

As sarcasm processing necessitates knowledge of the emotional state of the speaker, perception of emotional information (for NNS in their L2) is also important in order for the correct message to be perceived. As discussed above, some emotional information, including some aspects of sarcasm, could be expressed in the phonological and prosodic features of an utterance, and NNS may be able to utilize this information. Indeed, Pell, Monetta, Paulmann & Kotz (Reference Pell, Monetta, Paulmann and Kotz2009) found that monolingual Argentine Spanish speakers were able to reliably decode vocally expressed emotions in the speech of unfamiliar languages (using English, German and Arabic speakers). However, the result of a more recent study by Cheang and Pell (Reference Cheang and Pell2011) found that monolingual speakers of English and Cantonese were not able to detect phonological cues for sarcasm in the speech of the other language (Cantonese and English respectively). It thus appears that detecting sarcasm in L2 is not always possible on the basis of phonology and prosody.

Another area of CLI that has clear implications in the current work is the influence of emotional resonance. As previously mentioned, sarcastic irony can be viewed as reflecting emotion on the part of the speaker, and used as a form of humor in some cases. Effects of emotion in L1 vs. L2 have been documented in memory for emotional words. For example, Ayçiçeği-Dinn & Caldwell-Harris (Reference Ayçiçeği-Dinn and Caldwell-Harris2009) found that emotional words are recalled more easily than non-emotional words but only when they are presented in L1. Additional work by Caldwell-Harris & Ayçiçeği-Dinn (Reference Caldwell-Harris and Ayçiçeği-Dinn2009) found that Turkish L1 speakers tended to experience greater emotional resonance for messages in their L1 than their L2 (English). In that study, when listening to emotional words spoken in English vs. Turkish, participants showed a smaller galvanic skin response, which was taken as an indication of lower emotional resonance (Caldwell-Harris & Ayçiçeği-Dinn, Reference Caldwell-Harris and Ayçiçeği-Dinn2009).

Differences between emotional resonance in L1 and L2 go beyond language and memory processing and can also affect decision-making. For example, Keysar, Hayakawa and An (Reference Keysar, Hayakawa and An2012) showed that subjects made different decisions depending on whether information was presented to them in L1 or L2. These authors concluded that decision-making in L2 reduces emotional biases present in L1. Dewaele (Reference Dewaele2009) also found that age of acquisition modulated expressions of emotion in L2 and L3, such that the earlier the age of acquisition of either L2 or L3, the more likely a speaker was to express emotion using this language. Research conducted with Russian-speaking immigrants also found that the extent of emotions evoked by L1 was influenced by age of arrival to the U.S., perceived proficiency in their L1 and L2, as well as perceived emotionality of the two languages (Caldwell-Harris, Staroselsky, Smashnaya & Vasilyeva, Reference Caldwell-Harris, Staroselsky, Smashnaya, Vasilyeva and Wilson2012). Specifically, emotional responses to L1 were reduced with earlier immigration, suggesting these variables may modulate the CLI associated with emotional resonance. Pavlenko (Reference Pavlenko2005) provides a comprehensive overview of how learner characteristics such as age of exposure to L2 and length of time in an immersive environment of different languages affect the processing of emotion at multiple levels. Pavlenko shows that discerning intended emotion in L2 is not bound simply by linguistic factors, but also depends on L1 background and learner characteristics.

Finally, a more recent review by Pavlenko (Reference Pavlenko2012) focuses on whether affective processing in general in bilingual speakers is in a “disembodied cognition” form. Pavlenko examined work from multiple bilingual paradigms including clinical, cognitive, and neuroimaging (Reference Pavlenko2012). The main findings that resulted from the meta-analysis concluded that, in line with the work above, L1 information had an advantage for affective words and processing, and L2 showed a decrease in how automatically this information is processed. Pavlenko suggested that, therefore, emotion-laden words may see an “advantage” in processing when presented in L1 (they may resonate more clearly), while the L2 “advantage” may be in the reduction of the automaticity (and as a result reduced interference of the emotion) of processing negative stimuli. Also, this work suggests that while L2 speakers may learn the semantics of an L2, they may not process affect in an L2 the same way as in L1, resulting in the differences reported in the literature. We will return to this idea in the general discussion.

In summary, the previous work done in SLA on CLI in the areas of phonology, prosody, and emotion leads to several possibilities for why we should expect NNS to show difficulty when interpreting sarcasm in an English L2. Previous research on phonological transfer has suggested that according to L1 and even L1 dialect, L2 sound information may be interpreted differently (see Jarvis & Pavlenko, Reference Jarvis and Pavlenko2008 for review). Phonological transfer could be responsible for the lack of ability to discern sarcasm in an unfamiliar language based on prosody (Cheang & Pell, Reference Cheang and Pell2011). On the other hand, other work found that in many cases emotion identification in the sound pattern does transfer (Pell et al., Reference Pell, Monetta, Paulmann and Kotz2009). It is therefore possible that, for sarcasm detection in English L2, phonological and emotional information would provide conflicting cues.

Summary and current work

The main goal of the present study is to compare native English speakers and NNS in their utilization of context and prosody for detecting sarcasm, and explain these differences in the context of the CLI literature. Specifically, this study aims to (1) verify whether NS indeed rely on both prosody and context cues for detecting sarcasm, as theorized in previous work (Woodland & Voyer, Reference Woodland and Voyer2011); (2) offer further insight on how any differences between NS & NNS in the identification of sarcasm can be explained by previous work on CLI, and by the resource availability and proficiency hypotheses mentioned previously. To our knowledge, these issues have not been examined within the context of spoken language comprehension.

Because transfer from L1 is an important factor in SLA we decided to focus our investigation on English learners from a single L1 background. We chose to use native speakers of Arabic, a population to which we had ready access and from which we could test a sufficient number of speakers in order to make valid statistical inferences and comparisons with the English speaking population. We note that our design could similarly be applied to participants with other L1 backgrounds, for as long as there is a sufficiently large group from each L1 background.

In the current study, all the non-native English speakers were Arabic L1 Speakers (hereafter Arabic speakers) who spoke an Arabian Peninsula Middle-Eastern dialect of Arabic (additionally, see Table 3 for speaker countries of origin). While there have been profiles of sarcastic prosody in some languages, namely Mexican Spanish and French (Rao, Reference Rao2013; Loevenbruck, Ben Jannet, D'Imperio, Spini & Champagne-Lavau, Reference Loevenbruck, Ben Jannet, D'Imperio, Spini and Champagne-Lavau2013), to our knowledge, to date there have been no studies on the prosodic profile of Arabic sarcasm. The results of both the Mexican Spanish (eliciting utterances from speakers and analyzing them) and French (analyzing corpus data) analyses found that, as in English, speech rate decreased and stressed syllable length increased for the sarcastically relevant portions of the utterances (Rao, Reference Rao2013; Loevenbruck et al., Reference Loevenbruck, Ben Jannet, D'Imperio, Spini and Champagne-Lavau2013; Cheang & Pell, Reference Cheang and Pell2008).

Although there have been no studies on sarcastic prosody in Arabic, a few studies have looked at pragmatically unmarked prosody in Arabic as well as emotional transfer. Previous work comparing the comprehension of spoken Hebrew in L1 Arabic speakers to L1 Hebrew speakers, found that although the Arabic speakers could detect anger in the prosody of the Hebrew speech, they judged the extent of anger to be smaller than the L1 Hebrew speakers (Amir, Almogi & Gal, Reference Amir, Almogi and Gal2004). Additionally, there are both some similarities and important distinctions between Arabic and English prosody. For example, both Arabic and English show word-final lengthening and utterance-final lengthening (de Jong & Zawaydeh, Reference de Jong and Zawaydeh1999). On the other hand, the role of prosody in the two languages is not identical. For example, in Arabic, a language which employs interrogative particles, sentence-level prosodic contour is less crucial for identifying the sentence as a question (Ramsay & Mansour, Reference Ramsay and Mansour2008). Thus, while Arabic speakers make general use of prosody in their language similar to English speakers, the particular prosodic cues are not identical. Importantly, research has identified that sarcastic humor does occur in Arabic, but these investigations have primarily been focused on sentiment analyses of texts such as Twitter (Rafaee & Rieser, Reference Rafaee and Rieser2014). The present work therefore tests whether Arabic speakers could use the same cues as English speakers to identify sarcasm when it is presented in spoken English.

Experiment

Previous work had indicated that contextual information contributes to determining how well a sarcastic statement fits within a conversation (Woodland & Voyer, Reference Woodland and Voyer2011). Additionally, previous work had identified aspects of prosody that English speakers rely on to identify sarcasm (Rockwell, Reference Rockwell2000; Cheang & Pell, Reference Cheang and Pell2008). The current experiment focused on determining whether English and Arabic speakers comprehended and interpreted discourses differently when presented with multiple combinations of contextual and prosodic information. To determine if this was the case, three-sentence discourses were created. The first introduced two actors and an action, the second featured either a Positive (action had a successful/desirable outcome) or Negative (action had an unsuccessful/undesirable outcome) Context, and the final sentence used either a Sincere or Sarcastic Prosody when reviewing the actor's actions, but was otherwise identical (see Table 1 for sample experimental items in all conditions).

Table 1. Two sample experimental items in each possible condition.

For an experimental item, after hearing all three sentences in a given contextual and prosody combination, the participants were presented with a comprehension question. The answers to these comprehension questions were based on participants’ interpretation of an actor's intention within the discourse as either sincere or sarcastic. Thus, answers to these questions allow a direct assessment of the effect of the Context and the Prosody on sarcastic interpretation. This is an important distinction between the present study and Woodland and Voyer's (Reference Woodland and Voyer2011) work, which explicitly asked for a rating of sarcasm rather than an interpretation of the discourse as a whole. In our work, participants were tested for their actual comprehension and were not asked to make an explicit meta-linguistic judgment about the likelihood of a sarcastic interpretation.

Crossing the Context and Prosody cues within the discourse (by introducing a situation (Context), and finishing with a character's appraisal of the other's actions), created four different types of conversations presented to participants for evaluation that could be compared between English and Arabic speaking populations, resulting in a 2 (Context) x 2 (Prosody) x 2 (Language group) design. This method allowed measures of reliance on context and prosody when both pieces of information, in regard to the final statement's sincerity, either agreed (Positive context, Sincere prosody; Negative context, Sarcastic prosody), or were conflicting (Positive context, Sarcastic prosody; Negative context, Sincere prosody).

Other factors of NNS such as proficiency differences, age of acquisition, and differences in years in an immersive English environment have also been hypothesized to affect transfer strategies and responses, as previously introduced when discussing the CLI of emotion (Harley & Hart, Reference Harley and Hart1997; Pavlenko, Reference Pavlenko2005). Although these were not the focus of our study, we nevertheless collected these data from our participants so that we can rule out the role of these factors as confounds. Specifically, we recorded several measures of English proficiency, which we included as covariates in our analyses. We nevertheless only recruited subjects who already had conversational skills in English, as well as good enough comprehension that they were able to understand our task.

Before describing the Main Study in detail, we present the method and results of a short Pilot Study that verified our items using a native-English speaking population.

Pilot Study method

Before deploying the items in the main experimental design, the items were normed with a sample (n = 15) of native English-speaking students from the University of South Carolina's Department of Psychology participant pool. These participants were excluded from participating in the Main Study. In this norming study, participants were asked to rate the second and third sentences of the experimental items separately as to how sincere they sounded on a scale of 1–5 (1 being “very sincere”, 5 being “very insincere”). The Sentence 2 items consisted of the Positive Contexts and Negative Contexts, while the Sentence 3 items were spoken in either a Sarcastic Prosody or a Sincere Prosody.

Items

Thirty-two experimental items were created. The first sentence introduced one character with a certain goal. The second sentence described an event that varied by condition (either Positive Context or Negative Context). In the Positive Context, the event resulted in a positive outcome for achieving the desired goal. In the Negative Context, the event resulted in a negative outcome for achieving the desired goal. The third sentence then described an evaluative statement made by a second character, which was stated in either a Sincere Prosody or Sarcastic Prosody. In the Main Study, these short discourses were all followed by comprehension questions to the participants. In order to answer these questions, participants had to decide if a character's evaluations of another's were honest. Thirty-two filler items were also created. The filler items all contained descriptions of events similar to the experimental items, but did not describe events that were clearly positive or negative and were recorded in neutral prosody. In the comprehension questions that followed the filler items, participants were asked about whether a piece of information related to the content of sentences 1–3 was true or not. To answer the questions, participants had to integrate information from multiple sentences. This was done to encourage and verify that participants understand the content of the items and the task (see a sample filler item in Table 2). Each filler item had only one version that was presented to all the participants.

Table 2. Two sample filler items.

All items were then recorded by a female native speaker of English (S.A.P.). The discourses were recorded in their entirety using a sincere tone. Questions were not recorded, as they were presented as text in the Main Study.

To control for as many factors as possible, we used the Praat software (Boersma, Reference Boersma2001) to create a second, Sarcastic Prosody version of Sentence 3 by changing the speech signal to exhibit the acoustic cues for sarcasm in English (alteration of tempo, intensity and pitch). We based this method on the previous studies of the acoustic characteristics of sarcasm in English by Rockwell (Reference Rockwell2000) and Cheang and Pell (Reference Cheang and Pell2008). For example, in the Sincere Prosody version of the utterance “Angie thanked John for doing such a great job”, the phrase “such a great job” was adjusted to 0.9% of the original pitch, the duration increased by 30%, and the intensity was increased and contoured such that the stressed syllable of the first stressed word of the phrase was given a 2.5 db multiplier, which sloped down to a 2 db multiplier at the end of the sentence.

Overall, participants in this experiment rated the sincerity of two versions of Sentence 2 (Positive Context and Negative Context) and two versions of Sentence 3 (Sincere Prosody and Sarcastic Prosody). The two sentences from each item were presented separately and each could appear at any time during the experiment and not immediately preceding or following the other sentence in the item. Each participant rated only one version of each sentence resulting in each participant rating a total of 64 experimental items (32 Sentence 2, 32 Sentence 3), which were mixed with 64 filler sentences (Sentences 2 & 3 for the 32 fillers).

Apparatus

The experiment was presented on a Dell PC running E-Prime 1.2 software (Schneider, Eschman & Zuccolotto, Reference Schneider, Eschman and Zuccolotto2002). Participants were issued a pair of Ultrasone HFI - 2000 headphones to use for the experiment, and instructed not to adjust the volume of the presentation, which remained constant for all participants.

Procedure

Participants arrived in the lab and, after indicating informed consent, received instructions verbally on the task and put on headphones. They then were presented with instructions once again on the computer screen.

Participants were instructed to rate how sincere sentences sounded, on a scale of 1–5, with 1 corresponding to “very sincere”, 2 corresponding to “sincere”, 3 corresponding to “neutral”, 4 corresponding to “insincere” and 5 corresponding to “very insincere”. Each time participants rated a sentence they heard, they were presented with the scale, on which the direction of sincerity was indicated. Sentences were presented individually and in a random order, which differed by participant in order to control for any ordering effects. A “Thank You” screen signaled the end of the task. The experiment lasted approximately 20 minutes.

Pilot Study results

All analyses were carried out using the R statistical software package (v.3.1.1; R Core Team, 2014), and the lme4 mixed-effects models package (Bates, Maechler & Bolker, Reference Bates, Maechler, Bolker and Walker2014). Of the fifteen participants tested, one participant was removed due to failure to follow task instructions, as indicated by a low mean rating score of < 1.9 for the entire experiment. Mean rating response for the remaining n = 14 participants across the experiment was M = 2.61, with se = .11.

Ratings were first analyzed to determine whether ratings for filler items differed significantly from sincere items. It was found that the ratings for filler Sentences 2 & 3 did not differ significantly from experimental items in the sincere condition for Sentences 2 & 3 (t = .24, p > .81), and so the remainder of the analysis was restricted to the experimental items. Mean ratings for conditions were as follows, with standard error in parentheses: Sentence 2, Positive Context M = 2.225 (.05), Sentence 2, Negative Context M = 3.20 (.06), Sentence 3, Sincere Prosody M = 2.12 (.05), and Sentence 3, Sarcastic Prosody M = 2.87 (.07).

Ratings for Sentence 2 and 3 were analyzed separately using mixed-effects models [1], which included random intercepts for subjects and items (Baayen, Reference Baayen2008). This method was used as an alternative to traditional F1 & F2 analyses, as recommended by Baayen, Davidson & Bates (Reference Baayen, Davidson and Bates2008) and Baayen (Reference Baayen2008) due to its increased sensitivity.

(1) \begin{eqnarray} {Y_{ij}} &=& {\beta _{0ij}} + {\beta _{1ij}}^*{\rm{Discourse}}\,{\rm{Condition}}\nonumber\\ &&+ {\beta _{2i}}^*{\rm{Subject}} + {\beta _{3j}}^*{\rm{Item}} + {\varepsilon _{ij}} \end{eqnarray}

Within each model, outcome Yij reflects an individual subject's ( i ) performance on a specific item ( j ), calculated by adjusting that individual's deviation from the group sample mean. This is estimated by calculating deviation from the group mean (baseline in the Sincere condition; β 0ij ), using condition (coefficient estimates of β 1ij , Discourse Condition: 0 = Sincere; 1 = Sarcastic), given the calculation of a random intercept for the coefficient of subject (β 2i ) and item (β 3j ), and including the random error term for the combination (ε ij ).

For Sentence 2, items that had a Negative Context were rated as less sincere (higher on the scale) than corresponding Positive Context versions, with model results (β 1 *Sarcasm = .94, se = .09, and t = 11.12, with p < .001). It is important to remember that the prosody characteristics from the Negative Context did not differ from the Positive Context, instead the context itself was the only cue. Thus, the difference shown by our participants must reflect a top-down effect whereby the suggestion of a failed goal raises expectation of insincerity. For Sentence 3, Sarcastic Prosody items were rated as less sincere than corresponding Sincere Prosody versions, as demonstrated by model results (β 1 *Sarcasm = .74, se = .09, and t = 8.12, with p < .001). P-values for both analyses were estimated using Markov chain Monte Carlo (MCMC) estimation. It is important to note that while it may appear as if the Sarcastic Prosody items were not identified as strongly sarcastic, the participants were tasked with rating sincerity on a relatively arbitrary scale. Thus, the magnitude of the rating is not very informative about the strength of the manipulation. Instead, the strength of the manipulation could be evaluated by looking at the statistical measure of effect size. The estimated Cohen's d measures for Sentence 2 (estimated d = 0.77) and for Sentence 3 (estimated d = 0.60) show that the differences are considered medium to large by standard interpretation of effect size, which explains their reliability even at small subject numbers (Cohen, Cohen, West & Aiken, Reference Cohen, Cohen, West and Aiken2003). Thus, the fact that there was a strong and significant difference based solely on the prosody or context manipulation indicates that English speakers are very adept at identifying items as Negative Context or Sarcastic Prosody when compared to the absence of the manipulation, and that for English speakers this effect is not at all subtle.

In summary, the results of this Pilot Study confirmed that our items elicited the desired effect in native English speakers in a robust fashion. This indicates that NS are sensitive to both context and prosody in evaluating sincerity and sarcasm. With this information, we proceeded to the Main Study.

Main Study Method

Participants

Twenty-five English speakers were recruited from the University of South Carolina Department of Psychology's participant pool and participated in the experiment in exchange for credit for a course of their choice. None of them participated in the Pilot Study. Arabic speakers (n = 27) were recruited through the English Programs for Internationals of the University of South Carolina, and took part in the experiment as part of a speaking and listening lab session. All Arabic-speaking participants had been placed in levels 4–6 (out of 6) in the program's courses, meaning that they were considered high beginner/low-intermediate to high-intermediate/low-advanced speakers (n = 15 Intermediate, n = 12 Advanced). The function of the experiment was later explained to the Arabic speakers as part of a class lesson.

Materials

Participants listened to 3-sentence spoken discourses in the forms presented previously in Table 1, and responded to a comprehension question. Thus, in the experimental items, the first sentence introduced an actor and action (see above Table 1), the second sentence introduced a Context (either Positive or Negative), and the third sentence presented a second actor's feeling about the action (stated using either Sarcastic or Sincere prosodic cues). Critically, the comprehension questions following the experimental items probed the interpretation of the discourse such that the answer to these questions revealed whether the participant interpreted the discourse as sarcastic or as sincere.

As mentioned in the Pilot Study, 32 filler items were also constructed as 3-sentence spoken discourses, which were followed by a comprehension question. Here, the comprehension question had a clear yes or no answer based on determining whether or not a question was true regarding the discourse (see above Table 2). The filler items were constructed such that there was an equal ratio of yes and no answers for the fillers. This created a design in which participants viewed 32 experimental items, (8 in each of the combinations of Positive/Sincere, Positive/Sarcastic, Negative/Sincere, and Negative/Sarcastic), as well as 32 filler items where the answers to the questions were pre-set in a 1:1 yes/no ratio.

To eliminate possible confounds related to gender bias, the gender of actors and secondary characters was approximately equal across the experiment. In the experimental items, the initial actor was female 47% of the time, and the secondary actor in the sentence was female in 53% of cases. In the experimental items, if the initial actor was female, the secondary actor in the sentence was male (11 instances), female (1 instance) or other (a parent or gender free, 2 instances). If the initial actor was male, the second actor was female (16 instances), or other (2 instances). Additionally, in the case of filler items, females were introduced first 53% of the time, and males 47%. This ensured that within the experiment the first actor had a 50:50 chance of being male or female over the course of the experiment. Each participant heard only one version of each experimental item (n = 32) as well as all of the fillers (n = 32) for a total of 64 items. The items were presented in random order for each participant to eliminate any ordering effects.

Procedure

English NS participants arrived in the lab and after indicating informed consent put on headphones and listened to the 3-sentence discourses. After participants heard each discourse, they answered a comprehension question regarding the second character's interpretation of the actions of the first, which tested the participants’ own interpretation of the discourse under the varying conditions. Comprehension questions had Yes or No answers which were recorded using the keypad, where a “Yes” answer was indicated with a numerical answer of “1” (and indicated a Positive-Sincere interpretation of the discourse), and “No” with a “2” (which included a Negative-Sarcastic interpretation). The experiment took approximately 35 minutes.

Arabic speakers were tested in groups at the language lab, and also filled out a questionnaire about their language background prior to participating; they otherwise followed the same procedures as the English speakers. The summary of the Arabic speakers’ questionnaire responses is listed in Table 3, which also includes demographic information.

Table 3. Averages, standard errors, and ranges for demographic data collected for Arabic speakers.

Predictions

Following previous work for NS and considering multiple CLI influences for adult Arabic speakers, we made the following predictions based on the fact that both input cues should lead to the same resolution:

  1. 1. For the conditions when context and prosody matched, we hypothesized that English and Arabic speakers’ patterns of interpretation should be similar (see upper left, lower right quadrants of Table 1 for example):

    1. a. If Sentence 2 describes a Positive context and Sentence 3 has Sincere prosody, the answer to the comprehension question for both populations should be “Yes”.

    2. b. If Sentence 2 describes a Negative context and Sentence 3 has Sarcastic prosody, the answer to the comprehension question should be “No”.

  2. 2. For the mixed conditions, answers should vary according to the relative weight listeners give each of the cues. For example, if Sentence 2 describes a Positive context and Sentence 3 is heard in a Sarcastic prosody (see lower left quadrant of Table 1 for example), then a “No” answer would indicate a sarcastic interpretation of the discourse by the listener, demonstrating a greater weight placed on the prosody than on context. If, on the other hand, Sentence 2 has a negative context and the prosody of Sentence 3 is sincere (see upper right quadrant of Table 1 for example), then a “No” in this case would demonstrate a sarcastic interpretation once again, but one made favoring context over prosody.

    1. a. Due to the lack of previous research on the role of context and prosody in English speakers, we do not have a clear prediction for the English speakers in the mixed conditions.

    2. b. For the Arabic speakers, based on previous work examining the identification of emotion by NNS in an unfamiliar L2, we predicted prosody should be used to signal positive and negative emotion, and lead to the corresponding interpretation (Pell et al., Reference Pell, Monetta, Paulmann and Kotz2009).

Main Study Results

The percent of “Yes” answers to the comprehension questions of the English speakers and the Arabic speakers are shown in Table 4, to facilitate discussion of the results.

Table 4. Percent of “Yes” answers to comprehension questions for each language group per condition. Standard error reported in parentheses.

We performed two statistical analyses of these results using logistic regression mixed-effects models, which included random intercepts for subjects and items (Baayen, Reference Baayen2008). The model results, corresponding to the model presented in [2], thus tested the initial research question (how do English speakers’ interpretations of sarcasm differ from Arabic speakers’). In the first analysis, the two main factors were thus set up as a series of contrasts, with Discourse Condition (coded using 2 variables by condition: Sentence (S2 or S3), and Context (S2 Positive (0) or Negative (1); S3 Sincere (0) or Sarcastic (1)), and native Language (Arabic speaker (0) or native-English speaker (1)) within an additive model. In the second analysis, main effects of these variables were tested using a separate set of logistic regression models in which the main effects were coded with Helmert coding, which contrasted the full model (containing all variables) against models containing a particular effect of interest, (e.g., sentence condition). Thus, the individual main effect tests of significance (sentence condition and language) are also reported, as contrasts between a full model and the main effect, with the interaction tested similarly. The equations for this analysis were similar to [2], but as noted, coding for variables differed. This allows us to model the results of each language group within each condition specifically (analysis 1, Table 5), while also testing the strength of these effects independently for further clarity (analysis 2, Table 6). The effects reported for both sets of models are significant when subjects and items have been included as having random effects, with p-values estimated using MCMC estimation.

Table 5. Fixed effects results for logistic regression mixed-effects model (Yes answers coded 1, No 0). The baseline conditions consisted of Arabic speakers, Positive Context and Sincere Prosody, with Context, Sentence and Language as factors. Results are presented by contrast condition.

Table 6. Logistic regression model contrast results of full model vs. model containing only one main effect, or interaction. Results were obtained using Helmert coding of main effects.

The model presented below uses the same nomenclature as outlined in [1]. For the analysis, the reference group for the model included the Positive-Sincere condition performance of Arabic speakers. Thus, changes per outcome (Yij ) are predicted given a particular Discourse Condition and Language. Additionally, since logistic regression is being employed, the outcome can be interpreted as the likelihood of answering “Yes” given the particular Discourse Condition and Language.

(2) \begin{eqnarray} \begin{array}{l} {Y_{ij}} = {\beta _{0i}} + {\beta _{1i}}^*{\rm{S2Cond}} + {\beta _{2i}}^*{\rm{S3Cond}}\\ \quad\quad + {\beta _{3i}}^*{\rm{FirstLanguage}} + {\beta _{4i}}{\rm{^*S2Cond^*S3Cond}}\\ \quad\quad + {\beta _{5i}}^*{\rm{S2Cond}}^*{\rm{FirstLanguage}}\\ \quad\quad + {\beta _{6i}}^*{\rm{S3Cond}}^*{\rm{FirstLanguage }} \\ \quad\quad + {\beta _{7i}}^*{\rm{S2Cond}}^*{\rm{S3Cond}}^*{\rm{FirstLanguage }}\\ \quad\quad + {\rm{ }}{\beta _{8i}}^*{\rm{Subject}} + {\beta _{9j}}^*{\rm{Item}} + {\varepsilon _{ij}} \end{array}\end{eqnarray}

Full model results for the English vs. Arabic speakers can be viewed in Table 5. Additionally, as indicated above, a second analysis tested the main effects of the variables with contrasts using Helmert coding, and analysis information can be viewed directly in Table 6.

Within the mixed-effects model, we note the following results, in terms of significant differences between conditions. First, there was a main effect of Context (Table 6, Contrast 1, p < 0.001), such that overall, Negative Contexts were less likely to be later perceived as generating a sincere interpretation (β 1 = -0.131, p < 0.001). When referring to the model we present, the main effect can be seen as an adjustment to the base model of the logistic regression, given the predictor being tested. In the case of our model, the Negative Context (β 1 = −0.131) lowers the likelihood of a Sincere interpretation and adjusts the Intercept of the model (β 0 = 0.762 minus 0.131 = 0.631 likelihood, p < 0.001). This would combine with the remaining conditions present in the discourse for a full interpretation. There was a main effect of L1 (Table 6, Contrast 2, p < 0.001), such that English speakers interpreted Positive discourses as more Sincere (our baseline model included Arabic speaker mean ratings, in a Positive Context). Here, the Intercept is adjusted to include the predictor of English L1, such that β 0 = 0.762 plus (β 3 ) 0.131 = 0.893 likelihood of Sincere interpretation, given these conditions (p < 0.001). Additionally, interactions tested within the model indicate that Prosody, which does not have a main effect, does operate differently within the L1s tested (Table 6, Contrast 3, p < 0.001). Specifically, in the baseline condition (Positive Context) given Sarcastic Prosody, English speakers are more likely to respond with a sarcastic interpretation (β 0 = 0.762 minus (β 6 ) 0.118 = 0.644, p = 0.03). Again, it is necessary to combine this effect with the remaining predictors (Table 5) to arrive at the fully calculated probability. There was no significant interaction of Context and Prosody for the Arabic speakers (β 4 = -0.072, n.s.) nor was there a significant three-way interaction between Context, Prosody and Language (β 7 = -0.033, n.s.).

We now describe our results in greater detail separately for each group and then compare between the groups. When testing to see whether effects were different from chance, we used χ2 tests when necessary.

English

  1. 1. For the situations in which context and prosody agreed, English speakers showed the predicted effects: For the Positive contexts with Sincere prosody, the English speakers answered Yes an overwhelming 96.5% of the time (greater than chance, χ2(1) = 86.49, p < 0.001). For the Negative contexts with Sarcastic prosody, they answered Yes only 18.0% of the time (lower than chance, χ2(1) = 40.96, p < 0.001).

  2. 2. For the situations in which context and prosody disagreed, as noted we did not have clear a priori predictions for the English speakers. For the Positive contexts with Sarcastic prosody English speakers answered Yes 90.0% of the time (greater than chance, χ2(1) = 64.00, p < 0.001). For the Negative contexts with Sincere prosody, they answered Yes only 35.0% of the time (lower than chance, χ2(1) = 9.00, p < 0.003).

  3. 3. Overall then, English speakers rated the Negative conditions as more sarcastic overall, regardless of prosody (greater than chance, χ2(1) = 89.97, p < 0.001), and in situations where context and prosody did not agree, they relied more strongly on context but were sensitive to prosody as well. This is evidenced by their decision to answer Yes more often in the Positive-Sarcastic condition (90%), as compared to the Negative-Sincere condition (35%).

Arabic

  1. 1. For the situations in which context and prosody agreed, Arabic speakers did not fully show the predicted effects: While for the Positive contexts with Sincere prosody, the Arabic speakers answered Yes 76.44% of the time (greater than chance, χ2(1) = 27.96, p < 0.001), which is consistent with the predictions, they also answered Yes most of the time (61.06%) for the Negative contexts with Sarcastic prosody, in contrast to the prediction, (also greater than chance, χ2(1) = 4.89, p < 0.05).

  2. 2. For the situations in which context and prosody disagreed, again, we predicted that, following the pattern of emotion identification in the CLI literature, prosody would be relied on when identifying sarcasm in an L2. A differential effect of emotion was partially supported by the results, in the case of a Negative context and Sincere prosody. For the Positive contexts with Sarcastic prosody, Arabic speakers answered Yes 81.73% of the time (greater than chance, χ2(1) = 40.27, p < 0.001). For the Negative contexts with Sincere prosody, they answered Yes 62.98% of the time (greater than chance, χ2(1) = 6.74, p < 0.01), suggesting they considered the Sincere interpretation more likely, given a Negative context, and Sincere prosody mismatch.

  3. 3. Overall, Arabic speakers rated the different conditions based on context while underutilizing prosody (as previously noted, the Context by Prosody interaction within the model in Table 5 is not significant, β 4 = -0.072, n.s.). Moreover, in all conditions, Arabic speakers’ answers reflected a bias towards interpreting all discourses as sincere rather than sarcastic, at a greater than chance rate (χ2(3) = 14.78, p < 0.003).

We also examined whether these findings were a result of differences among the Arabic speakers in any of the individual difference measures that collected during testing. These variables included: gender, age, years of instruction of English, age they were exposed to English, months spent in an immersive English environment, and EPI entrance exam scores (a standardized proficiency measure). This background information was foremost utilized to ensure that the majority of the participants were at an intermediate level, and averages and ranges are reported in Table 3. As there were only 2 females within our sample, we did not test for a gender effect.

Using these as variables within further analyses, (a model containing the variable being tested (β 1 ) as well as S2 context (β 2 ), S3 prosody (β 3 ) and the interaction of S2*variable being tested (β 4 ) and S3*variable being tested (β 5 ) with the random effects used in the previous analyses) we found that the original results of the Arabic speakers were influenced by the age at which the participant was exposed to English, (β 1 = -0.021, p = 0.008, β 4 = 0.023, p = 0.022, with this only influencing context scores). It can be noted that age of exposure effects have previously been linked to emotional response, and this finding is in line with the literature. As noted earlier, these effects appear to decrease with later age of exposure and acquisition of L2 (Dewaele, Reference Dewaele2009, Reference Dewaele2013; Pavlenko, Reference Pavlenko2005; Caldwell-Harris et al., Reference Caldwell-Harris, Staroselsky, Smashnaya, Vasilyeva and Wilson2012). Additionally, total proficiency scores on the EPI entrance exam also influenced the results of the speakers (β 1 = 0.002, p = 0.03, β 4 = -0.002, p = 0.02, again, with this only influencing context scores). While the listening component of the entrance exam approached significance (p = 0.061) when tested individually, the adjustment to overall scores did not merit including the subscales of the entrance exam scores separately. It should be noted however, that the listening subscale did show the most variability among the Arabic speakers for the variables collected. As can be seen by the β values and p scores, the effects of these and their contribution to the contribution of contextual information was relatively small. However, since most covariates were insignificant predictors within the model when tested individually (p’s > .05), and those that were significant only interacted with responses according to S2 condition, our results still must primarily be interpreted in regards to how participants utilized the input they were given: context and prosody. Overall, this analysis reaffirms that the most important predictor of relying on prosody in the interpretation of sarcasm in this experiment is simply whether a participant was a native speaker of English. Implications of these findings are discussed below.

General Discussion

The main goal of this study was to compare native and non-Native English speakers in their ability to utilize context and prosody to interpret sarcasm. Our results show clearly that, when context and prosody match, NS do much better than the Arabic speaking NNS in identifying sarcasm and sincerity of speech. Within our paradigm, NS relied on both context and prosody to interpret sarcasm but paid more attention to context than prosody when the two cues conflicted. In contrast, the Arabic speakers appeared to have relied exclusively on context in all cases. As is suggested by the pilot study, it may be the mention of a failed goal in the negative context sentences that elicits a top-down expectation of insincerity in both the NS and NNS. Only the NS, however, utilize a subsequent prosodic cue to confirm a sarcastic interpretation. The prosody cues are always ignored by the NNS and are ignored by the NS if they mismatch the context-based expectation. Interestingly, the preference for context over prosody in the NS appears to be a reversal of how English speakers have been reported to initially acquire the ability to identify sarcasm, placing greater importance on prosody than on context (Bryant & Fox Tree, Reference Bryant and Fox Tree2002; Capelli et al., Reference Capelli, Nakagawa and Madden1990; Nakassis & Snedeker, Reference Nakassis, Snedeker, Greenhill, Hughs, Littlefield and Walsh2002). We now turn to discuss our results in the context of the current literature on CLI, resource availability, and proficiency.

Fitting within CLI

Our prosody results could reflect negative transfer CLI of phonological information; it is possible that in their L1, Arabic speakers do not rely on prosody as heavily as English speakers to interpret emotion. As a full profile of prosody use in the Arabic language has not been completed, this provides a possible future area of investigation. In terms of previous work in the CLI of emotion, our results indicate that sarcasm cannot be detected as easily as other forms of emotion, regardless of whether the NNS have a background in English learning or not (Pell et al., Reference Pell, Monetta, Paulmann and Kotz2009; Cheang & Pell, Reference Cheang and Pell2011). Work in English using ERP has found that when emotional prosody and semantic violations are combined, as in the case of the mismatched conditions in the current experiment, the elicited ERP response is more negative (Kotz & Paulmann, 2007). This is in contrast to when only a prosody violation occurs, in which case the resulting ERP pattern spike is more positive (Kotz & Paulmann, 2007). This suggests that sarcasm processing may be different from the processing of emotional prosody in general and may help account for some of the apparent inconsistencies in the CLI literature on emotional transfer and sarcasm (Pell et al., Reference Pell, Monetta, Paulmann and Kotz2009; Cheang & Pell, Reference Cheang and Pell2011).

Resource Availability

As previously noted, there were differences between the native English and Arabic speakers in the use of prosody to determine sarcastic interpretations. In cases of mismatch, NS appear to heavily utilize the context presented before the prosody information but are also able to consider the prosody cues once they become available. Native English speakers thus appear to use the prosody information to confirm context in an information mismatch situation, not simply following a “good-enough” application of the lexical semantic information present in the contexts (Ferreira, Bailey & Ferrano, Reference Ferreira, Bailey and Ferraro2002), but engaging in active additional processing to confirm an already-made interpretation.

In terms of resource availability, this shows that NS have sufficient resources to allow them to keep in mind the information from the context, while also processing and incorporating the information provided by prosody. In contrast, it is possible that our mainly intermediate L1 Arabic speakers (n = 15 Intermediate, n = 12 Advanced), while having an understanding of vocabulary and conversational skills, may not yet have sufficient resources to allow them to revise an interpretation they already derived on the basis of context using the more subtle nuances of sarcastic prosody. The resource availability hypothesis is also compatible with the literature regarding sarcasm identification in native-English speaking children (Bryant & Fox Tree, Reference Bryant and Fox Tree2002; Capelli et al., Reference Capelli, Nakagawa and Madden1990; Nakassis & Snedeker, Reference Nakassis, Snedeker, Greenhill, Hughs, Littlefield and Walsh2002).

Proficiency

Turning to a possible proficiency-based explanation of our results, we note that, if the difference between the English and Arabic speakers was solely a reflection of different proficiency, we would have expected the Arabic speakers to try and utilize prosody information to detect at least emotional cues to reinforce a context-based negative interpretation. As noted, identification of emotion has been previously demonstrated to have cross-linguistic transfer capabilities (Pell et al., Reference Pell, Monetta, Paulmann and Kotz2009; Cheang & Pell, Reference Cheang and Pell2011). However, to more directly address this concern, and since we did have multiple levels of proficiency within our NNS group, we tested the role of proficiency in our models and found it to not have a significant effect (p > .05). Thus, reliance on transfer strategies related to proficiency alone is not a likely explanation for our results unless all of our participants happened to have too low proficiency to show an effect of prosody. While this is a possibility that would have to be directly addressed in a different study, our results are still important in showing that among the proficiency ranges included in our study, prosody cues played no role in sarcastic interpretation.

Other possibilities

Another possibility is that our participants did not have sufficient exposure to the use of prosody to express sarcasm in spoken English. Although we cannot rule out this possibility, we note that the results of the Arabic speakers’ entrance exams in listening (see Table 4) suggest that they did have intermediate listening skills as well, and should have had ample exposure to prosody in English. Thus, our results either reflect some difference between English and Arabic in the prosodic expression of sarcasm or a general disregard to prosodic cues to modify an already existing context-interpretation.

Finally, we should note that despite the differences we identified, our Arabic speakers were clearly engaged in the task and capable of identifying sarcasm, as they did show differences between the Positive and Negative contexts. This shows they used the context but had difficulty with incorporating the prosodic cues.

While our research clearly leaves open some important questions, our results do contribute to a field in which little is known regarding prosodic processing when combined with other CLI influences, such as emotion. In particular, our results indicate that models of SLA need to account for the reasons prosody was utilized by the NS but not by the NNS to modify an existing context-based representation. Models of SLA that consider the role of multiple variables, both linguistic (such as phonological information) and cognitive (such as resources and emotional resonance), in CLI appear to have a better chance of explaining our data.

Irony is used in a variety of contexts, and in order to process spoken sarcasm in English effectively, comprehenders rely on multiple input streams (context and prosody). Also, the correct identification of sarcasm appears reliant on recognizing both emotion carried in prosody and context properly. This may be especially difficult for L2 English learners as previous work has already identified that individuals have greater emotional resonance in their L1, unless L2 (or L3) exposure occurs at a young age (Dewaele, Reference Dewaele2009, Reference Dewaele2013; Pavlenko, Reference Pavlenko2005; Caldwell-Harris et al., Reference Caldwell-Harris, Staroselsky, Smashnaya, Vasilyeva and Wilson2012). Whether providing additional context (such as richer, more culturally appropriate situations) or exaggerated prosody can overcome this, are questions that remain open for investigation. Regardless, it appears that the information identified by native-English speakers in work by Rockwell (Reference Rockwell2000) and Cheang and Pell (Reference Cheang and Pell2008), is alone insufficient for NNS for the proper identification of sarcasm in spoken language.

Footnotes

*

We would like to thank the University of South Carolina English Proficiency for Internationals Program for generously assisting us in participant recruitment. This work was partially supported by grants NIH R21AG030445 and NSF BCS0822617.

References

Amir, N., Almogi, B.-C., & Gal, R. (2004, March). Speech Prosody: Perceiving prominence and emotion in speech – a cross lingual study. (pp. 375378). Nara, Japan.Google Scholar
Ayçiçeği-Dinn, A., & Caldwell-Harris, C. L. (2009). Emotion-memory effects in bilingual speakers: A levels-of-processing approach. Bilingualism: Language and Cognition, 12 (3), 291303.CrossRefGoogle Scholar
Baayen, R. H. (2008). Analyzing linguistic data: A practical introduction to statistics using R. Cambridge, UK: Cambridge University Press.Google Scholar
Baayen, R. H., Davidson, D. J., & Bates, D. M. (2008). Mixed-effects modeling with crossed random effects for subjects and items. Journal of Memory and Language, 59, 390412.Google Scholar
Bates, D., Maechler, M., Bolker, B., & Walker, S. (2014). lme4: Linear mixed-effects models using Eigen and S4. R package version 1.1-7. http://CRAN.R-project.org/package=lme4 Google Scholar
Boersma, P. (2001). Praat, a system for doing phonetics by computer. Glot International, 5:9/10, 341345.Google Scholar
Broselow, E. (1992). Nonobvious transfer: On predicting epenthesis errors. In Glass, S. & Selinker, L (Eds.), Language transfer in language learning (pp. 7186). Amsterdam: Benjamins.Google Scholar
Brown, P., & Levinson, S. C. (1978). Politeness: Some universals in language use. Cambridge, UK: Cambridge University Press.Google Scholar
Bryant, G. A., & Fox Tree, J. E. (2002). Recognizing verbal irony in spontaneous speech. Metaphor and Symbol, 17 (2), 99117.Google Scholar
Caldwell-Harris, C. L., & Ayçiçeği-Dinn, A. (2009). Emotion and lying in a non-native language. International Journal of Psychophysiology, 71 (3), 193204.Google Scholar
Caldwell-Harris, C. L., Staroselsky, M., Smashnaya, S., & Vasilyeva, N. (2012). Emotional resonances of bilinguals’ two languages vary with the age of arrival: The Russian-English bilingual experience in the U.S. In Wilson, P. (Ed.), Dynamicity in Emotion Concepts. Frankfurt: Peter Lang.Google Scholar
Capelli, C., Nakagawa, N., & Madden, C. (1990). How children understand sarcasm: The role of context and intonation. Child Development, 61 (6), 18241841.Google Scholar
Cheang, H., & Pell, M. (2008). The sound of sarcasm. Speech Communication, 50, 366381.Google Scholar
Cheang, H., & Pell, M. D. (2011). Recognizing sarcasm without language: A cross linguistic study of English and Cantonese. Pragmatics & Cognition, 192, 203223.Google Scholar
Cohen, J., Cohen, P., West, S., & Aiken, L. S. (2003). Applied multiple regression/correlation analysis for the behavioral sciences 3rd Ed. Mahwah, NJ: Routledge.Google Scholar
Cutler, A., Dahan, D., & van Donselaar, W. (1997). Prosody in the comprehension of spoken language: A literature review. Language and Speech, 40 (2), 141201.Google Scholar
de Jong, K., & Zawaydeh, B. A. (1999). Stress, duration, and intonation in Arabic word-level prosody. Journal of Phonetics, 27, 322.Google Scholar
Dewaele, J.-M. (2009) The effect of age of acquisition on self-perceived proficiency and language choice among adult multilinguals. Eurosla Yearbook, 9, 245268.Google Scholar
Dewaele, J.-M. (2013). Emotions in multiple languages 2nd Ed. Basingstoke: Palgrave Macmillan.Google Scholar
Dupoux, E., Pallier, C., Sebastian, N., & Mehler, J. (1997). A destressing deafness in French? Journal of Memory and Language, 36, 406421.Google Scholar
Ferreira, F., Bailey, K. G. D., & Ferraro, V. (2002). Good-enough representations in language comprehension. Current Directions in Psychological Science, 11 (1), 1115.Google Scholar
Gathercole, S. E., Pickering, S. J., Ambridge, B., & Wearing, H. (2004). The structure of working memory from 4 to 15 years of age. Developmental Psychology, 40, 177190.Google Scholar
Gibbs, R. (1986). On the psycholinguistics of sarcasm. Journal of Experimental Psychology: General, 115 (1), 315.Google Scholar
Håkansson, G., Pienemann, M., & Sayehli, S. (2002). Transfer and typological proximity in the context of L2 processing. Second Language Research, 18, 250273.Google Scholar
Harley, B., & Hart, D. (1997). Language aptitude and second language proficiency in classroom learners of different starting ages. Studies in Second Language Acquisition, 19, 379400.Google Scholar
Hitch, G. J., Towse, J. N., & Hutton, U. (2001). What limits children's working memory span? Theoretical accounts and applications for scholastic development. Journal of Experimental Psychology: General, 130, 184198.Google Scholar
Jarvis, S., & Pavlenko, A. (2008 ). Crosslinguistic influence in language and cognition. New York, NY: Routledge.Google Scholar
Jorgensen, J. (1996). The functions of sarcastic irony in speech. Journal of Pragmatics, 26, 613634.Google Scholar
Jorgensen, J., Miller, G., & Sperber, D. (1984). Test of the mention theory of irony. Journal of Experimental Psychology: General, 113 (1), 112120.Google Scholar
Keysar, B., Hayakawa, S., & An, S. (2012). The foreign language effect: Thinking in a foreign tongue reduces decision biases. Psychological Science, 23, 661668.Google Scholar
Kotz, S. A., & Paulmann, S. (2007). When emotional prosody and semantics dance cheek to cheek: ERP evidence. Brain Research, 1151, 107118.Google Scholar
Kreuz, R., & Glucksberg, S. (1989). How to be sarcastic: The echoic reminder theory of verbal irony. Journal of Experimental Psychology: General, 118 (4), 374386.Google Scholar
Loevenbruck, H., Ben Jannet, M. A., D'Imperio, M., Spini, M., & Champagne-Lavau, M. (2013). Prosodic cues of sarcastic speech in French: slower, higher, wider. In Proceedings of the 14th Annual conference of the International Speech Communication Association, pp. 14701474. Lyon, France.Google Scholar
Nakassis, C., & Snedeker, J. (2002). Beyond sarcasm: Intonation and context as relational cues in children's recognition of irony. In Greenhill, A., Hughs, M., Littlefield, H., Walsh, H. (Eds.), Proceedings of the Twenty-Sixth Boston University Conference on Language Development, pp. 429440. Somerville, MA: CascadillaPress.Google Scholar
Pavlenko, A. (2005). Emotions and multilingualism. Cambridge, UK: Cambridge University Press.Google Scholar
Pavlenko, A. (2012). Affective processing in bilingual speakers: Disembodied cognition?. International Journal of Psychology, 47 (6), 405428.Google Scholar
Pell, M. D., Monetta, L., Paulmann, S., & Kotz, S. A. (2009). Recognizing emotions in a foreign language. Journal of Nonverbal Behavior, 33, 107120.Google Scholar
R Core Team. (2014). R: A language and environment for statistical computing. R Foundation for Statistical Computing, Vienna, Austria. http://www.R-project.org Google Scholar
Rafaee, E., & Rieser, V. (2014). An Arabic Twitter corpus for subjectivity and sentiment analysis. In Proceedings of the 9th International Conference on Language Resources and Evaluation, pp. 22682273. Reykjavik, Iceland.Google Scholar
Ramsay, A., & Mansour, H. (2008). Towards including prosody in a text-to-speech system for modern standard Arabic. Computer Speech & Language, 22 (1), 84103.Google Scholar
Rao, R. (2013). Prosodic consequences of sarcasm versus sincerity in Mexican Spanish. Concentric: Studies in Linguistics, 39 (2), 3359.Google Scholar
Rockwell, P. (2000). Lower, slower, louder: Vocal cues of sarcasm. Journal of Psycholinguistic Research, 29 (5), 483495.Google Scholar
Schneider, W., Eschman, A., & Zuccolotto, A. (2002). E-Prime reference guide. Pittsburgh, PA: Psychology Software Tools, Inc.Google Scholar
Schwartz, B. D., & Sprouse, R. A. (1994). Word order and nominative case in nonnative language acquisition: A longitudinal study of (L1 Turkish) German interlanguage. In Hoekstra, T. & Schwartz, B. D. (Eds.), Language acquisition studies in generative grammar: papers in honor of Kenneth Wexler from the 1991 GLOW workshops, pp. 317368. Philadelphia, PA: John Benjamins.Google Scholar
Schwartz, B. D., & Sprouse, R. A. (1996). L2 cognitive states and the full transfer/full access model. Second Language Research, 12 (1), 4072.Google Scholar
Slugoski, B. R., & Turnbull, W. (1998). Cruel to be kind and kind to be cruel: Sarcasm, banter, and social relations. Journal of Language and Social Psychology, 7 (2), 101120.Google Scholar
Sperber, D., & Wilson, D. (1986). Relevance: Communication and Cognition. Cambridge, MA: Harvard University Press.Google Scholar
Sperber, D., & Wilson, D. (1995). Relevance: Communication and Cognition: 2nd Edition. Cambridge, MA: Blackwell Publishers Inc.Google Scholar
Weingartner, K., & Klin, C. (2005). Perspective taking during reading: An on-line investigation of the illusory transparency of intention. Memory & Cognition, 33 (1), 4858.Google Scholar
Weingartner, K., & Klin, C. (2009). Who knows what? Maintaining multiple perspectives during reading. Scientific Studies of Reading, 13 (4), 275294.CrossRefGoogle Scholar
Woodland, J., & Voyer, D. (2011). Context and intonation in the perception of sarcasm. Metaphor and Symbol, 26, 227239.Google Scholar
Figure 0

Table 1. Two sample experimental items in each possible condition.

Figure 1

Table 2. Two sample filler items.

Figure 2

Table 3. Averages, standard errors, and ranges for demographic data collected for Arabic speakers.

Figure 3

Table 4. Percent of “Yes” answers to comprehension questions for each language group per condition. Standard error reported in parentheses.

Figure 4

Table 5. Fixed effects results for logistic regression mixed-effects model (Yes answers coded 1, No 0). The baseline conditions consisted of Arabic speakers, Positive Context and Sincere Prosody, with Context, Sentence and Language as factors. Results are presented by contrast condition.

Figure 5

Table 6. Logistic regression model contrast results of full model vs. model containing only one main effect, or interaction. Results were obtained using Helmert coding of main effects.