Introduction
The processing of affective language in bilinguals has become a topic of interest in psycholinguistics during the last decade (for reviews, see Caldwell-Harris, Reference Caldwell-Harris2014; Jończyk, Reference Jończyk2016; Pavlenko, Reference Pavlenko2012). The main objective in that line of research has been to elucidate if the native language (L1) has a stronger emotional resonance than the non-native language (L2).
Different methodological approaches have been followed in the attempt to answer that question: i) introspective studies that assess bilinguals’ perception of emotionality in their respective languages (e.g., Dewaele, Reference Dewaele2004, Reference Dewaele2008; Dewaele, Lorette, Rolland & Mavrou, Reference Dewaele, Lorette, Rolland and Mavrou2021); ii) cognitively oriented studies where the effects of the emotional content of words (defined in relation to the dimensions of valence-ranging from pleasant to unpleasant, and arousal-ranging from exciting to calming) in the two languages is examined with different experimental paradigms, such as the lexical decision task (e.g., Ferré, Anglada-Tort & Guasch, Reference Ferré, Anglada-Tort and Guasch2018; Ponari, Rodríguez-Cuadrado, Vinson, Fox, Costa & Vigliocco, Reference Ponari, Rodríguez-Cuadrado, Vinson, Fox, Costa and Vigliocco2015), the emotional Stroop task (e.g., Eilola & Havelka, Reference Eilola and Havelka2011; Sutton, Altarriba, Gianico & Basnight-Brown, Reference Sutton, Altarriba, Gianico and Basnight-Brown2007), the affective priming paradigm (e.g., Degner, Doycheva & Wentura, Reference Degner, Doycheva and Wentura2012) or memory tasks (e.g., Ayçiçeği-Dinn & Caldwell-Harris, Reference Ayçiçeği-Dinn and Caldwell-Harris2009; Ferré, García, Fraga, Sánchez-Casas & Molero, Reference Ferré, García, Fraga, Sánchez-Casas and Molero2010); iii) psychophysiological studies, where physiological markers of arousal (mostly the skin conductance response, SCR) are recorded when words that differ in emotional content (including, for instance, swearwords or taboo words) are presented in L1 and L2 (e.g., Baumeister, Foroni, Conrad, Rumiati & Winkielman, Reference Baumeister, Foroni, Conrad, Rumiati and Winkielman2017; Caldwell-Harris, Tong, Lung & Poo, Reference Caldwell-Harris, Tong, Lung and Poo2011; Eilola & Havelka, Reference Eilola and Havelka2011; see also Iacozza, Costa & Duñabeitia, Reference Iacozza, Costa and Duñabeitia2017 and Toivo & Scheepers, Reference Toivo and Scheepers2019, for studies recording the pupillary response); iv) and electrophysiological and neuroimaging studies, where the time course of emotional processing is recorded (e.g., Conrad, Recio & Jacobs, Reference Conrad, Recio and Jacobs2011; Jończyk, Boutonnet, Musiał, Hoemann & Thierry, Reference Jończyk, Boutonnet, Musiał, Hoemann and Thierry2016; Wu & Thierry, Reference Wu and Thierry2012), or the neural areas involved in such processing are identified (e.g., Hsu, Jacobs & Conrad, Reference Hsu, Jacobs and Conrad2015) while bilingual speakers are presented with L1 and L2 words.
Overall, previous research in this area has documented a reduced effect of emotional content when bilinguals process linguistic information in their L2. Concretely, the effects of affective content are either observed only in L1 (e.g., Degner et al., Reference Degner, Doycheva and Wentura2012) or, more commonly, they are also found in the L2, but with a smaller magnitude (e.g., Baumeister et al., Reference Baumeister, Foroni, Conrad, Rumiati and Winkielman2017; Colbeck & Bowers, Reference Colbeck and Bowers2012; García-Palacios, Costa, Castilla, Del Río, Casaponsa & Duñabeitia, Reference García-Palacios, Costa, Castilla, Del Río, Casaponsa and Duñabeitia2018; Ivaz, Costa & Duñabeitia, Reference Ivaz, Costa and Duñabeitia2016; Toivo & Scheepers, Reference Toivo and Scheepers2019), or the effects appear later than in L1 (Conrad et al., Reference Conrad, Recio and Jacobs2011; Opitz & Degner, Reference Opitz and Degner2012), with some studies indicating that this reduced emotional effect is especially true for negative content (e.g., Jończyk et al., Reference Jończyk, Boutonnet, Musiał, Hoemann and Thierry2016; Wu & Thierry, Reference Wu and Thierry2012). There are, however, other studies that report an effect of the same magnitude in both languages (e.g., Eilola, Havelka & Sharma, Reference Eilola, Havelka and Sharma2007; Eilola & Havelka, Reference Eilola and Havelka2011; Ferré et al., Reference Ferré, García, Fraga, Sánchez-Casas and Molero2010; Ferré, Sánchez-Casas & Fraga, Reference Ferré, Sánchez-Casas and Fraga2013; Ponari et al., Reference Ponari, Rodríguez-Cuadrado, Vinson, Fox, Costa and Vigliocco2015), and even others that have found a larger emotional effect in L2 than in L1 (Ayçiçeği-Dinn & Caldwell-Harris, Reference Ayçiçeği-Dinn and Caldwell-Harris2009; Caldwell-Harris et al., Reference Caldwell-Harris, Tong, Lung and Poo2011). Such discrepancies might be explained by methodological differences between the studies, such as the type of bilinguals tested, or variables related with the L2 like age of acquisition, proficiency level or the type of acquisition context (Caldwell-Harris, Reference Caldwell-Harris2014; Pavlenko, Reference Pavlenko2012). This last variable is very relevant if one considers that the acquisition of the L1 is closely linked to sensory and affective experiences while the acquisition of the L2 commonly takes place in a classroom environment, devoid of such experiences. When L2 words are learned exclusively in an academic setting, they are often disembodied (Pavlenko, Reference Pavlenko2012): although the speakers know that the meaning of those L2 words is emotional, they may not experience an affective reaction in response to them (i.e., bilinguals know the words, but do not feel them).
The emotional (positive and negative) words and neutral words included in previous research on this area have been commonly selected from normative studies, where native speakers rate a large set of words in terms of their affective properties, most often valence and arousal. Subjective norms are available for a variety of languages, such as English (e.g., Warriner, Kuperman & Brysbaert, Reference Warriner, Kuperman and Brysbaert2013); Spanish (e.g., Guasch, Ferré & Fraga, Reference Guasch, Ferré and Fraga2016; Stadthagen-Gonzalez, Imbault, Perez Sanchez & Brysbaert, Reference Stadthagen-Gonzalez, Imbault, Perez Sanchez and Brysbaert2016); European Portuguese -EP- (e.g., Soares, Comesaña, Pinheiro, Simões & Frade, Reference Soares, Comesaña, Pinheiro, Simões and Frade2012); French (e.g., Monnier & Syssau, Reference Monnier and Syssau2014); German (e.g., Võ, Conrad, Kuchinke, Urton, Hofmann & Jacobs, Reference Võ, Conrad, Kuchinke, Urton, Hofmann and Jacobs2009); Polish (e.g., Imbir, Reference Imbir2015); Croatian (e.g., Ćoso, Guasch, Ferré & Hinojosa, Reference Ćoso, Guasch, Ferré and Hinojosa2019); Finnish (e.g., Eilola & Havelka, Reference Eilola and Havelka2010); Italian (e.g., Montefinese, Ambrosini, Fairfield & Mammarella, Reference Montefinese, Ambrosini, Fairfield and Mammarella2014); Dutch (e.g., Moors, De Houwer, Hermans, Wanmaker, Van Schie, Van Harmelen, De Schryver, De Winne & Brysbaert, Reference Moors, De Houwer, Hermans, Wanmaker, van Schie, Van Harmelen, De Schryver, De Winne and Brysbaert2013); or Chinese (e.g., Lin & Yao, Reference Lin and Yao2016). When bilingual experiments are designed, those words are commonly translated to the other language involved in the study and their valence and arousal levels are assumed to be the same in both languages. In order to know if this assumption is correct, it is necessary to examine if the subjective perception of the affective properties of words varies between the L1 and the L2. This is the main goal of the present study, where we use, for the first time, an approach that allows us to distinguish between the meaning of the words and the feelings they produce. This approach also enables us to test Pavlenko's (Reference Pavlenko2012) proposal, according to which bilinguals might know the meaning of L2 emotional words, but not feel them.
Only a few studies to date have compared L1 and L2 affective ratings. Some of them included such ratings as a complement of an experimental task; for instance, Harris (Reference Harris2004) recorded the SCR of Spanish–English bilinguals while they rated the unpleasantness of a set of L1 and L2 words and expressions (taboo words, neutral words, sexual words, childhood reprimands, endearments and insults). There were differences between L1 and L2 in the SCR and in the unpleasantness ratings elicited by only one type of expressions: namely, childhood reprimands. Similar results regarding affective ratings were obtained by Caldwell-Harris et al. (Reference Caldwell-Harris, Tong, Lung and Poo2011) with a group of Mandarin–English bilinguals, who were asked to rate the emotional intensity of the same types of words as in Harris (Reference Harris2004). Of note, ratings of emotional intensity of endearments were similar in Mandarin and English, despite the fact that SCR was larger for the former. Winskel (Reference Winskel2013), in turn, asked a group of Thai–English bilinguals to provide pleasantness (valence) ratings for a small set of Thai and English words (20 negative and 20 neutral words) and found no difference between L1 and L2 ratings. Interestingly, the same participants performed an emotional Stroop task that yielded an emotional effect but restricted to L1 words. The studies by Caldwell-Harris et al. (Reference Caldwell-Harris, Tong, Lung and Poo2011) and Winskel (Reference Winskel2013) suggest that self-reported affective ratings do not always match with other behavioral or physiological measures.
Other studies have focused on affectivity ratings per se, without including any additional task. For instance, Vélez-Uribe and Rosselli (Reference Vélez-Uribe and Rosselli2019) collected valence ratings for positive, negative and neutral words, as well as for taboo words. Spanish–English bilinguals performed the ratings in both languages. The results revealed that valence ratings in the more dominant language (English) were more positive for positive and taboo words and more negative for negative words when compared to valence ratings in the less dominant language (Spanish). In a similarly oriented study, Garrido and Prada (Reference Garrido and Prada2021) collected ratings for positive, negative and neutral words, as well as for taboo words from European Portuguese (EP)–English bilinguals. The dimensions rated in this study were valence, emotional intensity, and familiarity. The results showed that positive words were rated as more positive in L1 (EP) than in L2 (English), negative words were rated as more negative in L1 than in L2, and taboo words were rated as more negative and more emotionally intense in L1 than in L2.
Harris (Reference Harris2004), Caldwell-Harris et al. (Reference Caldwell-Harris, Tong, Lung and Poo2011), Winskel (Reference Winskel2013), Vélez-Uribe and Rosselli (Reference Vélez-Uribe and Rosselli2019), and Garrido and Prada (Reference Garrido and Prada2021) all used a within-participants design; the same bilinguals rated the affective properties of words in both languages. This is the most suitable design for reducing the effects of individual differences, but it may be that presenting words in the two languages also affects the participants’ ratings. Concretely, ratings may have been influenced by bilinguals’ belief about differences in emotional intensity between the two languages (i.e., they may have produced lower affective ratings for L2 words than for L1 words because they consider that the non-native language is less emotional than the native language). This would increase the difference between L1 and L2 ratings. Conversely, due to the bilingual characteristics of the task, participants might be on a bilingual mode (Grosjean, Reference Grosjean2010), which might enhance the activation of the L1 translation equivalents when performing the L2 ratings, and reduce the difference between L1 and L2 ratings.
In order to avoid these possible confounds, it would be better to ask bilinguals to perform ratings only in their L2, and compare them to ratings from native speakers for those same words. This was the approach followed by Imbault, Titone, Warriner, and Kuperman (Reference Imbault, Titone, Warriner and Kuperman2020), who collected valence ratings for a large set of English words from L2 speakers living in Canada and compared those ratings with those of native speakers (Warriner et al., Reference Warriner, Kuperman and Brysbaert2013). In line with the results of Garrido and Prada (Reference Garrido and Prada2021), L2 speakers produced attenuated ratings (i.e., less positive and negative values) in comparison to native speakers of English. Such reduction in emotionality in L2 was observed for both positive and negative words, but it was more salient for negative words. They also found that the differences in affective ratings between L1 and L2 words were smaller for highly frequent words, and for bilinguals who had spent more time living in Canada and were more proficient in English.
The overall conclusion from the reviewed rating studies seems to be that L2 words are perceived as less emotionally intense than L1 words. No conclusions can be drawn, however, regarding the proposal of Pavlenko (Reference Pavlenko2012), who suggested that bilinguals might know the emotional meaning of L2 words, but not feel it. The reason for this is that the criterion used by participants to do their ratings is not clear. Focusing on valence, which is the variable of interest here, and depending on the instructions used, participants may rely either on their semantic knowledge about the affective content of the stimulus (i.e., “murder” refers to something bad), or on the affective response elicited by the stimulus (i.e., the word “murder” evokes a negative feeling). Even a mix of both approaches is possible. The studies reviewed previously did not use identical instructions. Imbault et al. (Reference Imbault, Titone, Warriner and Kuperman2020) and Vélez-Uribe and Rosselli (Reference Vélez-Uribe and Rosselli2019) instructed participants to rate the feeling the words produced in them. Caldwell-Harris et al. (Reference Caldwell-Harris, Tong, Lung and Poo2011) asked participants to think of a real situation where the words/expressions were used and to rate how emotional that use was for them. Garrido and Prada (Reference Garrido and Prada2021), in turn, instructed participants to rate the extent to which they thought the word generated positive or negative feelings. Finally, participants in Harris (Reference Harris2004) and Winskel's (Reference Winskel2013) studies were asked to rate the words for unpleasantness/pleasantness.
Pavlenko's proposal (Reference Pavlenko2012) may be tested by instructing participants explicitly, when they perform their ratings, to focus either on the meaning of words or on the feelings they evoke. Although some previous studies have asked participants to focus on feelings (Caldwell-Harris et al., Reference Caldwell-Harris, Tong, Lung and Poo2011; Imbault et al., Reference Imbault, Titone, Warriner and Kuperman2020; Vélez-Uribe & Rosselli, Reference Vélez-Uribe and Rosselli2019), the type of instructions used in others (Garrido & Prada, Reference Garrido and Prada2021; Harris, Reference Harris2004; Winskel, Reference Winskel2013) does not allow us to know if participants relied on meanings or on feelings. A systematic study comparing these two types of instructions has never been carried out with words. This approach, which has been recently used with pictures (e.g., Itkes, Kimchi, Haj-Ali, Shapiro & Kron, Reference Itkes, Kimchi, Haj-Ali, Shapiro and Kron2017), is the one followed in the present study.
Itkes and Kron (Reference Itkes and Kron2019) recently pointed out the confusion that can be created by using the term valence to refer to two different things. Researchers have used valence to refer to both the emotional response produced by the stimuli (affective valence) as well as the semantic knowledge about their affective content (semantic valence). However, they are not the same. Affective response refers to a physiological (i.e., autonomous nervous system activity), body (facial expression and bodily posture), and subjective (feelings) change in response to a particular event (Itkes & Kron, Reference Itkes and Kron2019). Semantic valence, by contrast, refers to general conceptual knowledge about the affective properties of objects or events (e.g., prison or violence refer to negative things). Ratings collected in normative studies may reflect either the affective response or the semantic valence, depending on how participants understand the task. If they understand that the task is about how positive or negative the stimulus is, they will probably rely mostly on their semantic knowledge. By contrast, when they understand that they have to report their feelings, it is more likely that their ratings will be based on the feelings provoked by the stimulus. Simply asking participants about stimulus or feelings may not be sufficient to make clear the difference between them. For this reason, Itkes and Kron (Reference Itkes and Kron2019) developed a set of instruction protocols where, after explaining the difference between affective valence and semantic valence to the participants, they were asked to focus on one of these two aspects to perform their ratings. They also provided evidence of the validity of these ratings. In particular, they found that ratings obtained with feeling-focused instructions predicted better the physiological response of participants (facial electromyography, heart rate, and electrodermal change) than those obtained with knowledge-focused instructions (Hamzani, Mazar, Itkes, Petranker & Kron, Reference Hamzani, Mazar, Itkes, Petranker and Kron2020). Furthermore, whereas affective valence ratings were attenuated with the repeated presentation of the stimuli (i.e., there was habituation), semantic ratings for valence were not attenuated (Itkes et al., Reference Itkes, Kimchi, Haj-Ali, Shapiro and Kron2017). These findings reveal that affective and semantic valence can be dissociated empirically and that, when participants are instructed to focus on one of these two aspects, they can do so.
In this study, we adopted the procedure of Itkes and colleagues with the aim of examining the impact of language in the evaluation of the affective and semantic valence of words. To that end, we collected ratings for a set of English words from native speakers of English as well as from EP–English bilinguals. We asked participants to provide positivity and negativity ratings for those words, either focusing on the feelings the words produced to them or on their meaning. Considering the results of previous rating studies, we expected differences between native speakers of English and bilinguals, and, in particular, we expected valence ratings to be more extreme for native speakers. Furthermore, in accordance with Pavlenko (Reference Pavlenko2012), we expected these differences to be larger for feeling-focused ratings than for knowledge-focused ratings. Considering that, in some cases, experimental differences in emotional effects between L1 and L2 have been restricted to negative words (Jończyk et al., Reference Jończyk, Boutonnet, Musiał, Hoemann and Thierry2016; Wu & Thierry, Reference Wu and Thierry2012), we expected such differences in ratings to be more pronounced for negative words than for positive words.
A second goal of our study was to explore whether, if there were indeed a difference, it would depend on the type of emotional word involved. Emotional words are not a homogeneous set. There are emotion terms (e.g., happiness or fear) that refer directly to emotions (i.e., they denote an emotion) which we may call “emotion words”. Words can also provoke intense emotional reactions even though they do not denote an emotion (e.g., murder), by virtue of their connotative meaning. These may be called “emotion-laden words” (Pavlenko, Reference Pavlenko2008), which are perceived by the speakers as positive or negative (e.g., party or murder, respectively). Psycholinguistic research on emotional word processing on both native speakers and bilinguals has not commonly distinguished between these two types of words (see Hinojosa, Moreno & Ferré, Reference Hinojosa, Moreno and Ferré2020, for a review), although recent attempts have been made to explore this distinction (e.g., Martin & Altarriba, Reference Martin and Altarriba2017; Wang, Shangguan & Lu, Reference Wang, Shangguan and Lu2019). The few rating studies conducted in bilinguals have not distinguished between emotion words and emotion-laden words (Garrido & Prada, Reference Garrido and Prada2021; Imbault et al., Reference Imbault, Titone, Warriner and Kuperman2020; Vélez-Uribe & Rosselli, Reference Vélez-Uribe and Rosselli2019; Winskel, Reference Winskel2013). This can be, however, a relevant distinction for the purposes of this work. Emotion words directly refer to emotion terms, while emotion-laden words become affectively loaded probably through associative mechanisms (e.g., emotion-laden words could be associated with an emotion as a result of the direct experience of the speaker, and/or through his/her indirect exposure to information about the world). In consequence, the affective content of emotion-laden words may be more prone to individual differences and culture-specific effects than that of emotion words, depending more on the affective experiences linked to the acquisition and use of those words. If that is the case, the differences between L1 and L2 ratings explored here may be larger for emotion-laden words than for emotion words. This is the first time that this issue is investigated.
Method
Participants
A total sample of 194 participants took part in this study. Among them, 117 participants were native speakers of English and 77 participants were EP–English bilinguals. The native speakers of English were students at the University of Southern Mississippi (United States of America). They were raised in the USA (with two exceptions who were raised in other English-speaking countries), had English as their mother tongue (two of them had an additional mother tongue), and were currently living in the USA. The bilingual participants were recruited at the Universidade do Minho (Portugal). They were all raised in Portugal (with four exceptions, raised in Brazil), had EP as their mother tongue, and were currently living in Portugal. Most of them (88.31%) learned English at school. All participants electronically signed an informed consent form before starting the experiment and received academic credit for their participation. The study was conducted with the approval of the Ethics Committee for Human Research of the University of Minho (Ref: CEICSH 052/2019) and of the University of Southern Mississippi's Institutional Review Board (Ref: IRB-20-383).
Participants from the two groups were randomly assigned to one of the instruction protocols: feeling-focused or knowledge-focused ratings; thus generating four groups of participants. A group of 62 native speakers of English (Mean age = 20.65 years, SD = 6.27, 83.87% females) were assigned to the knowledge-focused instructions, and another group of 55 native speakers of English (Mean age = 21.16 years, SD = 6.09, 80.00% females) were assigned to the feeling-focused instructions. Similarly, a group of 42 EP–English bilinguals (Mean age = 21.60 years, SD = 4.97, 85.71% females) were assigned to the knowledge-focused instructions, and another group of 35 EP–English bilinguals (Mean age = 21.86 years, SD = 6.43, 91.43% females) were assigned to the feeling-focused instructions.
All participants started the study by taking the LexTALE English test, developed by Lemhöfer and Broersma (Reference Lemhöfer and Broersma2012; http://www.lextale.com/takethetest.html) to measure English proficiency. This test consists of a lexical decision task where participants have to indicate whether or not a string of letters is an existing English word. LexTALE provides a proficiency score that is interpreted as follows, in relation to the Common European Framework of Reference for Languages proficiency levels: participants with scores between 80 and 100 are considered advanced users (C1-C2), those with scores between 60 and 79 are upper intermediate users (B2), and those with scores of 59 and below are lower intermediate (B1) or basic users (A2), (Lemhöfer & Broersma, Reference Lemhöfer and Broersma2012, Table 9).
Participants also completed a language background questionnaire after entering their ratings. Table 1 shows the mean scores obtained by the four groups of participants in the LexTALE test as well as the data collected from the language background questionnaire. Participants estimated their age of acquisition of English in years, and their proficiency in English (listening, reading, speaking, and writing) in a 7-point scale (1 = very poor to 7 = native-like). They also estimated their daily use of English (EP–English bilinguals also estimated their use of EP).
Table 1. Mean and standard deviation (in parentheses) of the LexTALE score and of the language background questionnaire data in the four groups of participants

Table 1 shows that native speakers of English acquired English early in life, and self-rated their proficiency in English as almost perfect. Importantly, the two groups of native speakers did not differ significantly in terms of age or any of the variables presented in Table 1 (all ps > .136). EP–English bilinguals acquired English during childhood, studied English for about ten years, their self-rated English proficiency in the four skills was above the middle point of the scale, and they scored in the upper intermediate range as measured by LexTALE. All bilingual participants declared using EP more than English in their daily life, but their percentage of use of English throughout the day was not negligible. Importantly, the two groups of bilinguals did not differ significantly in age or any of the aforementioned variables (all ps > .150).
Statistical comparisons revealed that native English speakers and EP–English bilinguals did not differ in age, t(192) = 0.94, p = .347, but, as intended, they differed in all the other variables examined (all ps < .001): native speakers obtained better scores in the LexTALE test, acquired English earlier, perceived themselves as more proficient, and used English to a greater extent than bilinguals.
Participants were also asked in the background questionnaire about their knowledge of other languages. 31.62% of the native speakers of English and 68.83% of the EP–English bilinguals indicated that they were familiar with other languages. However, their self-rated proficiency in those languages was low (M = 2.79 and M = 3.19 in a 1-to-7 scale for native speakers of English and EP–English bilinguals, respectively). None of the native speakers of English reported having knowledge of Portuguese.
Materials
The critical stimuli set consisted of 160 English words (see Appendix A), distributed as follows: half of them were emotional words and the other half were neutral words. The emotional words were divided into emotion words and emotion-laden words. Furthermore, half of them had a positive valence whereas the other half had a negative valence. Therefore, there were 20 positive emotion words (e.g., love), 20 negative emotion words (e.g., envy), 20 positive emotion-laden words (e.g., dream), 20 negative emotion-laden words (e.g., exile) and 80 neutral words (e.g., notion).
Words were selected from the English affective norms of Warriner et al. (Reference Warriner, Kuperman and Brysbaert2013). We first identified a potential set of emotion words in that dataset, and, to be sure that they were perceived as words referring to emotions, relied on the emotional prototypicality ratings of their translation equivalents in a Spanish database (Pérez-Sánchez, Stadthagen-Gonzalez, Guasch, Hinojosa, Fraga, Marín & Ferré, Reference Pérez-Sánchez, Stadthagen-Gonzalez, Guasch, Hinojosa, Fraga, Marín and Ferré2021). We used this dataset because it is the most recent and largest for this type of ratings. We focused on emotion words (nouns and adjectives) with prototypicality ratings higher than 3 (in a 5-point scale). From this pool, we selected 20 positive and 20 negative words, according to the valence values in the Warriner et al. (Reference Warriner, Kuperman and Brysbaert2013) database. A word was considered to be positive if it had a valence rating higher than 6, and it was considered to be negative if it had a valence rating lower than 4 (in a 9-point scale). Positive and negative emotion words differed in valence, t(38) = 29.36, p < .001, but they were matched in a set of affective and psycholinguistic variables: arousal (Warriner et al., Reference Warriner, Kuperman and Brysbaert2013), concreteness (Brysbaert, Warriner & Kuperman, Reference Brysbaert, Warriner and Kuperman2014), Zipf value (a measure of word frequency, Van Heuven, Mandera, Keuleers & Brysbaert, Reference Van Heuven, Mandera, Keuleers and Brysbaert2014), and length taken as number of letters (all ps > .164), (see Table 2). In addition, as one set of participants consisted of EP–English bilinguals, the orthographic similarity between the English words and their Portuguese translations was also controlled. For this purpose, we relied on the Normalized Levenshtein Distance (NLD), calculated with NIM (Guasch, Boada, Ferré & Sánchez-Casas, Reference Guasch, Boada, Ferré and Sánchez-Casas2013). Positive and negative emotion words were matched in NLD, t(38) = 0.48, p = .635. Finally, both conditions consisted of 75% nouns and 25% adjectives.
Table 2. Mean and standard deviation (in parentheses) of the affective and psycholinguistic characteristics of the stimuli

Note. Valence and arousal rated in a 9-point scale. Concreteness rated in a 5-point scale. NLDport refers to the proportion of orthographic similarity between the English words and their Portuguese translation equivalents.
Words from the emotion-laden and neutral conditions were also taken from the norms of Warriner et al. (Reference Warriner, Kuperman and Brysbaert2013) and consisted of 75% nouns and 25% adjectives in each condition as well. The valence criteria used to select emotion-laden positive and negative words were the same as for the emotion words. Words were considered as neutral if they had a valence rating between 4 and 6 (see Table 2). As intended, emotion-laden positive and negative words differed in valence, t(38) = 28.49, p < .001, but not in the other variables (all ps > .279). Furthermore, positive emotion-laden words were matched in all the variables with positive emotion words (all ps > .120), and negative emotion-laden words were matched in all the variables with negative emotion words (all ps > .326).
Finally, the set of 80 neutral words differed from emotional words (emotion words and emotion-laden words considered together) only in arousal, t(158) = 9.18, p < .001. There were no significant differences for the other variables (all ps > .198). The emotional words included both positive and negative words, so the average value of their valences was similar to that of neutral words. However, when comparing the valence of positive and negative words separately against the valence of neutral words, significant differences were observed in both comparisons (both ps < .001).
Procedure
The instructions for the ratings were adapted from Itkes et al. (Reference Itkes, Kimchi, Haj-Ali, Shapiro and Kron2017; see Appendices B and C for the complete instructions). Participants were first presented with an explanation of the distinction between knowing and feeling regarding words. Then, the particular instructions for each version of the ratings were provided. For the knowledge-focused version, participants were instructed to focus on word meaning to rate how positive/negative the word was. It was explained to participants that they should provide ratings based on the valence of the word, not on their internal feelings (see Appendix B). Conversely, in the feeling-focused version, participants were instructed to focus on the internal feelings evoked by the words in order to rate them in terms of positivity and negativity. It was explained to participants that they were to provide ratings based on their internal feelings, not on their knowledge of the affective meaning of the words (see Appendix C). All participants completed the questionnaire online.
The procedure was the same for all participant groups. First, participants read and electronically signed the informed consent form. Then, they were prompted to take the online version of the English LexTALE test (Lemhöfer & Broersma, Reference Lemhöfer and Broersma2012), and to report the score obtained. Each participant was then randomly assigned to one of the conditions and the corresponding set of instructions was presented on the screen (see Appendices B and C). The following screen explained the use of the scales to provide ratings. Following the procedure of Itkes et al. (Reference Itkes, Kimchi, Haj-Ali, Shapiro and Kron2017), we used two sliding scales for each word in which participants could indicate values ranging from 0 to 10. The first one was a positivity scale, and the second one was a negativity scale. In the knowledge-focused version, 0 meant that the word had no positive meaning at all or no negative meaning at all, while 10 meant that the word had an extremely positive meaning or an extremely negative meaning (for the positivity and negativity scales, respectively). In the feeling-focused version, 0 meant that the word elicited no positive feeling at all or no negative feeling at all while 10 meant that the word produced an extremely positive feeling or an extremely negative feeling (for the positivity and negativity scales, respectively). Participants could indicate that they didn't know the word.
After the instructions, the 160 words were presented in a different random order for each participant, one word per screen, together with the positive and the negative scales (see Figure 1).

Fig. 1. Layout of the rating screen.
In order to progress to the next item, participants had to provide a response on each scale or indicate that they did not know the meaning of the word. After rating all the words on the list, participants completed the language background questionnaire described in the Participants section.
Results
Three sets of analyses were performed: 1) an analysis that included all the words, where emotional words (including emotion words and emotion-laden words) were compared to neutral words; 2) an analysis restricted to emotional words, that focused on the comparisons between emotion words with emotion-laden words; and 3) an analysis restricted to emotional words, that focused on the comparison between positive and negative words. In all the cases, we carried out two analyses of variance (ANOVA), one by participants and the other by items. Significant interactions were subjected to post-hoc Bonferroni tests. When needed, the effect size was calculated using Cohen's d (Cohen, Reference Cohen1988). When the interactions were significant both by participants and by items (or when they were close to significance by participants, and significant by items), the statistical data provided with respect to Bonferroni tests (i.e., p and d values) correspond to the analysis by participants. In the few cases where the interactions reached statistical significance only in the analysis by items, the statistical data reported correspond to that analysis. Finally, all the means and standard deviations that appear in the tables, in the figures, as well as in the main text refer to the analyses by participants.
1- Analyses including all the words
We carried out two analyses of variance (ANOVA by participants and by items) on positivity ratings. The factors included in the analyses were emotionality (emotional words vs. neutral words), group (native speakers of English vs. EP–English bilinguals) and type of rating (knowledge-focused vs. feeling-focused: know vs. feel hereafter). Emotionality was treated as a within-group factor in the analysis by participants and as a between-groups factor in the analysis by items. Group and type of rating were treated as between-groups factors in the analysis by participants and as within-group factors in the analysis by items. The same two ANOVAs were carried out on negativity ratings. Table 3 shows the mean and standard deviation of the affective ratings across conditions.
Table 3. Mean and standard deviation (in parentheses) of the affective ratings across conditions

Positivity ratings
The analyses of positivity ratings revealed a main effect of the three factors involved: positivity ratings were higher in native speakers (M = 3.77, SD = 1.48) than in bilinguals (M = 3.33, SD = 1.72), F 1(1, 190) = 6.44, p = .012, ηp2 = 0.03, F 2(1, 158) = 98.93, p < .001, ηp2 = 0.39, and they were higher in the know condition (M = 4.18, SD = 1.41) than in the feel condition (M = 2.92, SD = 1.53), F 1(1, 190) = 49.20, p < .001, ηp2 = 0.21, F 2(1, 158) = 776.05, p < .001, ηp2 = 0.83. The ratings were higher in emotional words (M = 3.93, SD = 1.17) than in neutral words (M = 3.26, SD = 1.86), although in this case the difference was only significant in the analysis by participants, F 1(1, 190) = 75.81, p < .001, ηp2 = 0.29, F 2(1, 158) = 3.39, p = .067, ηp2 = 0.02. The interaction between group and type of rating was close to significance in the analysis by participants and reached statistical significance in the analysis by items, F 1(1, 190) = 3.86, p = .051, ηp2 = 0.02, F 2(1, 158) = 206.84, p < .001, ηp2 = 0.57. This interaction, which is displayed in Figure 2a, indicates that the difference between native speakers and bilinguals was significant in the feel condition (p = .002), but not in the know condition (p = .674). Furthermore, the difference in positivity ratings between the feel condition and the know condition was significant in both groups of participants (both ps < .001), but the effect size calculated using Cohen's d (Cohen, Reference Cohen1988) was larger for bilinguals (d = 1.14) than for native speakers of English (d = 0.69). Finally, the interaction between emotionality and type of rating was also significant, F 1(1, 190) = 11.52, p < .001, ηp2 = 0.06, F 2(1, 158) = 28.78, p < .001, ηp2 = 0.15. That interaction indicates that both emotional and neutral words were rated as more positive in the know condition than in the feel condition (both ps < .001), the effect size being larger for emotional words (d = 0.93) than for neutral words (d = 0.88). It also indicates that emotional words were rated as more positive than neutral words in both the know and the feel condition (both ps < .001), being the effect size larger in the feel condition (d = 0.65) than in the know condition (d = 0.31).

Fig. 2a and 2b. Analysis with all the words: Interaction between group and type of rating in positivity ratings (2a) and negativity ratings (2b).
Negativity ratings
The analyses of negativity ratings revealed a similar pattern of findings. The three factors involved had a significant effect. Indeed, negativity ratings were higher in native speakers of English (M = 3.47, SD = 1.63) than in bilinguals, (M = 2.85, SD = 1.59), F 1(1, 190) = 13.98, p < .001, ηp2 = 0.07, F 2(1, 158) = 175.46, p < .001, ηp2 = 0.53; they were higher in the know condition (M = 3.68, SD = 1.47) than in the feel condition (M = 2.70, SD = 1.67), F 1(1, 190) = 35.03, p < .001, ηp2 = 0.16, F 2(1, 158) = 323.80, p < .001, ηp2 = 0.67, and they were higher in emotional words (M = 4.12, SD = 1.26) than in neutral words (M = 2.34, SD = 1.48), F 1(1, 190) = 826.07, p < .001, ηp2 = 0.81, F 2(1, 158) = 25.39, p < .001, ηp2 = 0.14. The interaction between group and type of rating was also significant, although only in the analysis by items, F 1(1, 190) = 1.15, p = .285, ηp2 = 0.01, F 2(1, 158) = 56.48, p < .001, ηp2 = 0.26. This interaction, which is displayed in Figure 2b, indicates that, although negativity ratings were higher in native speakers than in bilinguals, and in the know condition than in the feel condition (all ps < .001), the effect size of the difference between native speakers and bilinguals was larger in the feel condition (d = 0.39) than in the know condition (d = 0.16). On the other hand, the effect size of the difference between the feel condition and the know condition was larger for bilinguals (d = 0.51) than for native speakers of English (d = 0.33).
2- Analyses restricted to emotional words: Comparison between Emotion words and Emotion-laden words
We carried out two ANOVAs (by participants and by items) on positivity ratings, restricted to positive words. The factors were as follows: type of emotional word (emotion words vs. emotion-laden words), group (native speakers of English vs. EP–English bilinguals) and type of rating (know vs. feel). Type of emotional word was treated as a within-group factor in the analysis by participants and as a between-groups factor in the analysis by items. Group and type of rating were treated as between-groups factors in the analysis by participants and as within-group factors in the analysis by items. The same two ANOVAs were carried out on negativity ratings, focusing now on negative words.
Positivity ratings
There was a main effect of the three factors involved: positivity ratings for positive words were higher in native speakers (M = 7.36, SD = 1.68) than in bilinguals (M = 6.62, SD = 2.39), F 1(1, 190) = 9.80, p = .002, ηp2 = 0.05, F 2(1, 38) = 50.30, p < .001, ηp2 = 0.57; they were higher in the know condition (M = 7.78, SD = 1.41) than in the feel condition (M = 6.24, SD = 2.29), F 1(1, 190) = 41.88, p < .001, ηp2 = 0.18, F 2(1, 38) = 724.06, p < .001, ηp2 = 0.95, and they were higher in emotion positive words (M = 7.45, SD = 1.93) than in emotion-laden positive words (M = 6.68, SD = 2.04), F 1(1,190) = 159.62, p < .001, ηp2 = 0.46, F 2(1,38) = 4.83, p = .034, ηp2 = 0.11. The interaction between group and type of rating was close to significance in the analysis by participants and reached statistical significance in the analysis by items, F 1(1, 190) = 3.66, p = .057, ηp2 = 0.02, F 2(1, 38) = 111.01, p < .001, ηp2 = 0.74. Such interaction, which is displayed in Figure 3a, indicates that the difference between native speakers of English and bilinguals was significant in the feel condition (p = .001), but not in the know condition (p = .371). It also indicates that, although positivity ratings were higher in the know condition than in the feel condition for both native speakers of English and for bilinguals (both ps < .001), the effect size was larger for bilinguals (d = 1.01) than for native speakers (d = 0.74).

Fig. 3a and 3b. Analysis with emotional words: Interaction between group and type of rating in positivity ratings for positive words (3a) and in negativity ratings for negative words (3b).
Negativity ratings
There was a main effect of group: Negativity ratings were higher in native speakers of English (M = 7.40, SD = 1.86) than in bilinguals, (M = 6.70, SD = 2.22), F 1(1, 190) = 8.24, p = .005, ηp2 = 0.04, F 2(1, 38) = 64.38, p < .001, ηp2 = 0.63. The effect of type of rating was also significant: ratings were higher in the know condition (M = 7.94, SD = 1.30) than in the feel condition (M = 6.19, SD = 2.32), F 1(1, 190) = 48.18, p < .001, ηp2 = 0.20, F 2(1, 38) = 351.79, p < .001, ηp2 = 0.90. The effect of type of emotional word reached statistical significance only in the analysis by participants, F 1(1, 190) = 56.19, p < .001, ηp2 = 0.23, F 2(1, 38) = 1.92, p = .174, ηp2 = 0.05, indicating that participants rated emotion words as more negative (M = 7.31, SD = 2.02) than emotion-laden words (M = 6.94, SD = 2.05). The interaction between group and type of rating was significant too, although in this case only in the analysis by items, F 1(1, 190) = 0.86, p = .356, ηp2 = 0.004, F 2(1, 38) = 17.11, p < .001, ηp2 = 0.31. This interaction, which is displayed in Figure 3b, indicates that negative words were perceived as more negative by native speakers of English than by bilinguals in both the know condition and the feel condition (both ps < .001), but the effect size was larger in the feel condition (d = 0.99) than in the know condition (d = 0.54). On the other hand, both native speakers of English and bilinguals rated negative words as more negative in the know condition than in the feel condition (both ps < .001), being the effect size larger in bilinguals (d = 2.00) than in native speakers (d = 1.75).
Finally, the interaction between group and type of emotional word was significant in the analysis by participants, F 1(1, 190) = 6.07, p = .015, ηp2 = 0.03, F 2(1, 38) = 1.54, p = .223, ηp2 = 0.04. This interaction, which is displayed in Figure 4, indicates that negativity ratings were higher for native speakers of English than for bilinguals, both when considering emotion words (p = .020) and emotion-laden words (p = .001), the effect size being larger in emotion-laden words (d = 0.41) than in emotion words (d = 0.29). It also indicates that negativity ratings were higher for emotion words than for emotion-laden words, both in native speakers of English and in bilinguals (both ps < .001), with a higher effect size in bilinguals (d = 0.55) than in native speakers of English (d = 0.14).

Fig. 4. Analysis with emotional words: Interaction between group and type of emotional word in negativity ratings for negative words.
3- Analyses restricted to emotional words: Comparison between positive words and negative words
The above analyses reveal a similar pattern of findings with positive words (i.e. those with positivity ratings) and negative words (i.e. those with negativity ratings). A direct comparison between them is needed, however, to examine directly the effects of valence (i.e., positive vs. negative). To that end, a single bipolar valence score (positive rating minus negative rating) for each word was computed, following the procedure of Itkes et al. (Reference Itkes, Kimchi, Haj-Ali, Shapiro and Kron2017). A positive bipolar valence score for a given word indicates that the positivity rating is higher than the negativity rating for that word. Conversely, a negative bipolar valence score indicates that the positivity rating for that word is lower than the negativity rating. We carried out an ANOVA on this variable including these factors: valence (positive word vs. negative word), group (native speakers of English vs. EP–English bilinguals) and type of rating (know vs. feel). Valence was treated as a within-groups factor in the analysis by participants and as a between-groups factor in the analysis by items, while group and type of rating received the same treatment as in the above analyses. The results revealed a main effect of valence, F 1(1, 190) = 2333.45, p < .001, ηp2 = 0.92, F 2(1, 78) = 1461.78, p < .001, ηp2 = 0.95, showing, as expected, that the bipolar valence score was higher for positive words (M = 5.93, SD = 1.87) than for negative words (M = -6.35, SD = 1.99). The three two-way interactions reached statistical significance: The interaction between valence and type of rating, F 1(1, 190) = 29.17, p < .001, ηp2 = 0.13, F 2(1, 78) = 346.11, p < .001, ηp2 = 0.82; between group and type of rating, F 1(1, 190) = 2.94, p = .088, ηp2 = 0.02, F 2(1, 78) = 6.47, p = .013, ηp2 = 0.08, and the interaction between group and valence, F 1(1, 190) = 2.49, p = .116, ηp2 = 0.01, F 2(1, 78) = 15.32, p < .001, ηp2 = 0.16, although in the last two cases, the interaction only reached statistical significance in the analysis by items. Interestingly, these two-way interactions were qualified by a three-way interaction between group, type of rating, and valence, which was significant in the analysis by items, F 1(1, 190) = 0.99, p = .321, ηp2 = 0.01, F 2(1, 78) = 20.10, p < .001, ηp2 = 0.20 (see Figure 5). In order to interpret this interaction, we analyzed separately the bipolar valence scores for know and feel focused instructions. The analysis focused on the know condition revealed a main effect of valence, F 2(1, 78) = 1536.31, p < .001, ηp2 = 0.95, indicating that the overall bipolar valence score was larger for positive words (M = 6.52, SD = 1.81) than for negative words (M = -6.92, SD = 1.42). The analysis focused on the feel condition revealed again a significant valence effect, F 2(1, 78) = 1235.75, p < .001, ηp2 = 0.94, with higher values for positive words (M = 5.14, SD = 1.70) than for negative words (M = -5.56, SD = 1.25). A significant interaction between group and valence was also present in the feel condition, F 2(1, 78) = 33.01, p < .001, ηp2 = 0.30. Such interaction indicates that the bipolar valence score was higher for positive words than for negative words in both groups of participants (both ps < .001), being the effect size larger in native speakers of English (d = 7.99) than in bilinguals (d = 6.86). It also indicates that native speakers of English produced more extreme scores for both positive and negative words (i.e., a larger bipolar valence score indicates that a positive word is considered as more positive, and that a negative word is considered as more negative) than bilinguals (both ps < .001). In this case, the effect size was larger for negative words (d = 0.50) than for positive words (d = 0.46).

Fig. 5. Analysis with emotional words: Interaction between group, type of rating and valence in the bipolar valence score.
Discussion
In this study we examined whether there is a difference between native and second-language speakers regarding affective ratings of words. Native speakers of English and EP–English bilinguals provided positivity and negativity ratings for a set of emotional and neutral English words. Among the emotional words, there were emotion words and emotion-laden words. Half of the participants were asked to focus on the feelings elicited by each word to do the ratings, while the other half were asked to focus on their knowledge of the word's meaning. The results were clear-cut: native speakers produced more intense affective ratings than bilinguals, and that difference was larger when participants focused on their feelings than when they focused on the meaning of words. Regarding the comparison between emotion words and emotion-laden words, the effects were rather modest and restricted to negative words, where the difference in ratings between native speakers of English and EP–English bilinguals was larger for emotion-laden words than for emotion words.
In a review of the literature on affective language processing in bilinguals, Pavlenko (Reference Pavlenko2012) suggested that bilinguals might know the emotional meaning of L2 words, but not feel it. The results of the few affective rating studies conducted so far largely support that proposal (Garrido & Prada, Reference Garrido and Prada2021; Imbault et al., Reference Imbault, Titone, Warriner and Kuperman2020; Vélez-Uribe & Rosselli, Reference Vélez-Uribe and Rosselli2019; but see Winskel, Reference Winskel2013, for contrasting findings and Harris, Reference Harris2004, and Caldwell-Harris et al., Reference Caldwell-Harris, Tong, Lung and Poo2011, for effects restricted to a particular type of words – namely, childhood reprimands). However, as stated in the introduction, the procedure used in those studies does not allow us to conclude whether the differences between native speakers and bilinguals’ ratings are mostly related to the meaning of the words or to the feelings they evoke. Winskel (Reference Winskel2013) and Harris (Reference Harris2004) asked participants to rate words’ pleasantness and unpleasantness, respectively. With these ambiguous instructions, the aspect on which participants focused to provide their ratings cannot be known. Similarly, the instructions by Garrido and Prada (Reference Garrido and Prada2021) are somewhat ambiguous because they asked participants to rate the extent to which they thought the word generated positive or negative feelings. It is not clear from these instructions if participants actually focused on the feelings the words produced in them (feeling-focused ratings) or on their knowledge about the capacity of the words to evoke some feelings (knowledge-focused ratings). The studies by Caldwell-Harris et al. (Reference Caldwell-Harris, Tong, Lung and Poo2011), Imbault et al. (Reference Imbault, Titone, Warriner and Kuperman2020), and Vélez-Uribe and Rosselli (Reference Vélez-Uribe and Rosselli2019), by contrast, had less ambiguous instructions, asking participants to pay attention to the feelings that the words produced in them.
It is not necessary to explicitly focus on one's feelings to be affected by the emotional content of words. In fact, there is ample evidence of physiological, electrophysiological, and neuroimaging effects of emotional words in tasks where participants’ attention was focused on aspects other than the feelings evoked by words, such as in word recognition tasks (e.g., Conrad et al., Reference Conrad, Recio and Jacobs2011; Degner et al., Reference Degner, Doycheva and Wentura2012; Sulpizio, Toti, Del Maschio, Costa, Fedeli, Job & Abutalebi, Reference Sulpizio, Toti, Del Maschio, Costa, Fedeli, Job and Abutalebi2019; Toivo & Scheepers, Reference Toivo and Scheepers2019). However, as pointed out by Caldwell-Harris et al. (Reference Caldwell-Harris, Tong, Lung and Poo2011), when it comes to bilingual emotional processing, asking bilinguals to rate words or expressions for pleasantness/valence or emotional intensity may not be the most suitable approach, because they can rely on rote cultural knowledge of the meaning of the word. The ratings most sensitive to language-learning history seem, by contrast, to be those obtained by asking participants about the personal emotional feelings evoked by the stimuli (i.e., feeling-focused ratings). However, this is not enough to test the proposal made by Pavlenko (Reference Pavlenko2012). To that end, it is necessary to compare these feeling-focused ratings with ratings focused on the meaning of words. This, which has not been done before, is what we have done in the present study. Importantly, the few studies conducted with images using such distinction (Hamzani et al., Reference Hamzani, Mazar, Itkes, Petranker and Kron2020; Itkes & Kron, Reference Itkes and Kron2019) show that ratings obtained with feeling-focused instructions are better predictors for the physiological response of participants than those obtained with knowledge-focused instructions. Taking this into consideration, applying that distinction to the field of emotional word processing in the native and the non-native language is a promising methodological approach to better characterize such processing.
The most relevant finding in this study is the interaction between group and type of rating. This interaction indicates that the difference between bilinguals’ and native speakers’ ratings is more pronounced in the feeling-focused condition than in the knowledge-focused condition. It also indicates that the difference between the feeling-focused condition and the knowledge-focused condition is larger for bilinguals than for native speakers. This result goes beyond what previous research has indicated (Caldwell-Harris et al., Reference Caldwell-Harris, Tong, Lung and Poo2011; Garrido & Prada, Reference Garrido and Prada2021; Harris, Reference Harris2004; Imbault et al., Reference Imbault, Titone, Warriner and Kuperman2020; Vélez-Uribe & Rosselli, Reference Vélez-Uribe and Rosselli2019), because it demonstrates that, although differences between native speakers and bilinguals in affective ratings are observed both when they focus on their knowledge of the words or on the feelings they produce, they are larger in the latter. These findings are in line with Pavlenko's (Reference Pavlenko2012) proposal, suggesting that, although bilinguals know the emotional meaning of words in the second language, they may not feel them with the same intensity as native speakers. This is the case at least for bilinguals who have acquired their L2 mostly in instructional settings, like the participants of this study. Such difference in the feelings produced by L2 words might be reduced or even disappear when bilinguals have learnt the L2 by immersion, in which case they may have the chance to associate the foreign words with sensory and affective experiences.
Apart from the main result found here, we also expected some differences between the various types of emotional words involved in the study. Although the literature about emotional word processing has not commonly distinguished between emotion words and emotion-laden words, there is evidence of differences in their processing, both in the native language (e.g., Kazanas & Altarriba, Reference Kazanas and Altarriba2015) and in the non-native language of bilingual speakers (Zhang, Wu, Yuan & Meng, Reference Zhang, Wu, Yuan and Meng2020). The findings of these studies seem to suggest that emotion words have an advantage in activating emotions over emotion-laden words (see also Wu & Zhang, Reference Wu and Zhang2020, for an overview). In this study, we predicted that the difference in affective ratings between native speakers of English and EP–English bilinguals might be larger for emotion-laden words than for emotion words because the affective content of emotion-laden words probably comes from their association with affective experiences, being more prone to individual and culture differences.
Our results supported that prediction in relation to negative words, but not to positive words. In this last case, only a main effect of type of emotional word emerged, which was of the same magnitude for native speakers as for bilinguals. Therefore, these results suggest that, when it comes to positive words, the fact that their affective load comes from their denotative meaning (i.e., emotion words) or from associative mechanisms (i.e., emotion-laden words) does not modulate the difference in affective ratings between native speakers and bilinguals. By contrast, when it comes to negative words, the distinction between emotion words and emotion-laden words matters: the difference in ratings’ intensity between native speakers and bilinguals is larger for the latter, as revealed by the interaction between group and type of emotional word. This suggests that the overlap between the meanings of (and the feelings produced by) negative emotion words across languages is larger than that of negative emotion-laden words.
There is no clear explanation at present for the different pattern of findings between positive and negative words in relation to the emotion word/emotion-laden word distinction. It is worth noting that the three-way interaction obtained in the direct comparison between positive and negative words (i.e., the analyses on the bipolar valence score) also revealed a slight difference between these two types of words in feeling-focused ratings. Concretely, native speakers produced a more differentiated bipolar valence score (i.e., more positive ratings for positive words and more negative ratings for negative words, indicating more extreme scores) than bilingual speakers. Importantly, although this was true for all the words, the difference between native speakers and bilingual speakers was larger for negative words than for positive words. This result is in line with previous findings showing larger differences in affective processing between L1 and L2 with negative words in comparison to positive words (Jończyk et al., Reference Jończyk, Boutonnet, Musiał, Hoemann and Thierry2016; Wu & Thierry, Reference Wu and Thierry2012). Our results also suggest that this is especially evident when participants focus on the feelings produced by the words (i.e., the difference between positive and negative words was not observed with knowledge-focused ratings). However, considering the subtle effects found here in relation to this issue, more research is needed to examine the effects of valence on bilinguals’ affective ratings and their possible interaction with the type of emotional word (emotion words vs. emotion-laden words).
To sum up, we have collected affective ratings for words and conducted the first systematic comparison between feeling-focused instructions vs. knowledge-focused instructions. The results show that bilinguals produce attenuated affective ratings in their L2 in comparison to native speakers. Such difference is more pronounced when participants focus on the feelings produced by the words than when they focus on their meaning. These findings provide both theoretical and methodological contributions to the literature on the subject. From a theoretical point of view, they inform us about the perception of affective language in bilingual speakers, highlighting the relevance of the distinction between feelings and meanings. From a methodological point of view, they show the need for incorporating such distinction in normative studies from which researchers obtain their experimental stimuli. An important direction for future studies would be to provide additional validity to the distinction between feeling-focused ratings and knowledge-focused ratings by using psychophysiological measures, as has been done with images. Future research should also examine in more depth the relevance of the distinction between different types of emotional words (i.e., emotion words vs. emotion-laden words; positive vs. negative words) in relation to affective processing in bilinguals.
Competing interests
The authors declare none.
Data availability
The data that support the findings of this study are openly available in figshare at https://doi.org/10.6084/m9.figshare.19182248.v1.
Acknowledgements
This study was supported by the Ministerio de Ciencia, Innovación y Universidades of Spain (PID2019-107206GB-I00, RED2018-102615-T), and by the Universitat Rovira i Virgili (2019PFR-URV-B2-32). This has been also supported by the Foundation for Science and Technology (FCT) through the Portuguese State Budget (UIDB/01662/2020).
Appendix A. List of critical stimuli used in the study, divided by condition (sorted by alphabetical order):

Appendix B. Instructions for the knowledge-focused questionnaires:
“In this questionnaire you will be presented with a set of words. We are asking you to report what you think about the emotional meaning of these words. Some words have an emotional meaning (i.e., they refer to something positive or negative) while other words do not. We want to make a distinction between what you think and what you feel: what you think about the meaning of a word and how that word makes you feel inside. For instance, imagine you saw on the screen the word GUN. It is possible that you read that word, and know that it has a negative emotional meaning (i.e., it means something negative), but you don't really feel anything inside. On the other hand, you may read the word GUN and feel strong emotions inside.
In this questionnaire we are asking you to report what you think about the meaning of the word, and not about your feelings. Please report what you think about the emotional meaning of the word, the degree to which it is positive and/or negative.”
Appendix C. Instructions for the feeling-focused questionnaires:
“In this questionnaire you will be presented with a set of words. We are asking you to report about your feelings; about what you feel inside while reading each word. Reporting about your feelings is not always easy. Sometimes it is not exactly clear what to report when asked to report about your feelings. In these instructions we will explain precisely what we mean when we ask you to report about your feelings.
Let's begin with an example. Imagine you read in the screen the word GUN. We want to make a distinction between what you FEEL and what you THINK: how the word makes you FEEL INSIDE and what you THINK ABOUT the meaning of the word. For example, you may read the word GUN and feel strong emotions inside. However, you may know that a word has a negative meaning, but you don't really feel anything inside when reading it. That is to say, you may know that guns are negative but you yourself don't feel anything when reading that word.
In this questionnaire we are asking you to report how the word actually made you feel -- if your feelings when reading the word were positive and/or negative.
We are not asking what you think about the meaning of the word, your opinion about the word, how you expected yourself to feel, or what you thought we expected you to feel; just the actual feelings that you experienced while reading the word – how you actually felt.
Compare yourself to a smoke detector that lights up when it detects smoke. In the same way think of yourself as a feelings detector. Similar to a smoke detector that lights up when it detects smoke, the feelings detector lights up when it detects feelings.”