Hostname: page-component-745bb68f8f-cphqk Total loading time: 0 Render date: 2025-02-06T07:48:30.634Z Has data issue: false hasContentIssue false

Palatal is for happiness, plosive is for sadness: evidence for stochastic relationships between phoneme classes and sentiment polarity in Hungarian

Published online by Cambridge University Press:  23 September 2022

Réka Benczes*
Affiliation:
Department of Communication and Media, Corvinus University of Budapest, Budapest, Hungary
Gábor Kovács
Affiliation:
Department of Communication and Media, Corvinus University of Budapest, Budapest, Hungary
*
*Corresponding author. Email: reka.benczes@uni-corvinus.hu
Rights & Permissions [Opens in a new window]

Abstract

The past couple of decades have seen a substantial increase in linguistic research that highlights the non-arbitrariness of language, as manifested in motivated sound–meaning correspondences. Yet one of the challenges of such studies is that there is a relative paucity of data-driven analyses, especially in the case of languages other than English, such as Hungarian, even though the proportion of at least partially motivated words in Hungarian vocabulary is substantial. We address this gap by investigating the relationship between Hungarian phoneme classes and positive/negative sentiment based on 3,023 word forms retrieved from the Hungarian Sentiment Lexicon. Our results indicate that positive polarity word forms tend to contain more vowels, front vowels, continuants, fricatives, palatals, and sibilants. On the other hand, negative sentiment polarity words tend to have more rounded vowels, plosives, and dorsal consonants. While our analysis provides strong evidence for a set of non-arbitrary form–meaning relationships, effect sizes also reveal that such associations tend to be fairly weak tendencies, and therefore sentiment polarity cannot be derived from the relative frequencies of phoneme classes in a deterministic fashion.

Type
Article
Copyright
© The Author(s), 2022. Published by Cambridge University Press

1. Introduction

1.1. Background

Over a century ago, the Swiss linguist Ferdinand de Saussure (1915/Reference Saussure, Bally, Sechehaye, Riedlinger and Baskin1959) established the doctrine of the arbitrariness of the linguistic sign, meaning that the relationship between a signifier (i.e., the form—the sound shape of a word) and the signified (i.e., the meaning or concept that the word refers to) is mostly arbitrary; there is nothing about the sequence [tri:] that would allow a non-English speaker to figure out that it denotes a perennial plant with a trunk. According to this view, there is no similarity or resemblance between language and reality—it is arbitrariness that allows us to make use of and exploit the full range of possibilities in language (Hockett, Reference Hockett1960; Lupyan & Winter, Reference Lupyan and Winter2018). Yet while the Saussurean notion of the arbitrary sign has been a constant in linguistic theorizing, the past couple of decades have seen a substantial increase in linguistic research that downplays arbitrariness and highlights motivationFootnote 1 as an organizing principle in language (e.g., Benczes, Reference Benczes2019; Cuyckens et al., Reference Cuyckens, Berg, Dirven and Panther2003; De Cuypere, Reference De Cuypere2008; Dingemanse et al., Reference Dingemanse, Blasi, Lupyan, Christiansen and Monaghan2015, Reference Dingemanse, Perlman and Perniss2020; Jakobson & Waugh, Reference Jakobson and Waugh1979; Nänny & Fischer, Reference Nänny and Fischer1999; Perniss et al., Reference Perniss, Thompson and Vigliocco2010; Perniss & Vigliocco, Reference Perniss and Vigliocco2014; Simone, Reference Simone1995).

Motivation in language comes to the fore in the form of sound–meaning correspondences—in other words, non-arbitrary relationships between particular sounds and the properties of a referent. Obvious examples include onomatopoeia—words that “resemble” the sound they represent, as in the case of baa, click or woof (Assaneo et al., Reference Assaneo, Nichols and Trevisan2011). Further examples of sound–meaning pairings include so-called phonesthemes, that is, recurring sound sequences that are not contrastive morphemes but can nevertheless be associated with a particular meaning. For example, the gl-phonestheme in English can be found in the initial segment of words connected to vision and light (Bergen, Reference Bergen2004), as in gleam, glimmer, glitter, glisten, glow, and so on.

Research on sound–meaning correspondences has provided plenty of evidence for non-arbitrariness across a wide range of phenomena in language (Schmidtke et al., Reference Schmidtke, Conrad and Jacobs2014). Sound–meaning correspondences have been shown to affect language evolution (Johansson et al., Reference Johansson, Carr and Kirby2021; Vinson et al., Reference Vinson, Jones, Sidhu, Lau-Zhu, Santiago and Vigliocco2021) and have also been shown to play a key function in language acquisition (Asano et al., Reference Asano, Imai, Kita, Kitajo, Okada and Thierry2015; Imai & Kita, Reference Imai and Kita2014; Imai et al., Reference Imai, Kita, Nagumo and Okada2008; Laing, Reference Laing2014; Maurer et al., Reference Maurer, Pathman and Mondloch2006; Perry et al., Reference Perry, Perlman and Lupyan2015, Reference Perry, Perlman, Winter, Massaro and Lupyan2018). In light of the growing and converging body of evidence, Dingemanse et al. (Reference Dingemanse, Blasi, Lupyan, Christiansen and Monaghan2015) have thus proposed that arbitrariness is able to account only partially for the various relationships that can exist between form and meaning, implying that non-arbitrariness is a complementary (and basic) principle in language that facilitates word learning and communication on account of perceptuomotor analogies.

1.2. Sound symbolism research across languages

One particular area that testifies to non-arbitrariness is sound symbolism—in other words, the linkage of certain vowels, consonants, and suprasegmentals with visual, tactile, perceptual (such as size or shape) or other sensory properties (Hinton et al., Reference Hinton, Nichols, Ohala, Hinton, Nichols and Ohala1994). Over the decades, a substantial amount of research has emerged on a wide variety of languages, especially English (for an overview, see Benczes, Reference Benczes2019), but also including American Indian languages (Aoki, Reference Aoki, Hinton, Nichols and Ohala1994; Jacobsen, Reference Jacobsen, Hinton, Nichols and Ohala1994; Nichols, Reference Nichols1971; Silverstein, Reference Silverstein, Hinton, Nichols and Ohala1994), African languages (Childs, Reference Childs, Hinton, Nichols and Ohala1994), Australian languages (Alpher, Reference Alpher, Hinton, Nichols and Ohala1994; Haynie et al., Reference Haynie, Bowern and LaPalombara2014), Chinese (Karlgren, Reference Karlgren1962; Lapolla, Reference Lapolla, Hinton, Nichols and Ohala1994), Japanese (Hamano, Reference Hamano, Hinton, Nichols and Ohala1994), Korean (Kim, Reference Kim1977), Dutch (Klamer, Reference Klamer2002), Danish (Jespersen, Reference Jespersen1918), French (Peterfalvi, Reference Peterfalvi1970), German (Hilmer, Reference Hilmer1914), Modern Greek (Joseph, Reference Joseph, Hinton, Nichols and Ohala1994), Finnish (Austerlitz, Reference Austerlitz, Hinton, Nichols and Ohala1994) and American Sign Language (Thompson et al., Reference Thompson, Perlman, Lupyan, Sevcikova Sehyr and Emmorey2020), to name but a few (and see Hinton et al., Reference Hinton, Nichols, Ohala, Hinton, Nichols and Ohala1994; Taylor & Taylor, Reference Taylor and Taylor1965, for further examples). This prompted Lahti et al. (Reference Lahti, Barrett and Webster2014, p. 335) to propose that sound symbolism can “appear in many, if not all, of the world’s languages, spoken in places ranging from India to Latin America, from Japan to Africa.” Thanks to this ever-growing body of research there is ample evidence to claim that the sound symbolic qualities for at least some features of language are most probably universal—such as the linkage of certain sound qualities to (1) size (going all the way back to Sapir, Reference Sapir1929, and see also Preziosi & Coane, Reference Preziosi and Coane2017; Shinohara & Kawahara, Reference Shinohara and Kawahara2010; Ultan, Reference Ultan and Greenberg1978; and most recently Winter & Perlman, Reference Winter and Perlman2021); (2) shape (see the now classic study of Köhler, Reference Köhler1929; and see also Aveyard, Reference Aveyard2012; Knoeferle et al., Reference Knoeferle, Li, Maggioni and Spence2017; Ramachandran & Hubbard, Reference Ramachandran and Hubbard2001); and (3) proximity/distance (e.g., Johansson & Zlatev, Reference Johansson and Zlatev2013; Tanz, Reference Tanz1971; Woodworth, Reference Woodworth1991).

Despite this massive body of evidence, a number of challenges have been identified by Westbury et al. (Reference Westbury, Hollis, Sidhu and Pexman2018) in their extensive review of the available sound symbolic literature. One such challenge is the problem of restricted data sets—in other words, the tendency that researchers base their observations on specific subsets of language, without relying on larger data sets. In the past years there has been an evident upsurge in investigations building on large-scale and representative vocabularies, demonstrating sound and meaning correspondence effects across thousands of words—see, for example, Monaghan et al. (Reference Monaghan, Shillcock, Christiansen and Kirby2014) or Westbury et al. (Reference Westbury, Hollis, Sidhu and Pexman2018). Yet there is still a relative paucity of comprehensive, large-scale research on languages other than English (with the notable exceptions of Adelman et al., Reference Adelman, Estes and Cossu2018 and Ullrich et al., Reference Ullrich, Kotz, Schmidtke, Aryani and Conrad2016).

This problem is also acute in the case of Hungarian, although the proportion of motivated (partially iconic) words in the vocabulary—similarly to other Uralic languages—is substantial (Pomozi, Reference Pomozi, Kádár and Szilágyi2015). Lack of large-scale, data-driven analyses can be traced back to the marginalized status of sound symbolism within Hungarian linguistics (see Molnár, Reference Molnár1993 for an overview). Generally speaking, the available Hungarian literature tends to consider sound symbolism as a stylistic—and not as a linguistic—issue (Székely, Reference Székely, Kádár and Szilágyi2015, p. 17), which means that the focus has mainly been on its use and function in a very restricted subset, namely, poetic language (Fónagy, Reference Fónagy1959, Reference Fónagy1960, Reference Fónagy1961; Molnár, Reference Molnár1993; Szathmári, Reference Szathmári1970, Reference Szathmári and Ortutay1980). Thus, limited observations and sporadic conclusions have been drawn on the sound symbolic nature of certain sounds. Tsur (Reference Tsur2006) for example has noted that front vowels in Hungarian can be linked with closeness, while back vowels symbolize distance. Fónagy (Reference Fónagy1959, Reference Fónagy1961), has pointed out that the use of /k/, /t/ and /r/, as opposed to /l/, /m/ and /n/, are more typical of poems with an “aggressive” tone, while the latter group is more frequent in poems with a “tender” tone (see also Boda & Porkoláb, Reference Boda and Porkoláb2013). Molnár (Reference Molnár1993) attempted to link individual phonemes with 16 semantic dimensions that included physical, perceptual, and emotional features, such as light–heavy, soft–hard, warm–cold, peaceful–aggressive, happy–sad, beautiful–ugly, which rankings were then tested on poetic texts. Perhaps not surprisingly, Molnár (Reference Molnár1993, p. 150) concluded that there were “substantial differences” among language users when it came to the perception and comprehension of sound symbolism. More recent studies on Hungarian sound symbolism (cf. Kádár & Szilágyi, Reference Kádár and Szilágyi2015) similarly draw on relatively restricted data sets, such as onomatopoeic verbs (Benő & Szilágyi, Reference Benő, Szilágyi, Kádár and Szilágyi2015), a set of verb clusters (Dimény, Reference Dimény2018; Szili, Reference Szili, Kádár and Szilágyi2015), nonce words (Elsen et al., Reference Elsen, Németh and Kovács2021), or texture vocabulary (Winter et al., Reference Winter, Sóskuthy, Perlman and Dingemanse2022).

1.3. Aims and scope of present study

Given the scarcity of large-scale investigations in languages other than English, the present article expands existing research by applying an inferential statistical analysis on a database of Hungarian word forms, unrestricted by syntactic category or semantic domain. We wish to focus on one particular semantic dimension, namely positive/negative polarity, with the help of sentiment analysis. Despite the widespread application of sentiment analysis in general (see Cambria et al., Reference Cambria, Das, Bandyopadhyay and Feraco2017; Lei & Liu, Reference Lei and Liu2021), and its overall suitability for sound symbolic research, as it is able to link a binary semantic feature to a phonological property, academic articles are relatively few within a sound symbolic context. One prominent example is Igarashi et al. (Reference Igarashi, Sasano, Takamura and Okumura2013), who demonstrated that the semantic polarity of unknown onomatopoeic words in Japanese can be relatively accurately estimated by relying on N-gram features of phonetic representation and the consonant category features of phonetic symbols. Following Russell’s (Reference Russell1980) model of emotions, the term “valence” is also used in cognitive science and/or the psycholinguistic literature for positive/negative emotional properties (e.g., Auracher et al., Reference Auracher, Albers, Zhai, Gareeva and Stavniychuk2010; Myers-Schulz et al., Reference Myers-Schulz, Pujara, Wolf and Koenigs2013). Emotional qualities are also discussed under “affect” (e.g., Schmidtke et al., Reference Schmidtke, Conrad and Jacobs2014) and “affective iconicity” (e.g., Aryani et al., Reference Aryani, Conrad, Schmidtke and Jacobs2018). Nevertheless, we will use the term “(sentiment) polarity” in the present article, which is the conventional label in sentiment analysis research (Igarashi et al., Reference Igarashi, Sasano, Takamura and Okumura2013).Footnote 2

The investigation of the relationship between the sound structure and the emotional character of words (also referred to as “emotional sound symbolism”—see Adelman et al., Reference Adelman, Estes and Cossu2018) is by no means new in the academic literature. Its roots can be traced back to the study of human emotional vocalization; in other words, the study of how vocal cues are able to carry emotion information (Gendron et al., Reference Gendron, Roberson, van der Vyver and Barrett2014; Sauter et al., Reference Sauter, Eisner, Ekman and Scott2010). As Phillip and Majid (Reference Phillips and Majid2011, p. 1) point out, emotional vocalizations were most probably the principal form of communication in the prelinguistic era and emotions can very well be “a hotbed of sound symbolism.”

A variety of emotional qualities have been within the focus of attention in recent investigations, demonstrating that sound–meaning correspondences can be established between individual sounds and emotional qualities. For example, /i:/ is more positive and /o:/ is more negative (Rummer & Schweppe, Reference Rummer and Schweppe2019; Yu et al., Reference Yu, McBeath and Glenberg2021) and /l/ is associated with pleasantness and /r/ with unpleasantness (Whissell, Reference Whissell1999). Correspondences have also emerged between the relative position of phonemes within a word and emotional quality (Adelman et al., Reference Adelman, Estes and Cossu2018; Louwerse & Qu, Reference Louwerse and Qu2017), as well as phoneme classes and emotional qualities. With regard to this latter aspect, happiness has been found to be associated with plosive sounds and sadness with nasal sounds (Auracher et al., Reference Auracher, Albers, Zhai, Gareeva and Stavniychuk2010). Short vowels, voiceless consonants and hissing sibilants have been found, however, to make words sound more “negative” (Aryani et al., Reference Aryani, Conrad, Schmidtke and Jacobs2018). The present study contributes to this line of research by focusing on the relationship between natural classes of Hungarian phonemes and positive/negative polarity. The article thus aims to investigate the degree to which the relative frequencies of phonological classes in a non-Indo-European language, Hungarian, are stochastically related to the positive/negative sentiment distinction, and the degree to which the sound symbolic property of phoneme classes might correspond to such patterns already established in international literature. Our research questions are the following:

  1. 1. RQ1: Is sentiment polarity significantly related to the relative frequency of any of the well-established natural classes of phonemes?

  2. 2. RQ2: If significant relationships can be detected, which particular natural classes are associated with negative/positive sentiment polarity?

  3. 3. RQ3: To what extent can sentiment polarity be predicted from the word form by a model that uses the relative frequencies of several natural classes as predictors?

In line with the results of emotional sound symbolism research cited above, we expect that many of the examined phonological variables are related to the positive/negative sentiment distinction, and these differences are highly unlikely to have arisen by chance—thus highlighting the potentiality of predictable correspondences between the sound shape of a word and its sentiment.

2. Material and methods

2.1. The database

All analyses presented in this article are based on the content of the Hungarian Sentiment Lexicon (HSL), developed by the company Precognox and made freely available for non-commercial use (Szabó, Reference Szabó, Gecső and Sárdi2015). The HSL was created on the basis of WordNet Affect (Strapparava & Valitutti, Reference Strapparava and Valitutti2004) and contains 1,748 positive and 5,940 negative words. The raw data for this study was therefore compiled independently of us, and—most importantly—the assignment of semantic polarity categories to individual words was originally accomplished by a series of well-documented procedures purely based on the semantic content of the words and the semantic relationships between items in the WordNet database. The subsequent conversion of the WordNet Affect word lists into Hungarian by Precognox was a multistage process. Initially, an automatic, machine-based translation of the English word lists was derived. This material was then manually checked and corrected, and finally the lists were expanded utilizing the content of a set of Hungarian thesauri. For our purposes, it is particularly important to note that the original sentiment polarities were preserved at each stage of this process and thus could not be affected by the phonemic content of the corresponding Hungarian word forms.

A major advantage of basing our analysis on the data in an already available sentiment lexicon is that the potential threat of selection bias can be effectively eliminated in this manner. A closer inspection of the content of HSL, however, revealed another potential source of bias: in addition to word stems, HSL contains a large number of compound words, derived words, and inflected forms, and some morphemes recur as components of such complex items multiple times. Because such (free and bound) morphemes may already be associated with a positive or negative sentiment polarity, their repeated occurrence in the data set may artificially inflate the stochastic relationships between their phonemic make-up and sentiment. To control for this unwanted effect, we decided to limit our analysis to the word stems in HSL. After reducing HSL to its intersection with a database of Hungarian word stems (Elekfi, Reference Elekfi1994), we removed a single item, tbc (“tuberculosis”) which—due to its exceptional spelling—could not be handled by our algorithm that converted the graphemic representation of all words into a phonemic code. Our final dataset contains a total of 3,023 word forms, the majority of which (73.6%, i.e., 2,224 items) carry negative sentiment, while the remaining 26.4% (799 items) belong to the positive category.

2.2. Converting letters to phonemes

In order to be able to capture some important attributes of the phonemic make-up of the items, we first had to transform the original orthographic forms of the words into a phonological representation, in which each phoneme corresponds to one and only one character. In Hungarian orthography the mappings between letters and phonemes are fairly regular, although some consonant sounds are represented by diagraphs: for example, sz is used to denote a single phoneme /s/. Long consonants (geminates) are denoted by doubling the consonant letter; in the case of digraphs (within word stems) only the first character is doubled, resulting in trigraphs such as ssz denoting /sː/ or /ss/ (depending on the analysis). Non-native letters whose occurrence is limited to foreign loan words (q, w and x) also have standard phonemic equivalents. Therefore, the underlying phonemic representation of a word can be restored quite reliably from its written form by applying a series of replacements. In our case, this was accomplished by an Excel function (programmed in VBA) which carried out the following substitutions in all words in all positions (in each case the equivalent IPA symbols are shown between slashes after our own notation utilizing capital letters): ccs → CC /ʧː/, ggy → GG /ɟː/, lly → jj /jː/, nny → NN /ɲː/, ssz → SS /sː/, tty → TT /cː/, zzs → ZZ /ʒː/, dzs → JJ /ʤː/ intervocalically or word-finally or J /ʤ/ otherwise, cs → C /ʧ/, gy → G /ɟ/, ly → j /j/, ny → N /ɲː/, sz → S /s/, ty → T /c/, zs → Z /ʒ/, x → kS /ks/, qu → kv /kv/, w → v /v/. In line with the arguments advanced by Siptár (Reference Siptár and Kiefer1994, Reference Siptár and Kiefer2016)), we did not consider dz as a digraph corresponding to an underlying phoneme and analysed it as a /d/+/z/ sequence.

It is important to point out that the above procedure is not perfect, as it is unable to resolve ambiguities that may arise in compound words at morpheme boundaries: for instance, in a compound word like pácsó (pác + ; “pickling salt”) the above algorithm will misanalyse the cs character sequence as a digraph “cs” representing the /ʧ/ phoneme, whereas in fact it corresponds to two monographs—“c” and “s” –, representing the /ʦʃ/ phoneme sequence with an underlying morpheme boundary in between. However, because our data was limited to word stems, such issues could not arise.

Another feature of Hungarian orthography which we needed to address is the occasional occurrence of consonant sequences that appear to violate the principle of voice assimilation in Hungarian. Voice assimilation is a phonological process that applies within words as well as across all morpheme and word boundaries. The principle states that two consecutive obstruents cannot differ in terms of voicing: if such a situation occurred through suffixation, compound formation or in connected speech across a word boundary, the first consonant will change its voicing so that it matches the second consonant.Footnote 3 The orthographic form of some Hungarian word stems (typically loan words) contain voiced + voiceless or voiceless + voiced obstruent sequences, although Hungarians pronounce such words in accordance with the voice assimilation principle: for instance, the surface realization of abszolút (“absolute”) contains a /p/ rather than a /b/ sound: /ɒpsoluːt/. An interesting theoretical question is whether such stems underlyingly contain obstruent clusters that match the written form (e.g., /bs/) or clusters that match the surface representations (e.g., /ps/). While both of these solutions would result in the same phonetic output (after assimilation takes place), certain tendencies in the formation of diminutives and slang words suggest that the underlying representation is equivalent to the form actually pronounced. For instance, the slang term medzsó (“toy model car”), derived from the brand name Matchbox, has preserved the voiced phoneme suggested by the pronunciation /ʤ/, rather than its voiceless counterpart: /mɛʤbɒks/ → /mɛʤoː/. Examples like zsepi, the diminutive form of zsebkendő (“handkerchief,” literally: “pocket cloth”; /ʒɛb/ + /kɛndøː/ → /ʒɛpkɛndøː/ → /ʒɛpi/), indicate that the same principle applies even to compound nouns. In line with these considerations, we analysed all obstruent clusters that were incongruent in terms of voicing as orthographic peculiarities, and therefore our algorithm applies Hungarian voice assimilation rules iteratively (as described in Siptár, Reference Siptár, Kiss, Kiefer and Siptár1998, pp. 331–335).

After all necessary grapheme-to-phoneme replacements and the application of voicing assimilation rules, the resulting form was regarded as an underlying phonemic code and was used as the basis of all further analysis.

2.3. Identifying phonemic attributes potentially related to sentiment polarity

The idea of sound symbolism in language, including non-controversial phenomena such as onomatopoeia or literary examples, is based on the intuitive observation that the acoustic properties and/or the physical movements involved in the articulation of certain phonemes (or more precisely: phones) invoke distinct associations related to other forms of perceptual experience. In pedagogical grammars (see e.g., Nádasdy, Reference Nádasdy2006), palatals are often described as “soft,” velar plosives as “hard,” and even in main-stream phonology we find labels such as “liquids” for /r/ and /l/. In Hungarian descriptive grammar, front and back vowels are traditionally called “high” and “low,” respectively. It is beyond the scope of our study to test whether—and to what extent—such impressions reflect common agreement. These observations, however, indicate that in an exploratory study such as ours, it is potentially fruitful to examine whether the relative frequencies of particular classes of phonemes (rather than the occurrence of individual phonemes) are stochastically related to certain semantic attributes, such as sentiment.

Our analysis is directly based on the classification of Hungarian phonemes suggested by Siptár (Reference Siptár, Kiss, Kiefer and Siptár1998). In this framework, each class represents a group of phonemes which are similar to one another in terms of their place and/or manner of articulation, acoustic properties and their involvement in phonological processes. The classes are derived from a basic set of elements each of which may be present or absent in a particular phoneme’s representation. The elements are denoted by capital letters (vowels: I, U, A; consonants: P, Y, K, Z, N, L, R, H, S). In addition, the root node labels and are also used to distinguish obstruents from sonorants, respectively.

We measured the phonological make-up of the words through a set of 19 variables, each representing the proportion of phonemes belonging to a given class relative to the total number of phonemes, vowels or consonants in the word. We derived 13 of our phonological variables from Siptár’s (Reference Siptár, Kiss, Kiefer and Siptár1998) system of elements, and in the case of consonants we added five more classes (viz. voiced consonants, plosives, fricatives, affricates and dentals), which do not directly correspond to elements in this system but are nevertheless traditionally recognized categories and are also present in the tabular classification of consonants presented by Siptár (p. 328). Finally, we added a variable to represent the proportion of vowels relative to the total number of phonemes in the word. Because vowels and consonants are complementary categories, it is sufficient to include only one of these classes in the analysis. For the same reason, we included the proportion of sonorants (relative to the total number of consonants) as a variable, but excluded the complementary category of obstruents. The only semantic variable used in the study was sentiment polarity (positive vs. negative), taken directly from the HSL (Szabó, Reference Szabó, Gecső and Sárdi2015). Sentiment polarity was also the only qualitative variable in the study.

The full list of our 20 variables is shown in Table 1. For labelling the variables, we use the standard notation for (conditional) probabilities: for instance, P(A|V) means the probability that a randomly selected phoneme from a given word is non-high, given that the phoneme is a vowel.

Table 1. List of variables

2.4. Statistical analyses

Visual inspection of histograms showed that all variables deviated considerably from a normality. Considering the nature of these variables, this result is not at all unexpected. For instance, because of the principle of vowel harmony, the value of P(I|V) is equal to 0 or 1 in the majority of Hungarian word stems (because the vowels within a word form tend to be uniformly back or front). This phenomenon yields a bimodal distribution for P(I|V). Similarly, because rounded vowels are considerably less frequent than unrounded ones, the distribution of P(U|V) is skewed to the left, and because non-high vowels are more common than high vowels, the distribution of P(A|V) is right-skewed.

Due to these irregular distributional properties of the phonological variables, we chose to use a series of Mann–Whitney U-tests to address RQ1 and RQ2. The Mann–Whitney procedure is a distribution-free (non-parametric) test, and while it technically does not serve the purpose of comparing means, it is often viewed as an alternative to the independent samples t-test, and because it does not rely on the assumption of normality, deviations from the normal distribution do not pose a threat to its validity (Sheskin, Reference Sheskin2004, pp. 423–452). Because at this stage a series of 19 independent tests were conducted, we applied the Bonferroni correction and used the significance level of $ \alpha =0.05/19\approx 0.00263 $ to keep the familywise error rate under control. In this manner, the probability that at least one Type I error is made in the whole series of tests is kept below 5%.

To determine the extent to which sentiment polarity is predictable from our phonological variables (RQ3), we conducted a stepwise binary logistic regression analysis.Footnote 4 Specifically, we used a forward procedure based on likelihood ratios. The significance levels for entry and removal were set at the conventional levels of 0.05 and 0.10, respectively. The range of phonological variables included in the stepwise procedure as potentially significant predictors was restricted to the subset of variables that have been shown to be significantly related to sentiment polarity by the Mann–Whitney U-tests. As stepwise regression is known to routinely overfit (e.g., Mundry & Nunn, Reference Mundry and Nunn2008), the full data set was randomly partitioned into a training subset (70% of the words) which was used to construct the model, and a testing subset (the remaining 30% of the words), which was used for evaluating the predictive power of the model. This way, we were able to validate the model on a set of words which were not used for constructing the model. Finally, the model was evaluated by means of its receiver operating characteristic (ROC) curve, which provides both a graphical plot and a numerical measure describing what a binary classifier model can achieve (Fawcett, Reference Fawcett2006). We conducted all statistical analyses with IBM SPSS Statistics, Version 27.0 (IBM Corp., 2020). The raw data set, the SPSS syntax and the original output are available online (the link is provided in the Data Availability Statement at the end of the article).

3. Results

The Mann–Whitney U-tests revealed that nine out of the 19 phonological variables were significantly related to sentiment polarity at the Bonferroni-corrected significance level of $ p<0.00263 $ . Thus, the analysis provides strong evidence that nearly half of the phonological variables are related to the positive/negative sentiment distinction. Table 2 lists all significant associations with their common language effect sizesFootnote 5 (f; see McGraw & Wong, Reference McGraw and Wong1992), observed significance levels (p-values) and some examples to illustrate each effect. The inequality symbols (< and >) indicate whether a given class of phonemes occurs more commonly in negative or positive words.

Table 2. Significant associations between phonological variables and sentiment polarity as revealed by a series of Mann–Whitney U-tests

The pattern of results therefore indicates that positive polarity word forms tend to contain more vowels, front vowels, continuants, fricatives, palatals and sibilants. On the other hand, negative sentiment polarity words tend to have more rounded vowels, plosives and dorsal consonants. The remaining 10 phonological variables were not found to be significantly related to sentiment. Even for the phoneme classes that have been found to be significantly related to sentiment polarity, the common language effect sizes range between 0.540 and 0.581, indicating that the analysis revealed a set of rather weak tendencies. For instance, for all possible positive–negative word pairs selected from the database, the relative proportion of front vowels is higher in the positive polarity word than in the negative polarity word in 58.1% of the cases.

The nine phonological variables listed in Table 2 were entered in a stepwise binary logistic regression procedure as potential independent variables, which might turn out to be useful predictors of sentiment polarity, the dependent variable. The model was fitted to a randomly selected subset of the HSL database comprising 70% of the words. In its final iteration, the model included four predictors which are (in the order of inclusion): P(V), P(I|V), P(Y|C), and P(K|C). This means that once these predictors are present in the model, the inclusion of any one of the remaining five candidate predictors—that is, P(U|V), P(H|C), P(plosive|C), P(fricative|C), or P(S|C)—cannot significantly improve the model any further due to their shared variance with other variables already in the model. Each of the four predictors retained in the model are, however, significant ( $ p\hskip1.5pt \le \hskip1.5pt 0.014 $ in each case), even in the presence of all the other predictors, indicating that they make unique contributions to predicting sentiment polarity. Our regression model therefore takes complex information about the phonological form of a word and can use that information to estimate the probability that the word’s meaning is associated with positive (vs. negative) sentiment. The estimated parameters as well as the odd ratios associated with each predictor are summarized in Table 3. It must be noted that due to nature of the independent variables, the odds ratios correspond to the estimated multiplicative change in the odds that a word form falls in the positive sentiment category when the relative frequency of the given phoneme class increases from 0 to 100%.

Table 3. Estimated coefficients and odds ratios of the logistic regression model predicting the probability of positive sentiment polarity from the Relative frequencies of 10 phonemic classes in the word form

The ROC curve shown in Fig. 1 was generated using our testing subset, which comprised 30% of the items that were not used in constructing the model. The curve remains continuously above the diagonal, indicating that at any threshold setting the true positive rate will exceed the false positive rate.

Fig. 1. The Receiver Operating Characteristic (ROC) Curve of a Binary Logistic Regression Model Predicting Sentiment Polarity from the Relative Frequencies of Four Natural Phonological Classes in the Word Form

The area under the curve (AUC) turned out to be 0.632, which indicates that if the model is presented with a randomly selected positive–negative word pair, the value assigned to the positive word will be higher than that assigned to the negative word in 63.2% of the cases (Fawcett, Reference Fawcett2006). Therefore—in the majority of cases—the model will correctly determine which of the two words has positive sentiment, purely on the basis of the phonological form of the words. The ROC analysis also revealed that this success rate is significantly different ( $ p<0.00001 $ ) from the 50% level one would expect under random guessing. Other measures of effect size, viz. Cox and Snell $ {R}^2=0.041 $ and Nagelkerke $ {R}^2=0.060 $ , also indicate that even taken together, the phonological predictors in the model can account for only a small proportion of the variation of sentiment polarity. This result is fully in line with the effect sizes for the individual predictors (reported in Table 2), and also with our expectations, as we stipulate that the position of arbitrariness has been advocated by linguists for decades because the form–meaning associations in language are weak tendencies that can only be reliably detected by modern data analysis methods used on large data sets (for an analogous conclusion through their analysis of sound symbolism related to the semantic feature of size, see Winter & Perlman, Reference Winter and Perlman2021). Therefore, while it is evident that word meaning cannot be derived from the phonological form of a word in a deterministic fashion, our analysis provides strong evidence for a set of weak stochastic relationships between certain aspects of word form and one particular semantic attribute, viz. sentiment polarity.

4. Discussion

One of the challenges of sound symbolic research is that there is a relative paucity of investigations building on larger data sets (Westbury et al., Reference Westbury, Hollis, Sidhu and Pexman2018). This particular limitation is especially acute in the case of languages other than English—including the language that is the focus of the present research, that is, Hungarian. In response to this challenge, we have conducted a statistics-based analysis of the sentiment polarity of thousands of Hungarian words. Our results indicate that positive polarity word forms tend to contain more vowels, front vowels, continuants, fricatives, palatals and sibilants. On the other hand, negative sentiment polarity words tend to have more rounded vowels, plosives and dorsal consonants. In the following we will discuss the implications from the point of view of (1) Hungarian sound symbolic research; and (2) general studies on emotional sound symbolism.

4.1. Implications for Hungarian sound symbolic research

The findings partially corroborate Fónagy’s (Reference Fónagy1961) observation on the emotional tone of Hungarian poetic text and sound symbolic quality. More specifically, Fónagy found that poems containing plosives are considered as aggressive, which has been supported by our finding as well: plosives occurred more frequently in negative sentiment words. Comparisons to further patterns detected in previous research on Hungarian sound symbolism were not possible because (1) these studies focus on individual sounds (such as Molnár, Reference Molnár1993; Tsur, Reference Tsur2006); (2) they concentrate on particular sound schemas (such as syllable structure as in Benő & Szilágyi, Reference Benő, Szilágyi, Kádár and Szilágyi2015); or (3) the meanings that are associated with the sound structure are too specific (pertaining to a group of verbs, such as those connected to speaking, as in Szili, Reference Szili, Kádár and Szilágyi2015). Our findings also lend support and provide empirical, statistics-led evidence to the idea that sound symbolic relationships in Hungarian are not restricted to the realm of poetry, as it has been generally considered in the academic literature (Székely, Reference Székely, Kádár and Szilágyi2015, p. 17), but might be very much a part of everyday language use. Last but not least, the data question the subjective nature (cf. Boda & Porkoláb, Reference Boda and Porkoláb2013; Molnár, Reference Molnár1993) of Hungarian sound–meaning correspondences—at least in the case of particular classes of Hungarian phonemes and positive/negative sentiment distinction. Our results suggest that the sentiment of a particular text might be predicted on the basis of particular phonological features.

4.2. Implications for emotional sound symbolism research

If, however, the results of the present research implicate (at least to some degree) the predictability of sound symbolism in Hungarian, then the question necessarily arises whether the stochastic relationships that we have identified are language-specific or can be found in other languages as well. Due to the rather varied methodology and data that have been applied in previous studies, any comparison can offer only highly tentative conclusions—and only in the case of some phoneme classes. Nevertheless, our results do corroborate the tendency to associate positivity with /i:/ and negativity with /o:/ (Rummer & Schweppe, Reference Rummer and Schweppe2019): front and non-rounded vowels were associated with positive sentiment, whereas back and round vowels with negative sentiment. The tendency for words containing plosives to carry negative sentiment partially complements (Sidhu et al., Reference Sidhu, Deschamps, Bourdage and Pexman2019), who found that names with voiceless stops (as compared to sonorants) are considered as less agreeable.

Some of our results run counter to patterns of emotional sound symbolism established elsewhere: plosives in our dataset occurred more frequently in negative sentiment words, while Auracher et al.’s (Reference Auracher, Albers, Zhai, Gareeva and Stavniychuk2010, p. 21) cross-cultural analysis detected “positive feelings associated with high activation” in the case of plosive sounds. Note that Auracher et al. (Reference Auracher, Albers, Zhai, Gareeva and Stavniychuk2010) were also aware of the contradiction between their results and those of Fónagy (Reference Fónagy1961) with respect to plosive sounds, but argued that Fónagy’s “aggressive” versus “tender” distinction can be correlated with active and passive states, respectively—and from this vantage point Fónagy’s results agree with plosives’ “arousal axis” (Auracher et al., Reference Auracher, Albers, Zhai, Gareeva and Stavniychuk2010, p. 21). Nasality (linked with sadness in Auracher et al., Reference Auracher, Albers, Zhai, Gareeva and Stavniychuk2010) was not a significant predictor of sentiment polarity in our dataset. We did not detect any of the tendencies cited in Aryani et al. (Reference Aryani, Conrad, Schmidtke and Jacobs2018) either (i.e., the linkage of hissing sibilants with negativity). In fact, sibilants in the Hungarian data occurred more frequently with positive sentiment words (note that unlike Aryani et al., Reference Aryani, Conrad, Schmidtke and Jacobs2018, we did not make a distinction among sibilants).

One of the most intriguing results of our study were the dorsal consonants, which were associated more frequently with negative sentiment. Generally, consonants produced at the back of the throat have been characterized in the literature as being unpleasant and aggressive (cited in Elsen, Reference Elsen2017; Fónagy, Reference Fónagy1963; Thorndike, Reference Thorndike1945a, Reference Thorndike1945b), possibly because they require more effort to produce. In Whissel’s (Reference Whissell1999, p. 43) view, aggression associated with such sounds is due to the fact that they “share some of the muscular responses characteristic of the negative and active emotions of disgust and anger.” The negative sentiment associated with dorsal consonants chimes well with the negative sentiment associated with back vowels—which can then be contrasted with the positive sentiment that we have found to be associated with sounds produced at the front of the mouth cavity (such as front vowels or palatals).

We do not wish to claim that the similarities discussed above demonstrate the existence of possibly universal sound–meaning correspondences, but tendencies can be detected in the case of at least some phoneme classes. It thus seems that the conclusion we can draw is somewhat trivial: certain sound–meaning relationships may be language-specific, while others fit into patterns established in other languages (such as the more frequent occurrence of front vowels (and consonants) with positive sentiment words, as opposed to the more frequent occurrence of back vowels and dorsal consonants with negative sentiment words). However, the fact that nearly half of the phonological variables explored in our study have been found to be significantly related to sentiment polarity indicates that sound–meaning relationships are not likely to be sporadic phenomena confined to a restricted set of phonological features. Therefore, our results raise the possibility that non-arbitrariness is realized in the form of complex multiple mappings, in which each semantic feature is associated with several phonological characteristics and vice versa. Possible explanations for the existence of such patterns nicely underline the articulatory hypothesis (Garrido & Godinho, Reference Garrido and Godinho2021; Körner & Rummer, 2021; Zajonc et al., Reference Zajonc, Murphy and Inglehart1989).

5. Conclusions

The present study provides novel insights into Hungarian sound symbolism by providing statistics-based results on the relationship between particular classes of Hungarian phonemes and positive/negative sentiment distinction. The analysis provides strong evidence that many of the phonological variables we looked at are related to the positive/negative sentiment distinction, and these differences are statistically strongly significant. Such findings highlight the potentiality of predictable correspondences between the sound shape of a word and its sentiment.

The results—though partially—conform to patterns established in previous research on emotional sound symbolism, yet the small effect sizes also indicate that the links between phoneme classes and sentiment are no more than a set of fairly weak tendencies. This means that our results are primarily important from a theoretical perspective. Similar analyses targeting a set of further semantic distinctions (such as animate/inanimate, natural/artificial, physical/abstract, hard/soft, etc.) will be necessary to yield more detailed and precise insights into Hungarian sound–meaning correspondences.

Acknowledgments

We would like to thank the Department of Language Technology at the Research Institute for Linguistics (Hungarian Academy of Sciences), in particular Tamás Váradi and Bálint Sass, who made the database containing the entries in Elekfi’s Dictionary of Hungarian Inflections (Elekfi, Reference Elekfi1994) available to us.

Data availability statement

The raw data, the SPSS syntax used for our analyses, and the results (SPSS output) are available online at the following URL: https://osf.io/v7f9d/?view_only=1455645a004c4948a3000418042c3082.

Competing interests

The authors declare none.

Footnotes

1 In the article we use the term motivation in its Saussurean interpretation, according to which it refers to a non-arbitrary relationship between form and meaning.

2 Valence – as opposed to sentiment polarity – is a graded phenomenon. Measuring sentiment via a graded scale would be an important extension of the present study, but currently no Hungarian lexical database exists containing graded measures of valence for a large number of lexical items.

3 For a detailed description of voice assimilation, including the asymmetric behaviour of /v/ and /x/, see Siptár and Törkenczy (Reference Siptár and Törkenczy2000, pp. 75–94).

4 The literature on stepwise methods is to some extent controversial and several authors have stressed the point that the potential caveats associated with stepwise search procedures may outweigh their benefits. Mundry and Nunn (Reference Mundry and Nunn2008), for instance, advise against the use of stepwise model selection altogether due to the inflated Type I error rates. In the present study, we follow Hair et al.’s (Reference Hair, Black, Babin and Anderson2014) guidelines who mitigate such concerns by concluding that “[t]hese potential issues do not suggest that sequential search methods should be avoided, just that the researcher must realize the issues (pro and con) involved in their use” (p. 186). In particular, we are aware that the final model produced is likely to be just one of a set of potential models with similar levels of predictive power because stepwise procedures are highly affected by multicollinearity between the predictors. Therefore, our conclusions focus on the overall utility of the final model (i.e., to what extent sentiment can be predicted from the relative frequencies of phoneme classes?) and we do not draw theoretical inferences from the composition of the model per se.

5 Admittedly, Cohen’s d is a more widely used measure of effect size for the difference between two independent samples than the common language effect sizes we provide here. However, Cohen’s d is a measure closely associated with the Independent Samples t-test and is based on the assumptions of normality and homogeneity of variance. Because the distributional properties of our data indicate that those assumptions are violated, neither the t-test nor Cohen’s d would provide valid results.

References

Adelman, J. S., Estes, Z., & Cossu, M. (2018). Emotional sound symbolism: Languages rapidly signal valence via phonemes. Cognition, 175, 122130.CrossRefGoogle ScholarPubMed
Alpher, B. (1994). Yir-Yiront ideophones. In Hinton, L., Nichols, J., & Ohala, J. J. (Eds.), Sound symbolism (pp. 161177). Cambridge University Press.Google Scholar
Aoki, H. (1994). Symbolism in Nez Perce. In Hinton, L., Nichols, J., & Ohala, J. J. (Eds.), Sound symbolism (pp. 1522). Cambridge University Press.Google Scholar
Aryani, A., Conrad, M., Schmidtke, D., & Jacobs, A. (2018). Why is “piss” ruder than “pee”? The role of sound in affective meaning making. PloS One, 13(6), e0198430.CrossRefGoogle ScholarPubMed
Asano, M., Imai, M., Kita, S., Kitajo, K., Okada, H., & Thierry, G. (2015). Sound symbolism scaffolds language development in preverbal infants. Cortex, 63, 196205.CrossRefGoogle ScholarPubMed
Assaneo, M. F., Nichols, J. I., & Trevisan, M. A. (2011). The anatomy of onomatopoeia. PloS One, 6(12), e28317.CrossRefGoogle ScholarPubMed
Auracher, J., Albers, S., Zhai, Y., Gareeva, G., & Stavniychuk, T. (2010). P is for happiness, N is for sadness: Universals in sound iconicity to detect emotions in poetry. Discourse Processes, 48(1), 125.CrossRefGoogle Scholar
Austerlitz, R. (1994). Finnish and Gilyak sound symbolism—the interplay between system and history. In Hinton, L., Nichols, J., & Ohala, J. J. (Eds.), Sound symbolism (pp. 249260). Cambridge University Press.Google Scholar
Aveyard, M. E. (2012). Some consonants sound curvy: Effects of sound symbolism on object recognition. Memory & Cognition, 40(1), 8392.CrossRefGoogle ScholarPubMed
Benczes, R. (2019). Rhyme over reason: Phonological motivation in English. Cambridge University Press.CrossRefGoogle Scholar
Benő, A. & Szilágyi, S. N. (2015). Hangzásséma és motiváltság a hangutánzó és hangulatfestő igéink körében [Sound schema and motivation among our onomatopoeic and imitative verbs]. In Kádár, E. & Szilágyi, S. N. (Eds.), Motiváltság és nyelvi ikonicitás [Motivation and linguistic iconicity] (pp. 4357). Erdélyi Múzeum-Egyesület.Google Scholar
Bergen, B. K. (2004). The psychological reality of phonaesthemes. Language, 80(2), 290311.CrossRefGoogle Scholar
Boda, I. K. & Porkoláb, J. (2013). Hang-és színszimbolika a poétikai kommunikációban [Sound and colour symbolism in poetic communication]. Alkalmazott Nyelvészeti Közlemények, 8(2), 8796.Google Scholar
Cambria, E., Das, D., Bandyopadhyay, S., & Feraco, A. (Eds.) (2017). A practical guide to sentiment analysis. Springer International Publishing.CrossRefGoogle Scholar
Childs, G. T. (1994). African ideophones. In Hinton, L., Nichols, J., & Ohala, J. J. (Eds.), Sound symbolism (pp. 178204). Cambridge University Press.Google Scholar
Cuyckens, H., Berg, T., Dirven, R., & Panther, K.-U. (Eds.) (2003). Motivation in language: Studies in honor of Günter Radden. John Benjamins.CrossRefGoogle Scholar
De Cuypere, L. (2008). Limiting the iconic: From the metatheoretical foundations to the creative possibilities of iconicity in language. John Benjamins.CrossRefGoogle Scholar
Dimény, H. (2018). Sound symbolism and meaning patterns: The case of Hungarian verbs. Roczniki Humanistyczne, 66(11), 4757.CrossRefGoogle Scholar
Dingemanse, M., Blasi, D. E., Lupyan, G., Christiansen, M. H., & Monaghan, P. (2015). Arbitrariness, iconicity and systematicity in language. Trends in Cognitive Sciences, 19(10), 603615.CrossRefGoogle ScholarPubMed
Dingemanse, M., Perlman, M., & Perniss, P. (2020). Construals of iconicity: Experimental approaches to form–meaning resemblances in language. Language and Cognition, 12(1), 114.CrossRefGoogle Scholar
Elekfi, L. (1994). Magyar ragozási szótár [Dictionary of Hungarian inflections]. MTA Nyelvtudományi Intézet.Google Scholar
Elsen, H. (2017). The two meanings of sound symbolism. Open Linguistics, 3(1), 491499.CrossRefGoogle Scholar
Elsen, H., Németh, R., & Kovács, L. (2021). The sound of size revisited: New insights from a German–Hungarian comparative study on sound symbolism. Language Sciences, 85, 101360.CrossRefGoogle Scholar
Fawcett, T. (2006). An introduction to ROC analysis. Pattern Recognition Letters, 27, 861874.CrossRefGoogle Scholar
Fónagy, I. (1959). A költői nyelv hangtanából [On the phonology of poetic language]. Akadémiai Kiadó.Google Scholar
Fónagy, I. (1960). A hang és szó hírértéke a költői nyelvben [The information value of sound and word in poetic language]. Nyelvtudományi Közlemények, 62, 73100.Google Scholar
Fónagy, I. (1961). Communication in poetry. Word, 17, 194218.CrossRefGoogle Scholar
Fónagy, I. (1963). Die Metaphern in der Phonetik. Ein Beitrag zur Entwicklungsgeschichte des wissenschaftlichen Denkens. The Hague.Google Scholar
Garrido, M. V. & Godinho, S. (2021). When vowels make us smile: The influence of articulatory feedback in judgments of warmth and competence. Cognition & Emotion, 35(5), 837843.CrossRefGoogle ScholarPubMed
Gendron, M., Roberson, D., van der Vyver, J. M., & Barrett, L. F. (2014). Cultural relativity in perceiving emotion from vocalizations. Psychological science, 25(4), 911920.CrossRefGoogle ScholarPubMed
Hair, J. F., Black, W. C., Babin, B. J., & Anderson, R. E. (2014). Multivariate data analysis (7th ed.). Pearson Education Limited.Google Scholar
Hamano, S. (1994). Palatalization in Japanese sound symbolism. In Hinton, L., Nichols, J., & Ohala, J. J. (Eds.), Sound symbolism (pp. 148157). Cambridge University Press.Google Scholar
Haynie, H., Bowern, C., & LaPalombara, H. (2014). Sound symbolism in the languages of Australia. PloS One, 9(4), e92852.CrossRefGoogle Scholar
Hilmer, H. (1914). Schallnachamung, Wortschoepfung und Bedeutungswandel. Helle.Google Scholar
Hinton, L., Nichols, J., & Ohala, J. J. (1994). Introduction: Sound symbolic processes. In Hinton, L., Nichols, J., & Ohala, J. J. (Eds.), Sound symbolism (pp. 112). Cambridge University Press.Google Scholar
Hockett, C. F. (1960). The origin of speech. Scientific American, 203, 8996.CrossRefGoogle Scholar
IBM Corp. (2020). IBM SPSS Statistics for Windows, Version 27.0. IBM Corp.Google Scholar
Igarashi, T., Sasano, R., Takamura, H., & Okumura, M. (2013). Use of sound symbolism in sentiment classification. Journal of Natural Language Processing, 20(2), 183200.CrossRefGoogle Scholar
Imai, M. & Kita, S. (2014). The sound symbolism bootstrapping hypothesis for language acquisition and language evolution. Philosophical Transactions of the Royal Society B, 369, 20130298.CrossRefGoogle ScholarPubMed
Imai, M., Kita, S., Nagumo, M., & Okada, H. (2008). Sound symbolism facilitates early verb learning. Cognition, 109, 5465.CrossRefGoogle ScholarPubMed
Jacobsen, W. H. (1994). Nootkan vocative vocalism and its implications. In Hinton, L., Nichols, J., & Ohala, J. J. (Eds.), Sound symbolism (pp. 2339). Cambridge University Press.Google Scholar
Jakobson, R. & Waugh, L. R. (1979). The sound shape of language. Indiana University Press.Google Scholar
Jespersen, O. (1918). Nogle men-ord. In Studier tillegnade Esaias Tegner (pp. 4955). Lund.Google Scholar
Johansson, N., Carr, J. W., & Kirby, S. (2021). Cultural evolution leads to vocal iconicity in an experimental iterated learning task. Journal of Language Evolution, 6(1), 125.CrossRefGoogle Scholar
Johansson, N. & Zlatev, J. (2013). Motivations for sound symbolism in spatial deixis: A typological study of 101 languages. Public Journal of Semiotics, 5(1), 320.CrossRefGoogle Scholar
Joseph, B. D. (1994). Modern Greek ts: Beyond sound symbolism. In Hinton, L., Nichols, J., & Ohala, J. J. (Eds.), Sound symbolism (pp. 222236). Cambridge University Press.Google Scholar
Kádár, E. & Szilágyi, S. N. (Eds.) (2015). Motiváltság és nyelvi ikonicitás [Motivation and linguistic iconicity]. Erdélyi Múzeum Egyesület.Google Scholar
Karlgren, B. (1962). Sound symbolism in Chinese. Hong Kong.Google Scholar
Kim, K. (1977). Sound symbolism in Korean. Journal of Linguistics, 13, 6775.CrossRefGoogle Scholar
Klamer, M. (2002). Semantically motivated lexical patterns: A study of Dutch and Kambera expressives. Language, 78, 258287.CrossRefGoogle Scholar
Knoeferle, K., Li, J., Maggioni, E., & Spence, C. (2017). What drives sound symbolism? Different acoustic cues underlie sound-size and sound-shape mappings. Scientific Reports, 7(1), 111.CrossRefGoogle ScholarPubMed
Köhler, W. (1929). Gestalt psychology (2nd ed.). Liveright.Google Scholar
Lahti, K., Barrett, R., & Webster, A. K. (2014). Ideophones: Between grammar and poetry. Pragmatics and Society, 5(3), 335340.CrossRefGoogle Scholar
Laing, C. E. (2014). A phonological analysis of onomatopoeia in early word production. First Language, 34(5), 387405.CrossRefGoogle Scholar
Lapolla, R. J. (1994). An experimental investigation into phonetic symbolism as it relates to Mandarin Chinese. In Hinton, L., Nichols, J., & Ohala, J. J. (Eds.), Sound symbolism (pp. 130147). Cambridge University Press.Google Scholar
Lei, L. & Liu, D. (2021). Conducting sentiment analysis. Cambridge University Press.CrossRefGoogle Scholar
Louwerse, M. & Qu, Z. (2017). Estimating valence from the sound of a word: Computational, experimental, and cross-linguistic evidence. Psychonomic Bulletin & Review, 24(3), 849855.CrossRefGoogle ScholarPubMed
Lupyan, G. & Winter, B. (2018). Language is more abstract than you think, or, why aren’t languages more iconic? Philosophical Transactions of the Royal Society B: Biological Sciences, 373(1752), 20170137.CrossRefGoogle ScholarPubMed
Maurer, D., Pathman, T., & Mondloch, C. J. (2006). The shape of boubas: Sound-shape correspondences in toddlers and adults. Developmental Science, 9, 316322.CrossRefGoogle ScholarPubMed
McGraw, K. O. & Wong, S. P. (1992). A common language effect size statistic. Psychological Bulletin, 111(2), 361365.CrossRefGoogle Scholar
Molnár, I. T. (1993). A magyar beszédhangok szubjektív elemi szimbolikája [The subjective fundamental symbolism of Hungarian speech sounds]. Akadémiai Kiadó.Google Scholar
Monaghan, P., Shillcock, R. C., Christiansen, M. H., & Kirby, S. (2014). How arbitrary is language? Philosophical Transactions of the Royal Society B: Biological Sciences, 369(1651), 20130299.CrossRefGoogle ScholarPubMed
Mundry, R. & Nunn, C. L. (2008). Stepwise model fitting and statistical inference: Turning noise into signal pollution. The American Naturalist, 173(1), 119123.CrossRefGoogle Scholar
Myers-Schulz, B., Pujara, M., Wolf, R. C., & Koenigs, M. (2013). Inherent emotional quality of human speech sounds. Cognition & Emotion, 27(6), 11051113.CrossRefGoogle ScholarPubMed
Nádasdy, Á. (2006). Background to English pronunciation. Nemzeti Tankönyvkiadó.Google Scholar
Nänny, M. & Fischer, O. (Eds.) (1999). Form miming meaning: Iconicity in language and literature. John Benjamins.CrossRefGoogle Scholar
Nichols, J. (1971). Diminutive consonant symbolism in Western North America. Language, 47, 826848.CrossRefGoogle Scholar
Perniss, P., Thompson, R. L., & Vigliocco, G. (2010). Iconicity as a general property of language: Evidence from spoken and signed languages. Frontiers in Psychology, 1, 227.CrossRefGoogle ScholarPubMed
Perniss, P. & Vigliocco, G. (2014). The bridge of iconicity: From a world of experience to the experience of language. Philosophical Transactions of the Royal Society B, 369, 20130300.CrossRefGoogle Scholar
Perry, L. K., Perlman, M., & Lupyan, G. (2015). Iconicity in English and Spanish and its relation to lexical category and age of acquisition. PloS One, 10(9), e0137147.CrossRefGoogle ScholarPubMed
Perry, L. K., Perlman, M., Winter, B., Massaro, D. W., & Lupyan, G. (2018). Iconicity in the speech of children and adults. Developmental Science, 21(3), e12572.CrossRefGoogle ScholarPubMed
Peterfalvi, J. M. (1970). Recherches expérimentales sur le symbolisme phonétique. American Journal of Psychology, 65, 439473.Google Scholar
Phillips, W. & Majid, A. (2011). Emotional sound symbolism. In Field manual (Vol. 14, pp. 1618). Max Planck Institute for Psycholinguistics.Google Scholar
Pomozi, P. 2015. A magánhangzó-harmónia ikonikus szerepéről egyes finnugor nyelvekben [On the iconic role of vowel harmony in various Finno-Ugric languages]. In: Kádár, E. & Szilágyi, S. N. (Eds.), Motiváltság és nyelvi ikonicitás [Motivation and linguistic iconicity] (pp. 6887). Erdélyi Múzeum-Egyesület.Google Scholar
Preziosi, M. A. & Coane, J. H. (2017). Remembering that big things sound big: Sound symbolism and associative memory. Cognitive Research: Principles and Implications, 2(1), 121.Google ScholarPubMed
Ramachandran, V. S. & Hubbard, E. M. (2001). Synaesthesia: A window into perception, thought and language. Journal of Consciousness Studies, 8, 334.Google Scholar
Rummer, R. & Schweppe, J. (2019). Talking emotions: vowel selection in fictional names depends on the emotional valence of the to-be-named faces and objects. Cognition and Emotion, 33(3), 404416.CrossRefGoogle ScholarPubMed
Russell, J. A. (1980). A circumplex model of affect. Journal of Personality and Social Psychology, 39(6), 11611178.CrossRefGoogle Scholar
Sapir, E. (1929). A study in phonetic symbolism. Journal of Experimental Psychology, 12, 225239.CrossRefGoogle Scholar
Saussure, F. d. (1915/1959). Course in general linguistics. Bally, C, Sechehaye, A., & Riedlinger, A. (Eds.), Baskin, W. (Trans.). McGraw Hill.Google Scholar
Sauter, D. A., Eisner, F., Ekman, P., & Scott, S. K. (2010). Cross-cultural recognition of basic emotions through nonverbal emotional vocalizations. Proceedings of the National Academy of Sciences, 107(6), 24082412.CrossRefGoogle ScholarPubMed
Schmidtke, D., Conrad, M., & Jacobs, A. M. (2014). Phonological iconicity. Frontiers in Psychology, 5, 80.CrossRefGoogle ScholarPubMed
Sheskin, D. J. (2004). Handbook of parametric and nonparametric statistical procedures. CRC Press.Google Scholar
Shinohara, K. & Kawahara, S. (2010). A cross-linguistic comparison of sound symbolism: Images of size. Proceedings of the Annual Meeting of the Berkeley Linguistics Society, 36(1), 396410.CrossRefGoogle Scholar
Sidhu, D. M., Deschamps, K., Bourdage, J. S., & Pexman, P. M. (2019). Does the name say it all? Investigating phoneme-personality sound symbolism in first names. Journal of Experimental Psychology: General, 148(9), 1595.CrossRefGoogle ScholarPubMed
Silverstein, M. (1994). Relative motivation in denotational and indexical sound symbolism of Wasco-Wishram Chinookan. In Hinton, L., Nichols, J., & Ohala, J. J. (Eds.), Sound symbolism (pp. 4060). Cambridge University Press.Google Scholar
Simone, R. (Ed.) (1995). Iconicity in language. John Benjamins.CrossRefGoogle Scholar
Siptár, P. (1994). A mássalhangzók [The consonants]. In Kiefer, F. (Ed.), Strukturális magyar nyelvtan, 2. kötet: Fonológia [Structural Hungarian grammar, Vol. 2: Phonology] (pp. 183272). Akadémiai Kiadó.Google Scholar
Siptár, P. (1998). Hangtan [Phonology]. In Kiss, K. É., Kiefer, F., & Siptár, P. (Eds.), Új magyar nyelvtan [New Hungarian grammar]. Osiris Kiadó.Google Scholar
Siptár, P. (2016). A mássalhangzók [The consonants]. In Kiefer, F. (Ed.), Strukturális magyar nyelvtan, 2. kötet: Fonológia (javított digitalis kiadás) [Structural Hungarian Grammar, Vol. 2: Phonology, revised digital edition]. Akadémiai Kiadó. Retrieved from: https://mersz.hu/hivatkozas/m26smny2_book1.Google Scholar
Siptár, P. & Törkenczy, M. (2000). The phonology of Hungarian. Oxford University Press.Google Scholar
Strapparava, C. & Valitutti, A. (2004). WordNet-Affect: An affective extension of WordNet. In Proceedings of the 4th International Conference on Language Resources and Evaluation (LREC) (pp. 10831086). Universidade Nova de Lisboa.Google Scholar
Szabó, M. K. (2015). Egy magyar nyelvű szentimentlexikon létrehozásának tapasztalatai és dilemmái [Experiences and dilemmas of the creation of a Hungarian sentiment lexicon]. In Gecső, T. & Sárdi, C. (Eds.), Nyelv, kultúra, társadalom: Segédkönyvek a nyelvészet tanulmányozásához 177 [Language, culture and society: Handbooks for the study of linguistics (p. 177)]. Tinta Könyvkiadó.Google Scholar
Szathmári, I. (1970). A hangszimbolikáról [On sound symbolism]. Néprajz és Nyelvtudomány, 14, 7591.Google Scholar
Szathmári, I. (1980). A hangszimbolika a magyar népballadákban [Sound symbolism in Hungarian folk ballads]. In Ortutay, Gy. (Ed.), Népi kultúra—népi társadalom [Folk culture—Folk society] (pp. 299331). Akadémiai Kiadó.Google Scholar
Székely, Zs. (2015). A motiváció kérdése a nyelvészetben [Motivation in linguistics]. In Kádár, E. & Szilágyi, S. N. (Eds.), Motiváltság és nyelvi ikonicitás [Motivation and linguistic iconicity] (pp. 1122). Erdélyi Múzeum Egyesület.Google Scholar
Szili, K. (2015). Beszél vs. csacsog. Adalékok a motiváció egy sajátos fajtájához. In Kádár, E. & Szilágyi, S. N. (Eds.), Motiváltság és nyelvi ikonicitás [Motivation and linguistic iconicity] (pp. 5867). Erdélyi Múzeum Egyesület.Google Scholar
Tanz, C. (1971). Sound symbolism in words relating to proximity and distance. Language and Speech, 14(3), 266276.CrossRefGoogle ScholarPubMed
Taylor, I. K. & Taylor, M. M. (1965). Another look at phonetic symbolism. Psychological Bulletin, 64, 413427.CrossRefGoogle Scholar
Thompson, B., Perlman, M., Lupyan, G., Sevcikova Sehyr, Z., & Emmorey, K. (2020). A data-driven approach to the semantics of iconicity in American Sign Language and English. Language and Cognition, 12(1), 182202.CrossRefGoogle Scholar
Thorndike, E. L. (1945a). The association of certain sounds with pleasant and unpleasant meaning. Psychological Review, 52, 143149.CrossRefGoogle Scholar
Thorndike, E. L. (1945b). On Orr’s hypotheses concerning the front and back vowels. The British Journal of Psychology, 36(1), 1014.Google Scholar
Tsur, R. (2006). Size–sound symbolism revisited. Journal of Pragmatics, 38(6), 905924.CrossRefGoogle Scholar
Ullrich, S., Kotz, S. A., Schmidtke, D. S., Aryani, A., & Conrad, M. (2016). Phonological iconicity electrifies: An ERP study on affective sound-to-meaning correspondences in German. Frontiers in Psychology, 7, 1200.CrossRefGoogle Scholar
Ultan, R. (1978). Size-sound symbolism. In Greenberg, J. (Ed.), Universals of human language. Volume II: Phonology (pp. 525568). Stanford University Press.Google Scholar
Vinson, D., Jones, M., Sidhu, D. M., Lau-Zhu, A., Santiago, J., & Vigliocco, G. (2021). Iconicity emerges and is maintained in spoken language. Journal of Experimental Psychology: General, 150(11), 22932308.CrossRefGoogle ScholarPubMed
Westbury, C., Hollis, G., Sidhu, D. M., & Pexman, P. M. (2018). Weighing up the evidence for sound symbolism: Distributional properties predict cue strength. Journal of Memory and Language, 99, 122150.CrossRefGoogle Scholar
Whissell, C. (1999). Phonosymbolism and the emotional nature of sounds: evidence of the preferential use of particular phonemes in texts of differing emotional tone. Perceptual and Motor Skills, 89(1), 1948.CrossRefGoogle ScholarPubMed
Winter, B. & Perlman, M. (2021). Size sound symbolism in the English lexicon. Glossa: A Journal of General Linguistics, 6(1), 79.CrossRefGoogle Scholar
Winter, B., Sóskuthy, M., Perlman, M., & Dingemanse, M. (2022). Trilled /r/ is associated with roughness, linking sound and touch across spoken languages. Scientific Reports, 12(1), 111.CrossRefGoogle ScholarPubMed
Woodworth, N. L. (1991). Sound symbolism in proximal and distal forms. Linguistics, 29(2), 273300.CrossRefGoogle Scholar
Yu, C. S. P., McBeath, M. K., & Glenberg, A. M. (2021). The gleam-glum effect:/i:/versus /λ/ phonemes generically carry emotional valence. Journal of Experimental Psychology: Learning, Memory, and Cognition, 47(7), 11731185.Google ScholarPubMed
Zajonc, R. B., Murphy, S. T., & Inglehart, M. (1989). Feeling and facial efference: Implications of the vascular theory of emotion. Psychological Review, 96(3), 395416.CrossRefGoogle ScholarPubMed
Figure 0

Table 1. List of variables

Figure 1

Table 2. Significant associations between phonological variables and sentiment polarity as revealed by a series of Mann–Whitney U-tests

Figure 2

Table 3. Estimated coefficients and odds ratios of the logistic regression model predicting the probability of positive sentiment polarity from the Relative frequencies of 10 phonemic classes in the word form

Figure 3

Fig. 1. The Receiver Operating Characteristic (ROC) Curve of a Binary Logistic Regression Model Predicting Sentiment Polarity from the Relative Frequencies of Four Natural Phonological Classes in the Word Form