1 Introduction
Spontaneous speech contains various kinds of disfluencies, including the use of fillers. Like all disfluencies, fillers can be studied from different angles and with different goals (e.g. Levelt Reference Levelt1989, Shriberg Reference Shriberg2001, Clark & Fox Tree Reference Fox Tree2002, Merlo & Mansur Reference Merlo and Mansur2004, Lease & Johnson Reference Lease and 2006). Generally, fillers are vocalizations or lexical items used to fill gaps in discourse (Fox Tree Reference Fox Tree2002, O'Connell & Kowal Reference O'Connell and Kowal2005). They may mark not only ‘I can't find the right word’ situations but ‘I am going to try to find it’ ones, too. They may solicit help from a co-participant, and they may also indicate that even if help is forthcoming from a co-participant helping to find the word, the current speaker intends to continue speaking once the local problem is solved. They carry interactional meaning and they are indicative of covert speech planning problems: they signal that speech control is faltering, even though the speaker does not produce an overt error (Howell & Au-Yeung Reference Howell, Au-Yeung and Fava2002). They are false alerts since the repair has taken place in the planning process (Levelt Reference Levelt1983). Fillers are imbued with interactional meaning: they have implications for turn-taking practices, and they raise interesting questions about how current speakers signal (and current listeners recognize) aspects of interaction such as, for example, completion of various kinds, including syntactic, prosodic and pragmatic completion. Fillers and filled pauses may also provide a unique window on sentence processing in general because they show what ambiguities are relevant at a certain point in the utterance (Bailey & Ferreira Reference Bailey, Ferreira, van Gompel, Fischer, Murray and Hill2007). The use of fillers may also suggest that the speaker does not know how to continue, or is unable to recall a particular word. The organization of narratives is not pre-planned, it emerges, and each part of the speech signal relates to several functions simultaneously (cf. Clark Reference Clark and Gernsbacher1994, Local Reference Local2003).
In the examples below, from Hungarian, the speakers seem to hesitate about how to continue or how to put their thoughts into grammatical forms. In example (1), the speaker openly states that he will restart while in (2) the speaker rephrases the utterance.Footnote 1 (A glossary of the grammatical-category annotations used in this paper will be found in the appendix.)
(1)
(2)
In examples (3)–(4), the speakers seem to be searching for words. In (3), the word fizetés ‘salary’ is the target word, while in (4), the speaker seems to be looking for the word megfelelés ‘adequacy’.
(3)
(4)
Evidence on fillers comes from several large corpora of spontaneous speech. For example, a large spontaneous speech corpus compiled in Japan provides evidence that fillers are the most frequent disfluencies for both females and males (Watanabe, Hirose, Den & Minematsu Reference Watanabe, Hirose, Den and Minematsu2008). When speakers discover some problem in planning spontaneous speech, they try to suspend speaking while repairing the hitch. At the same time, they are supposed to be planning what to say even during the repair procedure. As a result, they will produce filler words, prolong parts of words, make repetitions, hesitate, etc. In English they will use uh or um (cf. Local Reference Local, Couper-Kuhlen and Ford2004, O'Connell & Kowal Reference O'Connell and Kowal2005), while in Hungarian they will commonly use the hesitation marker öö [ø] (Gósy Reference Gósy2005a). These types of disfluencies vary according to the speaker, the cognitive activities speakers are engaged in and a number of other psycholinguistic variables, and sociolinguistic variables.
Besides vocalizations or filled pauses like um or öö, many languages use a variety of lexical items, repetitions and phrases to gain time when there is a problem (grammatical, phonological or other) in speech planning (e.g. Curl, Local & Walker Reference Curl, Local and Walker2006). These lexical items are called either fillers (filler words), hesitations and filled pauses or discourse markers (e.g. I think, well, you know, besides, by the way). Some of the fillers used in spontaneous dialogues and narratives are in fact classed as discourse markers by some researchers. Indeed, the distinction is not clear-cut, and the same item may have different functions in different contexts. The terms used and the classifications employed usually reflect the theoretical approach adopted by the author. (The terminological diversity is connected also with theoretical differences, cf. Fraser Reference Fraser1999, Hentschel & Weydt Reference Hentschel, Weydt, Cruse, Hundsmurscher, Job and Lutzeier2002, Dér Reference Dér2006.) Discourse markers are generally defined as words or phrases that mark a boundary in discourse. However, there are broader and narrower definitions than that (see e.g. Fraser Reference Fraser1999, Schourup Reference Schourup1999, Fox Tree & Schrock Reference Fox Tree and Schrock2002, Louwerse & Mitchell Reference Louwerse and Mitchell2003, Müller Reference Müller2005). In general, they signal interrelations between earlier and later segments of the discourse, providing linkage between utterances.
We will regard an item as a filler when there is no grammatical or semantic reason for its presence in a given context. We may hypothesize that a filler is an overt indicator of some speech planning problem (see e.g. Jefferson Reference Jefferson1974, Schegloff, Sacks & Jefferson Reference Schegloff, Sacks and Jefferson1977, Jasperson Reference Jasperson, Ford, Fox and Thompson2002). In addition, speakers choose the wrong word, change their minds about what they want to say, and cannot find the right word.
Fillers come from various word classes, and are language-specific. The original function of the words that become fillers in a language gradually changes as they lose both their primary semantics and their original function in the utterance or as their original function and primary semantics become obscured. This obscuring of the original function is related to increasing frequency of use in another function – in our case, the filler function. Words of high frequency undergo more adjustments, and register the effects of sound change more rapidly than low-frequency words (Bybee Reference Bybee2001, Reference Bybee, Joseph and Janda2003; Pierrehumbert Reference Pierrehumbert, Bybee and Hopper2001). In her recent investigation of the effect of frequency on the durations of homophones in spontaneous speech, Gahl (Reference Gahl2008) found that form frequency or the frequency of particular combinations of segments would be insufficient for predicting which forms shorten. She suggests that lemma frequency (that is, frequency indexed by information about a word's meaning and syntactic properties) is a determinant of word duration. When people begin to use a word frequently in the filler function, the new function may affect the articulation of the word (including its durations) in the given function, while its articulation in its original function remains unaltered. Depending on the speech situation, the same item may have different functions and articulations even with the same speaker. Different holistic gestural and acoustic templates are associated with different word meanings and functions.
In Hungarian, there are several words coming from various word classes that have probably had a filler function for several centuries. The word úgymond (from the collocation úgy ‘so’ + mond ‘says’) can be found in records from the 15th and 16th centuries, where its initial steps towards grammaticalization can be traced. The history of úgymond exhibits a series of changes. The filler function is particularly strongly evidenced in spontaneous Hungarian speech in the middle of the 20th century (Dömötör Reference Dömötör2008). The meaningless sound sequence izé [izeː], whose origin is unclear, began to be used frequently in the second half of the 20th century as a placeholder word for nouns and verbs, and then, taking advantage of the rich morphology of the Hungarian language, derivations were formed for the replacement of practically all content words (e.g. izét (n): izé+acc, izélnek (v): izé+verbal suffix+3pl., or izés (adj): izé+adjectival suffix; cf. Fabulya Reference Fabulya2007). The pseudo-word izé can be found in written Hungarian as early as the 19th century, and it was also included in one of the first dictionaries. It is interesting to note that the meaning of izé, as described in that dictionary, is quite an accurate description of the filler function: ‘this word occurs when the speaker is not able or does not want to say a word’ (Czuczor & Fogarasi Reference Czuczor and Fogarasi1865: 161). Before long it came to be stigmatized, presumably as a reaction to its excessive use. It allegedly placed a burden on listeners, who were supposed to identify a speaker's intention without the benefit of hearing a key lexical item, because the speaker either deliberately avoided using the required word or simply could not access it fast enough.
About 60 years ago the word combination azt mondja [ɔst moɲɟɔ] ‘(he) says that’ (third-person subjects are dropped in Hungarian) was used frequently with a similar function. This change of function resulted in a change in pronunciation: the two words coalesced into one, [ɔsoɲɟɔ]. This finding is based on a large speech corpus representing language use in the 1950s (Gósy & Gyarmathy Reference Gósy and Gyarmathy2008). At that time, the hesitation marker [ø] represented only about 2% of all the disfluencies.
Today the use of azt mondja as a filler is relatively rare. Various other words took over its function, such as egyébként ‘anyway’, tulajdonképpen ‘as a matter of fact’, lényegében ‘essentially’, szóval ‘so’, hát ‘well’, akkor ‘then’, ‘that time’, etc. The hesitation marker [ø] is a relative newcomer in Hungarian. Current research shows that in spontaneous speech this hesitation marker accounts for about one-third of total speaking time (Gósy Reference Gósy2005a, Horváth Reference Horváth2009), which is consistent with the ratio for similar markers found in other languages (e.g. Duez Reference Duez1982, Misono & Kiritani Reference Misono and Kiritani1990, Clark & Fox Tree Reference Fox Tree2002, Maekawa Reference Maekawa2003, Zhao & Jurafsky Reference Zhao and Jurafsky2005, Fehringer & Fry Reference Fehringer and Fry2007, Leeuw Reference Leeuw2007). Compared with data from the 1950s, an increase can be seen in favor of the [ø] sound. Indeed, the use of [ø] seems to have superseded all other fillers in present-day spontaneous Hungarian independently of the speakers’ ages or social status.
The story of the pseudo-word izé used as a filler seems to repeat itself: letters to newspaper editors often complain about the excessive use of the hesitation marker [ø] (Adamikné Jászó Reference Adamikné Jászó and Jászó2004, Horváth Reference Horváth2009). Fillers and filled pauses get people's attention and hence aid comprehension (Brennan & Schober Reference Brennan and Schober2001, Fox Tree Reference Fox Tree2001, Ferreira & Bailey Reference Ferreira and Bailey2004, O'Connell & Kowal Reference O'Connell and Kowal2005, Corley, MacGregor & Donaldson Reference Corley, MacGregor and Donaldson2007; for Hungarian Markó Reference Markó2004, Gósy Reference Gósy2005b).
This paper will report on two words in present-day Hungarian that are apparently undergoing a similar process of development, with an increasing number of speakers using them as fillers. One of them is the conjunction tehát, which has two meanings: ‘consequently’ and ‘that is’, while the other is the pronoun ilyen, which also has two meanings: ‘like this’ and ‘such’, and which can also have determiner function. The filler function of these words is not listed even in the most recent dictionaries of Hungarian. Since these words do not show the properties of discourse markers, i.e. they do not mark a boundary and their contextual analysis did not identify the function of interaction between interpretation of preceding and following segments (see e.g. Schegloff Reference Schegloff and Tannen1982, Schachter, Christenfeld, Ravina & Bilous Reference Schachter, Christenfeld, Ravina and Bilous1991, Fraser Reference Fraser1999), we will regard them as fillers in their new function. This change is closely related to grammaticization (grammaticalization) in the sense that frequent repetition tends to deplete lexical meaning, often leading to phonological change (cf. Bybee Reference Bybee, Joseph and Janda2003). Bybee also claims that as a result of frequent repetition, the elements involved in the grammaticization process become more and more independent of the morphemes they were originally composed of. She considers discourse markers to be products of grammaticization processes (2003: 618; for Hungarian Dér Reference Dér2004, Reference Dér2006; Dér & Markó Reference Dér and Markó2010).
The word tehát does not occur as a filler in the spontaneous speech of the older generations as frequently as with younger speakers, and the word ilyen does not occur in the spontaneous speech of older speakers as a filler at all: it is used in this function only by young adults. The question arises whether a simple functional change is taking place in the use of these words or whether functional change is accompanied by other changes, too. It may be assumed that change in function is reflected in changes in the pronunciation and the acoustic structure of the words in question, with speakers unconsciously signaling the change in function. Therefore, the present study was designed to test the hypothesis that the articulation of the words tehát and ilyen in their filler function is different from their articulation in their respective traditional functions as conjunction and pronoun.
2 Subjects, materials, methods
2.1 The Hungarian Spontaneous Speech Corpus
Twenty spontaneous narratives from BEA, the Hungarian Spontaneous Speech CorpusFootnote 2 were selected for the research. The speakers were native adult speakers of Hungarian (9 females and 11 males) from Budapest (ages ranging from 22 years to 30 years). The recorded narratives, with a total duration of 533 minutes (8.9 hours) were submitted to analysis. The total duration of the speech material was 267 minutes for females and 266 minutes for males. Average speaking time per subject was 29.6 minutes for females and 24.1 minutes for males. The topics of the narratives were related to the subjects’ work, family and hobbies, and there was a selection of topics of current interest relevant to the subjects’ age and everyday lives (e.g. the issue of capital punishment, changes in higher education, attitudes to organ donation, protection of animals by law). The narratives were rarely interrupted by the interlocutor; this happened only when the interlocutor thought it was necessary to help the speaker continue, e.g. the listener did not continue, apparently waiting for response from the interlocutor. The interlocutor was the same person in all cases.
The material selected contained 535 occurrences of the words tehát and 195 occurrences of the word ilyen.
2.2 Method
The digital recordings were submitted to acoustic-phonetic analysis using Praat (Boersma & Weenink Reference Boersma and Weenink2004). The duration of the words and the frequency values of the first two formants of the vowels /ɛ aː i/ were analyzed. The duration of the word tehát was measured as the interval between the beginning of the release of the initial voiceless stop and the beginning of the release of the final stop (Figure 1a). The duration of the closure was not considered in the case of the first unvoiced stop because it could not be measured after silent pauses and in utterance-initial positions. The total duration of ilyen was measured from the first glottal pulse of the vowel [i] to the last glottal pulse of the nasal sound (Figure 1b). In some cases (depending on the phonetic context), other cues, such as formant discontinuity or differences in intensity, were also used as segmentation cues. The corresponding spectrographic, intensity and waveform displays were consulted and auditory analysis was also used where necessary.
The formant values were measured at the midpoint of total vowel duration. The F1 and F2 midpoints were estimated using visual inspection of wideband spectrograms and narrowband fast Fourier transforms (FFT). The total word durations, vowel durations and two formants of all vowels were analyzed for each token, yielding altogether more than 4,800 measurements. To test statistical significance, analysis of variance (ANOVA), regression analysis, discriminant analysis and independent sample t-tests were used (SPSS, version 8.0) as well as three-dimensional Euclidian distance measures. The confidence level was set at the conventional 95%.
3 Results
3.1 Context-dependent functions of words
The function of each word in the spontaneous speech corpus was defined by analyzing the actual context semantically, syntactically and prosodicallyFootnote 3 (Figure 2). Two different functions were identified in the occurrences of the word tehát: the conjunction function (with two meanings, ‘consequently’ and ‘that is’), and the filler function. Similarly, two different functions were identified in the case of ilyen: pronoun (‘like this’ and ‘such’) and filler. We analyzed each example carefully, and found that tehát often occurred in a function that we could not identify with any of the terms used in the literature. Therefore, we decided to use a new term, phatic marker, however risky that may be. The reasons for this decision will be described below.
The phatic function is related to contact: it allows the speaker to check the channel (Jakobson Reference Jakobson and Sebeok1960). Occurrences of tehát were identified as fillers when (i) they occurred inserted in the narrative and (ii) context analysis confirmed some problem in the speaker's speech planning process (see the examples below). However, in many cases, tehát did not fit this description. In initial position, its function seemed to be to test and clear the channel before starting or continuing to talk. In final position, they seemed to suggest ‘so that was about all, I suspend speaking now, the channel is open for you if you want to comment’. In both positions, then, they seemed to have a phatic function, i.e. to test, maintain, or close the channel of communication. The occurrences of the word tehát, used in the phatic function, show a heavy positional dependence: they occur either at the very beginning of the narrative, or at the very end of a part of the narrative. Accordingly, we shall refer to them as initial phatic marker and final phatic marker. The latter may be similar to the use of you know in English in the role of sequence closing (Local Reference Local2007).
As indicated above, speakers sometimes discontinue their narratives, and in such cases the interlocutor tried to encourage them by uttering response tokens like the English well, ühüm, I see, go on, yes, I agree, etc. (cf. O'Keeffe & Adolphs Reference O'Keeffe, Adolphs, Schneider and Barron2008). In some of these cases, the speakers produced the word tehát as the last word before silence, meaning, as described above, ‘that was all at the moment’. After their interlocutor reassured them and gave them some support, in a number of cases the speakers began with the word tehát, followed by pause, and then they continued their narratives.Footnote 4 This is similar to some occurrences of well in English in the same position, e.g. Jucker Reference Jucker1993. Work by Schegloff (Reference Schegloff2007) shows that well also projects a turn that will mark something as problematic in, e.g. the presuppositions of a question.
There are 482 tokens in the two main functions for tehát (313 tokens in the filler function and altogether 169 tokens in the conjunction function with two meanings). Fifty-three tokens were regarded as phatic markers. Sixteen tokens (30.18%) of the phatic markers occurred at the initial position (phatic initial markers), while the majority of them (69.82%) occurred in final position (phatic final markers). Since the conjunction tehát may have two meanings, ‘consequently’ and ‘that is’, the 169 tokens used in the conjunctive function were subdivided into two groups, with 44 tokens having the former meaning and 125 the latter. Table 1 shows the number of the words analyzed according to function and gender.
The two words (tehát and ilyen) in the filler function occur in the narratives mostly between two syntactically complete utterances in 92.77% of all cases. There were 27 instances (7.23%) where tehát and ilyen occurred at syntactically incomplete places (cf. Schegloff Reference Schegloff2007). Before analyzing the segmental characteristics of these words, we conducted an analysis of prosodic features. The articulation tempo of the words at hand was compared to that of those occurring in their immediate neighborhood, and no difference was found between them. The articulation tempo was defined as the rate at which the speech sounds of the words in question were produced. Three words preceding and another three words following tehát and ilyen were measured and their articulation tempi were compared to those of the target words. (Note that average word length in Hungarian spontaneous speech is around three syllables.) The words tehát and ilyen were not articulated faster (or slower) than other words in their immediate neighborhood. Melodic contours and F0-values within, and in the immediate environment of, the two words were analyzed. However, the data did not show any difference depending on the function of the words. This means that in our material the occurrences of tehát and ilyen did not carry any intonational prominence. Sentence stress analysis yielded no difference depending on the function, either. Correlation analysis of articulation tempi, fundamental frequency values and the function of the words did not show significant differences. The prosodic analysis including pausing characteristics of 57 Hungarian words carried out by Dér & Markó (Reference Dér and Markó2010) yielded the same results in this respect. Their study showed differences concerning the use of sentence stress in some cases of their analyzed words, but not in the case of tehát, which corresponds to our results. We can conclude that in our material the prosodic organization did not show any large differences depending on function or context. Therefore, our analysis focused on the segmental characteristics of the words. Similarly, a study of hesitation markers (e.g. [ø]) in Hungarian showed no differences in the vicinity of major discourse boundaries or dependence on syntactic positions (Horváth Reference Horváth2009), which is, however, inconsistent with the findings in other languages, such as Dutch (Swerts Reference Swerts1998).
The word tehát is abbreviated in most occurrences to tát. This form retains the initial and final stops, while the articulation of the vowel in between corresponds to that of the phoneme /aː/. Let us look at a few examples of the different functions. Examples (5)–(8) show tehát functioning as filler.
(5)
(6)
(7)
(8)
In (5), the speaker seems to be uncertain about how to put her thought into words. She is not satisfied with the words between the two filled pauses (at ‘that it is not’). Therefore after the second filled pause she uses the word tehát (tát) to gain time to find the desired expression. Example (6) shows two occurrences of the word tehát (tát) signaling two presumably different planning problems. In the first case, tát signals that the case-ending -ra in the previous lexical word weboldalakra was not correct. After producing the word weboldalakra, he inserts the word tát and continues, without a break and without repeating the whole word, with the correct suffix (-hoz). Then again he needs something to help him continue, so he produces the word tát again and then finishes the utterance. In (7), the speaker realizes that it was not the verb jár ‘go’ that he wanted to use but the verb corresponding to ‘get there’. Therefore he interrupts himself and then immediately continues without pausing, using a different grammatical structure, which contains a new verb and a slightly modified meaning. In (8), the speaker uses tát as a filler, sometimes together with other conjunctions, which provides further proof that he is either having a speech-planning problem or is trying to solve the problem. It may also signal that the speaker intends to continue speaking once the local problem is solved. This example shows tát after two adjacent conjunctions, illetve ‘that is’ and hogy ‘that’.
Examples (9)–(12) show tehát functioning as a phatic marker.
(9)
(10)
(11)
(12)
In (9), having obtained some support from the interlocutor the speaker continues his narrative starting with the word tehát, signaling that he was ready to speak again. In (10), the word tehát in phatic initial marker function is followed by a hesitation marker (öö), and then the speaker continues her narrative taking up where she left off before the interlocutor's back-channel support. Example (11) is the last utterance in a 1.7-minute uninterrupted narrative. The speaker lapses into silence after pronouncing the word tát signaling a momentary end of his narrative or waiting for some response word from the interlocutor. Example (12) contains a final tehát word that has the function of clearing the channel for a possible response by the interlocutor. Finally, (13)–(14) illustrate two examples of tát in its function as a conjunction.
(13)
(14)
Two different functions were identified for the word ilyen [ijɛn] (filler – 135 tokens – and pronoun meaning ‘like this’ or ‘such’ – 60 tokens). The filler function of this word seems to be a new phenomenon in spontaneous Hungarian and occurs only in the speech of the younger generation, as opposed to tehát, which is used as a filler by speakers of various ages. The similar English word so has been analyzed and shown to occur in turn-initial, turn-final and various other positions in a range of functions and meanings (Local Reference Local2007). It may even occupy a ‘stand-alone’ position with some variation in its phonetic design (Local & Walker Reference Local and Walker2005). Examples (15)–(16) show ilyen functioning as a filler.
(15)
(16)
In (15), the speaker seems to have some problem in explaining his job. He inserts the filler ilyen twice while looking either for an appropriate word or for an understandable phrase for his work. In (16), it appears that the speaker has trouble in selecting the right word, although it is not clear whether the gap where the filler appeared is a consequence of difficulty in activating the desired word or in selecting the appropriate word.
Finally, examples (17)–(18) illustrate ilyen in its ‘regular’ pronominal function.
(17)
(18)
3.2 Occurrences
Subjects produced altogether 1.36 tokens of tehát and ilyen per minute, of which 1.03 were tehát and 0.33 ilyen. Speakers uttered 0.93 words per minute that were analyzed as fillers: 0.68 tehát and 0.25 ilyen per minute. The mean rate for the former was 25.5 words per subject (min.: 9, max.: 83), while for the latter it was 15.5 words per subject (min.: 6, max.: 32). There was a considerable difference between female and male subjects. Males used both words more frequently as fillers (Table 2). Although total speaking time was almost the same for females and males, due to the fact that the number of subjects was different in the two groups, overall duration of spontaneous speech was different depending on gender. Male subjects used more of both filler words since a higher number of tehát and ilyen occurred during a shorter speaking time. This fact will be shown to have some consequences for articulation and will be taken into consideration when seeking explanations for the differences between females and males in our study.
The frequency of the words tehát and ilyen varies according to function. The word tehát occurs most frequently in the filler function and least frequently as a phatic marker. As a conjunction, it has a higher frequency in the meaning ‘that is’ than in the meaning ‘consequently’ (Figure 3). The word ilyen was used as a filler in 69.23% of cases and as a pronoun in 30.77% of cases.
3.3 Syllabic structures
In the majority of cases, the original phonological form of tehát /tɛhaːt/ appears in spontaneous speech as a single closed syllable (in 90.66% of all cases). This form retains the initial and final stops, while the articulation of the vowel varies. It may be pronounced as [tat], [taːt], [tət] or sometimes [tɛt], and shows some variation across functions and subjects. The interrelation between frequency of use and loss of speech sound or sounds within the word has also been confirmed for high-frequency words in English (Bybee Reference Bybee, Barlow and Kemmer2000).
The word ilyen tends to retain both of its syllables (in 93.3% of all cases), and its stressed vowel does not show any deviation from its original quality: it is always front, high, and unrounded. However, the vowel in the unstressed position, corresponding to the phoneme /ɛ/, is realized by a variety of vowel qualities from [ə] to [ɛ]. The word may also be articulated as a monosyllable, retaining the approximant [j] and the vowel [ɛ] accompanied by the nasal consonant, resulting in something like [jɛn].
Although both analyzed words are originally disyllabic in Hungarian, they show different tendencies in preserving their original syllabic structures. Tehát tends to lose one of its syllables, with the sounds merging into a monosyllable (tehát – tát) while ilyen seems to preserve both syllables independently of its function. The word tehát as a filler occurs as a monosyllable in more than 90% of all cases, while ilyen occurs in the filler function as a disyllabic word in more than 93% of all cases. The disyllabic word form tehát occurs in 13.63% and in 10.4% of all the tokens in the two conjunction functions and it also retains the two syllables when it is used to keep the floor (30.18% of all such tokens in the phatic marker function), while it occurs only in 4.79% of all fillers (Figure 4).
The situation is totally different with the syllabic structure of the word ilyen. It appears mostly in its original disyllabic form. The monosyllables occurred in 7.41% of all fillers and in 5.0% in all pronouns in our corpus (Figure 4).
The reason for the difference in the syllabic structure of the analyzed words in spontaneous speech, presumably, has to do with their phonetic structures. In the case of tehát, intervocalic /h/ is frequently vocalized (the phonetic correlates of /h/ are absent), and phonemically long /aː/ is often shortened and centralized (Gósy Reference Gósy2004). If these processes occur at the same time, then there is an opportunity for the articulatory gestures of the vowels [ɛ] and [aː] to be merged, and this is almost always the case. At the moment we do not see any systematic shortening in the case of ilyen.
3.4 Acoustic-phonetic analysis of tehát
It was hypothesized that the word tehát would show differences not only with regard to the number of syllables but also with regard to the overall duration of the words. Figures 5a–c show the function-dependent acoustic structures (by means of waveforms and spectrograms) of the word tehát produced by the same female speaker in the conjunction, filler and phatic marker functions.
3.4.1 Durations
Statistical analysis showed significant differences in duration depending on function (one-way ANOVA: F(3,531) = 56.792, p < .001) both for females and males (females: F(2,190) = 26.416, p < .001; males: F(3,337) = 30.793, p < .001). The post-hoc Tukey tests did not reveal significant differences between the durations of the two conjunctions, either with females or with males. Table 3 summarizes duration according to function and gender.
The articulation of the word tehát (such as [tɛɦaːt], [taːt], [tat] or [tət]) in the filler function was completed in a shorter time than the articulation of tehát in both the conjunction and the phatic marker functions, irrespective of the number of syllables (Figure 6). In the two conjunction functions, irrespective of meaning, tehát showed no significant differences in duration: its values fell between the values of the filler and of the phatic marker functions. This is probably because speakers are often uncertain about what they are going to say next and how it will relate to what they have already said; they may choose to present the next utterance as an explanation for, or as a consequence of, the preceding one. This might explain why there was no significant durational difference found between the two conjunction meanings. There were no significant durational differences between males and females, either. Although the values show some variation between females and males, the tendency is the same in both groups.
3.4.2 Formants
The formants of the stressed vowel ([ɛ]) in disyllabic forms of tehát were similar in all cases (with no significant differences; see Table 4 for the values). This means that the stressed vowel quality does not change across functions.
The frequency values of the first two formants of the realizations of the phoneme /aː/ were analyzed in both the monosyllabic and the disyllabic forms. The relatively wide range of F1-values (Table 5) can be explained by the fact that there seems to be a shift from the original [aː] toward [a], [ɛ] and the [ə] vowels. The female subjects’ articulation showed greater variability in relation to the first formants than that of male subjects. The first formant of the vowels corresponding to the phoneme /aː/ in the words tehát and tát showed slight differences depending on function: the values were higher in the conjunction functions irrespective of gender, while they were lower in filler and phatic marker functions. This finding indicates that the first formants in the vowels in the filler function are more centralized. Analysis of variance confirmed that there is a significant difference in the first formant values depending on function (for females: F(3,193) = 5.098, p < .002; for males: F(3,340) = 6.001, p < .001). The post-hoc Tukey tests revealed that there was a significant difference between the F1-values for fillers and conjunctions in the males’ pronunciation, while there are significant differences between fillers and conjunctions as well as between conjunctions and the phatic marker function in the case of females (for males: p < .001; for females: p < .002 and p < .029).
The second formant values (Table 6) also seem to be indicators of the new functions of the words under investigation, since they show clear differences depending upon function (one-way ANOVA for females: F(3,190) = 8.332, p < .001 and for males: F(3,340) = 23.347, p < .001). The post-hoc Tukey tests confirmed that there are significant differences in the second formant values between fillers and conjunctions in females’ articulation (p < .003 and p < .001). However, no such difference was found for the other functions. This means that the second formant frequencies do not show wide variation either between the two conjunction meanings or between the filler functions and the phatic marker functions. The male data are somewhat different. The second formant frequencies provide a more marked indication of function. The post-hoc Tukey tests revealed significant differences in almost all functions (p-values between fillers and conjunctions meaning ‘that is’: p = .001, between phatic markers and conjunctions meaning ‘that is’: p = .001, between fillers and conjunctions ‘consequently’: p = .006 and between phatic markers and conjunctions ‘consequently’: p = .031). No significant difference could be confirmed, however, between the functions of filler and phatic marker and between the two conjunction meanings (‘consequently’ and ‘that is’). In sum, the actual vowel quality of /aː/ in tát is determined both by its first and second formant values.
The vowels in the filler and the phatic marker functions in both the females’ and the males’ articulation become more centralized, while those in the conjunction function approach the characteristic formant structure of the Hungarian vowel [aː]. This difference is confirmed by the values of both formants (Figure 7).
Figure 8 demonstrates the F1 and F2 interrelations for realizations of the phoneme /aː/ in the two functions that yielded significant differences, in both females and males, namely fillers and conjunctions. The ellipses in the figure illustrate the range of the first two formants while the dots represent the medians.
The original vowel [aː] appears to be losing its characteristic properties and turning into a schwa-like vowel. Recent research on casual speech in Hungarian (Beke Reference Beke2008) shows that the mean of the first formant frequency values of [aː] is 630 Hz for males and 710 Hz for females (standard deviations: 65 Hz and 86 Hz). Mean F2 is 1340 Hz for males and 1710 Hz for females (standard deviations: 81 Hz and 148 Hz). There is no mid central schwa-like vowel (either as a phoneme or an allophone) in Hungarian, but it appears in spontaneous speech replacing various vowels both in stressed and unstressed positions.
The new function of tehát shows values that deviate from those of expected realizations. The expected acoustic-phonetic parameters appear in the articulation of the word in its traditional function (as a conjunction). Figure 9 shows the dendrogram (i.e. a tree for visual classification of similarity) based on the Euclidean distance measures of the first two formants of the vowels in the relations between the functions analyzed. Fillers and phatic markers on the one hand and the two conjunctions on the other hand are closer in terms of squared Euclidean distance. (The similarity can be defined by means of transformation from covariance matrices to vectors in Euclidean space, which allows the use of the squared Euclidean distance measure, cf. Amir & Amir Reference Amir, Amir, Trouvain and Barry2007.)
Articulation is transformed into comparable values related to the distances of the phonetic patterns analyzed in the different functions. The measures support our previous findings that articulations in the two conjunction functions are almost the same and, likewise, articulations in the filler and phatic marker functions are close to each other.
3.4.3 Durations and formants
Taking into consideration all the data obtained we can see that both duration and formant frequencies show differences depending on the actual function of the word tehát. Fillers and phatic markers can be found at the opposite ends of a time scale, while realizations of the original conjunction function are located in between. Most fillers are monosyllabic, characterized by short duration and a schwa-like vowel. Phatic markers (including disyllabic realizations, which represent about one-third of the total data) are characterized by relatively long duration and a schwa-like vowel. In contrast, the duration of conjunctions is longer than that of fillers and shorter than that of phatic markers. The formants of /aː/ realizations are more or less characteristic of the F1 and F2 patterns for the realizations of the same phoneme in Hungarian casual speech. Based on these findings we can conclude that our acoustic-phonetic measurements seem to confirm the assumption that a change is taking place in the articulation of the word tát (tehát) when it is used in a new function in present-day Hungarian spontaneous speech. Since there is a shift toward a centralized pronunciation of the /aː/ realizations, the question arises whether there is hypoarticulation in the cases of the fillers.
The hyper-hypo theory (Lindblom Reference Lindblom, Hardcastle and Marchal1990) focuses on how the speech production mechanism adapts its performance in response to the changing perceptual demands. Lindblom proposes that speakers continually monitor how clear their articulation should be, in view of the information assumed to be shared by the speaker and the listener. In our case, the speakers seem to reduce articulatory efforts when the words do not have lexical status. With reference to this theory we might assume that the centralization tendency of the vowels in the fillers can be understood as hypoarticulation. The coefficients of a regression model show to what extent, on average, each variable predicts the outcome when all factors are taken into account (the ‘enter’ method was used). The vowel duration as a variable is supposed to have an effect on the formants of the vowels depending on being either in fillers or in conjunctions. The results show that the duration has a slight significant effect on F1 but not on F2 in both functions with both females and males. The correlation is slightly different depending upon function (r = .281 for the females and r = .333 for the males in the case of fillers, while r = .370 for the females and r = .397 for the males in the case of conjunctions). The slight correlation differences show that the vowel duration and F1 in the fillers are rather independent while there is an increased interrelation in the conjunctions. The centralization of the vowel does not seem to be a product of short duration. The above correlations suggest that about 30% of the occurrence of centralized vowels in filler function (both in females and males) and 37% (females) and close to 40% (males) in conjunction function are related to durational shortening. These data allow us to assume that most cases of centralization of the vowel in fillers cannot be explained by hypoarticulation. Crucially, although tokens of the filler and phatic marker functions of tehát are at durational extremes, their vowels share qualitative similarity that sets them apart from the linguistic function as a conjunction.
Since there were some speakers among our subjects who provided only few tokens, a discriminant analysis was performed. This analysis is a technique for classifying a set of variables into predefined classes, which in our case are the two classes of fillers and conjunctions. The results show that in the case of females, 74.5% of all tokens were classified correctly as fillers depending on the word duration and F1 (adding F2 to the model, 75.5% of all fillers fall into the predicted class while no change was found with the class of conjunctions). 65.6% of all tokens were classified as conjunctions. Similar results were found with male data. 75.4% of all fillers were classified as fillers depending on word duration and F1 (adding F2 to the model, the ratio was 75.8%) while 63.8% of all conjunctions were classified correctly (in case of adding F2, the ratio was 64.8%). In all cases, more conjunctions were classified as fillers than fillers as conjunctions. On the basis of these results we assumed that word duration and F1 are of crucial importance in characterizing the filler and the conjunction functions. To support this view, a classification model, the ROC (Receiver Operating Characteristic) curve analysis was also performed. The ROC separation curve shows how the word duration and the first two formants of the vowel predict whether the target word is a filler or a conjunction. The ROC curve analysis of females’ data showed that the function-dependent separation of the tokens based on word duration accounted for 74% of all cases, that based on F1 accounted for 65%, and that based on F2 accounted for 72.2% of all cases. The results are similar with male data: the function-dependent separation of the tokens based on word duration accounted for 70.5% of all cases, that based on F1 accounted for 62.1% and that based on F2 accounted for 73.8% of all cases. The data support the primacy of word durations that can be interpreted as the most convincing predictor of the change in articulation depending on function. However, both formants are of crucial importance in possible separation of the tokens either as fillers or as conjunctions.
3.5 Acoustic-phonetic analysis of ilyen
3.5.1 Durations
The duration of the word ilyen shows significant differences depending on its two functions. In the function of filler, it is longer than in the function of pronoun (Table 7).
Statistical analysis showed significant differences between word durations depending on function (F(1,194) = 7.971, p < .005). Most notable, however, is not the relatively small difference in average duration, but rather the greater variation exhibited in the duration of the filler tokens (see Figure 4). Female and male data were analyzed separately. The results confirmed a statistically significant difference between the two functions with both groups (independent speech sample t-test for females: t(65) = 2.251, p < .028; for males: t(126) = 2.044, p < .044). However, there were no gender-based differences.
The durational data support our assumption that a synchronic change is taking place in the function of ilyen, and this functional change is inducing a change in the temporal patterns associated with the word (Figure 10).
3.5.2 Formants
The formants of the stressed vowel ([i]) were alike in all cases (with no significant differences; see Table 8 for the values). The same insignificant result was found with the first formants of the realizations of the phoneme [ɛ] both with females and males.
F2 of the unstressed vowel ([ɛ]) showed no significant difference with females. However, its pronunciation by males showed variation depending on function. This altered articulation was confirmed by the significant changes found in second formant values for the vowel [ɛ] (F(1,127) = 15.533, p < .001), as shown in Figure 11.
The unstressed vowel /ɛ/ as pronounced by males is closer to schwa in the filler function, while in the pronoun function it conforms to the characteristics of the second formant values of the Hungarian vowel [ɛ].
The second formant values of the reduced vowel in tát fillers were compared with those of the vowel [ɛ] in the word ilyen in its filler function. Our assumption was that the schwa vowels in the two analyzed words (tát and ilyen) would not be alike for at least two reasons. Firstly, because they replace different vowels – the former a back and the latter a front one – and because there is also a difference in tongue height in the articulation of the two vowels. Secondly, the vowel realizations of /aː/ (in monosyllables) are in stressed position while those for /ɛ/ are in unstressed position. Recent investigations demonstrated that there is some variability in the schwa-like vowels replacing the different back vowels in spontaneous Hungarian (Beke Reference Beke2008). The reduction of [aː] in tát fillers differs significantly from the reduced vowel quality of [ɛ] in ilyen fillers, which is confirmed by a comparison of their second formant values (t(293) = 11.606, p < .001).
4 Conclusions
The BEA Hungarian Spontaneous Speech Corpus enabled us to conduct a study on a shift from lexical function to filler function in two words. The functional shift seemed to be confirmed by altered pronunciation, i.e. by the finding that there was a significant difference in word duration and both first and second formant values depending on function. Such changes occur when the number of speakers using a word in a new function increases (Bybee Reference Bybee2001, Pierrehumbert Reference Pierrehumbert, Bybee and Hopper2001; for Hungarian: Markó Reference Markó2004), and it is exactly this that is happening in the present case. The present results show that an ongoing language change is taking place in spontaneous Hungarian speech. This language change has several characteristics: the increased frequency of a conjunction and a pronoun in a function which is definitely not their original function, the word duration difference of the tokens depending on function, and differences in vowel pronunciation depending, again, on function. Given the high variance in fillers, it might be reasonable to assume a prosodic component. However, no difference was found concerning the prosodic organization depending on function in the present material. The spread of the change seems to be gradual. Firstly, the word tehát used as a filler has an extremely high frequency in present-day Hungarian, with no age limit to its use in this new function. In contrast, the word ilyen is used as a filler by the younger generation but not by older speakers. Secondly, in the case of tehát, articulation depends on its function, which was confirmed by analyses of word duration as well as first and second formant frequencies. Although the word ilyen also shows slight function-dependent variation in its durational patterns, the differences between formant frequencies in the vowels are not so convincing. There were no significant differences in stressed vowels. Formant frequencies in the realizations of the unstressed /ɛ/ phoneme showed no function-dependence with females. However, the values of second formants turned out to be significantly different in the filler and pronoun functions with male subjects. Considering the fact that our young male subjects used this word more frequently than did our female subjects, it can be said that the articulatory gestures performed to produce the unstressed vowel appear to be undergoing a change and can be expected to become manifest across gender and across age.
We assume that the original function of the word tehát allows for an easy transition to the filler function. This conjunction might have become a connective whose role was to connect constituents, sometimes fragmented ones, to larger constituents during speech production. At this stage, the word seems to be losing its connective function and is beginning to be used as a simple filler without any task of connecting constituents during speaking.
Use in the filler function results in reduced duration in the case of tehát and an increased duration in the case of ilyen compared to their counterparts used in lexical functions. It is possible to resolve this apparent contradiction here, pending a more controlled type of analysis than was feasible in the present corpus-based study. The word ilyen, in its lexical function, is either a pronoun (when used on its own) or a determiner (when followed by a content word). When it is followed by a content word (e.g. a noun), most of the speech planning is focused on the latter. In that case, articulation of the unstressed pronoun takes a shorter time. When it takes longer, it definitely signals speech-planning problems. A similar durational tendency was found in the articulation of English the. Normally, it is pronounced with a reduced schwa vowel, but when there is a problem in speech planning, it is pronounced as ‘thee’ (cf. Jefferson Reference Jefferson1974, Fox Tree & Clark Reference Fox Tree and Clark1997). Corpus data of spontaneous speech sometimes contradict the findings in experiments. The contradictions are reconciled by stating that reduction processes are word-specific and context-specific (cf. Gahl Reference Gahl2008: 491).
The final question is why speakers seem to need new forms to fill gaps in their speech production process. Apparently, speakers try to avoid using the common Hungarian hesitation marker, [ø], because of the negative impression it gives to many people (cf. Adamikné Jászó Reference Adamikné Jászó and Jászó2004, Horváth Reference Horváth2009), and they try to replace it with something that is less conspicuous and less open to criticism. Therefore, we strongly believe that tehát and ilyen compete with öö, and are taking over the role of hesitation marker. Our material reveals a tendency for this take-over, whether or not it is a conscious strategy on the part of the speakers. The frequency of the hesitation marker is in inverse proportion with that of the fillers tehát and ilyen. One of our subjects, for instance, produced 96 [ø] hesitations and used tehát 29 times in the filler function, while another subject produced ‘only’ 51 [ø] hesitations and 43 tehát fillers. Our data support the assumption that these two ‘trouble signals’ (and possibly some other filler words) alternate in present-day Hungarian. Their changing ratios might indicate the fact that a shift is in progress from [ø] hesitations toward the use of a conventional conjunction (tehát, in the form of tát) in a new function. Lexical items that are perceived as ‘real’ words (as distinct from meaningless izé or [ø]) are less suspicious. It takes time for the listener to identify them as fillers, and they may escape detection altogether. The use of the ‘new fillers’ is catching on fast. Both of them are highly successful in fulfilling their new role in spontaneous speech. Originally a conjunction, the use of the word tehát looks like a good strategy to adopt when problems arise in spontaneous speech. The original meaning of this conjunction often appears to fit the actual context, making its new function less conspicuous. Although it might in fact be an intruder signaling speech-planning problems, this is not evident to the listener. By the time the message following the word tát is decoded, the listener might not be able to recall the actual function (or semantics) or reflect on whether tát was used as a conjunction or as a device to gain more time for speech planning. Since the majority of tát fillers occur at syntactically complete places, they may easily escape the listener's attention. These factors might explain the successful spreading of this word in its new function. In principle, the same applies to the pronoun ilyen.
The findings confirmed our hypothesis on function-dependent articulation of the two words under study, leading to the conclusions that: (i) acoustic-phonetic analysis may shed light on possible synchronic changes taking place in a language, and (ii) we may make a general prediction of some pronunciation modifications for words used frequently as fillers or discourse markers, particularly in their durations (cf. also the findings of Gahl Reference Gahl2008 and for Hungarian, Dér & Markó Reference Dér and Markó2010). As Local (Reference Local2003: 336) points out, if we are to understand the workings of phonetic detail and its variability, we need to relate the phonetic details of utterances in spontaneous speech to various categories and levels of analysis.
Acknowledgements
We would like to thank first of all Dr Adrian Simpson, Dr Pál Heltai and András Beke for their help in preparing this paper, and three anonymous referees for their helpful comments on its earlier versions. This work was supported by the OTKA project No. 78315.