Hostname: page-component-745bb68f8f-l4dxg Total loading time: 0 Render date: 2025-02-06T04:04:38.022Z Has data issue: false hasContentIssue false

Acoustic correlates of rhythm in New Zealand English: A diachronic study

Published online by Cambridge University Press:  30 March 2012

Jacqui Nokes
Affiliation:
University of Canterbury
Jennifer Hay
Affiliation:
University of Canterbury
Rights & Permissions [Opens in a new window]

Abstract

This paper reports on a large-scale diachronic investigation into the timing of New Zealand English (NZE), which points to changes in its rhythmic structure. The Pairwise Variability Index (PVI) was used to measure the mean variation in duration, intensity, and pitch of successive vowels in the speech of over 500 New Zealanders, born between 1851 and 1988. Normalized vocalic PVIs for duration have reduced over time, after allowing for changes in speech rate, supporting existing findings that stressed and unstressed vowels are less differentiated by duration in modern NZE than in other varieties of English. Rhythmically, syllable duration may be playing a reduced role in signalling prominence in NZE. This is supported by the finding that there have been contemporaneous changes in pitch and intensity variation. We discuss external and internal influences on the timing of NZE, including contact with Māori, the emergence of Māori English, and diachronic vowel shift.

Type
Research Article
Copyright
Copyright © Cambridge University Press 2012

INTRODUCTION

New Zealand English (NZE) is commonly reported to be more “syllable-timed” than other varieties of English (Holmes & Ainsworth, Reference Holmes and Ainsworth1997; Szakay, Reference Szakay, Warren and Watson2006; Warren, Reference Warren1998). This finding tends to be based on investigation of the durational variability of a modest number of consecutive vocalic sequences, produced by a small set of speakers of contemporary NZE. In this paper, we take advantage of the availability of a large diachronic corpus to conduct the first large-scale investigation into speakers' use of variation in duration across the history of NZE. We also investigate concurrent changes in pitch and intensity variation.

SPEECH RHYTHM

Speech rhythm is the patterning of prominent elements in spoken language, as perceived by the listener. Prominence is achieved by manipulating a variety of acoustic-phonetic parameters, such as duration, intensity, intonation contours, and pitch, which are used to create cyclical prosodic patterns (Kohler, Reference Kohler2009a). Cross-linguistically, different strategies are used by different languages to emphasize syllables (Dauer, Reference Dauer1983). The use of a prosodic feature for rhythmic purposes is affected by its other linguistic functions within a language. For example, Fant, Kruckenberg, and Nord (Reference Fant, Kruckenberg and Nord1991) found that syllable stress correlated with lengthening in English and Swedish to a greater degree than in French, which demarcates prosodic phrase boundaries using final syllable lengthening (Fletcher, Reference Fletcher1991). Languages such as Sāmoan and Māori with phonemic length distinctions appear to rely less on prolonging syllables for accentual purposes (see also Roach, Reference Roach and Crystal1982). It is expected that contact between languages that employ different strategies to realize prominence will affect one or both varieties, and indeed evidence has been produced to show that Māori has undergone rhythm and vowel quality changes as a result of contact with English (Harlow, Keegan, King, Maclagan, & Watson, Reference Harlow, Keegan, King, Maclagan, Watson, Stanford and Preston2009; Maclagan, Watson, King, Harlow, Thompson, & Keegan, Reference Maclagan, Watson, King, Harlow, Thompson and Keegan2009). There has been speculation that Māori, in turn, has affected the timing of NZE, which is often perceived to be more syllable-timed than other varieties of English (Holmes & Ainsworth, Reference Holmes and Ainsworth1997; Szakay, Reference Szakay, Warren and Watson2006; Warren, Reference Warren1998).

QUANTIFYING SPEECH RHYTHM

Speech rhythm is commonly described using the stress-timed versus syllable-timed dichotomy, developed to categorize speech rhythm cross-linguistically (Abercrombie, Reference Abercrombie1967; Classe, Reference Classe1939; Pike, Reference Pike1946). Abercrombie (Reference Abercrombie1967) developed the ideas of Pike and Classe by grouping English, Russian, and Arabic as languages that regularize the occurrence of prominent syllables, primarily by compressing or deleting intervening unstressed syllables. He contrasted such stress-timed languages with syllable-timed languages such as French, Telegu and Yoruba which, it was proposed, regularise the duration of syllables resulting in asynchronous prominences or stresses. This schema was expanded to include mora-timing for languages such as Japanese (Han, Reference Han1964; Ladefoged, Reference Ladefoged1975) and Māori (Bauer, Reference Bauer1993); subsequently other languages have been classified as belonging to one or the other category (Dauer, Reference Dauer1983:56 gave a referenced selection). However, attempts to demonstrate the regularity of stressed feet or syllables in languages so-categorized have not been successful (see Arvaniti, Reference Arvaniti2009; Cutler, Reference Cutler, Sundberg, Nord and Carlson1991; Grabe & Low, Reference Grabe, Low, Gussenhoven and Warner2002; and Lehiste, Reference Lehiste1980, for reviews). It is clear that the perception of rhythm is not based solely on timing, and conversely, that timing is affected by considerations other than rhythm. These include the word position and complexity of syllables (Beckman, Reference Beckman, Tohkura, Vatikiotis-Bateson and Sagisaka1992; Dauer, Reference Dauer1983; Fant, Reference Fant2004; Lehiste, Reference Lehiste1980); the syllabic composition of the stress foot (Lehiste, Reference Lehiste1980); the register and genre of an utterance (Arvaniti, Reference Arvaniti2009); and the oratory ability and communicative intent of the speaker (Kohler, Reference Kohler2009b). The failure to demonstrate categorical isochrony by stress, syllable, or mora cross-linguistically has been one motivation for the proposal that languages fall on a continuum between the two extremes (Dauer, Reference Dauer1983; Grabe & Low, Reference Grabe, Low, Gussenhoven and Warner2002; Roach, Reference Roach and Crystal1982), and for the development of more sophisticated metrics to index rhythmic timing.

Rhythm metrics in current use include the Pairwise Variability Index (PVI), an index of mean difference in a given acoustic measure across successive linguistic units; VarcoV, defined as the standard deviation of vowel duration divided by mean vowel duration; %V, defined as the vocalic proportion of an utterance duration; and ΔC, defined as the standard deviation of consonantal duration. Ramus, Nespor, and Mehler's (Reference Ramus, Nespor and Mehler1999) study of English, Polish, Dutch, French, Spanish, Italian, Catalan, and Japanese concluded that combined %V and ΔC scores best supported distinct stress-, syllable, and mora-timing categories. However, Polish was found to have “mixed” rhythm. Grabe and Low (Reference Grabe, Low, Gussenhoven and Warner2002) calculated normalized vocalic PVI and raw consonantal PVI for 18 languages, and compared the resulting language groupings with traditional rhythm classifications. Rhythmically prototypical languages were classified as expected, but nine languages fell in between traditional categories, including Polish. Arvaniti (Reference Arvaniti2009) presented preliminary results from a comparative study of timing metrics across English, German, Italian, Korean, Greek and Spanish. Arvaniti, Ross, and Ferjan (preliminary findings, cited in Arvaniti, Reference Arvaniti2009) calculated PVIs, Varcos, ΔC and ΔV, and %V for each language, and compared the resulting groupings. Arvaniti concluded that not only do different measures produce different categorizations for nonprototypical languages, but different elicitation methods and utterance constructions result in varying scores for a single language. In sum then, the theory that languages have a categorical rhythm type based on timing is largely discredited, and the various rhythm metrics in current use fall short of satisfactorily providing a comprehensive typology of rhythm cross-linguistically.

RHYTHM AND DURATION IN STRESS-TIMED LANGUAGES

Nevertheless, what Beckman (Reference Beckman, Tohkura, Vatikiotis-Bateson and Sagisaka1992:458) called the “persistent metaphor” of stress- versus syllable-timing remains a popular shorthand for characterizing the rhythmic properties of languages, seemingly indexing a perceivable difference in the patterning of prominent syllables (Ramus, Dupoux & Mehler, Reference Ramus, Dupoux and Mehler2003). White, Mattys, Series, and Gage (Reference White, Mattys, Series and Gage2007) demonstrated that listeners could perceive differences between two different languages, and between accents within a language, if timing measures for the languages diverged sufficiently. White et al. synthesized sasa syllables from the speech of four speakers for each of three different British English varieties and for Castilian Spanish. Utterance-initial syllables and final stressed syllables were excised, and pitch was leveled to remove additional prosodic cues. Significantly different scores for VarcoV and %V predicted higher-than-chance accuracy in tasks where the participants were asked to decide whether two varieties were the same or different. Timing differences are therefore used by listeners to discriminate between languages. Note though that variation in intensity was not controlled for.

Ultimately, the question of whether timing is a fundamental cause of perceptual differences in rhythm, or merely a surface realization of the underlying structural differences that influence prosody, remains open for debate (Fenk & Fenk-Oczlon, Reference Fenk, Fenk-Oczlon, Spiliopoulou, Kruse, Borgelt, Nürnberger and Gaul2006). What does seem clear is that in languages traditionally considered stress-timed, such as English, the duration of linguistic units is a key parameter in rhythmic structure. Stressed vowels tend to be more fully articulated in English and consequently have a longer duration than their unstressed counterparts. In contrast, unstressed vowels are typically reduced or deleted. Other factors held equal, a longer vowel length will give rise to a percept of syllable stress, and thus rhythmic prominence, in English. The alternation of stressed and unstressed vowels leads to variation in the duration of successive vowels. Perhaps the term “stress-timed” is best employed to differentiate languages that rely heavily on temporal manipulation of syllables for emphasis, from languages that rely more on other acoustic-phonetic parameters as a cue for prominence (traditionally “syllable-timed”).

In this sense, there is some evidence that NZE is less stress-timed than other varieties of English are. Anecdotal observations that NZE speakers produce more peripheral vowels in unstressed syllables than speakers of British English (Hay, Maclagan, & Gordon, Reference Hay, Maclagan and Gordon2008) have been supported by quantitative studies. Warren (Reference Warren1998) examined 3508 syllables from the speech of newscasters and found that there was less variation in syllable duration in NZE than in British English (BE). While the percentage of full vowels was similar for the PākehaFootnote 1 English (PE) speakers and the BE speakers, syllabic PVI measures of Pākeha speech were intermediate between BE and Māori English speakers. Additionally, Warren found an inverse relationship between speech rate and full vowel production in NZE, which contrasted with the slower rate and high level of reduction in the BE sample. The hypothesis that contact with (mora-timed) Māori has affected NZE receives support from the finding that speakers of Māori English produce more fully articulated vowels than speakers of Pākeha English do (Holmes & Ainsworth, Reference Holmes and Ainsworth1996, Reference Holmes and Ainsworth1997; Szakay, Reference Szakay, Warren and Watson2006; Warren, Reference Warren1998). It was posited that this results from featural transference from Māori, where vowel length is phonemic (Bauer, Reference Bauer1997). Szakay (Reference Szakay, Warren and Watson2006) elicited speech from NZ Māori and Pākeha speakers, and analyzed a total of 3281 vocalic segments. Her data showed that both younger Māori and younger Pākeha speakers produce more syllable-timed speech than their older counterparts do, suggesting a progressive change in the timing of NZE. While not statistically significant, Szakay's data also showed a tendency for female Māori to be less syllable-timed than male Māori are, and for female Pākeha to be more syllable-timed than male Pākeha are. If convergence under the influence of contact is occurring, females in both ethnic groups are likely to be leading the change (Labov, Reference Labov2001:501). To explore the possibility of a historical change in rhythmic structure, we extended the work of Szakay (Reference Szakay, Warren and Watson2006) and Warren (Reference Warren1998) by carrying out a large-scale survey of the variation in segmental duration in the speech of New Zealanders born over the past century and a half.

MEASURING DURATIONAL VARIATION

The PVI was developed specifically to characterize variation between successive speech units and thus is well suited to exploring vowel length variation due to the alternation of stressed and unstressed syllables. The PVI has several advantages: It is a general measure that can be applied to duration, intensity, or pitch over segments, syllables, or feet, and so can be extended to incorporate other acoustic correlates of rythm. It can be normalized to reduce the effect of local fluctuations in speech rate. It captures the distribution of variation more effectively than a simple average, which can result in homogenization. We elected to measure PVI at the segmental level, calculating normalized PVIs for vowels and raw PVIs for consonants, using the formulae in Figure 1 (adapted from Grabe & Low, Reference Grabe, Low, Gussenhoven and Warner2002; consonantal PVI is multiplied by 1000 instead of 100 to give a measure in milliseconds).

Figure 1. Where n is the number of (vocalic or consonantal) segments, d i+1 is the duration of the (i + 1)th segment, and d i is the duration of the ith segment. Multiplication of normalized PVI by 100 is arbitrary for readability: the quantity is an index with no units. Raw PVI is measured in seconds, as it is calculated from durations in milliseconds.

Normalized PVI is used to neutralize the effect of speech rate on vowel duration. For consonants, manner of articulation (e.g, the presence of voicing) and syllable onset or coda composition seemed as likely as speech rate to affect duration. As these factors could not be controlled for without introducing considerable complexity, it was deemed preferable to make our initial calculations with raw numbers. A pilot run across all speakers, prior to quality assurance procedures, showed minimal change in consonantal PVI over time. Consequently the remainder of the study focuses on vocalic PVI.

Recently, authors have argued that either the syllable or foot, or both, are preferred units for the characterization of rhythm across languages (Kohler, Reference Kohler2009b; Nolan & Asu, Reference Nolan and Asu2009). Definitions of foot and syllable remain challenging cross-linguistically however, and reliance on manual segmentation necessarily restricts the range of speakers and speech units that can be analyzed. The examination of durational variation based on segmental measurements has the advantages of relative simplicity and reproducibility, with the added bonus that the robust automatic segmentation procedures now available allow for large-scale investigations of speech corpora. There is, then, a case for piloting a study of rhythm in a stress-timed language by investigating segmental timing.

The generality of the PVI measure also allows for investigation of other reflexes of prominence; indeed, it was applied to intensity variation in its original debut (Low, Reference Low1998). If prominence is created by language-specific bundles of prosodic variables, then a change in the rhythm profile of a language is likely to result not from the simple addition or subtraction of a variable, but from a more fluid shift in the weighting of the variables employed, just as a vowel shift affects surrounding vowels by altering perceptual boundaries. If segment duration is playing a reduced role in cuing stress in NZE, we might expect to observe changes in the contribution of other factors to rhythmic prominence. Intensity and pitch are two key variables that affect the perception of rhythm in languages (Arvaniti, Reference Arvaniti2009; Kohler, Reference Kohler2009c); their variability in running speech can be indexed using the PVI, in the same way as for duration. Given that these measures vary continuously over larger speech units, such an approach represents an even greater over simplification than the reduction of timing to a pairwise measure, but we emphasise again that this is intended as a preliminary, quantitative approach. It will not provide a comprehensive characterization of speech rhythm. Rather, it can uncover useful information about synchronic and diachronic trends, which can then be pursued with targeted qualitative analysis.

RESEARCH QUESTIONS

There are four key questions to be explored in this preliminary study of diachronic change in the rhythm of NZE. First, is there evidence of a reduction in variation of vowel durations in NZE over time? We have some evidence that contemporary speakers of NZE produce more full vowels than their BE-speaking counterparts do. Presumably this difference has arisen since the arrival of British settlers in New Zealand, and thus the progress of the change should be observable in the speech of New Zealanders born since then.

Second, do social factors have any bearing on the progress of any change in timing? Szakay's (Reference Szakay, Warren and Watson2006) data suggested that a move away from stress-timing in NZE may be more advanced among females than males. Using our corpora, we can consider the effects of gender on any change; additionally, our contemporary corpus has social class information that may offer further insights into social factors facilitating a rhythmic timing change.

Third, is there evidence that such a reduction in variability is due to an increase in the occurrence of fully articulated vowels? Evidence for this proposal remains anecdotal thus far. It has also been observed that speakers of NZE speak more quickly than speakers of other varieties of English. Warren's (Reference Warren1998) data show a relationship between the occurrence of full vowels and speech rate in NZE. As faster speech is associated with greater gestural efficiency, if New Zealanders' speech rate has increased, we might expect to see a reduction in duration variability, but one resulting from a general trend to more reduced vowels, rather than more full vowels. These two trends are in tension.

Fourth, if duration is becoming less significant in signaling prominence in NZE, is there evidence that either pitch or intensity, or both, are becoming more important? Szakay's (Reference Szakay, Warren and Watson2006) measures of pitch suggested that while mean pitch is an ethnic category indicator, pitch variability is not. In theory, this makes it more available for manipulation by speakers to signal prominence. The role of intensity in the percept of rhythm has not been examined in NZE; other than Low's (Reference Low1998) study of Singaporean English, there have been few applications of the PVI to investigate intensity variation in other dialects or languages (though see Ballard, Robin, McCabe, & McDonald, Reference Ballard, Robin, McCabe and McDonald2010, for clinical application in dysprosody treatment evaluation).

METHODOLOGY

Raw speech data was drawn from three corpora in the Origins of New Zealand English project (‘ONZE’: Gordon, Maclagan, & Hay, Reference Gordon, Maclagan, Hay, Beal, Corrigan and Moisl2007): the Mobile Unit, with New Zealanders born between 1851 and 1904 (MU); the Intermediate Archive, with speakers born between 1882 and 1963 (IA); and the Canterbury Corpus (CC), with speakers born between 1926 and 1987. The majority of speakers are European-descended and born in the South Island of New Zealand. The MU recordings were carried out in the mid-1940s, with speakers aged 43 to 96 years (mean age = 71.0); IA speakers were recorded in the early to mid-1990s, aged 31 to 99 years (mean age = 76.0); and CC recordings have been ongoing since the mid-1990s, and cover speakers aged 18 to 68 years (mean age = 37.5). Consequently, there is a very strong negative correlation between a speaker's date of birth (DOB) and age at the time of interview (Spearman's rank correlation rho = −9488, p < 000), such that those later-born tend to be younger than those earlier-born. We acknowledge that this must be taken into consideration when discussing diachronic change based on DOB, as there is evidence that speech rate may decrease for older speakers, particularly with declining health (Ramig, Reference Ramig1983). It is also known that fundamental frequency declines during adulthood, a trend that reverses in later years amongst males so that their pitch increases in old age (Reubold, Harrington, & Klebera, Reference Reubold, Harrington and Klebera2010). As separation of DOB and speaker age is problematic, we present our models using DOB to index change over time, with caveats as necessary in the discussion. Given that age-based changes in speech rate may differ across dialects (Jacewicz, Fox, O'Neill, & Salmons, Reference Jacewicz, Fox, O'Neill and Salmons2009), follow-up work on factors affecting speech rate in NZE is a priority for this line of research.

Speech from recorded interviews was automatically segmented, using the Hidden Markov Model Toolkit, or HTK (Young, Evermann, Hain, Kershaw, Moore, Odell, Ollason, Povey, Valtchev, & Woodland, Reference Young, Evermann, Hain, Kershaw, Moore, Odell, Ollason, Povey, Valtchev and Woodland2002) to carry out phoneme alignment in ONZE Miner (Fromont & Hay, Reference Fromont and Hay2008). In speech recognition applications, HTK uses probabilistic mathematical models to predict the underlying phonemes represented by observed acoustic data. This process involves several steps. A sound file is first manually transcribed and time-aligned into lines of speech. Transcripts are uploaded to ONZE Miner, which produces phonemic labels for the main speaker's utterances, based on ONZE Miner's orthographic-phonemic dictionary entries (built from CELEX [Baayen, Piepenbrock, & Gulikers, Reference Baayen, Piepenbrock and Gulikers1996] with supplementary entries provided by hand). Routines provided by the HTK Toolkit are then used to time-align phoneme labels with the sound files. ONZE Miner's dictionary contains multiple phonemic entries for many words that vary in their pronunciation or suffer reduction in running speech. For example, the word ‘and’ has the entries /ænd/, /ənd/, /ən/, /nˌ/, /mˌ/ and /ŋˌ/. During the alignment process, the best phonemic representation is inferred from the acoustic signal. The dictionary is a work-in-progress and by no means exhaustive, but the ability to select the most likely pronunciation where phonemic alternates are available improves the fidelity of the transcription considerably and caters to some extent for reduction in running speech. The alignment process is iterative: All speech files for a particular speaker are analyzed in a training phase that establishes speaker-specific acoustic parameters, then forced alignments are carried out in a second pass through the files.

Initially, PVI measures were calculated on “successive pairs of vowels” (Low, Grabe, & Nolan Reference Low, Grabe and Nolan2000:382). This was later defined as vocalic and intervocalic intervals (Grabe & Low, Reference Grabe, Low, Gussenhoven and Warner2002), used in order to avoid subjective division of the acoustic signal into phonemic segments. As the HTK segmentation is based on speaker-specific, statistical mapping of acoustic features to phonemic transcriptions, we investigated the use of segmental, rather than intervallic, PVI measures. If segmentation is reliable and objective, this approach is more aligned with the idea that prominence in English is primarily driven by differences in the realization of syllable nuclei. Calculating PVI using vocalic intervals that may extend over multiple syllables and words is undesirable. For example, in a phrase like “the red fire engine” (/ðə ˌrɛd ˈfaɪ.ə ˌɛn.ʤən/),Footnote 2 an intervallic approach would compare the lengths of the successive vocalic intervals /ɛ/ and /aɪəɛ/, then /aɪəɛ/ with /ə/ and so on. The segmental approach compares /aɪ/ with /ə/, then /ə/ with /ɛ/, which is intuitively preferable. Comparisons of segmental and intervallic normalized PVI (nPVI) are given in the appended Table A1; while PVI is inflated by the inclusion of polysyllabic feet, our statistical models were found to be robust over both measures.Footnote 3 Hence, while all statistics reported in this paper are based on segmental PVI, they do not rely solely on this method of calculation. There is a stronger argument for using intervallic PVI for intrasyllable consonant clusters, but we leave an empirical evaluation of the best methodology for intervocalic segments to a later study.

Using the automatically aligned segment labels, initial calculations of raw consonantal PVI (rPVI(C)) and normalized vocalic PVI (nPVI(V)) were carried out in Praat (Boersma & Weenink, Reference Boersma and Weenink2010). All vowel segments below each speaker's ceiling duration, including those in phrase-final syllables, were included. This produced nPVI(V) scores between 52.5 and 83.6, with a median value of 66.1. Six speakers were then selected for hand-correction of the phoneme segments, as shown in Table 1. These speakers were chosen to represent a compact sample of ages, sexes, recording formats, and initial nPVI(V) values.

Table 1. Speakers selected for hand-correction of automatic phoneme segmentations. A male and a female speaker were chosen from each corpus used in the analysis, to represent a range of nPVI(V) scores, birthdates, and number of phonemes (consonants plus vowels) automatically segmented. %Adjust gives the percentage of total intervals that were subject to boundary adjustments during the hand-correction process

Overall, the automatic segmentations were good, but shorter speech samples and overlapping speakers or noise, such as laughter, created problems for the HTK forced alignments. The speakers TC and myp01-1b, with the lowest segment counts, required the most segment boundary adjustments, followed by fyn94-20b, whose recordings frequently included overlapping speech. Other factors giving rise to incorrect segmentation were errors in transcription (such as spelling mistakes or poorly aligned line breaks), incorrectly selected dictionary entries (where more than one phonemic representation might be valid, e.g. /hwɪtʃ/ or /wɪtʃ/), unstressed vowels that were deleted in the signal but retained in the phonemic transcription, and transitions between vowels and nasals. A Praat (Boersma & Weenink, Reference Boersma and Weenink2010) script was used to calculate the number of segments amended for each speaker. In most cases, there is a clear relationship between the number of observations, i.e. total segments, for a speaker, and the extent of the correction required. Speaker fyn94-20b is an exception. Her recordings include significant amounts of overlapping speech and laughter, which caused localized disruption of segmentation accuracy despite the large number of segments.

Following hand-correction, the nPVI values for the selected speakers were recalculated, along with standard deviations for each speaker's vowel length. Based on the assumption that outlying (i.e., excessively high) values were due to segmentation errors, a ceiling value for vowel duration was calculated for each speaker based on the standard deviation. Due to the skewed nature of the distribution of vowel durations, the ceiling value was calculated as the speaker's mean vowel length plus four times the standard deviation.Footnote 4 Revised calculations of vocalic PVI excluded any vowel segment that exceeded the speaker's ceiling value. Table 2 compares the original vocalic PVIs, the PVIs after hand-correction, and the PVIs after outliers were automatically discarded, for the six hand-corrected speakers.

Table 2. Vocalic nPVI figures. Original is based on automatically segmented speech; Hand-corrected is based on segments with manually adjusted boundaries; and Autocorrected is based on automatically segmented speech with outlying (overly large) values removed. The last column compares the autocorrected, autosegmented nPVI to the hand-corrected nPVI

The use of a ceiling value for vowel duration for each speaker brings the nPVI within 1.7% of the hand-corrected values, a reasonable level of accuracy. All speakers bar one have a slightly underestimated nPVI value using the automatic segmentation with outliers removed. The large number of corrections for TC are at odds with the apparent accuracy of that speaker's automatic segmentation and suggests that, for this speaker at least, the HTK aligned boundaries were simply consistently offset from the manually judged ones.

Further quality assurance was carried out by removing speakers with the fewest vowel and consonant segments from the remaining speaker cohort (n = 25 or ~5%, C + V < 2000), as smaller speech samples give poorer segmentation accuracy. The final dataset comprised 506 New Zealander speakers, who between them produced 1.6 million vowel segments. We then calculated the nPVI(V) across the cohort, using the calculated ceiling value for each speaker to exclude outliers. Variability indices for intensity and pitch were also calculated, giving a set of three PVIs for each speaker.

Intensity PVI was calculated using the difference in mean intensity for successive vowel segments, while pitch PVI was based on the difference in maximum pitch across successive vowel segments. Both measures were normalized to temper local variation, and in the case of intensity, to make some compensation for an inability to calibrate for different ambient sound pressure levels (SPL) across recordings. Pitch calculation used the cross-correlation method in Praat, as recommended in the Praat documentation for calculations over short time windows (Boersma & Weenink, Reference Boersma and Weenink2010), with a range of 75–300 Hz for males, and 100–500 Hz for females. All other settings were standard. As for durational PVI, speaker-specific ceilings for vowel length were applied, and outlying segments were discarded from calculations. Speakers with low numbers of vocalic segments were omitted. Only speakers with fewer than 10% undefined values for pitch were retained for analysis. This resulted in a considerable reduction of the sample size: intensity analyses were carried out for 504 speakers, but only 354 speakers met the criteria for pitch analyses. Statistical analyses and modeling was carried out using the statistical software package R (R Development Core Team, 2010).

RESULTS 1: DURATIONAL VARIATION

We first consider pairwise durational variation. Our sample of vocalic PVI has a bell-shaped distribution (see Appendix) but is not strictly Gaussian. The data shows a gradual and significant reduction in vocalic PVI over the history of NZE (Figure 2) – or, in traditional terms, a move toward increased syllable timing (Spearman's rho = −.3086, p < .0000).

Figure 2. Plot of vocalic nPVI versus speaker's year of birth. Speakers from the ONZE corpora (Gordon et al., Reference Gordon, Maclagan, Hay, Beal, Corrigan and Moisl2007). The line is a LOWESS scatterplot smoother using locally weighted regression.

The overall mean nPVI(V) was 65.3, with a range of 51.5 to 82.5. By way of comparison, Grabe and Low (Reference Grabe, Low, Gussenhoven and Warner2002:Appendix) calculated an nPVI(V) of 57.2 for a speaker of BE in their original study.

The Effects of Speech Rate on Durational Variation

Turning to observations of speech rate (Figure 3), we find a positive, almost linear correlation between DOB and syllable rate across the corpora, which levels off for speakers born in the 1970s and 1980s (Spearman's rho = .2864, p < .0000). A good deal of the reduction in vocalic nPVI is linked to this increase in speech rate, despite normalization, which is intended to address this issue (Spearman's rho = −.5955, p < .0000). The qualitative observations that more unstressed vowels are fully articulated in NZE (Hay et al., Reference Hay, Maclagan and Gordon2008; Warren, Reference Warren1998) are unexpected given this higher syllable rate. Ainsworth's (Reference Ainsworth1993) auditory analysis of newsreaders actually found a similar rate of full vowels in speech from commercial NZ radio stations and speech from the BBC World News – about 20% – even though the BBC speech rate was around one syllable per second slower than the NZ commercial radio stations' speech rates (Warren, Reference Warren1998; using the same data). The relationship between vowel realization and speech rate is not straightforward, as Warren observed, but there is growing evidence that faster speech reduces pairwise durational variation, particularly for languages traditionally labelled stress-timed (Krull & Engstrand, Reference Krull and Engstrand2003; Szakay, Reference Szakay2008).

Figure 3. The left panel is a plot of speech rate versus speaker's year of birth. The right panel is a plot of vocalic nPVI against speech rate. Speakers from the ONZE corpora (Gordon et al., Reference Gordon, Maclagan, Hay, Beal, Corrigan and Moisl2007). Lines are LOWESS scatterplot smoothers using locally weighted regression.

Evidence that an increased syllable rate – which may at least in part be driven by the decreasing average age of speakers across our corpora – is not solely responsible for the observed reduction in nPVI(V) comes from the different patterns in rate change by sex (Figure 4). The absolute starting value and the increase in rate over time are similar for both male and female cohorts. A Wilcoxon rank sum test revealed no significant difference in the speech rate of the sexes (p = .1095). Speech rates level out for males born from the 1950s on, but in Figure 5, we see that male speakers born in this period have progressively reducing PVI(V) values. In contrast, the increase in speech rate for females is continuous over the period, while there is very little movement in nPVI(V) until the 1930s.

Figure 4. Plot of speech rate versus speaker's year of birth, split by gender. Female speakers are shown in the left panel, and male speakers in the right panel. Speakers from the ONZE corpora (Gordon et al., Reference Gordon, Maclagan, Hay, Beal, Corrigan and Moisl2007). Lines are LOWESS scatterplot smoothers using locally weighted regression.

Figure 5. Plot of vocalic nPVI versus speaker's year of birth, split by gender. Female speakers are shown in the left panel, and male speakers in the right panel. Speakers from the ONZE corpora (Gordon et al., Reference Gordon, Maclagan, Hay, Beal, Corrigan and Moisl2007). Lines are LOWESS scatterplot smoothers using locally weighted regression.

Due to its importance as a predictor of nPVI(V), a linear regression model was constructed in R to test factors affecting speech rate. Data from MU and IA was combined to improve the sample size, as the elicitation parameters were similar (n MU+IA = 131, n CC = 375). The binary variable bCorpus, along with DOB and Gender, were included as predictors for the full cohort. Only DOB reached significance (p = .0085). In a model for CC only, which included Class and a quadratic term for DOB, only DOB and DOB2 were significant predictors. Neither model, however, had any useful explanatory value (R 2 = .0805 and .0368, respectively). If DOB was replaced by its highly correlated variable Age Interviewed, model explanatory value was reduced (R 2 = .0722 and .0317, respectively).

Social Factors and Durational Variation

When the data is split by sex (Figure 5), we find females have a lower starting nPVI(V) than males do. There is a smaller downward trend in the nPVI(V) of the females; males have undergone the greatest reduction over the period evaluated. If we assume a typical S-curve for a change in progress (Chambers, Reference Chambers, Chambers, Trudgill and Schilling-Estes2004), the fact that males are in a steeper part of the curve is consistent with the hypothesis that the change in rhythmic timing is more advanced amongst women. This only holds if we assume a similar starting value of nPVI(V) at some point prior to the time period covered by our dataset. An anonymous reviewer points out that the data also supports a hypothesis of male-led change, if women had a lower initial nPVI(V) and did not begin to shift until the 1920s. When the 20th century-born CC speakers are isolated (n = 375), the negative correlation between date of birth and nPVI(V) is no longer significant for females (n = 188), while the negative correlation for males holds (Spearman's rho = −.1825, p = .0124, n = 187).

CC speakers are tagged as professional or non-professional (NP) speakers, giving a gross binary categorization of social class. There is a decline in the vocalic nPVI of non-professional speakers across the 20th century. NPs start from a higher baseline than professionals, suggesting that the change is more advanced amongst the latter, who also have shorter mean vowel durations (Wilcoxon rank sum p = .0416, n = 184). The decrease amongst NPs is significant (Spearman's rho = −.2115, p = .0033, n = 191). Combining the factors of sex and class, NP males have had the greatest and most consistent decrease in nPVI(V), from the highest starting point. Professional males and NP females born in the 1950s and 1960s pattern together, showing a slight increase in durational variation versus earlier and later speakers. There has been no change in the speech of professional females.

To more clearly establish the effect of different social factors, a Classification and Regression Tree (CART) analysis, which recursively partitions the data in order to produce a best-fit predictive model for the outcome variable, was carried out on the CC data. The resulting regression tree is reproduced in Figure 6 and shows that for the fastest speakers (SyllablesSec > 5.654) speech rate is the over-riding factor determinant of nPVI(V); in other words, in fast speech durational variation varies inversely with syllable rate.

Figure 6. Recursive regression tree showing predictors for durational variability, measured as nPVI(V). Speakers from the Canterbury Corpus, ONZE Project (Gordon et al., Reference Gordon, Maclagan, Hay, Beal, Corrigan and Moisl2007).

Once again, we note this effect obtains despite normalization of the metric. Class partitions mid-rate speakers (5.125 ≤ SyllablesSec < 5.654), with professionals having a lower nPVI(V) than NPs. Amongst the slowest speakers (SyllablesSec < 5.125), the latest-born have moderate durational variability, while Gender divides the rest into less-variable females and more-variable males. This suggests that there are two sets of principles at work: physiological/articulatory factors, which reduce acoustic distinctions between stressed and unstressed vowels as speech rate increases and segment length decreases; and social factors, which are overridden in faster speech and only come into play when the need for strict efficiency in articulation is relaxed.

Register and Durational Variation

Arvaniti (Reference Arvaniti2009) discussed the effect that manner of elicitation can have on the value of timing metrics. Level of formality and degree of spontaneity can affect both speech rate and care in articulation (though not always in the same way across speakers). The three different corpora included in the study were recorded for different purposes. Both the MU and the IA data were elicited by professional interviewers: the former for radio broadcast, the latter primarily for oral history projects. The interviewer was not personally known to the speakers, and some speakers had prepared for their interview by writing down or giving considerable thought to what they wanted to say. In contrast, CC speakers were interviewed by students, usually a friend or family member, in informal circumstances. It is valid to ask whether a difference in register exists between the corpora. If so, this might be related to the change in speech rate and by implication nPVI(V). In the model for speech rate discussed earlier, however, no effect of corpus was obtained. There is no evidence that the increase in speech rate across the ONZE speakers can be attributed to the different circumstances of data collection in the corpora, though it is very likely that the difference between MU/IA and CC in average age of the interviewed speakers has at least some effect.

Modeling Durational Variation

All of the factors observed to affect variation in vocalic durations were combined in a linear regression model to predict nPVI(V). The first model is for the entire cohort of analyzed speakers and does not include social class. The coefficients and an analysis of variance (ANOVA) for the model are given in Tables 3 and 4.

Table 3. Coefficients of linear regression model to predict the value of nPVI(V) for speakers in the ONZE corpora (Gordon, et al., Reference Gordon, Maclagan, Hay, Beal, Corrigan and Moisl2007). Adjusted R 2 = .3347

Table 4. Analysis of variance for linear regression model to predict the value of nPVI(V) for speakers in the ONZE corpora (Gordon et al., Reference Gordon, Maclagan, Hay, Beal, Corrigan and Moisl2007)

As expected, an increase in speech rate (SyllablesSec) results in a significant reduction in variability of vowel durations, as all vowels are shortened. Higher values for birth year reduce variability, and this effect is increased for males. Males start from a higher baseline for nPVI(V) than females do, which is reflected in the larger coefficient for Gender = M.

The second model is based on the CC data only and includes social class. Coefficients and the results of an ANOVA for the model are given in Tables 5 and 6. Similar effects obtain, with the addition that a higher social class further reduces nPVI(V). An interaction between DOB and Gender is also present, such that later-born males have lower variability.

Table 5. Coefficients of linear regression model to predict the value of nPVI(V) for speakers in the Canterbury Corpus, ONZE Project (Gordon et al., Reference Gordon, Maclagan, Hay, Beal, Corrigan and Moisl2007). Adjusted R2 = .2962

Table 6. Analysis of variance for linear regression model to predict the value of nPVI(V) for speakers in the Canterbury Corpus, ONZE Project (Gordon et al., Reference Gordon, Maclagan, Hay, Beal, Corrigan and Moisl2007)

Articulation of unstressed vowels

Is the data congruent with the proposal that New Zealanders produce more fully articulated vowels in unstressed speech? As our data does not partition stressed and unstressed vowels, we can only approach this question obliquely. There is a reduction in mean vowel length over time in NZE, partly as a result of increased speech rate. However, over and above the effect of speech rate, DOB is a highly significant and sizeable predictor of shorter mean vowel length. A comparison of effect size can be obtained by multiplying the mean speech rate of 5.1 syllables/sec, and the mean birth year of 1945, with their respective coefficients in Table 7 for a CC speaker. DOB clearly has the most substantial effect.

\eqalign{\hbox{Mean vowel length} & = 0.3068 + \lpar 5.1 \times -0.0099\rpar + \lpar 1945 \times -0.0001\rpar + 0 \cr & = 0.0618 \ or \ 61.8 \hbox{\ milliseconds}}

Table 7. Coefficients of linear regression model to predict mean vowel duration in milliseconds. Adjusted R 2 = .5328. Speakers from the ONZE Project (Gordon et al., Reference Gordon, Maclagan, Hay, Beal, Corrigan and Moisl2007)

Coefficient significance is given in Table 8. This reduction in duration is not suggestive of a general move to a higher proportion of fully articulated vowels in NZE. A future comparison of durative difference and vowel quality between stressed and unstressed syllable nuclei may provide further insight into this question.

Table 8. Analysis of variance for linear regression model to predict mean vowel duration in milliseconds, data from the ONZE Project (Gordon et al., Reference Gordon, Maclagan, Hay, Beal, Corrigan and Moisl2007)

DOB and bCorpus are not independent (see Methodology), so variance inflation factors were calculated for each parameter in the model (Table 9). While there is clearly collinearity, the magnitude of inflation was not deemed to invalidate the model given the sample size, R 2, and the model's descriptive intent (O'Brien, Reference O'Brien2007). Rather, exclusion of either bCorpus or DOB results in a model that greatly overstates the effect of the remaining predictor.

Table 9. Variance inflation measures for factors in model to predict mean vowel duration

RESULTS 2: PITCH AND INTENSITY VARIATION

To address our final research question, regarding a shift in rhythm quality, we now turn to intensity and pitch PVIs, which are plotted against DOB in Figure 7.

Figure 7. Plot of vocalic nPVI versus speaker's year of birth. Variation in mean vocalic intensity is shown on the left (n = 501), and variation in maximum vocalic pitch is shown on the right (n = 354). Speakers from the ONZE Project (Gordon et al., Reference Gordon, Maclagan, Hay, Beal, Corrigan and Moisl2007). Lines are LOWESS scatterplot smoothers using locally weighted regression.

The increase in intensity nPVI with DOB is significant overall (Spearman's rho = .1295, p = .0037, n = 501) and for males alone (Spearman's rho = .1588, p = .0116, n = 252), but not for females. The effect is centered in the earlier speaker cohort (MU + IA: Spearman's rho = .1588, p = .0116, n = 131), as no significant relationship between intensity nPVI and DOB is found in CC. There is no direct correlation between intensity nPVI and speech rate.

Historically, there is a near-significant pitch nPVI decline from earlier- to later-born male speakers (Spearman's rho = −.1464, p = .0584, n = 168). In contrast, if CC speakers are considered separately (Figure 8), there is a significant increase in pitch variability from earlier- to later-born female speakers (Spearman's rho = .1873, p = .0322, n = 131).

Figure 8. Plot of vocalic nPVI for maximum pitch versus speaker's year of birth, split by gender. Speakers from the Canterbury Corpus, ONZE Project (Gordon et al., Reference Gordon, Maclagan, Hay, Beal, Corrigan and Moisl2007). Lines are LOWESS scatterplot smoothers using locally weighted regression.

Males appear to be following the females after a two decade lag (Figure 8); drilling down in the data reveals that the reversal is occurring amongst the youngest NP males, but no further significant correlations arise due to the recent change in direction of the historic downward trend. There is no correlation between speech rate and pitch nPVI. Pitch nPVI is weakly, but significantly affected by both mean intensity (Spearman's rho = .1508, p < .0044) and intensity nPVI (Spearman's rho = .1252, p < .0184), extending findings that as speakers increase their vocal intensity, their fundamental frequency also rises (e.g., Buekers & Kingma, Reference Buekers and Kingma1997; Gramming et al., Reference Gramming, Sundberg, Ternström, Leanderson and Perkins1988; Jessen, Köster, & Gfroerer, Reference Jessen, Köster and Gfroerer2005).

The data suggests that the decline in durational variability in NZE may have been compensated for in different ways by the sexes, with earlier-born males using intensity and later-born females using pitch to bolster the contrast between stressed and unstressed syllables. We note, however, that there are interactions among duration, pitch, and intensity that are independent of DOB and consider this further in the next section.

RELATIONSHIPS AMONG VOWEL PITCH, INTENSITY, AND DURATION VARIABILITY

Can a shift in the composition of rhythm in NZE be identified? If there is a shift, we must further consider to what extent it could be considered compensatory reconfiguring of the acoustic-phonetic components of stress, and to what extent it simply follows from physiological-articulatory principles. Intensity variation has been shown to increase over the period that durational variation has declined. However, at an individual level, speakers' duration and intensity nPVIs are positively correlated. This paradoxical result is partially resolved by observing that there has been a gradual decoupling of the two measures across the history of NZE. The correlation between duration and intensity variability is strongest in the earliest corpus (MU: Spearman's rho = .4209, p = .0011; IA: Spearman's rho = .2517, p = .0305; CC: Spearman's rho = .1417, p = .0061). In CC, amongst twentieth century speakers, positive correlations are only significant for professionals, and for women. Thus while a decrease in durational nPVI has historically gone hand-in-hand with a decrease in intensity nPVI, for certain demographics this no longer holds. Tentatively, we speculate that this may result from a compensatory increase in the use of intensity variation for rhythmic purposes amongst male speakers.

Durational nPVI and pitch nPVI analyses are nonsignificant but suggestive: In CC, amongst NPs, the two trend together for males (who have a near-significant negative correlation between pitch nPVI and DOB) and oppose for females (who have a significant positive correlation between pitch nPVI and DOB). Over time this implies a related decrease in duration and pitch variability amongst men, contrasting with a compensatory increase in pitch variability amongst women as durational PVI has decreased. This difference between the sexes is discussed further below.

To help simplify and visualize the various interactions of duration, intensity, and pitch in NZE, a principal components analysis (PCA) was carried out.

PRINCIPAL COMPONENTS ANALYSIS

The full dataset was used, discarding speakers with more than 10% undefined pitch measurements (retained n = 354). The cohort was then split by sex, and a PCA carried out on eight continuous measures: nPVI.V, rPVI.C, PVI.Int, Mean.Int, PVI.Pitch, Mean.Pitch, DOB, and SyllablesSec. Only the first three components account for a greater than average proportion of variance in the data (F: [29.6%, 19.5%, 13.9%] > 12.5%; M: [32.3%, 20.1%, 13.2%] > 12.5%). The first two components, accounting for about half the variance in the data, are plotted in Figure 9, while the composition and relative importance of the components are shown in Table 10.

Figure 9. Bi-plot of first two principal components from PCA of duration, intensity and pitch measures. Speakers from the ONZE Project (Gordon et al., Reference Gordon, Maclagan, Hay, Beal, Corrigan and Moisl2007). The left panel shows the first two components for females (accounting for 49.0% of variance); the right panel shows the first two components for males (accounting for 52.3% of variance).

Table 10. Results of principal components analysis of rhythm metrics for female speakers (a) and (b) and male speakers (c) and (d). Data for speakers in the ONZE corpora (Gordon et al., Reference Gordon, Maclagan, Hay, Beal, Corrigan and Moisl2007)

There are three main poles running through the data. The nPVI.V, rPVI.C ↔ SyllablesSec pole confirms findings that speech rate and durational variability work against one another, with a consistent effect amongst male and female speakers. The PVI.Int ↔ Mean.Int pole shows that variability in intensity moves in the opposite direction to average intensity, again consistently for all speakers. The third pole involves DOB and pitch measures, and differs by sex: for females, mean pitch and DOB move together, while for males they move in opposite directions. Pitch nPVI also opposes DOB for males, while for females it appears to align with durational measures in opposing speech rate. For both genders, the DOB ↔ pitch pole is obliquely in the same direction as both speech rate and intensity nPVI, and obliquely opposed to durational nPVIs.

The findings that women's mean pitch increases if they are born later, while later-born men have lower mean pitch and pitch variation, is almost certainly a result of the age composition in our corpora. The third component, shown in Table 10, highlights the sex difference in metric changes, with PVI.Int, PVI.Pitch, and Mean.Pitch moving in the same direction as DOB for females, but in the opposite direction to DOB for males.

In sum, we suggest that a shift in the balance of timing, intensity, and pitch may be in progress in the rhythm of NZE, but we acknowledge that questions remain about the contribution of the age profile of our corpora to the observed trends.

CONTACT-BASED INFLUENCES ON THE RHYTHM OF NZE

Returning to the research questions posed earlier, we find that there has been a change in the timing of NZE as it has evolved, and this timing change may be responsible for the perception of NZE as more syllable-timed than other varieties of English. While an increasing speech rate is a major driver of the change, it is not the sole cause, as shown by the significance of DOB in our models. Analysis of social factors suggests that the change has been led by women, and that it has reached maturity amongst the most recently born female speakers. Non-professionals, particularly males, have followed. This class effect is suggestive of change from above.

A qualitative investigation of the effects of stress on the NZE vowel space is necessary to flesh out the findings here, but certainly the quantitative evidence does not lend support to the idea that New Zealanders in our corpora are producing more fully articulated vowels in unstressed syllables. This runs contrary to theories that contact with te reo Māori (the Māori language spoken by indigenous New Zealanders) may have affected the rhythmic qualities of NZE. There are also other considerations that weaken such a theory. First, the long-short distinction in the vowel system of Māori has been diminishing over its recorded history (Harlow et al., Reference Harlow, Keegan, King, Maclagan, Watson, Stanford and Preston2009; King et al., Reference King, Harlow, Watson, Keegan, Maclagan, Reyhner and Lockard2009). If vowel length is becoming less critical for linguistic meaning in Māori, we suppose that restrictions on the use of duration as a stress variable might be relaxed. Indeed, NZE appears to be having a greater effect on the rhythm of Māori than the reverse. Vowel reduction, which was previously unattested in Māori, has been increasingly noted in recent years (King et al., Reference King, Harlow, Watson, Keegan, Maclagan, Reyhner and Lockard2009). These changes may be contributing to perceived changes in the rhythm of Māori over time (Ibid.).

Further, while revitalization efforts since the 1980s have increased Māori's social (Te Puni Kōkiri, Reference Kōkiri2010) and political standing, a lack of support during the early and mid-twentieth century saw a substantial shift toward English by ethnic Māori, so that the number of fluent te reo Māori speakers declined dramatically. It was during this period of decline, while the influence of te reo Māori and the visibility of those speaking it were at their lowest, that the change in durational PVI was at its height. According to the New Zealand Government, the first national Māori language survey in 1973 estimated the proportion of ethnic Māori able to hold a conversation in te reo at just 18%. While the proportion rose to 24% in the 2006 general census (Ministry of Social Development, New Zealand Government, 2010), it still represents only 3.5% of all New Zealanders. Furthermore, the speakers in the ONZE corpora were almost all from the South Island of New Zealand. Socially and politically more conservative than the North Island, the South Island also has, historically (McLintock, Reference McLintock1966) and presently (Statistics New Zealand, 2010), a far lower proportion of ethnic Māori than New Zealand's north has.

While direct contact with te reo Māori seems an unlikely source for rhythm changes in NZE, the idea that the timing of NZE is converging on features of Māori English provides a plausible vector for indirect contact effects. Māori English (ME) is spoken by New Zealanders who are integrated into ethnic Māori culture, in certain professions such as the armed forces (Harlow et al., Reference Harlow, Keegan, King, Maclagan, Watson, Stanford and Preston2009; Hay et al., Reference Hay, Maclagan and Gordon2008), and in stereotypically masculine domains such as certain sports (Maclagan, King, & Gillon, Reference Maclagan, King and Gillon2008). Commonly, speakers of ME do not speak Māori; the dialect substitutes as a linguistic marker of ethnicity (Holmes, Reference Holmes2005; King, Reference King1995, Reference King1999). ME is more syllable-timed than NZE is (Bauer, Reference Bauer1997) and uses fewer reduced vowels (Holmes & Ainsworth, Reference Holmes and Ainsworth1996). Szakay (Reference Szakay2008) found that ME speakers had a higher overall mean pitch than PE speakers did, but there was no difference in pitch range or standard deviation between the two groups.

ME has only recently been recognized as a bona fide dialect of English (Richards, Reference Richards, Ewing and Shallcrass1970), and it is now increasingly visible in New Zealand: ME is used by national and regional radio announcers, by university students and lecturers, and on television (King, Reference King1995, Reference King1999). It is not clear, however, that ME has the social cachet to drive linguistic change in NZE. Speakers of ME were rated “warm” in Vaughan and Huygens's (Reference Vaughan, Huygens, Bell and Holmes1990) study, but on most personality and status measures, ME speakers have been ranked lower than speakers of other dialects of English in New Zealand (Holmes, Murachver, & Bayard, Reference Holmes, Murachver and Bayard2001; Robertson, Reference Robertson1994; Vaughan & Huygens, Reference Vaughan, Huygens, Bell and Holmes1990). Nevertheless it is the primary dialect for many younger ethnic Māori (King, Reference King1995, Reference King1999) among whom it marks group membership, and its masculine and sporting associations may give it a degree of covert prestige amongst non-Māori youth. Meyerhoff (Reference Meyerhoff1994) noted a rise in use of the discourse tag “eh,” characteristic of ME speakers, amongst Pākeha youth. Covert prestige could provide a social motivation for the observed changes in pitch and duration variability, particularly for NP males.

The data presented in this paper, however, does not fit well with this proposal. Over time, mean pitch amongst male speakers has been decreasing, widening the gap between speakers of ME and PE. Pitch variation has likewise decreased amongst males, which would suggest divergence from ME. The late reversal in this trend is more plausibly a result of influence from ME, given that its timing coincides with Māori language revival; note however that this has been led by females. It has not happened in tandem with the decrease in durational PVI, which has taken place through the twentieth century and has in fact levelled out amongst the same female speakers. Is there any social basis for females taking the lead in assimilating qualities associated with the rhythm of ME? One possible explanation is the overwhelming majority of females in teaching and administrative roles in early childhood and primary education sectors. These sectors are leaders in Māori language use and education amongst non-Māori New Zealanders. Te Whāriki, the New Zealand government's early childhood curriculum policy statement, mandates the support and active encouragement of biculturalism, including support of te reo Māori (Ministry of Education, 2010). Women employed in, and involved voluntarily with, early childhood education may provide a point of origin for socially driven effects in the rhythmic qualities of NZE.

STRUCTURAL INFLUENCES ON THE RHYTHM OF NZE

Another possibility suggests itself as a driver for the observed change in timing in NZE, namely that changes in the realization of certain vowels in NZE have had a flow-on effect on its rhythm. The NZE short front vowel shift (Hay et al., Reference Hay, Maclagan and Gordon2008; Langstrof, Reference Langstrof2006; Watson, Maclagan, and Harrington, Reference Watson, Maclagan and Harrington2000) has involved the raising of the TRAPFootnote 5 and DRESS vowels. The KIT vowel has mid-centralized to maintain contrast with DRESS, to the extent that the KIT and schwa vowels are indistinguishable in the speech of many New Zealanders. As vowel reduction in English commonly results in centralized, schwa-like realizations, stressed vowels that are more central may not be strongly differentiated from their unstressed versions. This offers a plausible alternative for the source of reduction in durational variation: rather than more fully articulated vowels in unstressed syllables, there may be an increasing number of shorter, centralized vowels in stressed syllables. Langstrof (Reference Langstrof2009) found that, during the intermediate period of NZE, KIT reduced in duration and was no longer subject to allophonic length alternations before voiced and unvoiced plosives. This change was seen in the speech of later-born females, identified as the innovators in the front vowel shift (Langstrof, Reference Langstrof2006). Furthermore, Maclagan and Hay (Reference Maclagan and Hay2007) found that the DRESS vowel was also shortening as it raised, before both voiced and unvoiced plosives. The FLEECE vowel was shown to be diphthongizing to maintain contrast with DRESS; even so, its length was holding before voiceless plosives and reducing before voiced plosives. Women have been found to lead in DRESS-raising, and Maclagan, Gordon, and Lewis (Reference Maclagan, Gordon and Lewis1999) showed that vowel innovation in NZE was more advanced amongst higher social classes. The social patterning in the vowel shift fits with the observed change in durational PVI, and both changes occurred over the same time window.

Table 11 shows that KIT, DRESS, and FLEECE are three of the five most frequent vowels in NZE. In terms of raw frequency, nearly one-third of the vowels occurring in spoken NZE have been shown to have reduced in duration. Can this be reconciled with earlier findings that NZE speakers produce a higher proportion of fully articulated vowels than speakers of other Englishes? Yes, because measures of reduction can only be calibrated against the articulation of stressed vowels, and if these are centralized or shortened, then further reduction is limited. The natural limit of reduction is of course ellipsis, a phenomenon that our data does not address but which may well occur to a lesser degree in NZE, and thereby contribute to perceptions of more “full” vowels.

Table 11. Vowel frequency in NZE, all occurrences including stressed and unstressed vowels from speech in the ONZE project (Gordon et al., Reference Gordon, Maclagan, Hay, Beal, Corrigan and Moisl2007). Segmentation routines (from the HTK, Young et al., Reference Young, Evermann, Hain, Kershaw, Moore, Odell, Ollason, Povey, Valtchev and Woodland2002) in ONZE Miner (Fromont & Hay, Reference Fromont and Hay2008) automatically selected the most likely phonemic representation for each manually transcribed word, from alternative phonemic entries in ONZE Miner's dictionary (incorporating CELEX, Baayen et al., Reference Baayen, Piepenbrock and Gulikers1996)

If such an explanation were correct, it would have interesting implications for the understanding of rhythmic timing. Languages' phonotactic structure and phonetic qualities may play a greater part in determining timing differences than previously thought. There have already been links made between allowable syllable complexity and rhythm differences between languages (Dauer, Reference Dauer1983; Fenk & Fenk-Oczlon, Reference Fenk, Fenk-Oczlon, Spiliopoulou, Kruse, Borgelt, Nürnberger and Gaul2006; Mehler, Christophe, & Ramus, Reference Mehler, Christophe, Ramus, Marantz, Miyashita and O'Neil2000), but this has generally focused on the effect of complex onsets and codas. The role of vowel quality, as well as the degree of vowel reduction, perhaps deserves further scrutiny cross-linguistically.

SEX DIFFERENCES IN RHYTHMIC CHANGES

Decreasing durational nPVI may have been driven by changes in the vowel system of NZE, with females leading the change and males following – a commonplace pattern for linguistic innovation. Changes in other nPVI measures diverge for the different sexes. Intensity differentials between neighboring vowels have increased over time for males, with nonprofessionals showing the largest change. On the other hand, female speakers seem to be increasing their pitch variation, perhaps to compensate for the loss of durational stress. A further question elicited by the data is then: Why would females compensate for reducing durational nPVI by increasing pitch variation, while males compensate by increasing intensity variation? Aside from social effects, we speculate that the different compensatory patterns may have a physiological basis. The lower pitch and more compact vowel space of males is in part due to their having physically larger and heavier vocal tract structures than females. This creates differences in the dynamics of speech production between the sexes (Simpson, Reference Simpson2001). Females appear to be able to make faster pitch changes than males (Xu & Sun, Reference Xu and Sun2002), for whom a reduction in pitch nPVI may be an automatic tendency following from faster speech rates. As always, physiological tendencies may be reinforced or moderated by sociocultural norms and individual proclivities, and this may be behind the recent increases in the pitch variability of later-born males.

CONCLUSIONS

Methodologically, this study has broken new ground by utilizing automated techniques to amass rhythmic timing data on a far larger scale than has previously been possible. This has enabled us to examine predictors of timing variability within a dialect. Development of quality assurance techniques are essential to providing principled and reliable data that can be used to provide robust statistics on the factors involved in language change.

We have shown that vocalic nPVI measures approximate a Gaussian distribution in NZE. In this dialect at least, variation in timing is strongly associated with speech rate, even when metrics are normalized. Further, as NZE has evolved as a dialect, vocalic nPVI has reduced, with the change led by female speakers. Thus, vocalic nPVI is neither synchronically nor diachronically fixed. Based on social patterning and concurrent timing, the change may result from shorter vowel realizations in NZE, following vowel shift, rather than from contact effects. These observations support a perceived difference in the rhythm of NZE compared to other Englishes, but they need to be supplemented with qualitative analysis of vowel properties.

The nPVI measure has some significant limitations. It is extremely localized, and as such a reduction in nPVI does not take into account larger metrical domains over which intensity and pitch in particular may vary. While comparisons between intervallic and intersegmental nPVI were made for normalized vowel durations, they were not carried out for other measures. There is no reason to assume that pitch and intensity variation would be limited to the segmental level. Future directions for this research include improving principled removal of outliers from automatically segmented speech; analysis of the duration and quality of stressed versus unstressed vowels across the ONZE corpora; and methodological comparisons of segmental versus other partitioning for the measurement of pitch, intensity, and intervocalic nPVIs.

APPENDIX

COMPARISON OF SEGMENTAL AND INTERVALLIC PVI

The normalized vocalic distributions for nPVI appear Gaussian (Figure A1), but are not normally distributed (Shapiro-Wilk: intervallic W = 0.9896, p = 0.0011; segmental W = 0.9917, p = 0.0052). The raw PVI measures used for consonants are positively skewed across the sample.

Figure A1. Histograms of durational PVI distributions (n = 506), contrasting measurement by segment and by interval for both vowels and consonants. Speakers from the Canterbury Corpus, ONZE Project (Gordon et al., Reference Gordon, Maclagan, Hay, Beal, Corrigan and Moisl2007).

Table A1 provides statistics on an index of intervallic to segmental PVI. It can be seen that the distributions are similar for both vowels and consonants, but the magnitude of the intervallic measure is considerably larger for consonants than it is for vowels.

Table A1. Index of intervallic and segmental PVI measures for vowels and consonants. The index is calculated as PVI-I/PVI-S × 100. Note that the consonantal PVIs are unnormalized

Footnotes

1. Pākeha is the Māori word for European/non-Māori.

2. Note that NZE is nonrhotic.

3. It could be argued that the tighter segmental measure is the more rigorous, as any change is correspondingly smaller in magnitude and therefore must be more consistent in direction, more pervasive, or both in order to achieve significance.

4. There is considerable scope to improve our methodology for the principled exclusion of outliers in highly right-skewed distributions. This will be addressed in subsequent work (see Brys, Hubert, & Struyf, Reference Brys, Hubert and Struyf2004).

5. Using the names of Wells's (Reference Wells1982) lexical sets to identify vowels.

References

REFERENCES

Abercrombie, D. (1967). Elements of general phonetics. Edinburgh: Edinburgh University Press.Google Scholar
Ainsworth, H. (1993). Rhythm in New Zealand English. Unpublished manuscript. Victoria University, Wellington, New Zealand.Google Scholar
Arvaniti, A. (2009). Rhythm, Timing and the Timing of Rhythm. Phonetica 66:4663.CrossRefGoogle ScholarPubMed
Baayen, R. H., Piepenbrock, R., & Gulikers, L. (1996). CELEX2. Philadelphia: Linguistic Data Consortium.Google Scholar
Ballard, K. J., Robin, D. A., McCabe, P., & McDonald, J. (2010). A Treatment for Dysprosody in Childhood Apraxia of Speech. Journal of Speech, Language, and Hearing Research 53:12271245.CrossRefGoogle ScholarPubMed
Bauer, W. (1993). Maori. London: Routledge.Google Scholar
Bauer, W. (1997). The Reed reference grammar of Māori. Auckland: Reed.Google Scholar
Beckman, M. E. (1992). Evidence for speech rhythms across languages. In Tohkura, Y., Vatikiotis-Bateson, E., & Sagisaka, Y. (eds.), Speech perception, production and linguistic structure. Tokyo: Ohmsha.Google Scholar
Boersma, P., & Weenink, D. (2010). Praat: doing phonetics by computer. [Computer program]. Version 5.1.31, retrieved April 4, 2010 http://www.praat.org.Google Scholar
Brys, G., Hubert, M., & Struyf, A. (2004). A robust measure of skewness. Journal of Computational and Graphical Statistics 13:9961017.CrossRefGoogle Scholar
Buekers, R., & Kingma, H. (1997). Impact of phonation intensity upon pitch during speaking: A quantitative study in normal subjects. Logopedics Phoniatrics Vocology 22(2):7177.CrossRefGoogle Scholar
Chambers, J. K. (2004). Patterns of variation including change. In Chambers, J. K., Trudgill, P., & Schilling-Estes, N. (eds.) The handbook of language variation and change. Oxford:Blackwell. 349372.CrossRefGoogle Scholar
Classe, A. (1939). The Rhythm of English prose. Oxford: Blackwell.Google Scholar
Cutler, A. (1991). Linguistic rhythm and speech segmentation. In Sundberg, J., Nord, L., & Carlson, R. (eds.) Music, language, speech and brain. Basingstoke: MacMillan Press. 157166.CrossRefGoogle Scholar
Dauer, R. M. (1983). Stress-timing and syllable-timing reanalyzed. Journal of Phonetics 11:5162.CrossRefGoogle Scholar
Fant, G. (2004). Speech acoustics and phonetics: Selected writings. Dordrecht: Kluwer.Google Scholar
Fant, G., Kruckenberg, A., & Nord, L. (1991). Acoustic correlates of rhythmical structures in text reading. Journal of Phonetics 19:351365.CrossRefGoogle Scholar
Fenk, A., & Fenk-Oczlon, G. (2006). Crosslinguistic Computation and a rhythm-based classification of languages. In Spiliopoulou, M., Kruse, R., Borgelt, C., Nürnberger, A., & Gaul, W. (eds.) From data and information analysis to knowledge engineering. Berlin: Springer.Google Scholar
Fletcher, J. (1991). Rhythm and final lengthening in French. Journal of Phonetics 19:193212.CrossRefGoogle Scholar
Fromont, R., & Hay, J. (2008). ONZE Miner: The development of a browser-based research tool. Corpora 3:173193.CrossRefGoogle Scholar
Gordon, E., Maclagan, M., & Hay, J. (2007). The ONZE Corpus. In Beal, J. C., Corrigan, K. P., & Moisl, H. (eds.) Models and methods in the handling of unconventional digital corpora: Volume 2 diachronic corpora. Hampshire: Palgrave.Google Scholar
Grabe, E., & Low, E. L. (2002). Durational variability in speech and the rhythm class hypothesis. In Gussenhoven, C. & Warner, N. (eds.), Laboratory phonology 7. Ossining, NY: Mouton de Gruyter. 515546.Google Scholar
Gramming, P., Sundberg, J., Ternström, S., Leanderson, R., & Perkins, W. H. (1988). Relationship between changes in voice pitch and loudness. Journal of Voice 2:118126.CrossRefGoogle Scholar
Han, M. S. (1964). Duration of Korean vowels. Studies in the Phonology of Asian Languages II. Los Angeles: University of Southern California.Google Scholar
Harlow, R., Keegan, P., King, J., Maclagan, M., & Watson, C. (2009). The changing sound of the Māori language. In Stanford, J. N. & Preston, D. R. (eds.) Quantitative Sociolinguistic studies of indigenous minority languages. Amsterdam: John Benjamins. 129152.Google Scholar
Hay, J., Maclagan, M., & Gordon, E. (2008). New Zealand English. Edinburgh: Edinburgh University Press.Google Scholar
Holmes, J. (2005). Using Māori English in New Zealand. International Journal of the Sociology of Language 172:91115.Google Scholar
Holmes, J., & Ainsworth, H. (1996). Syllable-timing and Māori English. Te Reo 39:7584.Google Scholar
Holmes, J., & Ainsworth, H. (1997). Unpacking the research process: Investigating syllable-timing in New Zealand English. Language Awareness 6:3247.CrossRefGoogle Scholar
Holmes, K., Murachver, T., & Bayard, D. (2001). Accent, appearance, and ethnic stereotypes in New Zealand. New Zealand Journal of Psychology 30(2):7986.Google Scholar
Jacewicz, E., Fox, R.A., O'Neill, C., & Salmons, J. (2009). Articulation rate across dialect, age, and gender. Language Variation and Change 21(2):233256.CrossRefGoogle ScholarPubMed
Jessen, M., Köster, O., & Gfroerer, S. (2005). Influence of vocal effort on average and variability of fundamental frequency. International Journal of Speech, Language and the Law 12:174213.CrossRefGoogle Scholar
King, J. (1995). Māori English as a solidarity marker for te reo Māori. New Zealand Studies in Applied Linguistics 1:5159.Google Scholar
King, J. (1999). Talking bro: Māori English in the university setting. Te Reo 42:1938.Google Scholar
King, J., Harlow, R., Watson, C., Keegan, P., & Maclagan, M. (2009). Changing pronunciation of the Māori Language: Implications for revitalization. In Reyhner, J. & Lockard, L. (eds.) Indigenous Language revitalization: Encouragement, guidance & lessons learned. Flagstaff, AZ: Northern Arizona University. 8596.Google Scholar
Kohler, K. J. (2009a). Whither speech rhythm research? Phonetica 66:514.CrossRefGoogle ScholarPubMed
Kohler, K. J.. (2009b). Rhythm in speech and language. Phonetica 66:2945.CrossRefGoogle ScholarPubMed
Kohler, K. J.. (2009c). Rhythm in speech and language. [Online supplementary Powerpoint presentation.] Retrieved November 24, 2009 from www.karger.com/pho.CrossRefGoogle Scholar
Krull, D., & Engstrand, O. (2003). Speech rhythm – intention or consequence? Cross-language obervations on the hyper/hypo dimension. Phonum 9:133136. Retrieved May 25, 2011 http://www.ling.umu.se/fonetik2003/.Google Scholar
Labov, W. (2001). Principles of linguistic change: Social factors. Boston, MA: Blackwell.Google Scholar
Ladefoged, P. (1975). A course in phonetics. New York: Harcourt Brace & Jovanovich.Google Scholar
Langstrof, C.. (2006). Acoustic evidence for a push-chain shift in the intermediate period of New Zealand English. Language Variation and Change 18:141164.CrossRefGoogle Scholar
Langstrof, C. (2009). On the role of vowel duration in the New Zealand English front vowel shift. Language Variation and Change 21:437453.CrossRefGoogle Scholar
Lehiste, I. (1980). Phonetic manifestation of syntactic structure in English. Annual Bulletin of the Institute of Logopaedics and Phoniatrics. Tokyo: University of Tokyo. 127.Google Scholar
Low, E. L. (1998). Prosodic prominence in Singaporean English. Doctoral dissertation, University of Cambridge.Google Scholar
Low, E. L., Grabe, E., & Nolan, F. (2000). Quantitative characterisations of speech rhythm: Syllable-timing in Singapore English. Language and Speech 43:377401.Google Scholar
Maclagan, M., Gordon, E., & Lewis, G. (1999). Women and sound change: Conservative and innovative behaviour by the same speakers. Language Variation and Change 11:1941.CrossRefGoogle Scholar
Maclagan, M., & Hay, J. (2007). Getting fed up with our feet: Contrast maintenance and the New Zealand English ‘short’ front vowel shift. Language Variation and Change 19:125.CrossRefGoogle Scholar
Maclagan, M., King, J., & Gillon, G. (2008). Māori English. Clinical Linguistics & Phonetics 22:658670.CrossRefGoogle ScholarPubMed
Maclagan, M., Watson, C., King, J., Harlow, R., Thompson, L., & Keegan, P. (2009). Investigating Changes in the rhythm of Māori over time. Brighton, UK: 10th Annual Conference of the International Speech Communication Association, 6-10 Sep 2009. Interspeech 2009. 15311534.Google Scholar
McLintock, A. (1966). Geographical distribution of population – An Encyclopaedia of New Zealand. Reproduced in Te Ara – the Encyclopedia of New Zealand. Retrieved December 2, 2010. http://www.TeAra.govt.nz/en/1966/population/5.Google Scholar
Mehler, J., Christophe, A., & Ramus, F. (2000). How infants acquire language: some preliminary observations. In Marantz, A., Miyashita, Y. & O'Neil, W. (eds.) Image, language, brain: Papers from the First Mind Articulation Project Symposium. Cambridge, MA: MIT Press.Google Scholar
Meyerhoff, M. (1994). Sounds Pretty Ethnic, eh?: A Pragmatic Particle in New Zealand English. Language in Society 23:367388.CrossRefGoogle Scholar
Ministry of Education, New Zealand Government. (2010). Biculturalism in Te Whāriki. ECE Educate. Retrieved December 12, 2010. http://www.educate.ece.govt.nz/learning/exploringPractice/BiculturalPractice/BiculturalismTeWhariki.aspx.Google Scholar
Ministry of Social Development, New Zealand Government. (2010) The Social Report 2010. Retrieved November 29, 2010. http://www.socialreport.msd.govt.nz/cultural-identity/maori-language-speakers.html.Google Scholar
Nolan, F., & Asu, E. L. (2009) The Pairwise Variability Index and coexisting rhythms in language. Phonetica 66:6477.CrossRefGoogle ScholarPubMed
O'Brien, Robert M. (2007). A caution regarding rules of thumb for variance inflation factors. Quality & Quantity 41:673690.CrossRefGoogle Scholar
Pike, K. (1946). The Intonation of American English. 2nd ed.Ann Arbor: University of Michigan Press.Google Scholar
R Development Core Team. (2009). R: A language and environment for statistical computing. Vienna, Austria: R Foundation for Statistical Computing http://www.R-project.org.Google Scholar
Ramig, L. A. (1983). Effects of physiological ageing on speaking and reading rates. Journal of Communication Disorders 16:211226.CrossRefGoogle ScholarPubMed
Ramus, F., Dupoux, E., & Mehler, J. (2003). The psychological reality of rhythm classes: Perceptual studies. Proceedings of the 15th International Congress of Phonetic Sciences. Barcelona: Casual Productions. 337342.Google Scholar
Ramus, F., Nespor, M., & Mehler, J. (1999). Correlates of linguistic rhythm in the speech signal. Cognition 73:265292.CrossRefGoogle ScholarPubMed
Reubold, U., Harrington, J., & Klebera, F. (2010). Vocal aging effects on F0 and the first formant: A longitudinal analysis in adult speakers. Speech Communication 52:638651.CrossRefGoogle Scholar
Richards, J. (1970). The language factor in Māori schooling. In Ewing, J. & Shallcrass, J. (eds.) Introduction to Māori education: Selected readings. Wellington: New Zealand University Press. 122132.Google Scholar
Roach, P. (1982). On the distinction between ‘stress-timed’ and ‘syllable-timed’ languages. In Crystal, D. (ed.), Linguistic Controversies 7379.Google Scholar
Robertson, Shelley. (1994). Identifying Māori English: A study of ethnic identification, attitudes and phonetic features. Unpublished MA thesis. Wellington: Victoria University of Wellington.Google Scholar
Simpson, A. P. (2001). Dynamic consequences of differences in male and female vocal tract dimensions. Journal of the Acoustical Society of America 109:21532164.CrossRefGoogle ScholarPubMed
Szakay, A.. (2006). Rhythm and pitch as markers of ethnicity in New Zealand English. In Warren, P. & Watson, C. I. (eds.) Proceedings of the 11th Australian International Conference on Speech Science & Technology. 421426.Google Scholar
Szakay, A. (2008). Ethnic dialect identification in New Zealand: The role of prosodic cues. Saarbrücken: VDM Verlag Dr. Müller.Google Scholar
Vaughan, G., & Huygens, I. (1990). Sociolinguistic stereotyping in New Zealand. In Bell, A. & Holmes, J. (eds.) New Zealand ways of speaking English. Wellington: Victoria University Press.Google Scholar
Warren, P. (1998). Timing patterns in New Zealand English rhythm. Te Reo 4:8093Google Scholar
Watson, C., Maclagan, M., & Harrington, J. (2000). Acoustic evidence for vowel change in New Zealand English. Language Variation and Change 12:5168.CrossRefGoogle Scholar
Wells, J. C. (1982). Accents of English I: An introduction. Cambridge: Cambridge University Press.CrossRefGoogle Scholar
White, L., Mattys, S., Series, L., & Gage, S. (2007). Rhythm metrics predict rhythmic discrimination. In Proceedings of the 16th International Congress of Phonetic Sciences, Saarbrűcken, 6–10th August 2007. http://eis.bris.ac.uk/~pslsw/White_Mattys_Series_Gage_2007.pdfGoogle Scholar
Xu, Y., & Sun, X. (2002). Maximum speed of pitch change and how it may relate to speech. Journal of the Acoustical Society of America 111:13991413.CrossRefGoogle ScholarPubMed
Young, S., Evermann, G., Hain, T., Kershaw, D., Moore, G., Odell, J., Ollason, D., Povey, D., Valtchev, V., & Woodland, P. (2002). The HTK book. Revised for HTK Version 3.2. http://htk.eng.cam.ac.Google Scholar
Figure 0

Figure 1. Where n is the number of (vocalic or consonantal) segments, di+1 is the duration of the (i + 1)th segment, and di is the duration of the ith segment. Multiplication of normalized PVI by 100 is arbitrary for readability: the quantity is an index with no units. Raw PVI is measured in seconds, as it is calculated from durations in milliseconds.

Figure 1

Table 1. Speakers selected for hand-correction of automatic phoneme segmentations. A male and a female speaker were chosen from each corpus used in the analysis, to represent a range of nPVI(V) scores, birthdates, and number of phonemes (consonants plus vowels) automatically segmented. %Adjust gives the percentage of total intervals that were subject to boundary adjustments during the hand-correction process

Figure 2

Table 2. Vocalic nPVI figures. Original is based on automatically segmented speech; Hand-corrected is based on segments with manually adjusted boundaries; and Autocorrected is based on automatically segmented speech with outlying (overly large) values removed. The last column compares the autocorrected, autosegmented nPVI to the hand-corrected nPVI

Figure 3

Figure 2. Plot of vocalic nPVI versus speaker's year of birth. Speakers from the ONZE corpora (Gordon et al., 2007). The line is a LOWESS scatterplot smoother using locally weighted regression.

Figure 4

Figure 3. The left panel is a plot of speech rate versus speaker's year of birth. The right panel is a plot of vocalic nPVI against speech rate. Speakers from the ONZE corpora (Gordon et al., 2007). Lines are LOWESS scatterplot smoothers using locally weighted regression.

Figure 5

Figure 4. Plot of speech rate versus speaker's year of birth, split by gender. Female speakers are shown in the left panel, and male speakers in the right panel. Speakers from the ONZE corpora (Gordon et al., 2007). Lines are LOWESS scatterplot smoothers using locally weighted regression.

Figure 6

Figure 5. Plot of vocalic nPVI versus speaker's year of birth, split by gender. Female speakers are shown in the left panel, and male speakers in the right panel. Speakers from the ONZE corpora (Gordon et al., 2007). Lines are LOWESS scatterplot smoothers using locally weighted regression.

Figure 7

Figure 6. Recursive regression tree showing predictors for durational variability, measured as nPVI(V). Speakers from the Canterbury Corpus, ONZE Project (Gordon et al., 2007).

Figure 8

Table 3. Coefficients of linear regression model to predict the value of nPVI(V) for speakers in the ONZE corpora (Gordon, et al., 2007). Adjusted R2 = .3347

Figure 9

Table 4. Analysis of variance for linear regression model to predict the value of nPVI(V) for speakers in the ONZE corpora (Gordon et al., 2007)

Figure 10

Table 5. Coefficients of linear regression model to predict the value of nPVI(V) for speakers in the Canterbury Corpus, ONZE Project (Gordon et al., 2007). Adjusted R2 = .2962

Figure 11

Table 6. Analysis of variance for linear regression model to predict the value of nPVI(V) for speakers in the Canterbury Corpus, ONZE Project (Gordon et al., 2007)

Figure 12

Table 7. Coefficients of linear regression model to predict mean vowel duration in milliseconds. Adjusted R2 = .5328. Speakers from the ONZE Project (Gordon et al., 2007)

Figure 13

Table 8. Analysis of variance for linear regression model to predict mean vowel duration in milliseconds, data from the ONZE Project (Gordon et al., 2007)

Figure 14

Table 9. Variance inflation measures for factors in model to predict mean vowel duration

Figure 15

Figure 7. Plot of vocalic nPVI versus speaker's year of birth. Variation in mean vocalic intensity is shown on the left (n = 501), and variation in maximum vocalic pitch is shown on the right (n = 354). Speakers from the ONZE Project (Gordon et al., 2007). Lines are LOWESS scatterplot smoothers using locally weighted regression.

Figure 16

Figure 8. Plot of vocalic nPVI for maximum pitch versus speaker's year of birth, split by gender. Speakers from the Canterbury Corpus, ONZE Project (Gordon et al., 2007). Lines are LOWESS scatterplot smoothers using locally weighted regression.

Figure 17

Figure 9. Bi-plot of first two principal components from PCA of duration, intensity and pitch measures. Speakers from the ONZE Project (Gordon et al., 2007). The left panel shows the first two components for females (accounting for 49.0% of variance); the right panel shows the first two components for males (accounting for 52.3% of variance).

Figure 18

Table 10. Results of principal components analysis of rhythm metrics for female speakers (a) and (b) and male speakers (c) and (d). Data for speakers in the ONZE corpora (Gordon et al., 2007)

Figure 19

Table 11. Vowel frequency in NZE, all occurrences including stressed and unstressed vowels from speech in the ONZE project (Gordon et al., 2007). Segmentation routines (from the HTK, Young et al., 2002) in ONZE Miner (Fromont & Hay, 2008) automatically selected the most likely phonemic representation for each manually transcribed word, from alternative phonemic entries in ONZE Miner's dictionary (incorporating CELEX, Baayen et al., 1996)

Figure 20

Figure A1. Histograms of durational PVI distributions (n = 506), contrasting measurement by segment and by interval for both vowels and consonants. Speakers from the Canterbury Corpus, ONZE Project (Gordon et al., 2007).

Figure 21

Table A1. Index of intervallic and segmental PVI measures for vowels and consonants. The index is calculated as PVI-I/PVI-S × 100. Note that the consonantal PVIs are unnormalized