INTRODUCTION
New Zealand English (NZE) is commonly reported to be more “syllable-timed” than other varieties of English (Holmes & Ainsworth, Reference Holmes and Ainsworth1997; Szakay, Reference Szakay, Warren and Watson2006; Warren, Reference Warren1998). This finding tends to be based on investigation of the durational variability of a modest number of consecutive vocalic sequences, produced by a small set of speakers of contemporary NZE. In this paper, we take advantage of the availability of a large diachronic corpus to conduct the first large-scale investigation into speakers' use of variation in duration across the history of NZE. We also investigate concurrent changes in pitch and intensity variation.
SPEECH RHYTHM
Speech rhythm is the patterning of prominent elements in spoken language, as perceived by the listener. Prominence is achieved by manipulating a variety of acoustic-phonetic parameters, such as duration, intensity, intonation contours, and pitch, which are used to create cyclical prosodic patterns (Kohler, Reference Kohler2009a). Cross-linguistically, different strategies are used by different languages to emphasize syllables (Dauer, Reference Dauer1983). The use of a prosodic feature for rhythmic purposes is affected by its other linguistic functions within a language. For example, Fant, Kruckenberg, and Nord (Reference Fant, Kruckenberg and Nord1991) found that syllable stress correlated with lengthening in English and Swedish to a greater degree than in French, which demarcates prosodic phrase boundaries using final syllable lengthening (Fletcher, Reference Fletcher1991). Languages such as Sāmoan and Māori with phonemic length distinctions appear to rely less on prolonging syllables for accentual purposes (see also Roach, Reference Roach and Crystal1982). It is expected that contact between languages that employ different strategies to realize prominence will affect one or both varieties, and indeed evidence has been produced to show that Māori has undergone rhythm and vowel quality changes as a result of contact with English (Harlow, Keegan, King, Maclagan, & Watson, Reference Harlow, Keegan, King, Maclagan, Watson, Stanford and Preston2009; Maclagan, Watson, King, Harlow, Thompson, & Keegan, Reference Maclagan, Watson, King, Harlow, Thompson and Keegan2009). There has been speculation that Māori, in turn, has affected the timing of NZE, which is often perceived to be more syllable-timed than other varieties of English (Holmes & Ainsworth, Reference Holmes and Ainsworth1997; Szakay, Reference Szakay, Warren and Watson2006; Warren, Reference Warren1998).
QUANTIFYING SPEECH RHYTHM
Speech rhythm is commonly described using the stress-timed versus syllable-timed dichotomy, developed to categorize speech rhythm cross-linguistically (Abercrombie, Reference Abercrombie1967; Classe, Reference Classe1939; Pike, Reference Pike1946). Abercrombie (Reference Abercrombie1967) developed the ideas of Pike and Classe by grouping English, Russian, and Arabic as languages that regularize the occurrence of prominent syllables, primarily by compressing or deleting intervening unstressed syllables. He contrasted such stress-timed languages with syllable-timed languages such as French, Telegu and Yoruba which, it was proposed, regularise the duration of syllables resulting in asynchronous prominences or stresses. This schema was expanded to include mora-timing for languages such as Japanese (Han, Reference Han1964; Ladefoged, Reference Ladefoged1975) and Māori (Bauer, Reference Bauer1993); subsequently other languages have been classified as belonging to one or the other category (Dauer, Reference Dauer1983:56 gave a referenced selection). However, attempts to demonstrate the regularity of stressed feet or syllables in languages so-categorized have not been successful (see Arvaniti, Reference Arvaniti2009; Cutler, Reference Cutler, Sundberg, Nord and Carlson1991; Grabe & Low, Reference Grabe, Low, Gussenhoven and Warner2002; and Lehiste, Reference Lehiste1980, for reviews). It is clear that the perception of rhythm is not based solely on timing, and conversely, that timing is affected by considerations other than rhythm. These include the word position and complexity of syllables (Beckman, Reference Beckman, Tohkura, Vatikiotis-Bateson and Sagisaka1992; Dauer, Reference Dauer1983; Fant, Reference Fant2004; Lehiste, Reference Lehiste1980); the syllabic composition of the stress foot (Lehiste, Reference Lehiste1980); the register and genre of an utterance (Arvaniti, Reference Arvaniti2009); and the oratory ability and communicative intent of the speaker (Kohler, Reference Kohler2009b). The failure to demonstrate categorical isochrony by stress, syllable, or mora cross-linguistically has been one motivation for the proposal that languages fall on a continuum between the two extremes (Dauer, Reference Dauer1983; Grabe & Low, Reference Grabe, Low, Gussenhoven and Warner2002; Roach, Reference Roach and Crystal1982), and for the development of more sophisticated metrics to index rhythmic timing.
Rhythm metrics in current use include the Pairwise Variability Index (PVI), an index of mean difference in a given acoustic measure across successive linguistic units; VarcoV, defined as the standard deviation of vowel duration divided by mean vowel duration; %V, defined as the vocalic proportion of an utterance duration; and ΔC, defined as the standard deviation of consonantal duration. Ramus, Nespor, and Mehler's (Reference Ramus, Nespor and Mehler1999) study of English, Polish, Dutch, French, Spanish, Italian, Catalan, and Japanese concluded that combined %V and ΔC scores best supported distinct stress-, syllable, and mora-timing categories. However, Polish was found to have “mixed” rhythm. Grabe and Low (Reference Grabe, Low, Gussenhoven and Warner2002) calculated normalized vocalic PVI and raw consonantal PVI for 18 languages, and compared the resulting language groupings with traditional rhythm classifications. Rhythmically prototypical languages were classified as expected, but nine languages fell in between traditional categories, including Polish. Arvaniti (Reference Arvaniti2009) presented preliminary results from a comparative study of timing metrics across English, German, Italian, Korean, Greek and Spanish. Arvaniti, Ross, and Ferjan (preliminary findings, cited in Arvaniti, Reference Arvaniti2009) calculated PVIs, Varcos, ΔC and ΔV, and %V for each language, and compared the resulting groupings. Arvaniti concluded that not only do different measures produce different categorizations for nonprototypical languages, but different elicitation methods and utterance constructions result in varying scores for a single language. In sum then, the theory that languages have a categorical rhythm type based on timing is largely discredited, and the various rhythm metrics in current use fall short of satisfactorily providing a comprehensive typology of rhythm cross-linguistically.
RHYTHM AND DURATION IN STRESS-TIMED LANGUAGES
Nevertheless, what Beckman (Reference Beckman, Tohkura, Vatikiotis-Bateson and Sagisaka1992:458) called the “persistent metaphor” of stress- versus syllable-timing remains a popular shorthand for characterizing the rhythmic properties of languages, seemingly indexing a perceivable difference in the patterning of prominent syllables (Ramus, Dupoux & Mehler, Reference Ramus, Dupoux and Mehler2003). White, Mattys, Series, and Gage (Reference White, Mattys, Series and Gage2007) demonstrated that listeners could perceive differences between two different languages, and between accents within a language, if timing measures for the languages diverged sufficiently. White et al. synthesized sasa syllables from the speech of four speakers for each of three different British English varieties and for Castilian Spanish. Utterance-initial syllables and final stressed syllables were excised, and pitch was leveled to remove additional prosodic cues. Significantly different scores for VarcoV and %V predicted higher-than-chance accuracy in tasks where the participants were asked to decide whether two varieties were the same or different. Timing differences are therefore used by listeners to discriminate between languages. Note though that variation in intensity was not controlled for.
Ultimately, the question of whether timing is a fundamental cause of perceptual differences in rhythm, or merely a surface realization of the underlying structural differences that influence prosody, remains open for debate (Fenk & Fenk-Oczlon, Reference Fenk, Fenk-Oczlon, Spiliopoulou, Kruse, Borgelt, Nürnberger and Gaul2006). What does seem clear is that in languages traditionally considered stress-timed, such as English, the duration of linguistic units is a key parameter in rhythmic structure. Stressed vowels tend to be more fully articulated in English and consequently have a longer duration than their unstressed counterparts. In contrast, unstressed vowels are typically reduced or deleted. Other factors held equal, a longer vowel length will give rise to a percept of syllable stress, and thus rhythmic prominence, in English. The alternation of stressed and unstressed vowels leads to variation in the duration of successive vowels. Perhaps the term “stress-timed” is best employed to differentiate languages that rely heavily on temporal manipulation of syllables for emphasis, from languages that rely more on other acoustic-phonetic parameters as a cue for prominence (traditionally “syllable-timed”).
In this sense, there is some evidence that NZE is less stress-timed than other varieties of English are. Anecdotal observations that NZE speakers produce more peripheral vowels in unstressed syllables than speakers of British English (Hay, Maclagan, & Gordon, Reference Hay, Maclagan and Gordon2008) have been supported by quantitative studies. Warren (Reference Warren1998) examined 3508 syllables from the speech of newscasters and found that there was less variation in syllable duration in NZE than in British English (BE). While the percentage of full vowels was similar for the PākehaFootnote 1 English (PE) speakers and the BE speakers, syllabic PVI measures of Pākeha speech were intermediate between BE and Māori English speakers. Additionally, Warren found an inverse relationship between speech rate and full vowel production in NZE, which contrasted with the slower rate and high level of reduction in the BE sample. The hypothesis that contact with (mora-timed) Māori has affected NZE receives support from the finding that speakers of Māori English produce more fully articulated vowels than speakers of Pākeha English do (Holmes & Ainsworth, Reference Holmes and Ainsworth1996, Reference Holmes and Ainsworth1997; Szakay, Reference Szakay, Warren and Watson2006; Warren, Reference Warren1998). It was posited that this results from featural transference from Māori, where vowel length is phonemic (Bauer, Reference Bauer1997). Szakay (Reference Szakay, Warren and Watson2006) elicited speech from NZ Māori and Pākeha speakers, and analyzed a total of 3281 vocalic segments. Her data showed that both younger Māori and younger Pākeha speakers produce more syllable-timed speech than their older counterparts do, suggesting a progressive change in the timing of NZE. While not statistically significant, Szakay's data also showed a tendency for female Māori to be less syllable-timed than male Māori are, and for female Pākeha to be more syllable-timed than male Pākeha are. If convergence under the influence of contact is occurring, females in both ethnic groups are likely to be leading the change (Labov, Reference Labov2001:501). To explore the possibility of a historical change in rhythmic structure, we extended the work of Szakay (Reference Szakay, Warren and Watson2006) and Warren (Reference Warren1998) by carrying out a large-scale survey of the variation in segmental duration in the speech of New Zealanders born over the past century and a half.
MEASURING DURATIONAL VARIATION
The PVI was developed specifically to characterize variation between successive speech units and thus is well suited to exploring vowel length variation due to the alternation of stressed and unstressed syllables. The PVI has several advantages: It is a general measure that can be applied to duration, intensity, or pitch over segments, syllables, or feet, and so can be extended to incorporate other acoustic correlates of rythm. It can be normalized to reduce the effect of local fluctuations in speech rate. It captures the distribution of variation more effectively than a simple average, which can result in homogenization. We elected to measure PVI at the segmental level, calculating normalized PVIs for vowels and raw PVIs for consonants, using the formulae in Figure 1 (adapted from Grabe & Low, Reference Grabe, Low, Gussenhoven and Warner2002; consonantal PVI is multiplied by 1000 instead of 100 to give a measure in milliseconds).
Normalized PVI is used to neutralize the effect of speech rate on vowel duration. For consonants, manner of articulation (e.g, the presence of voicing) and syllable onset or coda composition seemed as likely as speech rate to affect duration. As these factors could not be controlled for without introducing considerable complexity, it was deemed preferable to make our initial calculations with raw numbers. A pilot run across all speakers, prior to quality assurance procedures, showed minimal change in consonantal PVI over time. Consequently the remainder of the study focuses on vocalic PVI.
Recently, authors have argued that either the syllable or foot, or both, are preferred units for the characterization of rhythm across languages (Kohler, Reference Kohler2009b; Nolan & Asu, Reference Nolan and Asu2009). Definitions of foot and syllable remain challenging cross-linguistically however, and reliance on manual segmentation necessarily restricts the range of speakers and speech units that can be analyzed. The examination of durational variation based on segmental measurements has the advantages of relative simplicity and reproducibility, with the added bonus that the robust automatic segmentation procedures now available allow for large-scale investigations of speech corpora. There is, then, a case for piloting a study of rhythm in a stress-timed language by investigating segmental timing.
The generality of the PVI measure also allows for investigation of other reflexes of prominence; indeed, it was applied to intensity variation in its original debut (Low, Reference Low1998). If prominence is created by language-specific bundles of prosodic variables, then a change in the rhythm profile of a language is likely to result not from the simple addition or subtraction of a variable, but from a more fluid shift in the weighting of the variables employed, just as a vowel shift affects surrounding vowels by altering perceptual boundaries. If segment duration is playing a reduced role in cuing stress in NZE, we might expect to observe changes in the contribution of other factors to rhythmic prominence. Intensity and pitch are two key variables that affect the perception of rhythm in languages (Arvaniti, Reference Arvaniti2009; Kohler, Reference Kohler2009c); their variability in running speech can be indexed using the PVI, in the same way as for duration. Given that these measures vary continuously over larger speech units, such an approach represents an even greater over simplification than the reduction of timing to a pairwise measure, but we emphasise again that this is intended as a preliminary, quantitative approach. It will not provide a comprehensive characterization of speech rhythm. Rather, it can uncover useful information about synchronic and diachronic trends, which can then be pursued with targeted qualitative analysis.
RESEARCH QUESTIONS
There are four key questions to be explored in this preliminary study of diachronic change in the rhythm of NZE. First, is there evidence of a reduction in variation of vowel durations in NZE over time? We have some evidence that contemporary speakers of NZE produce more full vowels than their BE-speaking counterparts do. Presumably this difference has arisen since the arrival of British settlers in New Zealand, and thus the progress of the change should be observable in the speech of New Zealanders born since then.
Second, do social factors have any bearing on the progress of any change in timing? Szakay's (Reference Szakay, Warren and Watson2006) data suggested that a move away from stress-timing in NZE may be more advanced among females than males. Using our corpora, we can consider the effects of gender on any change; additionally, our contemporary corpus has social class information that may offer further insights into social factors facilitating a rhythmic timing change.
Third, is there evidence that such a reduction in variability is due to an increase in the occurrence of fully articulated vowels? Evidence for this proposal remains anecdotal thus far. It has also been observed that speakers of NZE speak more quickly than speakers of other varieties of English. Warren's (Reference Warren1998) data show a relationship between the occurrence of full vowels and speech rate in NZE. As faster speech is associated with greater gestural efficiency, if New Zealanders' speech rate has increased, we might expect to see a reduction in duration variability, but one resulting from a general trend to more reduced vowels, rather than more full vowels. These two trends are in tension.
Fourth, if duration is becoming less significant in signaling prominence in NZE, is there evidence that either pitch or intensity, or both, are becoming more important? Szakay's (Reference Szakay, Warren and Watson2006) measures of pitch suggested that while mean pitch is an ethnic category indicator, pitch variability is not. In theory, this makes it more available for manipulation by speakers to signal prominence. The role of intensity in the percept of rhythm has not been examined in NZE; other than Low's (Reference Low1998) study of Singaporean English, there have been few applications of the PVI to investigate intensity variation in other dialects or languages (though see Ballard, Robin, McCabe, & McDonald, Reference Ballard, Robin, McCabe and McDonald2010, for clinical application in dysprosody treatment evaluation).
METHODOLOGY
Raw speech data was drawn from three corpora in the Origins of New Zealand English project (‘ONZE’: Gordon, Maclagan, & Hay, Reference Gordon, Maclagan, Hay, Beal, Corrigan and Moisl2007): the Mobile Unit, with New Zealanders born between 1851 and 1904 (MU); the Intermediate Archive, with speakers born between 1882 and 1963 (IA); and the Canterbury Corpus (CC), with speakers born between 1926 and 1987. The majority of speakers are European-descended and born in the South Island of New Zealand. The MU recordings were carried out in the mid-1940s, with speakers aged 43 to 96 years (mean age = 71.0); IA speakers were recorded in the early to mid-1990s, aged 31 to 99 years (mean age = 76.0); and CC recordings have been ongoing since the mid-1990s, and cover speakers aged 18 to 68 years (mean age = 37.5). Consequently, there is a very strong negative correlation between a speaker's date of birth (DOB) and age at the time of interview (Spearman's rank correlation rho = −9488, p < 000), such that those later-born tend to be younger than those earlier-born. We acknowledge that this must be taken into consideration when discussing diachronic change based on DOB, as there is evidence that speech rate may decrease for older speakers, particularly with declining health (Ramig, Reference Ramig1983). It is also known that fundamental frequency declines during adulthood, a trend that reverses in later years amongst males so that their pitch increases in old age (Reubold, Harrington, & Klebera, Reference Reubold, Harrington and Klebera2010). As separation of DOB and speaker age is problematic, we present our models using DOB to index change over time, with caveats as necessary in the discussion. Given that age-based changes in speech rate may differ across dialects (Jacewicz, Fox, O'Neill, & Salmons, Reference Jacewicz, Fox, O'Neill and Salmons2009), follow-up work on factors affecting speech rate in NZE is a priority for this line of research.
Speech from recorded interviews was automatically segmented, using the Hidden Markov Model Toolkit, or HTK (Young, Evermann, Hain, Kershaw, Moore, Odell, Ollason, Povey, Valtchev, & Woodland, Reference Young, Evermann, Hain, Kershaw, Moore, Odell, Ollason, Povey, Valtchev and Woodland2002) to carry out phoneme alignment in ONZE Miner (Fromont & Hay, Reference Fromont and Hay2008). In speech recognition applications, HTK uses probabilistic mathematical models to predict the underlying phonemes represented by observed acoustic data. This process involves several steps. A sound file is first manually transcribed and time-aligned into lines of speech. Transcripts are uploaded to ONZE Miner, which produces phonemic labels for the main speaker's utterances, based on ONZE Miner's orthographic-phonemic dictionary entries (built from CELEX [Baayen, Piepenbrock, & Gulikers, Reference Baayen, Piepenbrock and Gulikers1996] with supplementary entries provided by hand). Routines provided by the HTK Toolkit are then used to time-align phoneme labels with the sound files. ONZE Miner's dictionary contains multiple phonemic entries for many words that vary in their pronunciation or suffer reduction in running speech. For example, the word ‘and’ has the entries /ænd/, /ənd/, /ən/, /nˌ/, /mˌ/ and /ŋˌ/. During the alignment process, the best phonemic representation is inferred from the acoustic signal. The dictionary is a work-in-progress and by no means exhaustive, but the ability to select the most likely pronunciation where phonemic alternates are available improves the fidelity of the transcription considerably and caters to some extent for reduction in running speech. The alignment process is iterative: All speech files for a particular speaker are analyzed in a training phase that establishes speaker-specific acoustic parameters, then forced alignments are carried out in a second pass through the files.
Initially, PVI measures were calculated on “successive pairs of vowels” (Low, Grabe, & Nolan Reference Low, Grabe and Nolan2000:382). This was later defined as vocalic and intervocalic intervals (Grabe & Low, Reference Grabe, Low, Gussenhoven and Warner2002), used in order to avoid subjective division of the acoustic signal into phonemic segments. As the HTK segmentation is based on speaker-specific, statistical mapping of acoustic features to phonemic transcriptions, we investigated the use of segmental, rather than intervallic, PVI measures. If segmentation is reliable and objective, this approach is more aligned with the idea that prominence in English is primarily driven by differences in the realization of syllable nuclei. Calculating PVI using vocalic intervals that may extend over multiple syllables and words is undesirable. For example, in a phrase like “the red fire engine” (/ðə ˌrɛd ˈfaɪ.ə ˌɛn.ʤən/),Footnote 2 an intervallic approach would compare the lengths of the successive vocalic intervals /ɛ/ and /aɪəɛ/, then /aɪəɛ/ with /ə/ and so on. The segmental approach compares /aɪ/ with /ə/, then /ə/ with /ɛ/, which is intuitively preferable. Comparisons of segmental and intervallic normalized PVI (nPVI) are given in the appended Table A1; while PVI is inflated by the inclusion of polysyllabic feet, our statistical models were found to be robust over both measures.Footnote 3 Hence, while all statistics reported in this paper are based on segmental PVI, they do not rely solely on this method of calculation. There is a stronger argument for using intervallic PVI for intrasyllable consonant clusters, but we leave an empirical evaluation of the best methodology for intervocalic segments to a later study.
Using the automatically aligned segment labels, initial calculations of raw consonantal PVI (rPVI(C)) and normalized vocalic PVI (nPVI(V)) were carried out in Praat (Boersma & Weenink, Reference Boersma and Weenink2010). All vowel segments below each speaker's ceiling duration, including those in phrase-final syllables, were included. This produced nPVI(V) scores between 52.5 and 83.6, with a median value of 66.1. Six speakers were then selected for hand-correction of the phoneme segments, as shown in Table 1. These speakers were chosen to represent a compact sample of ages, sexes, recording formats, and initial nPVI(V) values.
Overall, the automatic segmentations were good, but shorter speech samples and overlapping speakers or noise, such as laughter, created problems for the HTK forced alignments. The speakers TC and myp01-1b, with the lowest segment counts, required the most segment boundary adjustments, followed by fyn94-20b, whose recordings frequently included overlapping speech. Other factors giving rise to incorrect segmentation were errors in transcription (such as spelling mistakes or poorly aligned line breaks), incorrectly selected dictionary entries (where more than one phonemic representation might be valid, e.g. /hwɪtʃ/ or /wɪtʃ/), unstressed vowels that were deleted in the signal but retained in the phonemic transcription, and transitions between vowels and nasals. A Praat (Boersma & Weenink, Reference Boersma and Weenink2010) script was used to calculate the number of segments amended for each speaker. In most cases, there is a clear relationship between the number of observations, i.e. total segments, for a speaker, and the extent of the correction required. Speaker fyn94-20b is an exception. Her recordings include significant amounts of overlapping speech and laughter, which caused localized disruption of segmentation accuracy despite the large number of segments.
Following hand-correction, the nPVI values for the selected speakers were recalculated, along with standard deviations for each speaker's vowel length. Based on the assumption that outlying (i.e., excessively high) values were due to segmentation errors, a ceiling value for vowel duration was calculated for each speaker based on the standard deviation. Due to the skewed nature of the distribution of vowel durations, the ceiling value was calculated as the speaker's mean vowel length plus four times the standard deviation.Footnote 4 Revised calculations of vocalic PVI excluded any vowel segment that exceeded the speaker's ceiling value. Table 2 compares the original vocalic PVIs, the PVIs after hand-correction, and the PVIs after outliers were automatically discarded, for the six hand-corrected speakers.
The use of a ceiling value for vowel duration for each speaker brings the nPVI within 1.7% of the hand-corrected values, a reasonable level of accuracy. All speakers bar one have a slightly underestimated nPVI value using the automatic segmentation with outliers removed. The large number of corrections for TC are at odds with the apparent accuracy of that speaker's automatic segmentation and suggests that, for this speaker at least, the HTK aligned boundaries were simply consistently offset from the manually judged ones.
Further quality assurance was carried out by removing speakers with the fewest vowel and consonant segments from the remaining speaker cohort (n = 25 or ~5%, C + V < 2000), as smaller speech samples give poorer segmentation accuracy. The final dataset comprised 506 New Zealander speakers, who between them produced 1.6 million vowel segments. We then calculated the nPVI(V) across the cohort, using the calculated ceiling value for each speaker to exclude outliers. Variability indices for intensity and pitch were also calculated, giving a set of three PVIs for each speaker.
Intensity PVI was calculated using the difference in mean intensity for successive vowel segments, while pitch PVI was based on the difference in maximum pitch across successive vowel segments. Both measures were normalized to temper local variation, and in the case of intensity, to make some compensation for an inability to calibrate for different ambient sound pressure levels (SPL) across recordings. Pitch calculation used the cross-correlation method in Praat, as recommended in the Praat documentation for calculations over short time windows (Boersma & Weenink, Reference Boersma and Weenink2010), with a range of 75–300 Hz for males, and 100–500 Hz for females. All other settings were standard. As for durational PVI, speaker-specific ceilings for vowel length were applied, and outlying segments were discarded from calculations. Speakers with low numbers of vocalic segments were omitted. Only speakers with fewer than 10% undefined values for pitch were retained for analysis. This resulted in a considerable reduction of the sample size: intensity analyses were carried out for 504 speakers, but only 354 speakers met the criteria for pitch analyses. Statistical analyses and modeling was carried out using the statistical software package R (R Development Core Team, 2010).
RESULTS 1: DURATIONAL VARIATION
We first consider pairwise durational variation. Our sample of vocalic PVI has a bell-shaped distribution (see Appendix) but is not strictly Gaussian. The data shows a gradual and significant reduction in vocalic PVI over the history of NZE (Figure 2) – or, in traditional terms, a move toward increased syllable timing (Spearman's rho = −.3086, p < .0000).
The overall mean nPVI(V) was 65.3, with a range of 51.5 to 82.5. By way of comparison, Grabe and Low (Reference Grabe, Low, Gussenhoven and Warner2002:Appendix) calculated an nPVI(V) of 57.2 for a speaker of BE in their original study.
The Effects of Speech Rate on Durational Variation
Turning to observations of speech rate (Figure 3), we find a positive, almost linear correlation between DOB and syllable rate across the corpora, which levels off for speakers born in the 1970s and 1980s (Spearman's rho = .2864, p < .0000). A good deal of the reduction in vocalic nPVI is linked to this increase in speech rate, despite normalization, which is intended to address this issue (Spearman's rho = −.5955, p < .0000). The qualitative observations that more unstressed vowels are fully articulated in NZE (Hay et al., Reference Hay, Maclagan and Gordon2008; Warren, Reference Warren1998) are unexpected given this higher syllable rate. Ainsworth's (Reference Ainsworth1993) auditory analysis of newsreaders actually found a similar rate of full vowels in speech from commercial NZ radio stations and speech from the BBC World News – about 20% – even though the BBC speech rate was around one syllable per second slower than the NZ commercial radio stations' speech rates (Warren, Reference Warren1998; using the same data). The relationship between vowel realization and speech rate is not straightforward, as Warren observed, but there is growing evidence that faster speech reduces pairwise durational variation, particularly for languages traditionally labelled stress-timed (Krull & Engstrand, Reference Krull and Engstrand2003; Szakay, Reference Szakay2008).
Evidence that an increased syllable rate – which may at least in part be driven by the decreasing average age of speakers across our corpora – is not solely responsible for the observed reduction in nPVI(V) comes from the different patterns in rate change by sex (Figure 4). The absolute starting value and the increase in rate over time are similar for both male and female cohorts. A Wilcoxon rank sum test revealed no significant difference in the speech rate of the sexes (p = .1095). Speech rates level out for males born from the 1950s on, but in Figure 5, we see that male speakers born in this period have progressively reducing PVI(V) values. In contrast, the increase in speech rate for females is continuous over the period, while there is very little movement in nPVI(V) until the 1930s.
Due to its importance as a predictor of nPVI(V), a linear regression model was constructed in R to test factors affecting speech rate. Data from MU and IA was combined to improve the sample size, as the elicitation parameters were similar (n MU+IA = 131, n CC = 375). The binary variable bCorpus, along with DOB and Gender, were included as predictors for the full cohort. Only DOB reached significance (p = .0085). In a model for CC only, which included Class and a quadratic term for DOB, only DOB and DOB2 were significant predictors. Neither model, however, had any useful explanatory value (R 2 = .0805 and .0368, respectively). If DOB was replaced by its highly correlated variable Age Interviewed, model explanatory value was reduced (R 2 = .0722 and .0317, respectively).
Social Factors and Durational Variation
When the data is split by sex (Figure 5), we find females have a lower starting nPVI(V) than males do. There is a smaller downward trend in the nPVI(V) of the females; males have undergone the greatest reduction over the period evaluated. If we assume a typical S-curve for a change in progress (Chambers, Reference Chambers, Chambers, Trudgill and Schilling-Estes2004), the fact that males are in a steeper part of the curve is consistent with the hypothesis that the change in rhythmic timing is more advanced amongst women. This only holds if we assume a similar starting value of nPVI(V) at some point prior to the time period covered by our dataset. An anonymous reviewer points out that the data also supports a hypothesis of male-led change, if women had a lower initial nPVI(V) and did not begin to shift until the 1920s. When the 20th century-born CC speakers are isolated (n = 375), the negative correlation between date of birth and nPVI(V) is no longer significant for females (n = 188), while the negative correlation for males holds (Spearman's rho = −.1825, p = .0124, n = 187).
CC speakers are tagged as professional or non-professional (NP) speakers, giving a gross binary categorization of social class. There is a decline in the vocalic nPVI of non-professional speakers across the 20th century. NPs start from a higher baseline than professionals, suggesting that the change is more advanced amongst the latter, who also have shorter mean vowel durations (Wilcoxon rank sum p = .0416, n = 184). The decrease amongst NPs is significant (Spearman's rho = −.2115, p = .0033, n = 191). Combining the factors of sex and class, NP males have had the greatest and most consistent decrease in nPVI(V), from the highest starting point. Professional males and NP females born in the 1950s and 1960s pattern together, showing a slight increase in durational variation versus earlier and later speakers. There has been no change in the speech of professional females.
To more clearly establish the effect of different social factors, a Classification and Regression Tree (CART) analysis, which recursively partitions the data in order to produce a best-fit predictive model for the outcome variable, was carried out on the CC data. The resulting regression tree is reproduced in Figure 6 and shows that for the fastest speakers (SyllablesSec > 5.654) speech rate is the over-riding factor determinant of nPVI(V); in other words, in fast speech durational variation varies inversely with syllable rate.
Once again, we note this effect obtains despite normalization of the metric. Class partitions mid-rate speakers (5.125 ≤ SyllablesSec < 5.654), with professionals having a lower nPVI(V) than NPs. Amongst the slowest speakers (SyllablesSec < 5.125), the latest-born have moderate durational variability, while Gender divides the rest into less-variable females and more-variable males. This suggests that there are two sets of principles at work: physiological/articulatory factors, which reduce acoustic distinctions between stressed and unstressed vowels as speech rate increases and segment length decreases; and social factors, which are overridden in faster speech and only come into play when the need for strict efficiency in articulation is relaxed.
Register and Durational Variation
Arvaniti (Reference Arvaniti2009) discussed the effect that manner of elicitation can have on the value of timing metrics. Level of formality and degree of spontaneity can affect both speech rate and care in articulation (though not always in the same way across speakers). The three different corpora included in the study were recorded for different purposes. Both the MU and the IA data were elicited by professional interviewers: the former for radio broadcast, the latter primarily for oral history projects. The interviewer was not personally known to the speakers, and some speakers had prepared for their interview by writing down or giving considerable thought to what they wanted to say. In contrast, CC speakers were interviewed by students, usually a friend or family member, in informal circumstances. It is valid to ask whether a difference in register exists between the corpora. If so, this might be related to the change in speech rate and by implication nPVI(V). In the model for speech rate discussed earlier, however, no effect of corpus was obtained. There is no evidence that the increase in speech rate across the ONZE speakers can be attributed to the different circumstances of data collection in the corpora, though it is very likely that the difference between MU/IA and CC in average age of the interviewed speakers has at least some effect.
Modeling Durational Variation
All of the factors observed to affect variation in vocalic durations were combined in a linear regression model to predict nPVI(V). The first model is for the entire cohort of analyzed speakers and does not include social class. The coefficients and an analysis of variance (ANOVA) for the model are given in Tables 3 and 4.
As expected, an increase in speech rate (SyllablesSec) results in a significant reduction in variability of vowel durations, as all vowels are shortened. Higher values for birth year reduce variability, and this effect is increased for males. Males start from a higher baseline for nPVI(V) than females do, which is reflected in the larger coefficient for Gender = M.
The second model is based on the CC data only and includes social class. Coefficients and the results of an ANOVA for the model are given in Tables 5 and 6. Similar effects obtain, with the addition that a higher social class further reduces nPVI(V). An interaction between DOB and Gender is also present, such that later-born males have lower variability.
Articulation of unstressed vowels
Is the data congruent with the proposal that New Zealanders produce more fully articulated vowels in unstressed speech? As our data does not partition stressed and unstressed vowels, we can only approach this question obliquely. There is a reduction in mean vowel length over time in NZE, partly as a result of increased speech rate. However, over and above the effect of speech rate, DOB is a highly significant and sizeable predictor of shorter mean vowel length. A comparison of effect size can be obtained by multiplying the mean speech rate of 5.1 syllables/sec, and the mean birth year of 1945, with their respective coefficients in Table 7 for a CC speaker. DOB clearly has the most substantial effect.
Coefficient significance is given in Table 8. This reduction in duration is not suggestive of a general move to a higher proportion of fully articulated vowels in NZE. A future comparison of durative difference and vowel quality between stressed and unstressed syllable nuclei may provide further insight into this question.
DOB and bCorpus are not independent (see Methodology), so variance inflation factors were calculated for each parameter in the model (Table 9). While there is clearly collinearity, the magnitude of inflation was not deemed to invalidate the model given the sample size, R 2, and the model's descriptive intent (O'Brien, Reference O'Brien2007). Rather, exclusion of either bCorpus or DOB results in a model that greatly overstates the effect of the remaining predictor.
RESULTS 2: PITCH AND INTENSITY VARIATION
To address our final research question, regarding a shift in rhythm quality, we now turn to intensity and pitch PVIs, which are plotted against DOB in Figure 7.
The increase in intensity nPVI with DOB is significant overall (Spearman's rho = .1295, p = .0037, n = 501) and for males alone (Spearman's rho = .1588, p = .0116, n = 252), but not for females. The effect is centered in the earlier speaker cohort (MU + IA: Spearman's rho = .1588, p = .0116, n = 131), as no significant relationship between intensity nPVI and DOB is found in CC. There is no direct correlation between intensity nPVI and speech rate.
Historically, there is a near-significant pitch nPVI decline from earlier- to later-born male speakers (Spearman's rho = −.1464, p = .0584, n = 168). In contrast, if CC speakers are considered separately (Figure 8), there is a significant increase in pitch variability from earlier- to later-born female speakers (Spearman's rho = .1873, p = .0322, n = 131).
Males appear to be following the females after a two decade lag (Figure 8); drilling down in the data reveals that the reversal is occurring amongst the youngest NP males, but no further significant correlations arise due to the recent change in direction of the historic downward trend. There is no correlation between speech rate and pitch nPVI. Pitch nPVI is weakly, but significantly affected by both mean intensity (Spearman's rho = .1508, p < .0044) and intensity nPVI (Spearman's rho = .1252, p < .0184), extending findings that as speakers increase their vocal intensity, their fundamental frequency also rises (e.g., Buekers & Kingma, Reference Buekers and Kingma1997; Gramming et al., Reference Gramming, Sundberg, Ternström, Leanderson and Perkins1988; Jessen, Köster, & Gfroerer, Reference Jessen, Köster and Gfroerer2005).
The data suggests that the decline in durational variability in NZE may have been compensated for in different ways by the sexes, with earlier-born males using intensity and later-born females using pitch to bolster the contrast between stressed and unstressed syllables. We note, however, that there are interactions among duration, pitch, and intensity that are independent of DOB and consider this further in the next section.
RELATIONSHIPS AMONG VOWEL PITCH, INTENSITY, AND DURATION VARIABILITY
Can a shift in the composition of rhythm in NZE be identified? If there is a shift, we must further consider to what extent it could be considered compensatory reconfiguring of the acoustic-phonetic components of stress, and to what extent it simply follows from physiological-articulatory principles. Intensity variation has been shown to increase over the period that durational variation has declined. However, at an individual level, speakers' duration and intensity nPVIs are positively correlated. This paradoxical result is partially resolved by observing that there has been a gradual decoupling of the two measures across the history of NZE. The correlation between duration and intensity variability is strongest in the earliest corpus (MU: Spearman's rho = .4209, p = .0011; IA: Spearman's rho = .2517, p = .0305; CC: Spearman's rho = .1417, p = .0061). In CC, amongst twentieth century speakers, positive correlations are only significant for professionals, and for women. Thus while a decrease in durational nPVI has historically gone hand-in-hand with a decrease in intensity nPVI, for certain demographics this no longer holds. Tentatively, we speculate that this may result from a compensatory increase in the use of intensity variation for rhythmic purposes amongst male speakers.
Durational nPVI and pitch nPVI analyses are nonsignificant but suggestive: In CC, amongst NPs, the two trend together for males (who have a near-significant negative correlation between pitch nPVI and DOB) and oppose for females (who have a significant positive correlation between pitch nPVI and DOB). Over time this implies a related decrease in duration and pitch variability amongst men, contrasting with a compensatory increase in pitch variability amongst women as durational PVI has decreased. This difference between the sexes is discussed further below.
To help simplify and visualize the various interactions of duration, intensity, and pitch in NZE, a principal components analysis (PCA) was carried out.
PRINCIPAL COMPONENTS ANALYSIS
The full dataset was used, discarding speakers with more than 10% undefined pitch measurements (retained n = 354). The cohort was then split by sex, and a PCA carried out on eight continuous measures: nPVI.V, rPVI.C, PVI.Int, Mean.Int, PVI.Pitch, Mean.Pitch, DOB, and SyllablesSec. Only the first three components account for a greater than average proportion of variance in the data (F: [29.6%, 19.5%, 13.9%] > 12.5%; M: [32.3%, 20.1%, 13.2%] > 12.5%). The first two components, accounting for about half the variance in the data, are plotted in Figure 9, while the composition and relative importance of the components are shown in Table 10.
There are three main poles running through the data. The nPVI.V, rPVI.C ↔ SyllablesSec pole confirms findings that speech rate and durational variability work against one another, with a consistent effect amongst male and female speakers. The PVI.Int ↔ Mean.Int pole shows that variability in intensity moves in the opposite direction to average intensity, again consistently for all speakers. The third pole involves DOB and pitch measures, and differs by sex: for females, mean pitch and DOB move together, while for males they move in opposite directions. Pitch nPVI also opposes DOB for males, while for females it appears to align with durational measures in opposing speech rate. For both genders, the DOB ↔ pitch pole is obliquely in the same direction as both speech rate and intensity nPVI, and obliquely opposed to durational nPVIs.
The findings that women's mean pitch increases if they are born later, while later-born men have lower mean pitch and pitch variation, is almost certainly a result of the age composition in our corpora. The third component, shown in Table 10, highlights the sex difference in metric changes, with PVI.Int, PVI.Pitch, and Mean.Pitch moving in the same direction as DOB for females, but in the opposite direction to DOB for males.
In sum, we suggest that a shift in the balance of timing, intensity, and pitch may be in progress in the rhythm of NZE, but we acknowledge that questions remain about the contribution of the age profile of our corpora to the observed trends.
CONTACT-BASED INFLUENCES ON THE RHYTHM OF NZE
Returning to the research questions posed earlier, we find that there has been a change in the timing of NZE as it has evolved, and this timing change may be responsible for the perception of NZE as more syllable-timed than other varieties of English. While an increasing speech rate is a major driver of the change, it is not the sole cause, as shown by the significance of DOB in our models. Analysis of social factors suggests that the change has been led by women, and that it has reached maturity amongst the most recently born female speakers. Non-professionals, particularly males, have followed. This class effect is suggestive of change from above.
A qualitative investigation of the effects of stress on the NZE vowel space is necessary to flesh out the findings here, but certainly the quantitative evidence does not lend support to the idea that New Zealanders in our corpora are producing more fully articulated vowels in unstressed syllables. This runs contrary to theories that contact with te reo Māori (the Māori language spoken by indigenous New Zealanders) may have affected the rhythmic qualities of NZE. There are also other considerations that weaken such a theory. First, the long-short distinction in the vowel system of Māori has been diminishing over its recorded history (Harlow et al., Reference Harlow, Keegan, King, Maclagan, Watson, Stanford and Preston2009; King et al., Reference King, Harlow, Watson, Keegan, Maclagan, Reyhner and Lockard2009). If vowel length is becoming less critical for linguistic meaning in Māori, we suppose that restrictions on the use of duration as a stress variable might be relaxed. Indeed, NZE appears to be having a greater effect on the rhythm of Māori than the reverse. Vowel reduction, which was previously unattested in Māori, has been increasingly noted in recent years (King et al., Reference King, Harlow, Watson, Keegan, Maclagan, Reyhner and Lockard2009). These changes may be contributing to perceived changes in the rhythm of Māori over time (Ibid.).
Further, while revitalization efforts since the 1980s have increased Māori's social (Te Puni Kōkiri, Reference Kōkiri2010) and political standing, a lack of support during the early and mid-twentieth century saw a substantial shift toward English by ethnic Māori, so that the number of fluent te reo Māori speakers declined dramatically. It was during this period of decline, while the influence of te reo Māori and the visibility of those speaking it were at their lowest, that the change in durational PVI was at its height. According to the New Zealand Government, the first national Māori language survey in 1973 estimated the proportion of ethnic Māori able to hold a conversation in te reo at just 18%. While the proportion rose to 24% in the 2006 general census (Ministry of Social Development, New Zealand Government, 2010), it still represents only 3.5% of all New Zealanders. Furthermore, the speakers in the ONZE corpora were almost all from the South Island of New Zealand. Socially and politically more conservative than the North Island, the South Island also has, historically (McLintock, Reference McLintock1966) and presently (Statistics New Zealand, 2010), a far lower proportion of ethnic Māori than New Zealand's north has.
While direct contact with te reo Māori seems an unlikely source for rhythm changes in NZE, the idea that the timing of NZE is converging on features of Māori English provides a plausible vector for indirect contact effects. Māori English (ME) is spoken by New Zealanders who are integrated into ethnic Māori culture, in certain professions such as the armed forces (Harlow et al., Reference Harlow, Keegan, King, Maclagan, Watson, Stanford and Preston2009; Hay et al., Reference Hay, Maclagan and Gordon2008), and in stereotypically masculine domains such as certain sports (Maclagan, King, & Gillon, Reference Maclagan, King and Gillon2008). Commonly, speakers of ME do not speak Māori; the dialect substitutes as a linguistic marker of ethnicity (Holmes, Reference Holmes2005; King, Reference King1995, Reference King1999). ME is more syllable-timed than NZE is (Bauer, Reference Bauer1997) and uses fewer reduced vowels (Holmes & Ainsworth, Reference Holmes and Ainsworth1996). Szakay (Reference Szakay2008) found that ME speakers had a higher overall mean pitch than PE speakers did, but there was no difference in pitch range or standard deviation between the two groups.
ME has only recently been recognized as a bona fide dialect of English (Richards, Reference Richards, Ewing and Shallcrass1970), and it is now increasingly visible in New Zealand: ME is used by national and regional radio announcers, by university students and lecturers, and on television (King, Reference King1995, Reference King1999). It is not clear, however, that ME has the social cachet to drive linguistic change in NZE. Speakers of ME were rated “warm” in Vaughan and Huygens's (Reference Vaughan, Huygens, Bell and Holmes1990) study, but on most personality and status measures, ME speakers have been ranked lower than speakers of other dialects of English in New Zealand (Holmes, Murachver, & Bayard, Reference Holmes, Murachver and Bayard2001; Robertson, Reference Robertson1994; Vaughan & Huygens, Reference Vaughan, Huygens, Bell and Holmes1990). Nevertheless it is the primary dialect for many younger ethnic Māori (King, Reference King1995, Reference King1999) among whom it marks group membership, and its masculine and sporting associations may give it a degree of covert prestige amongst non-Māori youth. Meyerhoff (Reference Meyerhoff1994) noted a rise in use of the discourse tag “eh,” characteristic of ME speakers, amongst Pākeha youth. Covert prestige could provide a social motivation for the observed changes in pitch and duration variability, particularly for NP males.
The data presented in this paper, however, does not fit well with this proposal. Over time, mean pitch amongst male speakers has been decreasing, widening the gap between speakers of ME and PE. Pitch variation has likewise decreased amongst males, which would suggest divergence from ME. The late reversal in this trend is more plausibly a result of influence from ME, given that its timing coincides with Māori language revival; note however that this has been led by females. It has not happened in tandem with the decrease in durational PVI, which has taken place through the twentieth century and has in fact levelled out amongst the same female speakers. Is there any social basis for females taking the lead in assimilating qualities associated with the rhythm of ME? One possible explanation is the overwhelming majority of females in teaching and administrative roles in early childhood and primary education sectors. These sectors are leaders in Māori language use and education amongst non-Māori New Zealanders. Te Whāriki, the New Zealand government's early childhood curriculum policy statement, mandates the support and active encouragement of biculturalism, including support of te reo Māori (Ministry of Education, 2010). Women employed in, and involved voluntarily with, early childhood education may provide a point of origin for socially driven effects in the rhythmic qualities of NZE.
STRUCTURAL INFLUENCES ON THE RHYTHM OF NZE
Another possibility suggests itself as a driver for the observed change in timing in NZE, namely that changes in the realization of certain vowels in NZE have had a flow-on effect on its rhythm. The NZE short front vowel shift (Hay et al., Reference Hay, Maclagan and Gordon2008; Langstrof, Reference Langstrof2006; Watson, Maclagan, and Harrington, Reference Watson, Maclagan and Harrington2000) has involved the raising of the TRAPFootnote 5 and DRESS vowels. The KIT vowel has mid-centralized to maintain contrast with DRESS, to the extent that the KIT and schwa vowels are indistinguishable in the speech of many New Zealanders. As vowel reduction in English commonly results in centralized, schwa-like realizations, stressed vowels that are more central may not be strongly differentiated from their unstressed versions. This offers a plausible alternative for the source of reduction in durational variation: rather than more fully articulated vowels in unstressed syllables, there may be an increasing number of shorter, centralized vowels in stressed syllables. Langstrof (Reference Langstrof2009) found that, during the intermediate period of NZE, KIT reduced in duration and was no longer subject to allophonic length alternations before voiced and unvoiced plosives. This change was seen in the speech of later-born females, identified as the innovators in the front vowel shift (Langstrof, Reference Langstrof2006). Furthermore, Maclagan and Hay (Reference Maclagan and Hay2007) found that the DRESS vowel was also shortening as it raised, before both voiced and unvoiced plosives. The FLEECE vowel was shown to be diphthongizing to maintain contrast with DRESS; even so, its length was holding before voiceless plosives and reducing before voiced plosives. Women have been found to lead in DRESS-raising, and Maclagan, Gordon, and Lewis (Reference Maclagan, Gordon and Lewis1999) showed that vowel innovation in NZE was more advanced amongst higher social classes. The social patterning in the vowel shift fits with the observed change in durational PVI, and both changes occurred over the same time window.
Table 11 shows that KIT, DRESS, and FLEECE are three of the five most frequent vowels in NZE. In terms of raw frequency, nearly one-third of the vowels occurring in spoken NZE have been shown to have reduced in duration. Can this be reconciled with earlier findings that NZE speakers produce a higher proportion of fully articulated vowels than speakers of other Englishes? Yes, because measures of reduction can only be calibrated against the articulation of stressed vowels, and if these are centralized or shortened, then further reduction is limited. The natural limit of reduction is of course ellipsis, a phenomenon that our data does not address but which may well occur to a lesser degree in NZE, and thereby contribute to perceptions of more “full” vowels.
If such an explanation were correct, it would have interesting implications for the understanding of rhythmic timing. Languages' phonotactic structure and phonetic qualities may play a greater part in determining timing differences than previously thought. There have already been links made between allowable syllable complexity and rhythm differences between languages (Dauer, Reference Dauer1983; Fenk & Fenk-Oczlon, Reference Fenk, Fenk-Oczlon, Spiliopoulou, Kruse, Borgelt, Nürnberger and Gaul2006; Mehler, Christophe, & Ramus, Reference Mehler, Christophe, Ramus, Marantz, Miyashita and O'Neil2000), but this has generally focused on the effect of complex onsets and codas. The role of vowel quality, as well as the degree of vowel reduction, perhaps deserves further scrutiny cross-linguistically.
SEX DIFFERENCES IN RHYTHMIC CHANGES
Decreasing durational nPVI may have been driven by changes in the vowel system of NZE, with females leading the change and males following – a commonplace pattern for linguistic innovation. Changes in other nPVI measures diverge for the different sexes. Intensity differentials between neighboring vowels have increased over time for males, with nonprofessionals showing the largest change. On the other hand, female speakers seem to be increasing their pitch variation, perhaps to compensate for the loss of durational stress. A further question elicited by the data is then: Why would females compensate for reducing durational nPVI by increasing pitch variation, while males compensate by increasing intensity variation? Aside from social effects, we speculate that the different compensatory patterns may have a physiological basis. The lower pitch and more compact vowel space of males is in part due to their having physically larger and heavier vocal tract structures than females. This creates differences in the dynamics of speech production between the sexes (Simpson, Reference Simpson2001). Females appear to be able to make faster pitch changes than males (Xu & Sun, Reference Xu and Sun2002), for whom a reduction in pitch nPVI may be an automatic tendency following from faster speech rates. As always, physiological tendencies may be reinforced or moderated by sociocultural norms and individual proclivities, and this may be behind the recent increases in the pitch variability of later-born males.
CONCLUSIONS
Methodologically, this study has broken new ground by utilizing automated techniques to amass rhythmic timing data on a far larger scale than has previously been possible. This has enabled us to examine predictors of timing variability within a dialect. Development of quality assurance techniques are essential to providing principled and reliable data that can be used to provide robust statistics on the factors involved in language change.
We have shown that vocalic nPVI measures approximate a Gaussian distribution in NZE. In this dialect at least, variation in timing is strongly associated with speech rate, even when metrics are normalized. Further, as NZE has evolved as a dialect, vocalic nPVI has reduced, with the change led by female speakers. Thus, vocalic nPVI is neither synchronically nor diachronically fixed. Based on social patterning and concurrent timing, the change may result from shorter vowel realizations in NZE, following vowel shift, rather than from contact effects. These observations support a perceived difference in the rhythm of NZE compared to other Englishes, but they need to be supplemented with qualitative analysis of vowel properties.
The nPVI measure has some significant limitations. It is extremely localized, and as such a reduction in nPVI does not take into account larger metrical domains over which intensity and pitch in particular may vary. While comparisons between intervallic and intersegmental nPVI were made for normalized vowel durations, they were not carried out for other measures. There is no reason to assume that pitch and intensity variation would be limited to the segmental level. Future directions for this research include improving principled removal of outliers from automatically segmented speech; analysis of the duration and quality of stressed versus unstressed vowels across the ONZE corpora; and methodological comparisons of segmental versus other partitioning for the measurement of pitch, intensity, and intervocalic nPVIs.
APPENDIX
COMPARISON OF SEGMENTAL AND INTERVALLIC PVI
The normalized vocalic distributions for nPVI appear Gaussian (Figure A1), but are not normally distributed (Shapiro-Wilk: intervallic W = 0.9896, p = 0.0011; segmental W = 0.9917, p = 0.0052). The raw PVI measures used for consonants are positively skewed across the sample.
Table A1 provides statistics on an index of intervallic to segmental PVI. It can be seen that the distributions are similar for both vowels and consonants, but the magnitude of the intervallic measure is considerably larger for consonants than it is for vowels.