1 Introduction
One task for participants in everyday interaction is to manage entry into and exit from talk on a moment-to-moment basis. They have to construct turns at talk and shape them so that they provide recognisable places for others to take turns. They also have to be able to recognise features of turn design so they can place their incoming talk appropriately. Typically talk-in-interaction is characterised by one participant talking at a time (Sacks, Schegloff & Jefferson Reference Sacks, Schegloff and Jefferson1974, Jefferson Reference Jefferson, D'Urso and Leonardi1984, Schegloff Reference Schegloff2000). Ordinarily, turn-transition – the transfer of speakership from one participant to another – becomes relevant at what is referred to as ‘possible turn completion’, or a ‘transition relevance place’ (TRP). Analysts of talk-in-interaction have suggested that syntactic constituency (the boundaries of ‘turn constructional units’ in Conversation Analytic terminology) is a key resource for the projection of possible turn completion (Sacks et al. Reference Sacks, Schegloff and Jefferson1974, Jefferson Reference Jefferson, D'Urso and Leonardi1984, Ford & Thompson Reference Ford, Thompson and Ochs1996, Selting Reference Selting1996). Attention has also been drawn to the phonetic design of talk as an important resource which can make available the current and projected status of talk-in-progress. To date, such investigations have typically focussed on pitch and other prosodic phenomena, particularly pitch-accents (e.g. Schaffer Reference Schaffer1983, Ford & Thompson Reference Ford, Thompson and Ochs1996, Schegloff Reference Schegloff1998, Fox Reference Fox, Selting and Couper-Kuhlen2001, Wennerstrom & Siegel Reference Wennerstrom and Siegel2003, Szczepek Reed Reference Szczepek Reed2004) and there has been little in the way of sustained analysis of other phonetic parameters. This is despite a number of studies having demonstrated the relevance of features of articulation, phonation, and duration to turn-taking. Local, Kelly & Wells (Reference Local, Kelly and Wells1986), for example, argue that centralisation of vowels, combined with particular duration characteristics over the last metrical foot of turns, is implicated in unproblematic turn-transition. Ogden (Reference Ogden2001, Reference Ogden2004) demonstrates that non-modal phonation (in particular, creak, breathiness, whisper) plays a role in managing turn-taking in Finnish; Local & Kelly (Reference Local and Kelly1986) and Local & Walker (Reference Local, Hardcastle and Beck2005) show that different kinds of articulatory and laryngeal characteristics (e.g. complete glottal closure and hold versus its absence), in combination with particular lexical items, mark that a speaker has finished talking or projects that there is more talk to come from that speaker.
Further motivation for studying a broader range of phonetic features comes from a recent large-scale investigation into the role of pitch in raters’ judgments of the status of experimental stimuli as projecting ends of turns. In an online experiment, de Ruiter, Mitterer & Enfield (Reference de Ruiter, Mitterer and Enfield2006) manipulate the lexicosyntactic content and pitch of interactive Dutch speech, and conclude that ‘lexicosyntactic structure is necessary (and possibly sufficient) for accurate end-of-turn projection’ (Reference de Ruiter, Mitterer and Enfield2006: 531) by their raters. One of their findings which does not receive analytic discussion is that when played samples with no dynamic pitch movement (through experimental manipulation) raters do not consistently select the first point of possible syntactic completion in an utterance as a point of possible turn completion. (See especially their Figure 3, and discussion on pages 525–526.) This gives rise to the following puzzle: if lexicosyntactic structure provides the basis for the location of points of turn completion, why do raters not consistently select the first point of possible syntactic completion in an utterance as a point of possible turn completion? Given that no visual information was available to the raters, the basis for selecting some points of lexicosyntactic completion and not others as points of possible turn completion must reside in what raters could still hear, i.e. had not been subjected to experimental manipulation. We hypothesise that the explanation for the raters’ decisions is the presence of other phonetic features which are part of the lexicosyntactic makeup of the talk, i.e. articulatory, phonatory and durational detail.
In what follows we discuss phonetic resources other than pitch which project more talk from the same speaker (= talk-projecting features), or which project possible turn-transition (= turn-projecting features). Talk-projecting features include the avoidance of durational lengthening, articulatory anticipation, continuation of voicing, and the reduction of consonants and vowels. Turn-projecting features include the converse of each of the talk-projecting features, and two other distinct features: release of word-final plosives, and the production of audible outbreaths. We show that features of articulatory and phonatory quality and duration are relevant factors in the design and treatment of talk as talk- or turn-projective.
2 Data
The analysis is based on a naturally occurring, 12-minute long telephone call between two adult speakers of British English: Hal (male) and Leslie (female); names are pseudonyms. The call was selected from the Field Corpus of telephone calls made to and from the home of an English family (Drew & Holt Reference Drew and Holt1998). We chose to work with two-party telephone interaction for two reasons: the turn-taking system is more clearly exposed than in multi-party talk, and all resources available to the participants are audible. The main reason for selecting a single interaction (rather than working with a large corpus of interactions) is that it mitigates some of the confounding factors which arise in bringing together data from interactions conducted at different points in time, in different contexts and with different speaker identities (all of which have been argued to have a bearing on the precise design of talk).
3 Methods
It is widely acknowledged that the syntactic make-up of utterances plays a role in their treatment by participants as complete, and therefore is implicated in the management of turn-taking even if the relative importance of syntax in this respect is not uncontentious (Duncan Reference Duncan1972; Sacks et al. Reference Sacks, Schegloff and Jefferson1974; Goodwin Reference Goodwin and Psathas1979, Reference Goodwin1981; Lerner Reference Lerner1991; Ono & Thompson Reference Ono, Thompson, Hovy and Scott1996; Schegloff Reference Schegloff1996; Selting Reference Selting1996; Wennerstrom & Siegel Reference Wennerstrom and Siegel2003; de Ruiter et al. Reference de Ruiter, Mitterer and Enfield2006). Currently, our knowledge of the general phonetic design of utterances is less refined and does not provide us with the entrée to the data that syntax does. For example, there is considerably less consensus as to what might constitute ‘possible completion’ in the phonetic domain than there is around what constitutes possible syntactic completion. Therefore, as the first step in the analysis, we took the decision to focus exclusively on the syntactic make-up of turns and identify all points of possible syntactic completion (SYNCOMP). These were determined directly from a transcription by Gail Jefferson (see Appendix A; see also Jefferson Reference Jefferson2002 for a more complete statement of conventions, and Walker in press for a critique of this approach to transcription). The modified orthographic transcriptions presented here are taken directly from Jefferson's transcription, except where Jefferson's transcriptions are at odds with our impressions and would hinder a reader trying to follow our argument. These occasions are extremely rare. Employing the same criteria as Ford & Thompson (Reference Ford, Thompson and Ochs1996: 143–145), SYNCOMPs were determined incrementally from the beginning to the end of the call without reference to the audio recording, or by considering the ways in which Jefferson's modified orthography is intended to reflect pronunciational nuances. As the interaction progressed we asked whether the turn up to that point could be syntactically complete in the interactional context of the preceding talk, without reference to phonetic design.
Fragments (1)–(3) provide illustration of some SYNCOMPs. The words that are in bold are the final words in a SYNCOMP piece. For clarity of presentation, SYNCOMPs are identified in the talk of one speaker in each case.
(1)
(2)
(3)
Fragment (1) shows that Hal's turn at lines 14–15 has a SYNCOMP at ‘but you see it was such a beautiful day’. The turn at line 9 in (2) exhibits three such SYNCOMPs: one at ‘yes’, another at ‘yes thank you’ and a third at ‘yes thank you Hal’. Multiple possible syntactic completion points are illustrated in Hal's turn at lines 24–25 in (3). It is important to note that SYNCOMPs are established on an ‘in-context’ basis. Not all grammatically complete utterances end with a SYNCOMP, and therefore not all grammatically complete utterances constitute SYNCOMP pieces. For instance, Hal's turn at line 23 in (3) contains ‘uhm- (0.3) we walked’ and ‘e-we only walked’. These could constitute SYNCOMPs in other interactional-sequential contexts (e.g. following a turn such as ‘did you swim and walk?’) but they do not here.
We identified and labelled all SYNCOMPs. Five hundred and eighty-seven SYNCOMPs (99.3% of the final total) were identified independently by each author; it was agreed that four cases identified by one author also constituted SYNCOMPs and would be included in the study. This yielded 591 SYNCOMPs in the whole call. Once the location of all SYNCOMPs had been determined, we annotated the audio file with word-labels and the SYNCOMP boundaries using the Praat speech analysis software (Boersma & Weenink Reference Boersma and Weenink2012). At the same time we undertook a detailed interactional analysis of the whole call identifying in particular all cases of turn-transition in the clear, and all cases of talk in overlap along with their relationship to SYNCOMPs. We subjected each SYNCOMP piece to close inspection during repeated listening and visual inspection of the speech-pressure waveform and spectral, intensity and estimated f0 representations, to provide a systematic and detailed parametric phonetic analysis of each case. The results of these processes are reported in the following sections. We restrict our attention to the end of SYNCOMP pieces as this is in keeping with the tradition of research into transition relevance. By doing this we do not intend to suggest that all relevant features have as their locus or domain the end of SYNCOMP pieces.
The remainder of this article is divided into two main sections: Section 4 deals with features which project the production of more talk from the current speaker and Section 5 deals with turn-projecting features which engender transition relevance. A discussion of how these features may work in combination is given in Section 6.
4 SYNCOMP + no turn-transition: Talk-projecting features
This section describes the sequential patterns which arise in the data where there is no turn-transition and the current speaker goes on to produce more talk (Section 4.1), and their phonetic design (Section 4.2).
4.1 Sequential patterns
Of the 591 SYNCOMPs, 316 (54%) do not engender turn-transition. There are three types.
Type A. There is no turn-transition and the current speaker continues talking (n = 255, 81%; four cases could not be subjected to reliable phonetic analysis and are excluded from further consideration).
(4)
In (4) there is no turn-transition around any of the highlighted SYNCOMPs ‘(Towlett)’, ‘marvellous’, ‘gorillas’ and ‘there’.
Type B. There is no responsive talk to a SYNCOMP because A's SYNCOMP is reached before B's in-progress talk ends (n = 34, 11%).
(5)
In (5) there is no turn-transition in response to Hal's ‘marvellous’ (line 11) as its end is reached before Leslie has brought her ongoing talk to completion.
Type C. There is no responsive talk to a SYNCOMP because the SYNCOMP is reached while B is responding to earlier talk (n = 27, 9%).
(6)
In (6) at line 12 ‘another’ ends a SYNCOMP piece. There is no uptake. Leslie's talk, at line 13, in overlap with Hal's turn is responsive to the earlier SYNCOMP piece, ending with ‘carnival’.
From this point, where we discuss SYNCOMPs which do not engender turn-transition we only consider Type A cases. We do this for two reasons. First, this is the largest group without turn-transition. Second, in Type B and Type C cases the absence of turn-transition in response to the SYNCOMP may arise from the co-participant being already engaged in some other talk. Since there is no turn-transition in the vicinity of just over half of all SYNCOMPs in the data, it is appropriate to ask whether this is a contingent issue – a co-participant simply opts not to begin talking on reaching a SYNCOMP – or whether those SYNCOMP pieces exhibit particular design features which render them as projecting more talk by the current speaker. These design features are described in the remainder of this section.
4.2 Phonetic analysis
Parametric analysis reveals that a range of phonetic features regularly occur – either alone or in combination – in those SYNCOMP pieces which receive no turn-transition and the current speaker continues (Type A in the classification above). Eighty-three per cent of cases of SYNCOMP with no turn-transition have one or more of the properties described in the next sections. We deal in turn with duration, juncture (including articulatory anticipation, continuation of voicing and close proximity) and vowel and consonant reduction. Details of the occurrence and co-occurrence of these features are given in Appendix B.
4.2.1 Duration
One way in which a speaker can project more talk is by manipulating duration and, in particular, by avoiding the sorts of ‘lengthening’ phenomena evident in other environments (Wightman et al. Reference Wightman, Shattuck-Hufnagel, Ostendorf and Price1992, Salverda, Dahan & McQueen Reference Salverda, Dahan and McQueen2003, Local & Walker Reference Local and Walker2004, Turk & Shattuck-Hufnagel Reference Turk and Shattuck-Hufnagel2007). Providing robust, quantified, comparative measures of duration is problematic when working with naturally occurring materials: syllable and word structure, accentual patterning, position in utterance, speaker, overall speaking rate, information structure etc., are all things which cannot be controlled for and which, moreover, are known to impact on the durational characteristics of words and parts of words (Crystal & House Reference Crystal and House1988, Reference Crystal and House1990). Nonetheless, it is possible to compare some same-speaker productions of final words where turn-transition occurs (= final), with those where more talk from same speaker follows without delay (= medial). Although this does not take into account all factors above (e.g. overall tempo, position in utterance), it does control for syllable structure and inter-speaker variation. There are 25 lexical items which occur in medial and final position which can be compared (in some cases with several tokens in either or both positions). In (7) and (8) ‘Canterbury’ occurs in medial (560 ms) and final (700 ms) position, respectively. In (7) Hal's talk at line 4 (‘well’) is a collaborative completion of Leslie's talk (Local Reference Local, Hardcastle and Beck2005), thereby showing his orientation to Leslie's projected continuation beyond her production of ‘Canterbury’. In (8) Hal begins his talk on completion of a longer production of ‘Canterbury’.
(7)
(8)
From an interactional perspective, a speaker continuing to talk and a co-participant not beginning a turn after shorter versions provides interactional evidence that these shorter versions are deployed and treated as projecting more talk by the current speaker. Our analysis shows that when comparing medial and final tokens of the same word, final tokens are on average 65% longer than medial tokens (st.dev. = 77%, min = 1%, max = 325%). (Where there were multiple tokens of a particular lexical item in either/both positions, an average for that word in that position was calculated first.) We note that there is considerable variability in the relative duration of medial and final tokens. Medial tokens are regularly shorter than final tokens although they are not required to be significantly shorter in all cases. As we shall show, this is because duration is not the only phonetic resource which can be marshalled to project more talk.
4.2.2 Juncture
4.2.2.1 Articulatory anticipation
Articulatory anticipation at the end of a SYNCOMP piece is a resource that speakers can use to project more talk. In (9) Leslie's turn at line 20 contains three SYNCOMPs: ‘but that's lovely’, ‘we'll be there’ and ‘where is it’. The final (front) vowel of ‘lovely’ at the end of the first SYNCOMP piece gets audibly back and round towards its end in anticipation of the following labial-velar approximant at the beginning of ‘we'll’. This is reflected in the lowering of F2 from about 0.32 s (see Figure 1a, and compare a non-anticipating turn-final production in Figure 1c).
(9)
In (10) the final fricative of ‘gorillas’ at line 7 is audibly alveolo-palatal throughout anticipating the palatal beginning of ‘you’; see Figure 1b. Compare this with the non-anticipating turn-final production of ‘gorillas’ in Figure 1d; note particularly the differences observable in F2 during the final vowel and fricative of the anticipating form (rising and converging with a falling F3) when compared with the final non-anticipated form.
(10)
At line 9 of (11) ‘went’ at the end of the SYNCOMP piece does not have canonical voiceless apical closure at its end but voiced velar nasality. This velar articulation is held as voicing drops off for the voiceless velar plosive which begins the following word (‘cause’). F2/F3 velar ‘pinch’ is visible in the spectrogram towards the end of the vowel (around 0.14 s onwards).
(11)
4.2.2.2 Continued voicing across SYNCOMP joins
Of the SYNCOMP pieces which are followed by the production of more talk by the same speaker, there are 121 cases where the SYNCOMP piece ends in a canonically voiced segment and the post-SYNCOMP talk begins with a canonically voiced segment. One hundred and two (84%) of these cases have continued voicing from the first SYNCOMP piece into the talk which follows. This is significant as word-final voiced segments in other contexts are regularly devoiced (Laver Reference Laver1994, Smith Reference Smith1997, Gimson Reference Gimson2001). Smith (Reference Smith1997), for instance, reports that in an experimental study of devoicing of /z/ in American English, all tokens of sentence-final /z/ by all speakers were fully devoiced. Fragments (12)–(14) provide illustrations of voicing continuing from the first SYNCOMP piece into the talk which follows. At line 14 in (12) modal voiced phonation continues across the join between ‘area you’ and at line 16 modal voicing with labiality is continued across the join between ‘name which’.
(12)
(13)
‘Aspinall’ at line 15 in (13) ends with a velarized dental (rather than alveolar) lateral. Temporally extended dental laterality with voice continues across the join between ‘Aspinall the’ and constitutes the onset of ‘the’. See Figure 3b.
(14)
At line 10 of (14) there is continued voicing between ‘city and’. The final vowel of ‘city’ is monophthongal and noticeably open (as it moves to next word which begins with more open vowel) – particularly when compared with similar vowels in other turn-final vowels (‘actually’, ‘lovely’, ‘mainly’) which are also diphthongal. See Figure 3c.
In (15) ‘go’ at line 2 ends a SYNCOMP piece. The vowel is short and ends with noticeably back rounded quality. Voicing continues across the join in ‘go we're’. See Figure 3d.
There are 16 cases that show continued voicing even though the segmental make-up would not predict voicing in canonical productions. For example, in a canonical production, ‘Kent’ would end with a voiceless plosive. However, Leslie's production of ‘Kent’ has no plosive at its end. It terminates with a long dento-alveolar nasal, accompanied by low frequency voicing, whose closure is released directly into the vowel of ‘this’.
(15)
At line 3 in (16), rather than producing the pronoun ‘he’ with initial voicelessness, Hal continues voicing across the join between ‘here’ and the pronoun which begins with low frequency creaky voicing.
(16)
4.2.3 Other kinds of close proximity
Close proximity of words and phrases in everyday speech can take a variety of forms (see e.g. Local & Walker Reference Local and Walker2004). For instance, articulatory anticipation and continued voicing discussed in sections above could reasonably be treated as instances of close proximity. Here we document some other regular occurrences towards the end of SYNCOMP pieces which are not treated as presenting the opportunity for turn-transition.
(17)
(18)
In (17) the final apical closure at the end of ‘right’, line 14, is held (for 80 ms) from the end of ‘right’ and is released at the beginning of ‘that's’. Hal's ‘really’ in (18), line 1, is very short overall (149 ms) with an extremely short final vowel (34 ms) followed shortly after (44 ms) by the inbreath ‘.hh’. In (19)/Figure 4b the apical closure for the voiceless plosive at the end of ‘flat’ is unaspirated and released directly into apical friction for the beginning of ‘still’ (compare below, where it is shown that turn-transition follows all SYNCOMP-final voiceless plosives released with aspiration).
(19)
4.2.4 Reduction of consonants and vowels
SYNCOMPs which are not followed by turn-transition regularly exhibit vowel and consonant ‘reduction’ around their ends. Thus, for instance, we find consonants at the end of such SYNCOMP pieces produced as fricatives rather than plosives or affricates, as short, noticeably reduced (centralised) vowels and vowels in final open syllables produced as monophthongs rather than diphthongs. The following fragments provide some exemplification:
(20)
There is no plosive at the end of ‘weekend’. It ends with a nasal [] with modal voicing which continues across the join between ‘weekend with’.
(21)
‘Damage’ at line 11 of (21) ends with weak lip-rounded friction (not affrication) which moves directly into the initial labial-velar constriction for ‘when’.
(22)
At line 13 of (22) Leslie's ‘go’ is produced with a short (147 ms), noticeably reduced vowel of central (slightly) rounded quality [] even though stressed. This is noticeably different from her other tokens of the same vowel in turn-final open syllables followed by turn-transition. These are diphthongal. There is also an audibly early transition to apical closure at the end of the vowel.
(23)
In (23) the vowel of ‘see’ is produced as a short (221 ms), open monophthong [ɪ] rather than the diphthong found in other turn-final tokens (compare this with Leslie's turn-final token in Figure 5b, which shows distinctive formant movement through the vowel). Voicing also continues across the join between ‘see’ and ‘and’.
In (24) the final consonant of ‘back’ is produced as a short, lax, voiceless velar fricative which is immediately followed by voicing for the first (somewhat close and front) vowel of ‘again’. See Figure 6 (on page 268).
(24)
4.3 Summary
Table 1 provides details of the relative frequency of occurrence of phonetic features and their association with turn-taking. (Note that due to co-occurrence of features single instances may appear in the counts in more than one row; see Appendix B for details of co-occurrences.) Those cases where there is no turn-transition and continuation of talk by the same speaker are presented in the centre of the table, and those where there is prompt and unproblematic turn-transition are in the rightmost column. There is an asymmetry in the distribution of these features with respect to turn-taking.
In 82% of the cases where speakers avoid lengthening at a SYNCOMP there is no turn-transition and the current speaker continues talking. In 83% of the cases where talk after a SYNCOMP is produced in close proximity there is no turn-transition and current speaker produces more talk. In 75% of the cases where anticipatory phonetics occurs there is no turn-transition and more talk is produced by the current speaker. In 82% of the cases where voicing is continued into further talk there is no turn-transition. In 79% of the cases where the SYNCOMP displays vowel or consonant reduction there is no turn-transition and we find more talk from the current speaker.
There is therefore a clear association between the occurrence of each of these features and the production of more talk by that speaker, rather than with turn-transition. We therefore refer to these as talk-projecting features: they adumbrate the proximal production of more talk by the same speaker. However, there are phonetic parameters which are regularly followed by turn-transition (turn-projecting features). Some of these are dealt with in the next section, which considers those SYNCOMPs followed by prompt and unproblematic turn-transition in more detail.
Before moving on to consider turn-projecting features it is worth noting that 24% of cases where one or more talk-projecting features are present the co-participant talks. However, the kind of talk which occurs in this environment is highly constrained. In almost all cases the incoming talk does not represent a serious attempt to take an extended turn and consist of minimal continuers and agreement tokens (e.g. ‘mm’, ‘yes’, ‘no’, ‘that's right’), appreciations (‘oh good’) and laughter. In the remaining cases the incoming talk is designed with features of turn competition which displays an orientation to the legitimacy of continuation by the current as opposed to the incoming speaker (French & Local Reference French and Local1983).
5 SYNCOMP + turn-transition: Turn-projecting features
Unproblematic turn-transition occurs in the vicinity of 275 (47%) of the 591 SYNCOMPs. Sixteen cases which could not be subjected to reliable auditory or acoustic analysis (because they occurred in overlap with louder talk, or were otherwise obscured) were excluded from further consideration.
We have already seen a case of in-the-clear (out of overlap) turn-transition following a short silence in (3) above. In (25) there is unproblematic turn-transition at lines 24–25 (Jefferson's ‘ = ’ symbolisation indicates that Hal's turn begins in especially close proximity to the end of Leslie's turn).
(25)
Fragment (26) illustrates a case of unproblematic turn-terminal overlap (on aspects of the orderliness of overlap, see Jefferson Reference Jefferson1986, Schegloff Reference Schegloff2000). At line 14 Hal begins his turn in overlap with the very end of Leslie's ‘know’.
(26)
As shown in Table 1 above, talk-projecting features do not typically appear at SYNCOMPs where turn-transition occurs. Thus, in only 12% of cases where a SYNCOMP engenders turn-transition is lengthening avoided (compare 56% of SYNCOMPs in the Type A cases, i.e. where there is no turn-transition and the same speaker continues), 17% are followed by turn-continuation in close proximity (compare 83% for Type A cases), and 10% exhibit reduction towards their ends (compare 38% for Type A cases). We also note in Table 1 two turn-projecting features: plosive release and aspiration, and audible outbreath. Turn-transition occurs in 95% of the target cases where the final word in a SYNCOMP piece ends in a voiceless plosive and the plosive is released and accompanied by aspiration. Turn-transition occurs in 87% of cases where the final word in a SYNCOMP piece is followed by an audible outbreath.
5.1 Variation in plosive release and aspiration
Voiceless plosives occurring word-finally at SYNCOMPs which are not turn-projecting display different characteristics from those which are. When plosives occur word-finally at the end of a SYNCOMP piece they may be released and, if voiceless, may also be aspirated (Local et al. Reference Local, Kelly and Wells1986, Local Reference Local, Couper-Kuhlen and Ford2004, Walker Reference Walker2004). In the call, 111 SYNCOMP pieces end with word-final plosives; of these 87 are voiceless and 24 are voiced (see Table 2). Forty-one of the voiceless plosives and 11 of the voiced plosives occur at SYNCOMPs which engender turn-transition. Twenty-eight of the tokens which end in voiceless plosives are released with aspiration; all these tokens are treated as transition relevant and turn-transition occurs. Of the remaining 13 voiceless plosives that occur in the vicinity of turn-transition the presence of aspiration cannot be determined in three cases because of overlapping talk, one case is released into laughter which is joined by the co-participant. In the remaining nine cases there is no audible aspiration and the current speaker continues talking in overlap with the incoming talk (providing some evidence that the lack of aspiration is a design feature which projects more talk). Of the 11 voiced plosives which engender turn-transition eight are released. It is not possible to determine release for the remaining three cases due to early incoming, in-overlap talk.
Fragments (27)–(29) provide examples of voiceless plosives produced by each speaker, with different places of articulation, but they are all released with aspiration, and occur at the end of SYNCOMPs which engender turn-transition.
(27)
(28)
(29)
The release of voiceless plosives at the end of SYNCOMPs which engender turn-transition is rather different from what can be observed where SYNCOMPs are followed immediately by more talk from the same speaker. For instance, among the cases already presented of SYNCOMP-final voiceless plosives where more talk from the same speaker follows immediately we have seen their deletion in (11)/Figure 2, their release without aspiration into following sounds in (15), (19)/Figure 4b, and their production as fricatives in (24)/Figure 6: all effects we do not observe where SYNCOMP-final voiceless plosives occur at the end of SYNCOMPs which engender turn-transition. Furthermore, of the 39 voiceless plosives occurring at the end of SYNCOMPs which are not followed by turn-transition, only two have release with audible aspiration.
5.2 Audible outbreath
In the data 50 SYNCOMPs are followed by audible outbreath. In five cases audible outbreath accompanies aspirated, voiceless plosives while 45 cases occur after other types of articulation. None of the SYNCOMPs figuring in Type A sequences are accompanied by final outbreaths. However, outbreaths are seen at SYNCOMPs which engender turn-transition: see, for example, (27)/Figure 7. Indeed audible outbreaths routinely follow words at the end of SYNCOMPs which are followed by turn-transition irrespective of whether they end in voiceless plosives (see also Walker Reference Walker2004). In (30) Leslie produces an audible outbreath at the end of her ‘Saturday’, shown in Figure 8 (on page 272).
(30)
Audible outbreaths occur at the end of 19% of SYNCOMPs which engender turn-transition, but after less than 1% of those SYNCOMPs observed in Type A sequences where there is no turn-transition and more talk from the same speaker.
5.3 Summary
As well as the less frequent occurrence of talk-projecting features occurring at SYNCOMPs which engender turn-transition, those SYNCOMPs also often exhibit features very rarely found at the end of SYNCOMPs where there is more talk from the same speaker. Two of those turn-projecting features are (i) the occurrence of aspiration after SYNCOMP-final plosives, and (ii) the presence of final audible outbreaths.
We note in our data that in 27% (n = 24) of those cases where projecting features are present turn-transition occurs. In all cases where turn-transition precedes the production of the turn-projecting features we have described the current speaker stops talking at the end of the SYNCOMP. While in these cases co-participants may be responding to features available earlier in the ongoing talk, the cessation of talk by the current speaker with these phonetic features provides some evidence that their talk has reached an appropriate completion point and signals that no further talk was intended.
6 On combining phonetic parameters
Some of the phonetic features which we have attended to separately may co-occur, as shown in Appendix B. Some combinations are not possible. Continued voicing and aspiration of word-final voiceless plosives does not occur, and nor does anticipatory articulation with aspiration of word-final plosives. In one type of co-occurrence the features appear together due to the nature of the features analysed. For instance, continued voicing and anticipatory phonetics are both manifestations of close proximity. There are other cases, however, where one phonetic parameter does not entail the other but they regularly co-occur. One such ‘designed’ co-occurrence is reduction and continued voicing.
There is no simple correlation between features or the combination of features and the likelihood of transition. Turn-transition occurs after 32% of cases where there is continued voicing (a talk-projecting feature) on its own; where continued voicing occurs in combination with avoidance of lengthening (a further talk-projecting feature), turn-transition occurs in only 10% of cases. However, we also observe the opposite effect: an increase in turn-transition where the number of talk-projecting features increases. For instance, where more talk is produced by the same speaker after a SYNCOMP in close proximity to that SYNCOMP, turn-transition occurs only in 3% of cases (n = 32). However, where more talk is produced by the same speaker after a SYNCOMP in close proximity to that SYNCOMP in combination with the avoidance of lengthening (i.e. two talk-projecting features rather than one) turn-transition occurs in 15% of cases (n = 26). After a different combination of two talk-projecting features (close proximity and reduction) there are no instances of turn-transition (n = 10).
For reasons set out in Section 1 above, we have been mostly concerned with describing features other than pitch in this article. However, one issue requiring some discussion concerns the possible co-occurrence of the features we describe with certain pitch features. (Some might argue that the features and patterns we describe are ‘by-products’ of intonational phrasing. Since a package of features is constitutive of an intonation phrase boundary, of which pitch is just one, it is not obvious why the features we have described should be seen as ‘by-products’ of intonational phrasing any more than intonational phrasing should be seen as parasitic on the sorts of features we have described. Indeed, features we have described are drawn on by frameworks for the coding of intonation phrase structure, e.g. in the break indices of ToBI labelling; see Beckman & Ayers Elam Reference Beckman and Elam1997.) Given the general interest in intonational features of discourse, it is relevant at this point to begin to explore whether there is a systematic correspondence between the features we have identified as talk-projective, and the occurrence of prototypically ‘final’ pitch characteristics: a falling main pitch accent on the final word of the SYNCOMP piece measuring five semitones (ST) and ending within five ST of the lowest fall of this type for that speaker. (Note that we do not claim any particular interactional significance of these particular measures, but use them as a heuristic device for the identification of an illustrative set of utterances with prototypically ‘final’ pitch.)
There are 13 cases of this ‘fall to low’ (8 produced by Leslie, and 5 by Hal) where one or more of the talk-projecting features identified in this article are present. If pitch is the principal means by which turn- or talk-projection is signalled, then it might be expected that turn-transition is especially likely after this pitch configuration. However, of the 13 cases of ‘fall to low’ accompanied by talk-projecting features, only three are followed by turn-transition.
Fragment (31) provides an example of a SYNCOMP piece produced with a SYNCOMP-final ‘fall to low’ in pitch, but where information elsewhere in the signal projects more talk and consequently inhibits turn-transition.
(31)
Hal's ‘marvellous’ in line 1 of (31) occurs at the end of a SYNCOMP piece, carries a main pitch accent (which is known to contribute to the signalling of transition relevance; see Schaffer Reference Schaffer1983, Ford & Thompson Reference Ford, Thompson and Ochs1996, Schegloff Reference Schegloff1998, Szczepek Reed Reference Szczepek Reed2004) and is produced with a ‘fall to low’ in pitch (see Figure 9, next page). The fall measures 12 ST (1 octave), and ends within 2 ST of the lowest of all Hal's ‘falls to low’ described above. There is nothing about the pitch features of this utterance which projects more talk. However, Hal continues: the pitch of Hal's utterance notwithstanding, he has projected more talk to come through the quality of the fricative at the end of ‘marvellous’, which becomes progressively more rounded through the course of its production. This increased rounding is evident from the spectrogram in Figure 9: notice the falling resonance between 0.64 s and 0.7 s, from approximately 3100 Hz to 2100 Hz. Where speakers are signalling that there is no more talk to come after the SYNCOMP, they routinely move towards an open configuration of the vocal tract (Walker Reference Walker2004). However, in this case Hal is anticipating the production of the approximant [ɹ] which starts his next word, thereby projecting the production of more talk which he goes on to produce immediately. Note too that (as in other cases of articulatory anticipation described above) Leslie withholds talk until a later SYNCOMP without any of the described talk-projecting features.
There are 67 cases of ‘fall to low’ without any of the talk-projecting features described above (33 produced by Leslie, and 34 by Hal). Of these cases, 60 engender turn-transition. These results show that while turn-transition usually follows ‘fall to low’ where talk-projecting features are not present, when the talk projecting features are present turn-transition is inhibited and does not usually occur even when that talk is produced with prototypically ‘final’ pitch characteristics.
7 Summary and implications
Studies of talk- and turn-projection tend to focus on prosodic aspects, and especially pitch. Our main aim in this article has been to report on a systematic exploration into the role of other phonetic features in the management of turn-taking. In providing that report we have focussed on certain phonetic features evident towards the end of syntactic structures which are possibly complete in their interactional context (SYNCOMP pieces). We have focussed on two types of sequence in a single, naturally occurring telephone interaction involving two British native speakers of English: those where the point of possible syntactic completion did not engender turn-transition and more talk from the same speaker followed (Type A sequences), and those where unproblematic turn-transition did occur. We have shown that certain phonetic features regularly occur in the Type A sequences: avoidance of lengthening, close proximity of talk following the SYNCOMP (including anticipatory phonetics and continued voicing) and vowel and consonant reduction. Each occur more often in the Type A cases than in those cases with unproblematic turn-transition by a factor of more than 4. These features can therefore be regarded as talk-projecting. On the other hand, aspiration of word-final voiceless plosives at the end of the SYNCOMP piece and final audible outbreaths can be regarded as turn-projecting: where these occur, turn-transition occurs rather than the production of more talk by the same speaker.
The study, of course, is not without its limitations. Although we have combined qualitative analysis of representative instances with quantitative analysis, we are well aware that there are details – both phonetic and interactional-sequential – we have not considered here and which, we believe, will be relevant in trying to reach a more complete understanding of how turn projection is managed. For instance, we have not said anything in detail about the nature of the talk which is produced in response to a SYNCOMP though its design can shed light on whether or not the talk which it follows was designed as, or is being treated as, transition relevant (French & Local Reference French and Local1983, Wells & Macfarlane Reference Wells and Macfarlane1998). Nor have we explored other kinds of sequence identified here which do not engender turn-transition (namely, Types B and C outlined in Section 4.1). We leave these interactional intricacies, and others, for future work. The reason for this principled neglect is that it has been our aim to go further in unpicking the functional import of phonetic detail – of various types – in managing interaction in broad terms.
Our findings have two sorts of implications. There are theoretical implications: any theoretical model intended to handle the phonetic details of naturally occurring talk will need to attend to these sorts of details too. A further practical implication of this work is that to reach the goal of naturalistic speech and interaction by machines, these sorts of details (and others) will need to be taken into account. There are methodological and practical implications of this work, too. In showing the interactional relevance of a range of phonetic resources to the management of turn-taking, we hope to have demonstrated again ‘the need to be as open-minded as we can, and consider as many parameters as possible as potentially relevant’ (Local & Walker Reference Local and Walker2005: 122). De Ruiter et al. (Reference de Ruiter, Mitterer and Enfield2006) claimed that while lexicosyntax was necessary for their raters for projecting when a turn would be complete, other sorts of information – including pitch – were not necessary. We hypothesised that features other than pitch were likely to be implicated in their raters’ decisions as to possible turn completion. By considering articulatory, phonatory and durational details we have shown that there are features and feature-sets which co-occur with talk-projection on the one hand, and turn-projection on the other. On the basis of these findings, it is no longer satisfactory to limit investigations into the phonetics of turn-transition to features of pitch or other prosodic phenomena.
Acknowledgements
The authors made an equal contribution to the analysis presented here, and its writing up. We are grateful to Marianna Kaimaki for undertaking preliminary analysis of pitch features and to her and Traci Walker for providing detailed comments on earlier drafts. We are also grateful to three anonymous JIPA reviewers for prompting us to elaborate on certain important points. We blame each other for any errors of fact, interpretation or expression.
Appendix A. Summary of modified orthography transcription conventions
Turns at talk run down the page with the speaker identified at the left-hand edge. Onset of overlapping talk is indicated by left-hand square brackets, ‘[’; the end of overlap may be indicated by right-hand square brackets, ‘]’. Silences are measured in seconds and enclosed in parentheses, e.g. (0.2); a period in parentheses indicates a silence of less than one-tenth of a second; ‘ = ’ indicates that one participant's talk is produced in especially close proximity to another's: the talk is ‘latched’. Audible breathing is indicated by ‘h’, with each ‘h’ indicating one tenth of a second; audible inbreathing is indicated by ‘h’, or sequences of ‘h’, preceded by ‘.’: .hhh. See Jefferson (Reference Jefferson2002) for a more thorough description of transcription conventions including the use of arrows and punctuation marks to indicate aspects of prosody.
Appendix B. Co-occurrence of phonetic parameters at SYNCOMPs
Abbreviations of parameters as follows: antic = anticipatory articulation; av-length = avoidance of lengthening; contvoi = continued voicing; cp = close proximity; inb = audible inbreath; outb = audible outbreath; reduc = reduction of vowels and consonants; rel = plosive release/release with aspiration.