In the space of sixteen chapters Peter Ladefoged discusses topics ranging from constraints on sounds and on making them, the acoustics of speech sounds and how computers can in principle synthesise these and recognise them, how human beings perceive speech – through how we make vowels and consonants, how the larynx works, comparison between consonants and vowels in different languages – to how they all fit together to make running speech. This is a huge range of topics within phonetics and phonology and, as might be expected, the depth of treatment of individual themes varies enormously – much on articulating the various sounds and their acoustics, not so much on the theory of the relationship between mental representations and physical representations of speech, for example. In the end, perhaps, this is a summary of PL's main interests in the field of speech, and each topic bears the stamp of his personal approach and indeed his passion.
The passion pervades and is infectious: no serious reader – linguist, psychologist, therapist, technologist – will fail to get caught up in PL's enthusiasm for the subject, and none will fail to be impressed by the consistency of coverage and argument. PL's approach is coherent and lucid, with everything following from everything else. It is, of course, this passion and his ability to teach and explain that made PL our leading phonetician for half a century, and which made the plausible coherence of his approach so compelling.
So, let us look in detail at some of the main areas covered in the book:
• What speech sounds are;
• Consonants and vowels: how we make them;
• How we perceive speech.
What speech sounds are. Throughout the book, as in most speech research, the term ‘speech sound’ is ambiguous. On the one hand it is an actual acoustic signal, analysable and measurable by the phonetician, and to be heard and perceived by the listener; it is a phonetic unit. On the other hand it is an abstract object, existing cognitively in the speaker and listener, able to be rendered as a phonetic unit by the speaker, and standing as part of a pattern of such objects in the mind of the speaker and the collective mind of the users of the language; it is a phonological object.
In Chapter 1 PL presents the overall model: he will describe the sound waves in acoustic terms (i.e. as acoustic phonetic objects), in terms of vocal organ gestures used to make them (i.e. as articulatory phonetic objects), and will associate these sounds with symbols of the International Phonetic Alphabet (i.e. as abstract symbolic representations) – a process involving massive data reduction to remove inter- and intra-speaker variability to focus on their linguistic content. Chapter 1 immediately tackles how speech sounds can be described acoustically, using a simple static model. In reality, as we know, a speech sound in isolation (occurring very rarely in language) is different from a speech sound in running speech (occurring all the time). We can also ask whether a speaker rendering a phonological, abstract ‘sound’ out of context actually does the same thing, or in his terms performs the same gestures, as when they render it in the running context of a word or sentence.
In the summary to chapter 1 PL refers to acoustics as being the most scientific way of describing speech. Clearly he means here that calling on acoustics to describe the physical perspective on speech automatically means that scientific method fully applies here, and indeed he appears in this book (and elsewhere) to think of the acoustic signal as being an important reference point for talking about speech, perhaps because it is seen as less arguable. But cognitive science is certainly no less scientific, although it may employ different methodology from models based on physics. Speech research needs both, of course.
Chapter 2 discusses pitch and loudness, invoking tones and intonation in dealing with the uses of pitch and loudness. The reader might be somewhat confused as to which terms belong to the physical domain and which to the abstract domain. So, we find – while PL discusses vocal cord function – that the pitch of the voice can be used to produce different tones; this must mean that a change in fundamental frequency causes perceived change in pitch which leads to a linguistic interpretation as to the intended tone. Elsewhere in the book and in other writings, PL wants to reserve ‘pitch’ to mean ‘perceived fundamental frequency’, ‘intonation’ to mean ‘perceived change of fundamental frequency spanning linguistically relevant sequences of sounds’. The terms are not used consistently, and it is a pity that this happens when pointing out the very important relevance of how changing some physical parameters of speech can even change the cognitive meanings of words. For example, fundamental frequency can, via perceived tone, change a word's meaning; amplitude and duration can, via perceived stressing, also change meaning; fundamental frequency change across phrases can, via perceived intonation, change some syntactic aspects of a sentence – statement vs. question – or change some aspect of its expressive content – an angry tone vs. a happy tone.
The acoustics of vowels and consonants. Chapter 3 moves from phonetic variability to a phonological approach to the use of vowels contrastively in languages. We are concerned with abstract or normalised vowels and how their linguistic function depends on their separate identities at the cognitive level as far as speakers and listeners are concerned. Chapter 4 moves back to the details of the acoustics of the physical sounds themselves, though once again glossing over how sounds change in different contexts. PL takes very much an idealised approach in dealing with the acoustic characteristics of vowels: typical formant values are given and derived from simple spectrograms. Chapter 4 goes on to make sense of the measurements – pattern spotting in the acoustic signal (rather than the phonological space). Graphs of first and second formant values in the acoustic domain are related to parameters such as tongue height in the articulatory domain – a useful pedagogical exercise. Blending boundaries between physical and cognitive domains does not keep clear the important distinction between linguistic and physical – domains differing seriously in their treatment of the data. Women's vowel formant scores are compared with men's, as well as the scores from a few different accents of English. An opportunity was missed (even in an introductory text) to address directly the problem of the relationship between the detail-less abstraction of ‘a vowel’ and the various acoustic manifestations of that vowel. The graphs look roughly the same, but the dots are in systematically different places: We can ask:
• How much does the choice of descriptive framework make it possible to draw out such observations?
• Would we be attracted to different observations if the descriptive framework were different?
The comment is perhaps unfair and reflects my own view that basic questions can be answered at an introductory level – students and readers can engage with major points at an early stage (see Tatham & Morton Reference Tatham and Morton2006, forthcoming), for all the while PL is lucid and to the point, with a careful progression of ideas and illustrations to reiterate the notions of patterning and system in speech. The problem is, of course, how to tell the reader that there are measured acoustic values, and there are measurable articulatory parameters, and that these can be linked. But that, much more importantly, they can be linked in the context of an abstract linguistic descriptive framework. There are other frameworks for this, but this is a textbook in linguistics.
How we perceive speech. Chapter 10 is about speech perception, starting with confusion matrices, how these can be used to give us some idea about the perceptually relevant acoustic parameters involved, and enable us to assess similarities between segments (including syllables). The notion ‘perceptual space’ is introduced though this is treated as a static phenomenon, rather than something essentially dynamic involving distortions of the space due to running contextual effects. Expressive content in speech also distorts the static perceptual space. He now goes on to discuss some experiments which bring out some of the constraints operating within the static perceptual space.
A major tenet of PL's ideas on perception is that the interpretation of syllables and words can proceed by direct reference to stored representations of these units. How they are stored, how they are represented, how they are accessed and how they are compared are details not particularly discussed, though it is assumed that ‘when listening you may take in chunks of speech that last about a fifth of a second’ (p. 107). This idea is based on analysis of the saccadic eye movements observed during reading. And people ‘recognise larger sequences such as syllables and relate them to the patterns of words stored in their brains without breaking them into smaller pieces’ (p. 108). PL concedes that some researchers favour a more elemental kind of perception which focuses on smaller units (individual sounds, minimal articulatory or acoustic gestures or even ‘smaller’ phonetic features). However, he usually settles on the syllable in both production and perception as being the pivotal unit at the physical phonetic level – thus matching its assumed status at the abstract phonological level. There is little or no discussion of the problems this model encounters when there are differences in rate of delivery (which always varies continuously in any sentence), enormous accentual differences between speakers or within the same speaker, and equally enormous acoustic differences reflecting expressive content (which always exists in any sentence).
I have a high regard for both PL and the second edition of his Vowels and consonants, although appearing to be critical of omissions. And it is worth noting that in this book PL does begin to push the boundaries of the static model of Classical Phonetics with which he is so closely associated. Theoretical phoneticians will spot the omissions, as will psychologists and speech technologists, but this is not, of course, a research monograph dealing with frontier theory: it is more a reference textbook for students with an above-average interest in phonetics mainly as a sub-discipline of linguistics.
This was not quite PL's last book (his 2003 Phonetic data analysis: An introduction to instrumental phonetic fieldwork was) but Vowels and consonants brings together much of his core thinking about phonetics, expressed in an eminently readable form, and forms a solid foundation for any serious students and researchers hoping to push the subject of Phonetic Theory further. Much of what PL has written could well be expressed as a series of bulleted questions (look at the Chapters on perception and speech technology) which could form the basis of many an interesting and worthwhile research project. We shall see; but perhaps an appropriate way to leave the review is to say that, like his Course in phonetics (Ladefoged Reference Ladefoged1975), Peter Ladefoged's Vowels and consonants will stay in print for many years to come – surely the acid test of quality.