Hostname: page-component-7b9c58cd5d-sk4tg Total loading time: 0 Render date: 2025-03-15T22:06:33.552Z Has data issue: false hasContentIssue false

TATHAM MARK & MORTON KATHERINE, Speech production and perception. New York: Palgrave Macmillan, 2006. Pp. xviii + 326. ISBN: 1403917337 (hbk).

Published online by Cambridge University Press:  23 March 2009

Stefan A. Frisch
Affiliation:
Department of Communication Sciences and Disorders, University of South Floridafrisch@cas.usf.edu
Sylvie M. Wodzinski
Affiliation:
Department of Communication Sciences and Disorders, University of South Floridafrisch@cas.usf.edu
Rights & Permissions [Opens in a new window]

Abstract

Type
Review Article
Copyright
Copyright © Journal of the International Phonetic Association 2009

In the conclusion to this book, the authors state: ‘Our objective has been to identify key areas of speech production and perception where we believe theoretical work needs to concentrate’ (p. 299). The book certainly accomplishes this objective. The foremost area identified by the authors, Mark Tatham & Katherine Morton (T&M), is in the representation of expressive content in speech. T&M also highlight a lack of good integration between work on prosody and work on segmental phenomena, such as coarticulation. There are two other domains where the book makes a significant contribution. One is in the definition of terms and explanation of the historical development of phonetic theory. The other is in the description of the TM model of speech production and perception, which provides some inroads into tackling the areas of theoretical development for representing expressive content in descriptions of speech and in theoretical development of the representation of language sound structure in the mind.

The first chapter provides a very nice overview of the perspective of classical phonetics and introduces phonetic terminology. The reader is also immediately confronted with one of the strengths and one of the weaknesses of the book. T&M are very thorough and explicit in elucidating terminology, and perhaps the biggest strength of the book is in the definition of terms throughout. For the reader who takes the time to carefully examine what T&M say and how they say it, there is much insight to be gained into theoretical issues in phonetic science. There are numerous books on phonetics, but in most cases the focus of these is on data and phonetic description. The discussion of issues and presentation of terminology in this book would be useful material for an advanced course on phonetic theory. However, the first chapter (and the second, for that matter) is nearly completely devoid of examples. For students, especially, this will make the material harder to absorb. For an instructor to get the most effect out of this book, additional supplemental materials will need to be prepared with example recordings, waveforms, or spectrograms to illustrate the points.

The other overarching weakness in the book is in the presentation and writing itself. There are numerous typographic errors, which might be easy to ignore if the text itself were easier to read. T&M have a distinctive writing style that takes some time to figure out. We would have preferred many more commas in the text to help mark individual clauses within complex sentences. At times, it is difficult to distinguish data from opinion, a problem that is contributed to by the general lack of discussion of concrete data throughout. The discussion frequently shifts back and forth between a model from the literature or point of data and the TM model, and we feel that T&M often did not do a very good job of keeping the two separate in their presentation. Similar problems drift into the higher level organization of the book as well. New sections or chapters often resume the topic from the previous chapter more or less without interruption. While this does provide a nice flow of ideas for the reader, it makes the book more difficult to use in a class, for example, where sessions on different days would typically be organized on a chapter-by-chapter basis. It also undermines, to some extent, the organization of the book into sections as there is generally not a strong topic break between sections.

Setting these weaknesses aside, T&M provide a very good overview of advanced issues in phonetics and psycholinguistics. The first section of the book focuses on speech production. In addition to their excellent explication of terminology, T&M provide a thorough overview of coarticulation and models of coarticulation from the phonetics literature. The discussion highlights some of the requirements of a phonetic theory, including (i) resolving the transition from the abstract, discrete, timeless phonological objects to the concrete, continuous, temporally expressed phonetic objects; (ii) integrating suprasegmental constituents with inter-segmental processes and the articulation of speech sounds; and (iii) explaining how users of language access and manipulate sound structure for communicative purposes. Their discussion of models of coarticulation includes Öhman's vowel carrier model, Lindblom's target undershoot model, Keating's window model, Browman & Goldstein's articulatory phonology, as well as a variety of others, including some models from the speech technology literature. T&M's presentation of coarticulation provides a nice feel for the historical development of ideas in the field in the connections between older and newer approaches, as well as current issues and future trends. The coarticulation chapters of the book would be a good foundation for a seminar-level course or a student preparing for comprehensive exams.

The first section of the book also includes a short discussion of speech motor control. The focus here is primarily on Fowler's Action Theory and the development of Task Dynamics, which T&M adopt in their model of speech production. Details are fewer here, in line with the focus in the book in general on the linguistic side of phonetics, rather than the physiological. Those looking for a detailed description of speech articulation and intra-segmental phonetics will not find sufficient detail here.

The first section ends with a relatively good introduction to prosody, again starting with an excellent introduction to terminology and proceeding to a discussion of theories. The discussion of prosody includes intonation, stress, and rhythm as well as syllables, syllabic constituents, and syllable juncture. Previewing later chapters, T&M discuss the implementation of prosody in the TM model as being through ‘expression wrappers’ which they indicate with XML-style labeled and nested bracket notation. The approach is reminiscent of Coleman's synthesis architecture in putting a focus and priority on suprasegmental rather than segmental implementation (Coleman Reference Coleman1992, Dirksen & Coleman Reference Dirksen, Coleman and van Santen1997). Whether such an approach is ultimately successful is an empirical question, but the primacy of prosody seems to suggest that prosody could somehow be implemented without reference to segments. The two appear to be thoroughly linked, and in some cases, such as the interaction between intonation and glottalization as a characteristic of the allophonic realization of a segment, the control mechanisms for both prosodic and segmental units are dependent on the same articulator and cannot be easily separated (Pierrehumbert & Frisch Reference Pierrehumbert, Frisch and van Santen1997). Notably, the T&M discussion of speech rhythm is missing any mention of the work of Port, Cummins, and Tajima (e.g. Cummins & Port Reference Cummins and Port1998, Tajima & Port Reference Tajima, Port, Local, Ogden and Temple2003), who have made some progress in relating intuitions on the regular rhythmic nature of speech, mentioned by T&M, to the physical timing of speech segments that can be measured in an acoustic signal.

The second section of the book discusses speech perception. However, the first chapter in this section is where T&M present their model in full. Thus, the chapter is equally relevant to the previous section, on speech production. The TM model divides the grammar of sound structure into phonetic and phonological components, each of which contain static and dynamic components. The phonetics/phonology divide is the familiar one, between the discrete and atemporal phonological units and the continuous and real time phonetics. The static/dynamic divide separates a competence-like knowledge base from processes that manipulate or implement that knowledge during production and perception. For example, the static system would contain phonological segments and phonotactic information on the phonological side, and articulatory targets and generalizations about acoustic properties on the phonetic side. The static system would include the mental lexicon, a common component in other theories of language processing (e.g. Levelt Reference Levelt1989). The dynamic system would contain both phonological rules of coarticulation (e.g. tone sandhi) and phonetic rules of coarticulation (e.g. coproduction due to articulator overlap). The dynamic system would also include what is traditionally called the performance component of the grammar that creates and interprets actual utterances. The static/dynamic division in the T&M model is similar to the division between the roles of declarative and procedural memory systems in language processing in Ullman's declarative/procedural model (Ullman Reference Ullman2004). Since T&M propose no particular cognitive systems to explain their static/dynamic divide, their model could easily incorporate the declarative/procedural model into their system.

The TM model also contains a Cognitive Phonetics Agent. The role of the Cognitive Phonetics Agent is to supervise the production process to ensure successful production and perception. The Cognitive Phonetics Agent has access to an internal cognitive model of the speech production and perception process that it used in adapting the production based on context. The Cognitive Phonetics Agent supervises the dynamic execution of speech, the influences of all types of context (e.g. pragmatic, expressive, and coarticulatory contexts), monitors feedback, and makes feed-forward predictions that it uses to adjust the production plan. The amount of adaptation does have limits, however. Since T&M adopt a task dynamics approach to speech motor control, the Cognitive Phonetics Agent can set gestural tasks for articulation, but has limited access to the internal mechanisms of the gestural task.

The remainder of the second section focuses on speech perception. As with the other parts of the book, T&M provide a thorough introduction to basic terminology and issues in the field. The goal of the perceptual process is to assign a symbolic linguistic representation to the speech signal. T&M elucidate two general approaches to the speech perception process – active versus passive. In passive speech perception models, it is assumed that the necessary information to identify segments is contained relatively transparently in the speech signal. Active speech perception models include processes that must deduce segmental identity through active application of phonological knowledge to the process. This section of the book also contains a comprehensive overview of models of speech perception, including direct perception, the motor theory, and quantal theory among others. The section ends with a discussion of the role speech perception plays in speech production, in other words the ways in which speech production is altered to benefit speech perception by the listener.

The final section of the book discusses applications of phonetic theory and computational models of speech production and perception to speech technology, second language acquisition, and speech disorders. These are areas that theoretical work in phonetics will eventually have to address, and the discussion in these chapters, though somewhat superficial, is a useful start. Given our own experience in the field of speech disorders, the proposals by T&M for the application of models and theories from phonetics by clinicians are somewhat naïve. A great deal of work that would be called translational research in the medical field will need to be done by researchers or teams with knowledge of the issues from both fields. The time is ripe for this kind of research, however, as interdisciplinary work is attractive to many institutions and funding agencies are looking for research that will lead to more immediately applicable results.

In summary, T&M provide a good introduction to fundamental issues in speech production and perception, provide a description of their own model, and point out several important directions for future research. The book has some features that make it a difficult read. However, it provides a stimulating discussion of important concepts that will be of interest to researchers in the field and might be useful in a high-level seminar either as an introductory text or as an organizational framework in conjunction with primary source readings.

References

Coleman, John S. 1992. The phonetic interpretation of headed phonological structures containing overlapping constituents. Phonology 9, 144.Google Scholar
Cummins, Fred & Port, Robert F.. 1998. Rhythmic constraints on stress timing in English. Journal of Phonetics 26, 145171.CrossRefGoogle Scholar
Dirksen, Arthur & Coleman, John S.. 1997. All-prosodic synthesis architecture. In van Santen, et al. (eds.), 91–108.Google Scholar
Levelt, William J. 1989. Speaking: From intention to articulation. Cambridge, MA: MIT Press.Google Scholar
Pierrehumbert, Janet B. & Frisch, Stefan A.. 1997. Synthesizing allophonic glottalization. In van Santen, et al. (eds.), 9–26.Google Scholar
Tajima, Keichi & Port, Robert F.. 2003. Speech rhythm in English and Japanese. In Local, John, Ogden, Richard & Temple, Rosalind (eds.), Phonetic interpretation (Papers in Laboratory phonology 6), 317334. Cambridge: Cambridge University Press.Google Scholar
Ullman, Michael T. 2004. Contributions of memory circuits to language: The declarative/procedural model. Cognition 92, 231270.Google Scholar
van Santen, Jan P. H., Sproat, Richard W., Olive, Joseph P. & Hirschberg, Julia (eds.). 1997. Progress in speech synthesis. New York: Springer-Verlag.CrossRefGoogle Scholar