At the core of Pickering & Garrod's (P&G's) theory of language production and comprehension is the monitor, a cognitive mechanism that allows speakers to detect problems in speech. The proposal is that the monitor compares perceptual representations from two channels, namely (a) a perceptual representation of the semantics, syntax, and phonology that the “production implementer” is producing; and (b) forward perceptual representations of forward production representations of each linguistic level. If there is a mismatch, the monitor has detected the problem and so can begin a correction. Monitoring via forward models is part of an elegant account that nicely integrates the action and language literatures. But is this account compatible with findings on speech monitoring?
The model shares with perceptual loop theories (Hartsuiker & Kolk Reference Hartsuiker and Kolk2001; Levelt Reference Levelt1989) the assumption that monitoring needs speech comprehension (i.e., to create a perceptual representation of produced speech). It therefore shares the problems of other models with a perceptual component. One problem is that neuropsychological studies found dissociations between comprehension and monitoring in brain-damaged patients. A striking example is a 62-year-old woman with auditory agnosia and aphasia reported by Marshall et al. (Reference Marshall, Rappaport and Garcia-Bunuel1985). Although this patient's auditory system was intact, she was unable to comprehend familiar sounds, words, or sentences and could not report number of syllables or stress contrasts. Speech production was seriously impaired, with speech often containing neologistic jargon. But despite the patient's severe comprehension problems, she produced a great many (often unsuccessful) attempts at self-corrections of errors – in particular, her phonological errors (but not her semantic errors). These findings suggest that error detection can take place without perception.
Another patient that challenges perception-based monitoring is G., a 71-year-old Dutch Broca's aphasic (Oomen et al. Reference Oomen, Postma, Kolk, Hartsuiker, Bastiaanse, Postma and Wijnen2005). In a speech production task, G. produced many phonological errors, of which he repaired very few, whereas he produced very few semantic errors, which he usually repaired. G.'s production difficulties hence mirror his monitoring difficulties. Importantly, G.'s difficulty to repair phonological errors cannot be attributed to a perception deficit: In a perception task, he detected as many phonological errors as a group of controls. It therefore seems that G.'s monitoring deficit is related to this production deficit and not to any perception deficit. These data argue against a forward model account, because the forward model is a separate and qualitatively different system from the production implementer. There is no reason why the monitoring deficit should mirror the production deficit.
Our recent visual world eye-tracking data (Huettig & Hartsuiker Reference Huettig and Hartsuiker2010) also speak against both perceptual loop and forward modeling accounts. When our subjects named a picture of a heart, they gazed at the phonologically related written word harp more often than at unrelated words, and this “competitor effect” had the same time course as the analogous effect when listening to someone else (Huettig & McQueen Reference Huettig and McQueen2007). These data are not consistent with perceptual loop accounts, which predict earlier competitor effects in production than in comprehension, because the phonological representation inspected by the inner loop precedes external speech by a considerable amount of time (namely, the time articulation takes). Similarly, the representations created by forward models also precede overt speech in time, so a monitoring with forward models account also predicts an early competitor effect. After all, forward models are predictions of what one will say, and fixation patterns in the visual world are strongly affected by prediction. One might object that the predicted phonological percept is too impoverished to create a phonological competitor effect. But this seems to contrast with Heinks-Maldonado et al.'s (Reference Heinks-Maldonado, Nagarajan and Houde2006) magnetoencephalography data showing reafferance cancellation with frequency-shifted feedback, which implies a highly detailed phonological percept that even includes pitch.
There are also conceptual issues with monitoring via forward models. One is the reduplication of processing systems (Levelt Reference Levelt1989). Specifically, the production implementer creates semantic, syntactic, and phonological representations whereas a forward model creates corresponding representations. If the forward model creates highly accurate representations at each level, we have two separate systems doing almost the same thing, which is not parsimonious. But if the forward model creates highly impoverished representations (e.g., only one phoneme), such representations are not a good standard for judging correctness. It then becomes difficult to see how speakers detect so many errors at so many linguistic levels.
Additionally, if we assume the output of forward models, although impoverished, is still good enough to be useful for the monitor, then the monitor will have to “trust” the forward model, just like P&G's metaphorical sailor having to trust his charted route. But there is no a priori reason for assuming that the forward model is less error prone than the production implementer; in fact, if forward models are “quick and dirty” they will be more error prone. Trust in the forward model will then be misplaced. But such misplaced trust has the undesirable effect of creating “false alarms,” so that a correct item is replaced by an error. “Corrections” that make speech worse do not seem to occur frequently, although some repetitions may in fact be misplaced corrections (e.g., Hartsuiker & Notebaert Reference Hartsuiker and Notebaert2010).
P&G briefly mention an alternative to both the perceptual loop account and the forward model account, namely, a conflict monitoring account (Botvinick et al. Reference Botvinick, Braver, Barch, Carter and Cohen2001; Mattson & Baars Reference Mattson, Baars and Baars1992; Nozari et al. Reference Nozari, Dell and Schwartz2011). According to such accounts, monitoring does not use comprehension, but measures the amount of “conflict” in each layer of production representations, assuming that conflict is a sign of error. Conflict monitoring has the advantages of allowing error detection without perception and that a production deficit at a given level is straightforwardly related to an error detection deficit at that level. Such an account is consistent with Huettig and Hartsuiker's (2010) eye-tracking data and avoids the reduplication of processing components. Finally, in a conflict monitoring account, there is also no issue of which representation to “trust.” It is worth therefore considering conflict monitoring as a viable alternative to the perceptual loop and forward modeling accounts.
At the core of Pickering & Garrod's (P&G's) theory of language production and comprehension is the monitor, a cognitive mechanism that allows speakers to detect problems in speech. The proposal is that the monitor compares perceptual representations from two channels, namely (a) a perceptual representation of the semantics, syntax, and phonology that the “production implementer” is producing; and (b) forward perceptual representations of forward production representations of each linguistic level. If there is a mismatch, the monitor has detected the problem and so can begin a correction. Monitoring via forward models is part of an elegant account that nicely integrates the action and language literatures. But is this account compatible with findings on speech monitoring?
The model shares with perceptual loop theories (Hartsuiker & Kolk Reference Hartsuiker and Kolk2001; Levelt Reference Levelt1989) the assumption that monitoring needs speech comprehension (i.e., to create a perceptual representation of produced speech). It therefore shares the problems of other models with a perceptual component. One problem is that neuropsychological studies found dissociations between comprehension and monitoring in brain-damaged patients. A striking example is a 62-year-old woman with auditory agnosia and aphasia reported by Marshall et al. (Reference Marshall, Rappaport and Garcia-Bunuel1985). Although this patient's auditory system was intact, she was unable to comprehend familiar sounds, words, or sentences and could not report number of syllables or stress contrasts. Speech production was seriously impaired, with speech often containing neologistic jargon. But despite the patient's severe comprehension problems, she produced a great many (often unsuccessful) attempts at self-corrections of errors – in particular, her phonological errors (but not her semantic errors). These findings suggest that error detection can take place without perception.
Another patient that challenges perception-based monitoring is G., a 71-year-old Dutch Broca's aphasic (Oomen et al. Reference Oomen, Postma, Kolk, Hartsuiker, Bastiaanse, Postma and Wijnen2005). In a speech production task, G. produced many phonological errors, of which he repaired very few, whereas he produced very few semantic errors, which he usually repaired. G.'s production difficulties hence mirror his monitoring difficulties. Importantly, G.'s difficulty to repair phonological errors cannot be attributed to a perception deficit: In a perception task, he detected as many phonological errors as a group of controls. It therefore seems that G.'s monitoring deficit is related to this production deficit and not to any perception deficit. These data argue against a forward model account, because the forward model is a separate and qualitatively different system from the production implementer. There is no reason why the monitoring deficit should mirror the production deficit.
Our recent visual world eye-tracking data (Huettig & Hartsuiker Reference Huettig and Hartsuiker2010) also speak against both perceptual loop and forward modeling accounts. When our subjects named a picture of a heart, they gazed at the phonologically related written word harp more often than at unrelated words, and this “competitor effect” had the same time course as the analogous effect when listening to someone else (Huettig & McQueen Reference Huettig and McQueen2007). These data are not consistent with perceptual loop accounts, which predict earlier competitor effects in production than in comprehension, because the phonological representation inspected by the inner loop precedes external speech by a considerable amount of time (namely, the time articulation takes). Similarly, the representations created by forward models also precede overt speech in time, so a monitoring with forward models account also predicts an early competitor effect. After all, forward models are predictions of what one will say, and fixation patterns in the visual world are strongly affected by prediction. One might object that the predicted phonological percept is too impoverished to create a phonological competitor effect. But this seems to contrast with Heinks-Maldonado et al.'s (Reference Heinks-Maldonado, Nagarajan and Houde2006) magnetoencephalography data showing reafferance cancellation with frequency-shifted feedback, which implies a highly detailed phonological percept that even includes pitch.
There are also conceptual issues with monitoring via forward models. One is the reduplication of processing systems (Levelt Reference Levelt1989). Specifically, the production implementer creates semantic, syntactic, and phonological representations whereas a forward model creates corresponding representations. If the forward model creates highly accurate representations at each level, we have two separate systems doing almost the same thing, which is not parsimonious. But if the forward model creates highly impoverished representations (e.g., only one phoneme), such representations are not a good standard for judging correctness. It then becomes difficult to see how speakers detect so many errors at so many linguistic levels.
Additionally, if we assume the output of forward models, although impoverished, is still good enough to be useful for the monitor, then the monitor will have to “trust” the forward model, just like P&G's metaphorical sailor having to trust his charted route. But there is no a priori reason for assuming that the forward model is less error prone than the production implementer; in fact, if forward models are “quick and dirty” they will be more error prone. Trust in the forward model will then be misplaced. But such misplaced trust has the undesirable effect of creating “false alarms,” so that a correct item is replaced by an error. “Corrections” that make speech worse do not seem to occur frequently, although some repetitions may in fact be misplaced corrections (e.g., Hartsuiker & Notebaert Reference Hartsuiker and Notebaert2010).
P&G briefly mention an alternative to both the perceptual loop account and the forward model account, namely, a conflict monitoring account (Botvinick et al. Reference Botvinick, Braver, Barch, Carter and Cohen2001; Mattson & Baars Reference Mattson, Baars and Baars1992; Nozari et al. Reference Nozari, Dell and Schwartz2011). According to such accounts, monitoring does not use comprehension, but measures the amount of “conflict” in each layer of production representations, assuming that conflict is a sign of error. Conflict monitoring has the advantages of allowing error detection without perception and that a production deficit at a given level is straightforwardly related to an error detection deficit at that level. Such an account is consistent with Huettig and Hartsuiker's (2010) eye-tracking data and avoids the reduplication of processing components. Finally, in a conflict monitoring account, there is also no issue of which representation to “trust.” It is worth therefore considering conflict monitoring as a viable alternative to the perceptual loop and forward modeling accounts.