Pickering & Garrod (P&G) argue that coarse predictions from forward models can help detect errors of overt speech production before they occur. This error-detecting function is often assigned to inner speech (e.g., Levelt Reference Levelt1983; Levelt et al. Reference Levelt, Roelofs and Meyer1999; Nooteboom Reference Nooteboom, Sciarone, van Essen and van Raad1969): the little voice in one's head, better known for its role in conscious thought. It is therefore tempting to identify inner speech as a product of these forward models, with
${\hat p}\rightarrow{\hat c}$
providing what we know as the internal loop. In fact, conceiving of inner speech as a forward model could elegantly address three key questions. First, why do we have inner speech at all? Inner speech is a by-product of speakers' need to control their overt verbal behavior. Second, why does inner speech develop so long after overt speech (e.g., Vygotsky Reference Vygotsky, Hanfmann and Vakar1962)? Inner speech develops as the speaker learns to simulate their verbal behavior, which may lag behind the ability to produce that behavior. And third, how are people able to produce inner speech without actually speaking aloud? If inner speech is simply the offline use of forward models (
${\hat p}\rightarrow{\hat c}$
), then speakers never need to engage the production and comprehension implementers (
${p}\rightarrow{c}$
) that are the traditional generators and perceivers of inner speech.
P&G's framework would specifically address two more recently demonstrated qualities of inner speech. First, inner speech involves attenuated access to subphonemic representations. When people say tongue-twisters in their heads, their reported errors are less influenced by subphonemic similarities than their reported errors when saying them aloud (Oppenheim & Dell Reference Oppenheim and Dell2008; Reference Oppenheim and Dell2010; also Corley et al. Reference Corley, Brocklehurst and Moat2011, as noted by Oppenheim Reference Oppenheim2012). For instance, /g/ shares more features with /k/ than with /v/, so someone trying to say GOAT aloud would more likely slip to COAT than VOTE, but this tendency is less pronounced for inner slips. As P&G note, this finding is predicted if the forward models underlying inner speech produce phonologically impoverished predictions (and thus might not reflect the production implementer). Second, inner speech is flexible enough to incorporate additional detail. Although inner slips show less pronounced similarity effects than overt speech, adding silent articulation is sufficient to boost their similarity effect, apparently coercing inner speech to include more subphonemic detail (Oppenheim & Dell Reference Oppenheim and Dell2010). Such flexibility could be problematic for models that assign inner speech to a specific level of the production process (e.g., Levelt et al. Reference Levelt, Roelofs and Meyer1999), but P&G's account specifically suggests that forward models simulate multiple levels of representation, so it might accommodate the subphonemic flexibility of inner speech by adding motoric predictions (
$\hat{p}$
[sem,syn,phon,art]; forward models' more traditional jurisdiction) that are tied to motor planning.
But forward model simulations cannot provide a complete account of inner speech. One would still need to use what P&G would call “the production implementer” (target article, sect. 3, para. 2). First, inner rehearsal facilitates overt speech production (MacKay Reference MacKay1981; Rauschecker et al. Reference Rauschecker, Pringle and Watkins2008; but cf. Dell & Repka Reference Dell, Repka and Baars1992), suggesting that some aspects of the production implementer are also employed in inner speech. Second, there is abundant evidence that people easily detect their inner speech errors (Corley et al. Reference Corley, Brocklehurst and Moat2011; Dell Reference Dell and Paradis1978; Dell & Repka Reference Dell, Repka and Baars1992; Hockett Reference Hockett1967; Meringer & Meyer Reference Meringer and Meyer1895, cited in MacKay Reference MacKay and Reisberg1992; Oppenheim & Dell Reference Oppenheim and Dell2008; Reference Oppenheim and Dell2010; Postma & Noordanus Reference Postma and Noordanus1996). But since monitoring is described as the resolution of predicted and actual percepts (from forward models and implementers, respectively), it is unclear how one could detect and identify inner slips without having engaged the production implementer. (Conflict monitoring, e.g., Nozari et al. Reference Nozari, Dell and Schwartz2011, within forward models might at least allow error detection, but its use there seems to lack independent motivation, and still leaves the problem of how a speaker could identify the content of an inner slip.) Third, analogues of overt speech effects are often reported for experiments substituting inner-speech-based tasks. For instance, inner slips tend to create words, just like their overt counterparts (Corley et al. Reference Corley, Brocklehurst and Moat2011; Oppenheim & Dell Reference Oppenheim and Dell2008; Reference Oppenheim and Dell2010), and their distributions resemble overt slips in other ways (Dell Reference Dell and Paradis1978; Postma & Noordanus Reference Postma and Noordanus1996). And though inner and overt speech can diverge, they tend to elicit similar behavioral and neurophysiological effects in other domains (e.g., Kan & Thompson-Schill Reference Kan and Thompson-Schill2004), and their impairments are highly correlated (e.g., Geva et al. Reference Geva, Bennett, Warburton and Patterson2011). Though more ink is spilled cautioning differences between inner and overt speech, similarities between the two are the rule rather than the exception (at least for pre-articulatory aspects).
Given the impoverished character of P&G's forward models, it seems difficult to account for such parallels without assuming a role for production implementers in the creation of inner speech. Therefore, we could posit that inner speech works much like overt speech production, recalling P&G's acknowledgment that offline simulations could engage the implementers, actively truncating the process before articulation; forward models would supply a necessary monitoring component. This more-explicit account of inner speech allows us to question P&G's suggestion that the subphonemic attenuation of inner speech might reflect impoverishment of the forward model instead of the generation of an abstract phonological code by the production implementer. Having clarified the role of forward models as error detection, their suggestion now boils down to the idea that inner slips might be hard to “hear.” Empirical work suggests that is not the case. Experiments using noise-masked overt speech (Corley et al. Reference Corley, Brocklehurst and Moat2011) and silently mouthed speech (Oppenheim & Dell Reference Oppenheim and Dell2010) showed that each acts much like normal overt speech in terms of similarity effects (see also Oppenheim Reference Oppenheim2012). And, by explicitly modeling biased error detections, Oppenheim and Dell (Reference Oppenheim and Dell2010) formally ruled out the suggestion that their evidence for abstraction merely reflected such biases. Thus, better specifying the role of forward models in inner speech allows the conclusion that the subphonemic attenuation of inner speech does have its basis in the production implementer. More generally, conceiving of forward models as components of inner speech can wed strengths of the forward model account with the fidelity of implementer-based simulations.
Pickering & Garrod (P&G) argue that coarse predictions from forward models can help detect errors of overt speech production before they occur. This error-detecting function is often assigned to inner speech (e.g., Levelt Reference Levelt1983; Levelt et al. Reference Levelt, Roelofs and Meyer1999; Nooteboom Reference Nooteboom, Sciarone, van Essen and van Raad1969): the little voice in one's head, better known for its role in conscious thought. It is therefore tempting to identify inner speech as a product of these forward models, with
${\hat p}\rightarrow{\hat c}$
providing what we know as the internal loop. In fact, conceiving of inner speech as a forward model could elegantly address three key questions. First, why do we have inner speech at all? Inner speech is a by-product of speakers' need to control their overt verbal behavior. Second, why does inner speech develop so long after overt speech (e.g., Vygotsky Reference Vygotsky, Hanfmann and Vakar1962)? Inner speech develops as the speaker learns to simulate their verbal behavior, which may lag behind the ability to produce that behavior. And third, how are people able to produce inner speech without actually speaking aloud? If inner speech is simply the offline use of forward models (
${\hat p}\rightarrow{\hat c}$
), then speakers never need to engage the production and comprehension implementers (
${p}\rightarrow{c}$
) that are the traditional generators and perceivers of inner speech.
P&G's framework would specifically address two more recently demonstrated qualities of inner speech. First, inner speech involves attenuated access to subphonemic representations. When people say tongue-twisters in their heads, their reported errors are less influenced by subphonemic similarities than their reported errors when saying them aloud (Oppenheim & Dell Reference Oppenheim and Dell2008; Reference Oppenheim and Dell2010; also Corley et al. Reference Corley, Brocklehurst and Moat2011, as noted by Oppenheim Reference Oppenheim2012). For instance, /g/ shares more features with /k/ than with /v/, so someone trying to say GOAT aloud would more likely slip to COAT than VOTE, but this tendency is less pronounced for inner slips. As P&G note, this finding is predicted if the forward models underlying inner speech produce phonologically impoverished predictions (and thus might not reflect the production implementer). Second, inner speech is flexible enough to incorporate additional detail. Although inner slips show less pronounced similarity effects than overt speech, adding silent articulation is sufficient to boost their similarity effect, apparently coercing inner speech to include more subphonemic detail (Oppenheim & Dell Reference Oppenheim and Dell2010). Such flexibility could be problematic for models that assign inner speech to a specific level of the production process (e.g., Levelt et al. Reference Levelt, Roelofs and Meyer1999), but P&G's account specifically suggests that forward models simulate multiple levels of representation, so it might accommodate the subphonemic flexibility of inner speech by adding motoric predictions (
$\hat{p}$
[sem,syn,phon,art]; forward models' more traditional jurisdiction) that are tied to motor planning.
But forward model simulations cannot provide a complete account of inner speech. One would still need to use what P&G would call “the production implementer” (target article, sect. 3, para. 2). First, inner rehearsal facilitates overt speech production (MacKay Reference MacKay1981; Rauschecker et al. Reference Rauschecker, Pringle and Watkins2008; but cf. Dell & Repka Reference Dell, Repka and Baars1992), suggesting that some aspects of the production implementer are also employed in inner speech. Second, there is abundant evidence that people easily detect their inner speech errors (Corley et al. Reference Corley, Brocklehurst and Moat2011; Dell Reference Dell and Paradis1978; Dell & Repka Reference Dell, Repka and Baars1992; Hockett Reference Hockett1967; Meringer & Meyer Reference Meringer and Meyer1895, cited in MacKay Reference MacKay and Reisberg1992; Oppenheim & Dell Reference Oppenheim and Dell2008; Reference Oppenheim and Dell2010; Postma & Noordanus Reference Postma and Noordanus1996). But since monitoring is described as the resolution of predicted and actual percepts (from forward models and implementers, respectively), it is unclear how one could detect and identify inner slips without having engaged the production implementer. (Conflict monitoring, e.g., Nozari et al. Reference Nozari, Dell and Schwartz2011, within forward models might at least allow error detection, but its use there seems to lack independent motivation, and still leaves the problem of how a speaker could identify the content of an inner slip.) Third, analogues of overt speech effects are often reported for experiments substituting inner-speech-based tasks. For instance, inner slips tend to create words, just like their overt counterparts (Corley et al. Reference Corley, Brocklehurst and Moat2011; Oppenheim & Dell Reference Oppenheim and Dell2008; Reference Oppenheim and Dell2010), and their distributions resemble overt slips in other ways (Dell Reference Dell and Paradis1978; Postma & Noordanus Reference Postma and Noordanus1996). And though inner and overt speech can diverge, they tend to elicit similar behavioral and neurophysiological effects in other domains (e.g., Kan & Thompson-Schill Reference Kan and Thompson-Schill2004), and their impairments are highly correlated (e.g., Geva et al. Reference Geva, Bennett, Warburton and Patterson2011). Though more ink is spilled cautioning differences between inner and overt speech, similarities between the two are the rule rather than the exception (at least for pre-articulatory aspects).
Given the impoverished character of P&G's forward models, it seems difficult to account for such parallels without assuming a role for production implementers in the creation of inner speech. Therefore, we could posit that inner speech works much like overt speech production, recalling P&G's acknowledgment that offline simulations could engage the implementers, actively truncating the process before articulation; forward models would supply a necessary monitoring component. This more-explicit account of inner speech allows us to question P&G's suggestion that the subphonemic attenuation of inner speech might reflect impoverishment of the forward model instead of the generation of an abstract phonological code by the production implementer. Having clarified the role of forward models as error detection, their suggestion now boils down to the idea that inner slips might be hard to “hear.” Empirical work suggests that is not the case. Experiments using noise-masked overt speech (Corley et al. Reference Corley, Brocklehurst and Moat2011) and silently mouthed speech (Oppenheim & Dell Reference Oppenheim and Dell2010) showed that each acts much like normal overt speech in terms of similarity effects (see also Oppenheim Reference Oppenheim2012). And, by explicitly modeling biased error detections, Oppenheim and Dell (Reference Oppenheim and Dell2010) formally ruled out the suggestion that their evidence for abstraction merely reflected such biases. Thus, better specifying the role of forward models in inner speech allows the conclusion that the subphonemic attenuation of inner speech does have its basis in the production implementer. More generally, conceiving of forward models as components of inner speech can wed strengths of the forward model account with the fidelity of implementer-based simulations.