1. Introduction
A growing body of contemporary research in child development is motivated by the insight that we must pay attention to the concrete motor mechanisms of the developing infant or risk incorrect interpretation of infant behaviour. Esther Thelen's work on newborn stepping is perhaps the best-known example. Thelen and her colleagues examined a host of component systems that appeared relevant to infant stepping. This led to the striking discovery that the disappearance of stepping movements in the second or third month is not a result of the cortical inhibition of a “stepping mechanism” but to the disproportionate growth of leg muscles and fat tissue. When infants' legs are submerged in water to alleviate the effects of gravity, non-stepping infants resumed stepping behaviour (Thelen et al. Reference Thelen, Fisher and Ridley-Johnson1984). Infants also showed alternating stepping patterns on a treadmill long before they began walking independently (Thelen & Ulrich Reference Thelen and Ulrich1991). Similar in-depth treatments of specific action systems such as looking, crawling, reaching, object manipulation, postural adjustment, and locomotion reveal the crucial role of the motor systems in the development of perception and cognition (e.g., Adolph Reference Adolph1997; Bushnell & Boudreau Reference Bushnell and Boudreau1993; Campos et al. Reference Campos, Anderson, Barbu-Roth, Hubbard, Hertenstein and Witherington2000; Freedland & Bertenthal Reference Freedland and Bertenthal1994; Gibson & Schumuckler Reference Gibson and Schumuckler1989; Thelen et al. Reference Thelen, Schöner, Scheier and Smith2001; von Hofsten Reference von Hofsten and Wallace1989).
Here we contribute to this general line of research by looking at neonatal imitation through the lens of perinatal sensorimotor development. Despite nearly four decades of research on neonatal imitation and the incredible controversy it has generated, psychologists (as opposed to pediatric neurologists) have spent very little time investigating neonatal rhythmic motor behaviour, that is, the very “gestures” tested for imitation in neonate imitation experiments. To remedy this void, we present a theory of aerodigestive development and argue that the standard orofacial “gestures” used in imitation experiments are in fact aerodigestive stereotypies, a set of rhythmic motor sequences that emerge as the first structured behaviours in human/mammalian gestation. We explain the crucial role that stereotypies play in the perinatal aerodigestive development and why the positive results of neonatal imitation experiments should be re-examined in light of these developmental processes.
Note that this article is not intended as a review, meta-analysis, or formal critique of the experimental methods used in neonatal imitation research. Nor do we attempt to resolve the many tangled issues that have arisen over 40 years of debate. (There are a number of articles of this kind, e.g., Anisfeld Reference Anisfeld1991; Reference Anisfeld1996; Reference Anisfeld, Hurley and Chater2005; Oostenbroek et al. Reference Oostenbroek, Slaughter, Nielsen and Suddendorf2013; Ray & Heyes Reference Ray and Heyes2011). Instead, we present a case study of a paradigmatic “gesture,” tongue protrusion and retraction (hereafter TP/R), and argue that our results are generalizable and applicable, mutatis mutandis, to other tested gestures. There are several reasons for our choice. First, insofar as there is any agreement between the skeptics and proponents, everyone agrees that TP/R has garnered the most robust data: If neonates imitate any gesture, then TP/R is that gesture. Second, in the past decade there has been a surge of interest in neurophysiological studies of perinatal aerodigestive behaviours in mammals (e.g., in rats and pigs). Imaging studies on human infants have served to bridge the gap between these mammalian experiments and the human case. It is therefore possible to tell a developmental story – albeit sometimes a sketchy story – about the role of TP/R in motor development. Third, as we will argue, TP/R is merely one of many infant stereotypies present at birth. In our view, therefore, the story of TP/R development is representative of the other rhythmic movements commonly tested in neonatal imitation experiments, orofacial or otherwise. In some deep sense, then, this article is not about TP/R per se. It's about the role of rhythmic behaviours in neural development, about why we need to look “under the hood” in addition to doing careful behavioural work.
2. The neonatal imitation controversy
Over a century ago, Edward Thorndike (Reference Thorndike, Cattell and Baldwin1898) pointed out that imitation, which he famously defined as “learning to do an act from seeing it done,” is not a psychologically trivial feat. To imitate another person's behaviour, you must visually parse the actions to be imitated, translate them (as parsed) into the first-person point of view, and possess the motor expertise to realize those goals. Opaque imitation – when the imitator cannot observe and compare his or her own movements to the target – is especially challenging. It is notoriously difficult to gain a fine-grained, real-time understanding of one's own bodily movements with proprioception as the only source of feedback. This is why dance studios have mirrors and swim coaches use aquatic cameras. It was thus believed that infants could not imitate opaque gestures until the age of 8–12 months. Of course, infants could engage in contagious crying or the mimicry of emotional expressions prior to the age of 8–12 months, but considerable prior multimodal experience is required for opaque imitation (Piaget Reference Piaget, Attegno and Hodgson1962).
Meltzoff and Moore's (Reference Meltzoff and Moore1977) paper thus reported a remarkable finding: Neonates can copy the orofacial gestures of tongue protrusion, mouth opening, and lip pursing – three types of opaque imitation – as well as match sequential finger movements. When infants were shown these gestures, they responded in kind, producing the modeled gesture more often than an unrelated one. For example, an infant who viewed a demonstration of tongue protrusion responded more frequently with tongue protrusion than with mouth opening. The authors argued that these results could not be explained in terms of reflexes, releasing mechanisms, or simple resonance mechanisms. Instead, given the number of gestures imitated (i.e., that passed this operational definition of imitation) plus the variation in the execution of each imitated gesture, Meltzoff and Moore argued that infants must have a common supramodal system of action representation, one that converts the neonate's visual representations of observed action into proprioceptive space, thence from proprioceptive space into motor commands. This hypothesis became known as the theory of active intermodal matching (AIM) that Meltzoff and Moore (Reference Meltzoff and Moore1983; Reference Meltzoff, Moore, Mehler and Fox1985; Reference Meltzoff and Moore1989; Reference Meltzoff and Moore1992; 1994) then refined with further experiments. According to the robust theory, neonatal imitation was (a) generative (displaying both variety and novelty); (b) self-correcting (aiming at an accurate performance); (c) specific to occurrent movement such as the duration of the gesture (not simply the activated “organ”); and (d) temporally flexible (executed by memory after a delay and in the absence of any stimulus).
The current definition of imitation in experimental psychology no longer confines imitation to actions that we see. A comic can mimic a politician's speech in both voice and gesture; adults can learn American Sign Language with only haptic guidance. Nor do most psychologists believe that imitation must involve conscious intent or the perception of the target behaviour as an intentional action by the actor. A young child imitates his father when he unconsciously mirrors his gait; a toddler parrots her mother's telephone manner without knowing what her mother said (Brass & Heyes Reference Brass and Heyes2005; Hata et al. Reference Hata, Dai and Marumo2009). Thus, the modern definition of imitation highlights what cognitive neuroscientists have called “the correspondence problem” – the problem of determining, on the basis of observation, what sequence of motor commands will reproduce the observed behaviours. This broadening of the definition makes the existence of neonatal imitation more plausible: Neonates need not know that they are imitating, nor understand what they imitate, nor intend to imitate the actions of others.
Despite this revision, neonatal imitation remains controversial. (For an unbiased recent review of the debate, see Oostenbroek et al. Reference Oostenbroek, Slaughter, Nielsen and Suddendorf2013). Detractors have questioned – and continue to question – the reproducibility of the early results and the standard experimental methodology inclusive of data collection and analysis (Abravanel & Sigafoos Reference Abravanel and Sigafoos1984; Anisfeld Reference Anisfeld1991; Reference Anisfeld1996; Reference Anisfeld, Hurley and Chater2005; Anisfeld et al. Reference Anisfeld, Turkewitz, Rose, Rosenberg, Sheiber, Couturier-Fagan and Sommer2001). They point to the short timeline of neonatal imitation and the odd phenomenon of imitation “drop out.” At birth, human neonates produce multiple orofacial gestures both spontaneously and when adults model those behaviours. By 6 weeks after birth, however, these behaviours have markedly diminished; by 3 months they are almost entirely absent (Abravanel & Sigafoos Reference Abravanel and Sigafoos1984; Fontaine Reference Fontaine1984; Heimann et al. Reference Heimann, Nelson and Schaller1989; Jacobson Reference Jacobson1979; Kugiumutzakis Reference Kugiumutzakis, Nadel and Butterworth1999). These facts are mirrored in the nonhuman primate world. Chimpanzees no longer imitate 8 weeks postpartum (Myowa-Yamakoshi et al. Reference Myowa-Yamakoshi, Tomonaga, Tanaka and Matsuzawa2004), and macaques appear to imitate human facial expressions on only one day, post-partum Day 3 (Ferrari et al. Reference Ferrari, Visalberghi, Paukner, Fogassi, Ruggiero and Suomi2006b). Whatever role (if any) these short-lived orofacial gestures play, they are unlikely to be the developmental precursors of later imitation skills in infants. Detractors also point to a recent meta-analysis of the neonate imitation research papers (Ray & Heyes Reference Ray and Heyes2011) that claims that only one type of gesture, TP, has garnered more positive than negative results overall. Of course, detractors must provide an alternative explanation of such results that resist “explaining away.” To date, these alternative explanations fall into roughly two classes (with apologies to outliers): What we see is explained by neonatal reflexes triggered by releasing mechanisms (Jacobson Reference Jacobson1979) and/or by systemic factors in neonatal development, such as arousal (Anisfeld Reference Anisfeld1991; Reference Anisfeld1996; Reference Anisfeld, Hurley and Chater2005; Jones Reference Jones1996; Reference Jones2006a; Reference Jones2006b).
On the other side of the debate, proponents of neonatal imitation are satisfied that Meltzoff and Moore's original results have been largely replicated (Heimann et al. Reference Heimann, Nelson and Schaller1989; Kugiumutzakis Reference Kugiumutzakis, Nadel and Butterworth1999; Legerstee Reference Legerstee1991; Vinter Reference Vinter1986) and even extended to some new gestures (e.g., hand opening and closing [Vinter Reference Vinter1986]; blinking [Kugiumutzakis Reference Kugiumutzakis, Nadel and Butterworth1999]; lateral head motion [Meltzoff & Moore Reference Meltzoff and Moore1989]; and emotional expressions [Field Reference Field, Woodson, Greenberg and Cohen1982; Reference Field, Woodson, Cohen, Greenberg, Garcia and Collins1983]). Like AIM detractors, proponents must explain the experimental results: why and how neonates imitate adults (in the ways they do) at such an early stage of development/experience. Here, social explanations are common. Proponents argue that neonatal imitation is an evolved mechanism that promotes maternal/caregiver attachment to the newborn, a trait essential to infant survival given the physiological immaturity of our species at birth. This is why proponents view neonatal imitation (NI) experiments on nonhuman primates as corroboration for the theory: If NI promotes infant survival we should see the same behaviours in other nonhuman primates with similar social structure, state of maturation at birth, and communicative gestures. Proponents must also address the phenomenon of imitation drop out – that is, deny its existence or explain its purpose/origins. Here, most proponents follow Meltzoff and Moore's (Reference Meltzoff and Moore1992) explanation: Drop-out is a sign of the infant's changing social and cognitive inclinations. By three months of age the infant has moved on to other forms of social interaction such as gaze-sharing and vocalization and, thus, no longer finds the imitation of basic facial gestures socially useful. In other words, drop-out results from a change in performance not competence, as the later emergence of sophisticated imitation makes clear. Finally, proponents have been buoyed by a competing meta-analysis of the data, Simpson et al.'s (Reference Simpson, Murray, Paukner and Ferrari2014a), which showed that 85% of all tests for neonatal imitation have yielded positive results if one includes both human and “primate-other” data and excludes infants older than 28 days of age and experiments with small sample sizes.
Despite the continuing controversy, Meltzoff and Moore's early papers are among the most widely disseminated results in 20th century psychology. Researchers in psychology, philosophy, linguistics, neurophysiology, and comparative ethology have integrated Meltzoff and Moore's findings into their theories, often as a theoretical cornerstone. Such theories span a wide range of subjects from the mental capacities of Old and New World primates to the individual development of empathy, language, the sense of self, and our theory of mind (Bard Reference Bard2007; Bermudez Reference Bermudez2000; Champoux et al. Reference Champoux, Lepage, Desy, Lortie, Theoret and Pineda2009; Gallagher Reference Gallagher2005; Gallagher & Meltzoff Reference Gallagher and Meltzoff1996; Gallese Reference Gallese, Hurley and Chater2005; Go et al. Reference Go, Konishi and Baune2008; Goldman Reference Goldman2006; Gopnik et al. Reference Gopnik, Meltzoff and Kuhl1999; Gopnik & Wellman Reference Gopnik and Wellman1992; Kuhl Reference Kuhl2000; Metzinger Reference Metzinger2004; Myowa Reference Myowa1996; Myowa-Yamakoshi et al. Reference Myowa-Yamakoshi, Tomonaga, Tanaka and Matsuzawa2004; Preston & de Waal Reference Preston and de Waal2002; Trevarthen & Aitken Reference Trevarthen and Aitken2001).
More recently, neonatal imitation has garnered renewed interest in the wake of the discovery of mirror neurons in the premotor cortex of macaques (Rizzolatti et al. Reference Rizzolatti, Fadiga, Gallese and Fogassi1996). These theories suggest that mirror neurons are the building blocks of a host of core human traits including language (D'Ausilio et al. Reference D'Ausilio, Pulvermüller, Salmas, Bufalari, Begliomini and Fadiga2009), empathy (Gallese Reference Gallese2003; Leslie et al. Reference Leslie, Johnson-Frey and Grafton2004), theory of mind (Meltzoff & Decety Reference Meltzoff and Decety2003), and imitation (Iacoboni Reference Iacoboni2009a). Interestingly, the neonatal imitation experiments provide the only evidence that mirror neurons are present at birth, and thus are part of an innate system of action perception (Gallese Reference Gallese2003; Iacoboni et al. Reference Iacoboni, Woods, Brass, Bekkering, Mazziotta and Rizzolatti1999, Reference Iacoboni, Molnar-Szakacs, Gallese, Buccino, Mazziotta and Rizzolatti2005; Lepage & Théoret Reference Lepage and Théoret2007; Meltzoff & Decety Reference Meltzoff and Decety2003; Nagy & Molnar Reference Nagy and Molnar2004). The assumption that neonatal imitation exists is well entrenched in contemporary cognitive science despite a lack of resolution to the controversy.
In what follows, we offer an explanation of neonatal imitation in terms of the development of mammalian/human aerodigestion. Section 3 presents an overview of human aerodigestive function and the problems inherent in a dual system for respiration and suckling/swallowing, facts necessary to understand why mammalian aerodigestion develops as it does. In section 4, we arrive at aerodigestive development itself. Here we focus on the role of TP/R in both prenatal and postnatal development. Although aerodigestion is the first complex sensorimotor system to develop, only a rudimentary system exists at birth. With access to air and the onset of suckling, the infant's system gains expertise through practice. During this learning period, a series of failsafe mechanisms protect the novice system from accident. In these first postnatal months, however, the anatomy of the system gradually transforms from a system well suited to suckling and respiration to one that can masticate, manipulate, and swallow solid food while continuing to breathe. We argue that if one lines up the milestones of perinatal aerodigestion presented in section 3 with the appearance and extinction of TP/R, TP/R shows lock-step timing with this first phase of development. This is unlikely to be a coincidence. In section 5 we then argue that TP/R is an aerodigestive stereotypy, one of many such behaviours present in the perinatal infant. Section 6 begins with an introduction to some recent work on rhythmic behaviours and neural development. Using this background, we present a series of neurodevelopmental events to which TP/R is likely to contribute. Listed in developmental order, those are (1) the acquisition of tongue control; (2) the integration of the central pattern generator (CPG) for TP/R with other aerodigestive CPGs; and (3) the formation of connections within the cortical maps of S1 and M1. Finally, in section 7, we return to Meltzoff and Moore's original experiments. We show why, on the balance of evidence, that the positive experimental results for any of the stereotypies tested in human and nonhuman primates – indeed for any mammal – are unlikely to be best explained by imitation. We conclude with brief remarks about how a more integrative and interdisciplinary perspective could benefit developmental psychology.
3. Human aerodigestive function
3.1. Aerodigestion: A dual system
As the name suggests, the mammalian aero-digestive tract serves two central functions: respiration and digestion. In all mammals except adult chimpanzees and humans (Nishimura et al. Reference Nishimura, Oishi, Suzuki, Matsuda and Takahashi2008), the basic structure consists of two tubes that cross, forming an X. At this juncture the four-way intersection is open to both systems. In chimpanzees and humans, however, postnatal growth adds a short connecting tube, the laryngopharynx, between the upper and lower branches of both systems, shared by both respiratory and digestive systems (Lieberman et al. Reference Lieberman, McCarthy, Hiiemae and Palmer2001; Nishimura Reference Nishimura2003; Nishimura et al. Reference Nishimura, Mikami, Suzuki and Matsuzawa2003).
The primary problem of the dual system is ensuring that the right stuff ends up in the right place – air in the lungs and fluids/saliva/masticated food in the stomach. Ideally, air is inhaled up through the nostrils, into the nasal cavities, and then passes back down into the pharynx, through the lens-shaped opening of the larynx (the glottis), into the trachea and down into the lungs (Fig. 1). In digestion, liquids or solid food should be drawn into the mouth/oral cavity by the lips, pushed into the oropharynx by the tongue, travel down the laryngopharynx by peristalsis, then into the esophagus, and finally into the stomach (Dodds Reference Dodds1989; Palmer et al. Reference Palmer, Rudin, Lara and Crompton1992; Thexton Reference Thexton1992; Thexton & Crompton Reference Thexton, Crompton and Linden1998). As with any dual system, this shared real estate (the laryngopharynx) necessitates a protocol for usage – “when is it yours and when is it mine”? In aerodigestion, two additional complications arise. First, neither the digestive nor the respiratory tract is a physiologically dedicated pathway for the intake of nutrition and air respectively: Adults can inhale through the mouth, and the digestive tract also serves to drain the nasal cavities. Second, both aerodigestive paths must be capable of two-way flow. In respiration, we breathe in and out. In digestion, the stomach is filled by ingestion and on occasion, emptied by emesis.
![](https://static.cambridge.org/binary/version/id/urn:cambridge.org:id:binary:20171213080006951-0361:S0140525X16000911:S0140525X16000911_fig1g.jpeg?pub-status=live)
Figure 1. A detailed anatomy of the aerodigestive system.
This “open” arrangement of the dual system combined with the passage of fluids and gases through both tracts creates ample opportunity for mishap. Saliva and fluid from the nasal cavities amount to more than two liters of fluid per day. If misdirected into the lungs, this is enough liquid to cause suffocation within 24 hours. So “non-nutritive swallowing” is one of the pharynx's most vital functions. Aspiration of fluids is also a serious problem. Here, the shared laryngopharynx carries the risk of aspiration pneumonia during feeding (Kohda et al. Reference Kohda, Hisazumi and Hiramatsu1994). This risk is so serious that it appears to have acted as a strong constraint on the evolution of the aerodigestive system: Clearing the pharynx of fluids or food takes precedence over all competing functions, including respiration (Broussard & Altschuler Reference Broussard and Altschuler2000). Exhalation and emesis have their own risks, however. Exhalation during swallow can cause fluid to be forced into the sinuses and out the nasal cavities (as anyone who starts to laugh while drinking knows too well). For neonates, who have a prodigious capacity for emesis, repeated “mistakes” of this kind can lead to infection of the sinuses and the inner ear, via the Eustachian tubes.
The general solution to these problems is a set of functionally interconnected “valves”Footnote 1 that open and close the passages of ingress and egress. Two sphincters control ingress to and egress from the lower aerodigestive system: The entire larynx – epiglottis, aryepiglottic folds, ventricular folds, and vocal folds – protect the airway; the upper esophageal sphincter allows food and liquid into the esophagus. Yet another valve, the lower esophageal sphincter, controls flow into and out of the stomach itself. At the top of the aerodigestive system, the nasal cavities are sealed by the soft palate that moves backwards to contact the pharyngeal wall. In adults, the lips and posterior tongue also do double duty as aerodigestive “valves”: Lips prevent liquids from escaping from the mouth, and at the back of the oral cavity, the posterior tongue blocks entry into the oropharynx (Fig. 1). At the same time, the anterior tongue prevents the accidental re-entrance of the bolus into the mouth. In between these points of closure, sets of muscles control the movement of solids, fluids, and gases either via peristaltic motion (a wave-like motion of serial muscle groups) or by the differences in air pressure.
In sum, the tongue plays a pivotal role in human aerodigestion. In the adult, it serves to shift food about for mastication, and to form and hold a liquid or solid bolus within the mouth until swallowing. During swallowing, it blocks re-entry to the mouth and acts as an airlock to the nasal cavities, preventing the exhalation of liquids into those cavities. Even in the infant, tongue behaviour must be coordinated with respiration, jaw movement, epiglottal closure, and the peristaltic movements of pharynx – all sensorimotor events of great complexity.
3.2. The goal: Aerodigestion at birth
At birth, aerodigestive control is the human infant's most complex sensorimotor capacity. Even the “simple” or pharyngeal swallow requires the co-ordination of 26 pairs of muscles, inputs from five cranial nerve systems, as well as the control of chest wall movements during respiration by the cervical and thoracic spinal cord segments (Bosma Reference Bosma1986, Reference Bosma1992; Delaney & Arvedson Reference Delaney and Arvedson2008; Donner et al. Reference Donner, Bosma and Robertson1985). Complex sensory feedback adjusts the swallow according to the size of the bolus, its homogeneity, viscosity, texture, moisture content, and taste (Barlow Reference Barlow2009).Footnote 2 By adulthood, control of the simple swallow will expand to involve 15–20 cortical areas, as well as the cerebellum – a rather astonishing fact given that simple swallow is an involuntary act (Hamdy et al. Reference Hamdy, Aziz, Rothwell, Singh, Barlow, Hughes, Tallis and Thompson1996; Reference Hamdy, Rothwell, Brooks, Bailey, Aziz and Thompson1999; Mistry & Hamdy Reference Mistry and Hamdy2008; Mistry et al. Reference Mistry, Rothwell, Thompson and Hamdy2006).
When we think of human development, we tend to regard birth as its single most important milestone. Yet as Prechtl (Reference Prechtl1974) had emphasized, the very fact that birth is abrupt ensures that birth – a momentous event for all concerned – cannot be, primarily, a developmental milestone for the infant.Footnote 3 Instead, birth is the human infant's least forgiving hard deadline. The price of failure is suffocation, starvation, and/or infection through aspiration. A recent study on breastfeeding in Ghana illustrates this point (Edmond et al. Reference Edmond, Zandoh, Quigley, Amenga-Etego, Owusu-Agyei and Kirkwood2006). Under “natural” conditions (i.e., without modern medical intervention) healthy, full-term newborns who fail to breastfeed within 24 hours after birth were 2.5 times more likely to die as infants. The study estimated that 16% of infant deaths could be prevented if newborns suckled within the first day; fully 22% more newborns would survive if feeding began within the first hour after birth. Given the costs, aerodigestion must be “good to go” well in advance of the blessed event.
The mechanics of suckling turn out to be surprisingly complex. At a first guess, new parents might expect suckling to be like drinking through a straw: Suck inwards and the milk will soon follow. However, neonates do not inhale through their mouths. They are nose-breathers unless under duress. Instead, infants extract milk by a combination of positive mechanical pressure and negative air pressure, both caused by tongue and jaw movements (Bosma et al. Reference Bosma, Hepburn, Josell and Baker1990; Crompton & Owerkowicz Reference Crompton and Owerkowicz2004; Thexton et al. Reference Thexton, Crompton and German2007). Suckling begins with the “acquisition” phase: The infant's tongue protrudes and curls under the breast, then retracts to pull the breast into the mouth. At the same time, the infant's lips close tightly over the aureole, forming a seal; the sides of the tongue curve up and around the breast while pressing the breast and nipple tightly against the palate. The infant is now ready to express the milk. Once more, the tongue is the central player. Imagine attaching a wet suction cup to the bottom of a glass shelf. As the cup is flattened, it adheres to the shelf and forms a tight seal. To break that seal, a sharp tug is required. In suckling, the tongue acts like a travelling suction cup. As the infant's jaw opens, the tongue's seal to the breast is broken. This unleashes a peristaltic wave that travels down the length of the tongue, expressing the milk by positive mechanical pressure. The milk then flows into a “bowl,” created by a concave area at the back of the tongue. When enough milk has accumulated, this pooling initiates a simple or pharyngeal swallow.
In sum, suckling – a capacity of critical importance to infant survival – is a highly complex motor sequence in which the tongue plays the starring role. Suckling requires fine-grained motor control of the tongue (e.g., for changes in the shape and rigidity of the tongue), precise sequencing (e.g., for peristaltic motion of the tongue), and coordination of a diverse group of muscles (e.g., of the lips, tongue, and jaw). Importantly, suckling is a sensorimotor task, not a motor task alone. No infant comes into the world “wired for” a breast of a certain shape, size, and rigidity; a specific brand of baby bottle; or milk of a certain viscosity and rate of flow. As we will see, virtually all of the task parameters are variables in suckling, the values of which change in real time as the infant suckles (German et al. Reference German, Crompton, Owerkowicz and Thexton2004). This makes suckling the first and arguably most complex task controlled by a sensorimotor system in the human body.
In the next section, we outline a theory of human aerodigestive development. At present, we know more about the aerodigestive development of human infants than of any other species. Much of this research comes from medical research on premature infants, mostly through video, imaging, or post-mortem studies. But for obvious reasons, invasive physiological experiments are not performed on human newborns. Therefore, inevitably our theory relies on mammalian research more generally, from which we can extrapolate to the human case based upon shared mammalian traits such as tongue musculature, sub-cortical/cortical motor control, and basic sequence/rate of neurodevelopmental events.
4. The behavioural development of aerodigestion
4.1. Pre-natal aerodigestive development
The physiological complexity of suckling and swallowing – and the necessity of its tight coupling with respiration – explains why aerodigestive development begins well before birth.
Movement in the human fetus begins at about 7 weeks of gestation with strange lateral side bends of the head or the rump that occur at 1-second intervals (Lüchinger et al. Reference Lüchinger, Hadders-Algra, van Kan and de Vries2008). These are notable in that they are the only fetal movements that are truly “stereotyped”: Repetitions of side bends do not vary in frequency, force, timing, or exact patterning. Between 7 and 8.5 weeks, the arms and legs start to make small, slow, single-direction movements that last a few seconds. A period of transition begins at 9 weeks: “General movements” or full-body movements involving the head, neck, trunk, and limbs appear. Gradually, over the next 4 weeks, general movements replace the more primitive side bends. By the 32nd week of gestation, the human fetus's postnatal motor repertoire is complete (Kurjak et al. Reference Kurjak, Stanojevic, Andonotopo, Salihagic-Kadic, Carrera and Azumendi2004; Miller Reference Miller2003; Yigiter & Kavak Reference Yigiter and Kavak2006). In the last 8 weeks of pregnancy, the fetus increases dramatically in size and weight yet the frequency of all movement decreases markedly.
Ultrasound observation of the human fetus suggests that the first feeding behaviour – a rudimentary swallow – begins at approximately 9–10 weeks gestational age (GA) (de Vries et al. Reference de Vries, Visser and Prechtl1982; Miller Reference Miller2003). This is the same week in which the human fetus starts to make isolated arm and leg movements and to hiccup. This first swallow usually occurs prior to basic head movement (turning side-to-side, anteflexion, and retroflexion), breathing movements of the chest, and hand-to-face movements, all of which emerge one week later. Suckling begins gradually as a set of rudimentary behaviors, the “proto-components” of the mature suckling sequence. The first tongue movement, at 15 weeks GA, is a forward, rigid thrust of the tongue to edge of the lips – “tongue thrust” – that corresponds to the movement that presses the breast against the hard palate. The second tongue movement to emerge is “cupping,” the formation of the tongue into a bowl-like shape, similar to the movement which catches and collects the bolus before swallow. Tongue cupping becomes a consistent motion at about 28 weeks GA. Finally, anterior-posterior motion – tongue protrusion and retraction of the kind tested by Meltzoff and Moore – is seen at 18 weeks GA. This back-and-forth movement, out of and back into the mouth, is a precursor to the one that draws the breast into the mouth. In utero, it can be elicited by orofacial contact, by the fetus's thumb in the mouth, her cheek brushing against the umbilical cord, and so on (Miller Reference Miller2003). Like cupping, TP/R is well defined by 28 weeks GA and occurs in combination with tongue-cupping and tongue-thrust (Fig. 2). Importantly, the same range of orofacial behaviors observed by ultrasound at 32 weeks of gestation will be present after birth. Indeed, within the first 15 minutes after birth, 95 % of all full-term newborns make spontaneous TPs, almost all of which occur within the first 3 minutes (Hentschel et al. Reference Hentschel, Ruff, Juette, von Gontard and Gortner2007). An early study by Heimann et al. (Reference Heimann, Nelson and Schaller1989) recorded the baseline rates of TP at days 2 or 3 after birth, at age 3 weeks, and finally at age 3 months. At 2–3 days after birth, 59 TPs were produced (32 weak, 27 unequivocal). At age 3 weeks, this figure dropped to 18 “medium-to-strong” TPs and by age 3 months, only 4 spontaneous TP's were produced, a significant drop in incidence. These results were corroborated by Piek and Carman (Reference Piek and Carman1994). Small, large, straightforward TP/R motions along the median and lateral TP/Rs are all seen in utero and immediately after birth.
![](https://static.cambridge.org/binary/version/id/urn:cambridge.org:id:binary-alt:20171213080049-62589-mediumThumb-S0140525X16000911_fig2g.jpg?pub-status=live)
Figure 2. (a) Orofacial gestures of the experimenter and the neonate (Meltzoff & Moore Reference Meltzoff and Moore1977). (b) Four orofacial gestures of a fetus at approximately 28 weeks gestational age. (Top left) grimacing; (top right) finger sucking; (bottom left) tongue protrusion to the side; (bottom right) tongue thrust. (Kurjak et al. Reference Kurjak, Stanojevic, Andonotopo, Salihagic-Kadic, Carrera and Azumendi2004).
Gradually, the repetitive, simple behaviors of early gestation are integrated into smooth motor sequences. At 15 weeks GA, amniotic fluid is drawn into the mouth by inhalation-like movements of the chest. Sometimes the lips of the human fetus close after the bolus enters, sometimes not. At this stage of development, the bolus is drawn into the oral cavity without prior TP/R; occasionally tongue “fluttering” occurs prior to inhalation. By 28 weeks GA, however, once the individual components of suckling are refined, the bolus is drawn into the mouth by TP/R and then is held by the cupped and elevated rear portion of the tongue. Often the soft palate makes contact with the back of the tongue, securing the bolus in the mouth before the simple swallow. At this point, fetal swallowing differs from the adult version. In the human fetus, the bolus is propelled down the pharynx by a single large muscle contraction as opposed to the smooth peristaltic (wave-like) motion in the adult. Moreover, the opening at the fetal nasopharynx is left open during the swallow and the amniotic fluid flows freely into the nasal cavities. Similarly, the glottal folds that protect the lungs from aspiration in the adult are often open during swallow at 28 weeks GA. In other words, the adult mechanisms that guard the nasal cavities and the lungs do not function in the human fetus. Finally, during the fetal swallow, the epiglottis protrudes into the pharyngeal tube but it does not stand upright or make contact with the soft palate, as it will in the neonate. Swallowing in the fetus differs substantially from that of the adult, as well as from neonatal swallowing.
In short, the development of aerodigestion occurs through constant prenatal “practice.” The lips and jaws open and close as do the aerodigestive valves; the tongue protrudes and retracts; the chest expands and contracts, and the moving waves of contraction that define peristalsis flow down the length of tongue, the pharynx, and the esophagus. Through rhythmic repetition, the proto-components of aerodigestive behaviours emerge and transform into primitive motor sequences that then evolve into smooth, tightly coupled motor runs. In other words, rhythmic behaviour seems to be an essential part of aerodigestive development for both the acquisition of repetitive movements and their coordination by sensorimotor controllers. Tongue protrusion and retraction is just one element of this gestational process.
4.2. Postnatal development
At birth, the respiratory and digestive systems are unevenly matched in maturity. Respiration is immediately robust and reliable (Greer et al. Reference Greer, Funk and Ballanyi2006) whereas digestion can mature only given the complex stimuli of actual breastfeeding – the warmth, viscosity, and taste of milk, the smell, texture, variable shape, and “solidity” of the breast, and so on. At birth, the human infant has a simple suck-swallow pattern: one swallow follows one suck. Over the first month, the infant learns to contain and corral milk within the mouth, to produce greater pressure with the tongue, and to increase the rate of peristaltic tongue motion. By the end of the first month, the suckling sequence is now organized into runs of several sucks followed by one swallow. Suckling efficiency measured by the volume of milk per suck and per swallow almost doubles. By 6 months, mature suckling is characterized by faster and more rhythmic suckling, longer suckling bursts, larger volumes per suck, and greater integration and stability in the suck-swallow rhythms (Gewolb & Vice Reference Gewolb and Vice2006; Mizuno & Ueda Reference Mizuno and Ueda2001; Qureshi et al. Reference Qureshi, Vice, Taciak, Bosma and Gewolb2002).
This maturation of the suckling requires the parallel evolution of a system that switches control between respiration and digestion (Amaizu et al. Reference Amaizu, Shulman, Schanler and Lau2008; Qureshi et al. Reference Qureshi, Vice, Taciak, Bosma and Gewolb2002). In adults, approximately 75–95% of swallows begin during the expiratory phase of respiration, a pattern that gives the adult some measure of safety. If the glottis or the nasal passages are left open during the swallow, there is still enough air in the lungs to expel the fluid with a short, sharp exhalation (not unlike how a whale clears its blowhole on surfacing). For the neonate who swallows up to 60 times per minute during suckling and yet who still lacks the precise motor skills of the adult, this adult pattern is too risky. At 48 hours after birth, when only colostrum is excreted, the adult pattern is dominant. But by the end of the first week, newborns shift towards swallowing after inhalation but before exhalation begins (Kelly et al. Reference Kelly, Huckabee, Jones and Frampton2007). This is safer because the lungs are fully inflated just before the swallow. By 6 months of age, this pattern remains predominant. It continues until after the infant's first birthday – that is, through the risky period during which infants learn to ingest solid foods (Gewolb & Vice Reference Gewolb and Vice2006; Lau et al. Reference Lau, Smith and Schanler2003; Mizuno & Ueda Reference Mizuno and Ueda2001).
4.3. Defining the first period of aerodigestion: Safeguards during learning
In the months after birth, then, the sensorimotor control of aerodigestion matures by repetition. Of course, improvement by practice presupposes error, and, during this first year, there are a number of protective mechanisms in place (Reix et al. Reference Reix, St-Hilaire and Praud2007; Thach Reference Thach2001; Reference Thach2007). One safeguard mentioned above is the neonatal pattern of respiration. Predominantly nose-breathing also markedly reduces the risk of fluid aspiration. However, between 6 and 12 weeks after birth nose-breathing ends, just around the time when the mother's immune system no longer protects the infant from colds, and so forth. (Note to new parents: Even a neonate can “override” nose-breathing during nasal congestion [Rodenstein et al. Reference Rodenstein, Perlmutter and Stănescu1985] through crying.)
The laryngeal chemical reflex (LCR), a set of chemoreflexes, is another safety mechanism. In utero, the glottal folds open to regulate lung pressure by releasing acidic lung fluid into the larynx (a necessary part of developing lung capacity). In response, the chemoreceptors inhibit breathing and stimulate the swallowing of amniotic fluid to reduce acidity in the larynx. After birth, the LRC functions as a protective mechanism against acid reflux. And later in life, the LCR will transform again, now into a protective mechanism that stimulates cough. (Unfortunately, the same protective mechanisms that work so well in the full-term neonate works against the pre-term infant. Reflux can trigger life-threatening periods of apnea and bradycardia in these infants [Miller Reference Miller2002; Praud & Reix Reference Praud and Reix2005; St-Hilaire et al. Reference St-Hilaire, Samson, Nsegbe, Duvareille, Moreau-Bussière, Micheau and Praud2007; Thach Reference Thach2010; Reference Thach2007].)
A final protective mechanism, the position and function of the neonate epiglottis, is relevant to our thesis. Infant aerodigestive anatomy and physiology differs from that of adults. In the adult, the upper and lower respiratory tracts are displaced, connected by a short length of pharynx. During the adult nutritive swallow, when the bolus nears the opening to the larynx, the epiglottis – the flap-like structure attached just above the glottis – folds down over this opening.Footnote 4 Solid food or liquid passes over the tip of the flattened epiglottis on the way to the esophagus. For many years it was assumed that the epiglottis seals the glottis, thereby protecting the adult from fluid/solid aspiration. (Indeed, almost any text on aerodigestive physiology will contain this “fact.”) However, the epiglottis does not form a watertight seal over the glottis (Bosma et al. Reference Bosma, Hepburn, Josell and Baker1990), and so cannot prevent liquid from entering the lungs. The key to epiglottal function lies with the neonate. During the mammalian neonatal period, the openings to the upper and lower respiratory tracts sit directly across from each other. (Recall that the epiglottis is a purely mammalian organ.) In this configuration, the epiglottis sits high in the nasopharynx under the nasal cavities. During swallow, the epiglottis stands upright with its tip touching the uvula. Milk flows down the pharynx, around the base of the upright epiglottis, in two deep rivulets on either side of the open glottis (Pracy Reference Pracy1983). The upright epiglottis thus maintains a patent airway between upper and lower respiratory tract such that, in principle, the neonate could both suckle and swallow at the same time. However, in practice the epiglottis acts only as a safeguard. German et al. (Reference German, Crompton and Thexton2009) have shown that, in the newborn pig, the vocal folds close during nutritive swallow; they close the airway. Thus, as the neonate learns to integrate the copious new sensory cues of suckling after birth, the upright epiglottis serves as a safeguard against mistakes. This finding meshes nicely with Miller's (Reference Miller2003) observation that, even at 28 weeks GA, the nasopharynx remains open during swallow, but the glottal folds occasionally open and close.
Note that all of the aforementioned protective mechanisms bracket a period of aerodigestive learning that coincides with the period of TP/R “imitation” (Fig. 3). Nose breathing ends between 6 and 12 weeks after birth, just after the phase during which respiration and suckling are coordinated. The combined reflexes of the LCR start in utero to wash away acidic lung fluid during breath holding (closure of the glottis). They continue through the second month of postnatal life as a means to clear the esophagus of reflux and prevent reflux aspiration. Between 2 and 4 months, when the infant becomes susceptible to respiratory viruses, the LCR produces cough to clear the respiratory tract. In other words, the LCR matures in lockstep with changes in the aerodigestive system, first by producing apnea and swallowing in the perinatal stage, and then by initiating cough prior to the onset of respiratory infections and ingestion of solid food. Lastly, the epiglottis maintains a patent airway until respiration and suckling are fully coordinated – that is, just before “training” for mastication begins.
![](https://static.cambridge.org/binary/version/id/urn:cambridge.org:id:binary:20171213080006951-0361:S0140525X16000911:S0140525X16000911_fig3g.jpeg?pub-status=live)
Figure 3. This developmental timeline shows the onset and time period of a number of aerodigestive events in human development. Note the coincident timelines of the imitation of tongue protrusion with the end of the first phase of human aerodigestive development: the mastery of suckling, swallowing, and respiration.
4.4. Switching to solids: Why tongue protrusion ends
The preparation for the mastication and ingestion of solid food (and the production of speech sounds) begins to occur around 3–4 months of age. This transformation, from suckling “machine” to self-feeding infant, requires both anatomical and physiological changes (Fig. 4).
![](https://static.cambridge.org/binary/version/id/urn:cambridge.org:id:binary:20171213080006951-0361:S0140525X16000911:S0140525X16000911_fig4g.gif?pub-status=live)
Figure 4. Anatomical differences between the adult and neonate aerodigestive systems. In the adult, note the position of the epiglottis, which sits well below the soft palate. In the infant, the soft palate and epiglottis touch. Note also the differences in the tongue shape and positions: The neonate has an elongated tongue with a flat surface; it sits forward, with the tip of the tongue just over the gums. In the adult, there is empty space within the oral cavity to allow tongue movement. Tongue movement in the neonate is more restricted (Matsuo & Palmer Reference Matsuo and Palmer2008).
The most critical anatomical event, the descent of the neonatal hyoid bone and larynx, consists of two components, a horizontal component that lowers the hyoid relative to the palate and a vertical shift that lowers the larynx relative to the hyoid (Lieberman Reference Lieberman1968, Reference Lieberman1975, Reference Lieberman1987; Lieberman et al. Reference Lieberman, McCarthy, Hiiemae and Palmer2001; Nishimura Reference Nishimura2003; Nishimura et al. Reference Nishimura, Mikami, Suzuki and Matsuzawa2006; Sasaki et al. Reference Sasaki, Levine, Laitman and Crelin1977). Descended larynges are now documented in several mammals, including deer, gazelles, lions, jaguars, tigers, cheetahs, and domestic cats (Fitch & Reby Reference Fitch and Reby2001; Frey & Riede Reference Frey and Riede2003; Weissengruber et al. Reference Weissengruber, Forstenpointner, Peters, Kübber-Heiss and Fitch2002), but in primates, the developmental pattern is only documented in chimpanzees so far (Nishimura Reference Nishimura2003; Nishimura et al. Reference Nishimura, Mikami, Suzuki and Matsuzawa2006). In human infants, this descent begins slowly after birth; by 4 months, the infant pharynx contains the short connecting portion between the upper and lower aerodigestive tracts. As a consequence, the glottis is re-positioned well below the openings to the nasal cavities. The epiglottis no longer makes contact with the hard palate during swallow, nor does it stand upright to maintain a patent airway. The resting position of the tongue is also shifted, from just behind the gums towards the back of the oral cavity. This new posterior position of the tongue makes it possible for infants to adopt the adult swallow. To swallow solid food, the tongue pushes the bolus into the pharynx and blocks the entrance to the oral cavity with its posterior end (in order to prevent the return of the bolus). When a liquid bolus is swallowed, the tongue participates in blocking the nasal cavities (to prevent aspiration). This shift in tongue position is accompanied by a newly rounded hard palate and the dissolution of the neonatal cheek fat pads. Together, they create room for new kinds of tongue movement – side to side, up and down, and back and forth – all within the oral cavity. With these changes, the tongue is ready to collect, masticate, and maneuver food, as well as practice speech sounds.
Unfortunately, this freedom of movement carries a cost. For one, the epiglottis, now positioned further down the pharynx, can no longer act as a safeguard against an ill-timed glottal closure. Consequently, the coordination of glottal closure with swallow must be mature by this stage. Second, the new posterior position of the tongue makes it possible for the tongue to inadvertently stop respiration during sleep. This problem is solved by a new form of tongue control, a brainstem mechanism in the hypoglossal nucleus (HGN) that coordinates inhalation with rhythmic TP/R. With each exhalation, the HGN is disinhibited, an event which causes both a slight TP/R and an increase in the rigidity of the pharynx, both of which create a patent airway (Bailey et al. Reference Bailey, Huang and Fregosi2006; Fregosi Reference Fregosi2008; Fuller et al. Reference Fuller, Williams, Janssen and Fregosi1999; John et al. Reference John, Bailey and Fregosi2005; Richardson & Bailey Reference Richardson and Bailey2010).Footnote 5
The second change in tongue control is more obvious. The infant must acquire the ability to manoeuver food during mastication and prior to swallowing. Infants begin mouthing behaviour (touching an object to the lips or putting it into the mouth so that it touches the tongue and gums) at about 2–3 months of age. Mouthing increases over the next few months and peaks at around 6–9 months (Rochat Reference Rochat1989). This time period coincides with a critical period for learning to manipulate food of diverse textures; it also coincides with the most dangerous period of food-related asphyxiation in infants. Foods that break into hard pieces produce the most trouble: Nuts, carrots, apples, and candy are the main causes of asphyxiation (Altmann & Ozanne-Smith Reference Altmann and Ozanne-Smith1997). Mouthing wanes by 9 to 15 months once infants are well versed in eating solid foods (Fagan & Iverson Reference Fagan and Iverson2007). These data suggest that infants do not “explore the world by mouth” so much as explore their mouths with the world. The infant develops a sensorimotor oral topography by using whatever objects are close to hand and hands are, literally, always within reach. Large objects that vary in shape, size, texture, taste, thermal conductivity, and rigidity make ideal sensory substitutes for the variety of foods that will soon be chewed and ingested – or at least for any neurologically sound infant with healthy gag and cough reflexes.
The development of mastication begins around 4 months of age, when the infant can sit upright for several moments without assistance. In the coming weeks, self-sitting will be the cornerstone for a variety of goal-directed behaviors – target-directed head and eye movements (Goodkin Reference Goodkin1980) and reaching-to-grasp (without being pulled over by the weight of the extended arm). Self-sitting also indicates sufficient cortical control to sustain the grasping, mastication, and deglutition of solid food, the result of the myelination of the corticobulbar and corticospinal tracts. This correlation is not a coincidence. The safest position for the ingestion of solid foods is upright, not supine (Sears et al. Reference Sears, Castell and Castell1990). A bolus of solid food requires greater mechanical and air pressure for smooth movement along the aerodigestive tract. As a result, the effects of gravity are integrated, through learning, into adult deglutition as a part of normal function: Remove the effects of gravity, and swallowing becomes disorganized and unreliable even when the “solid” food is only a masticated marshmallow. The advent of cortical control also explains another sign of readiness to feed: the extinction or inhibition of the primitive reflexes. An infant who reacts with tongue thrust to every foreign/novel substance is not ready to taste and swallow new foods. Infants can transition safely to solid food, then, only when the cortical control of the sub-cortical pattern generators of respiration, suckling, and swallowing is in place.
To summarize, the first phase of human aerodigestion stretches from the 9th or 10th weeks of gestation to approximately 3 and 4 months after birth – from the onset of the first isolated aerodigestive movements to the mastery of suckling and the flawless coordination of swallowing with respiration. Throughout this learning period, numerous safeguards forestall potentially fatal accidents. Once mastery is reached, the second phase of aerodigestion begins, again prior to the onset of the new aerodigestive function: here, the ability to eat solid foods. During this period of transition, the tongue is repositioned to the back of the oral cavity, the palate gradually assumes a bell shape, and the fat pads disappear. All of these events allow the tongue to move freely within the oral cavity, to manipulate, masticate, and form a solid bolus. Importantly, these new aerodigestive tasks require flexible and novel tongue movements, including the ability to find, flip, and re-position solid foods onto the molars and point-to-point ballistic movements that require topographic information (i.e., from point A to point B). Cortical control is a necessary part of learning how to eat and, later, how to speak. And because of this, aerodigestive midbrain mechanisms, including TP/R, must be suppressed. Thus, TP/R ends when cortical control begins.
5. Spontaneous tongue protrusion as rhythmic stereotypy
In 1979, Thelen published a landmark, longitudinal study of the “rhythmic stereotypies” (or general movements) of infants. Twenty infants were filmed every 2 weeks, from 4 weeks after birth to age 52 weeks. Over one year, she recorded more than 16,000 instances of repetitive stereotypical body movements classified into 47 different kinds, among them hitting, kicking, banging, thumping, and flapping. She found, first, that the peak, postnatal frequency of each stereotypy was determined by anatomy – for example, all stereotypies involving the leg such as kicking with alternate legs, or synchronous heel-thumping peak at 20 weeks postpartum. Second, 84% of the stereotypies recorded (~16,000 events) had identifiable releasers such as the appearance of the caregiver, presentation of a toy, or an interruption to feeding. Yet these stimuli were remarkably nonspecific and unrelated to the rhythmic behaviors elicited. “It is as if the eliciting context demands of the infant, ‘Do something!’ – Greet the caregiver, express delight in the mobile, manipulate the toy – but the immature central nervous system (CNS) responds in a manner that is not goal directed” (Thelen Reference Thelen1981b, p. 240).
Thelen did not record the facial expressions of the infants studied (for methodological reasons) nor did she have access to high-resolution 4-D ultrasound images of pre-natal behaviours (including images of internal rhythmic motor events). Had she, it would have been evident that although all infant stereotypies develop prior to birth, after birth they divide into two rough groups based on the timing of peak frequency. Aerodigestive stereotypies peak in frequency at birth whereas general stereotypies of the head, trunk, and limbs (that Thelen herself studied) peak months later. (The single exceptions to this division are finger movements, present at a low frequency from birth onwards.) One physiological explanation for this difference is simply that, in mammals, the myogenesis and synaptogenesis of the tongue and pharynx occurs much earlier than the development of the limbs and trunk, and even the jaw (Widmer et al. Reference Widmer, English and Morris-Wiman2007; Yamane Reference Yamane2005). Another such explanation is that the corticobulbar tract, which mediates the cortical control of the trigeminal, facial, and hypoglossal cranial nerves, develops both earlier and faster than the corticospinal tract that controls limb movement (Martin Reference Martin2005; Sarnat Reference Sarnat2003). But as to why this should be, our answer at the outset seems the most plausible: Aerodigestive sensorimotor development takes precedence over the acquisition of “non-essential” general motor tasks at least until the second stage of aerodigestive development when trunk control is acquired and solid feeding can begin.
The experimental results of Thelen (Reference Thelen1979) combined with the early ultrasound studies of neonatal neurologists (de Vries et al. Reference de Vries, Visser and Prechtl1982; Prechtl Reference Prechtl1985) show that infant stereotypies form a class on the basis of seven factors as follows. Stereotypies (1) are simple, rhythmic movements; (2) begin and end within a set window during the first year of the infant's life; (3) are invoked or undergo a change in rate as a result of nonspecific stimuli often related to arousal; and (4) re-emerge in later life as a result of cortical injury or generalized cortical degeneration. When an infant fails to exhibit a stereotypy or the stereotypy shows a markedly abnormal pattern, it is often the case that (5) there is a cortical abnormality or injury in the infant; and (6) this abnormality will lead to a cascade of further developmental problems. Finally, (7) stereotypies are easily distinguished from primitive reflexes that occur as a result of specific stimuli and promote infant survival.
TP/R, as our model gesture, clearly meets these criteria. First, TP/R is a rhythmic behaviour, one rarely seen in full-term infants after the fourth month of life. Abnormal or continued TP/R beyond the neonatal period is often the result of developmental abnormalities. For example, children and adults with Down syndrome continue to exhibit spontaneous TP/R, often into adulthood. The problem here is hypotonicity, a lack of muscle tone in the tongue, lips, and jaw (Limbrock et al. Reference Limbrock, Fischer-Brandies and Avalle1991). Without proper internal control, the tongue flattens, assuming a broad, flaccid shape, and as a result, the tongue does not exert normal pressure on the hard palette during suckling. Without suckling pressure, the high arched shape of hard palate fails to change into the broad, rounded shape conducive to solid feeding (Mizuno & Ueda Reference Mizuno and Ueda2001). In turn, the jaw (masseter) muscles develop abnormally, and the misalignment of the jaw results in a cross- or overbite (Faulks et al. Reference Faulks, Mazille, Collado, Veyrune and Hennequin2008; Shapiro et al. Reference Shapiro, Gorlin, Redman and Bruhl1967; Thompson Reference Thompson1976). Eventually this hypotonicity will affect speech and even the child's ability to make emotional facial expressions (Limbrock et al. Reference Limbrock, Fischer-Brandies and Avalle1991).
TP/R often reappears later in life as a result of degenerative cortical disease or cortical trauma. Dystonic TP/R occurs with advanced cortical degeneration, as a result of Alzheimer's disease, pantothenate kinase-associated neurodegeneration (PKAN), and a variety of other genetic degenerative cortical diseases (Schneider et al. Reference Schneider, Aggarwal, Bhatt, Dupont, Tisch, Limousin and Bhatia2006). Involuntary TP/R, in the form of tongue thrust, in these cases may be life threatening: that is, severe enough to impair swallowing and breathing. And people who have suffered severe neural trauma, even those who have an absence of all cortical activity as measured by electroencephalogram (EEG), may also show spontaneous TP/R (Go et al. Reference Go, Konishi and Baune2008).
TP/R is affected by arousal. In Jones (Reference Jones2006a; Reference Jones2006b), infants who listened to the overture to The Barber of Seville, music chosen for its abrupt changes of pace and volume, showed a consistent increase in TP/R. Similarly, Jones (Reference Jones1996) found that infants responded with TP/R to flashing colored lights and dangling toys. Both stimuli were as effective at increasing the rate of (full) TP/Rs as the demonstration of TP/R. In response to this evidence, Nagy et al. (Reference Nagy, Pilling, Orvos and Molnar2013) have argued that increases in TP/Rs do not correlate with the standard measures of general arousal. But as Jones (Reference Jones2009) pointed out, at least within a certain range of arbitrary stimuli, infants respond with specific reactions, an increase in orofacial stereotypies overall but an increase in tongue protrusion in particular. Moreover, if heart rate is monitored, imitation of TP is preceded by significant heart rate acceleration, an independent and objective confirmation of at least one arousal response (Nagy & Molnar Reference Nagy and Molnar2004). In short, the infant reacts with tongue protrusion to any interesting or arousing stimulus. (In sect. 7, we will return to this issue.)
Importantly, TP/R differs from what have been called the “primitive reflexes” of the neonate, with which it has often been confused. The primitive reflexes such as the rooting, suckling, and the Babinski and Moro reflexes are complex, automatic behaviors evoked by specific triggering stimuli (e.g., stroking the cheek, drawing a pencil along the sole of the foot, briefly – and safely – dropping the infant). Although some primitive reflexes are rhythmic (stepping and sucking), others involve a single motor sequence (e.g., the Moro reflex). They develop around week 25 of gestation, and although they generally disappear within the first year of life, it is not uncommon to see certain primitive reflexes in healthy, young adults (Brown et al. Reference Brown, Smith and Knepper1998). In contrast, TP/R develops earlier in gestation, does not have a single trigger, and is fully absent in healthy adults. However, both TP/R and the primitive reflexes can reappear after neural loss in cortex, as the result of normal aging or with degenerative neural disease (Bakchine et al. Reference Bakchine, Lacomblez, Palisson, Laurent and Derouesné1989; Burns et al. Reference Burns, Jacoby and Levy1991; Damasceno et al. Reference Damasceno, Delicio, Mazo, Zullo, Scherer, Ng and Damasceno2005; van Boxtel et al. Reference van Boxtel, Bosma, Jolles and Vreeling2006; Vreeling et al. Reference Vreeling, Houx and Jolles1995). Therefore, both neonatal stereotypies and primitive reflexes appear to be sub-cortical motor functions but of two distinct kinds.
In sum, TP/R fits the profile of rhythmic neurodevelopmental behaviour. It emerges as a result of subcortical function in utero, is inhibited and/or integrated with the advent of cortical control, is sensitive to nonspecific external stimuli, and often reappears in cases of cortical trauma or degenerative disease. Abnormal neonatal tongue protrusion can also lead to a cascade of developmental disorders. Of course, if TP/R is just one of many rhythmic stereotypies, this would explain why stimuli such as the overture to The Barber of Seville produce an increase in neonatal TP/R. It would also explain the phenomenon of TP/R decline: We no longer see TP/R “imitation” after 3 months because rhythmic movements, as a developmental phase, come to an end as a whole.
6. Tongue protrusion and activity-dependent development
6.1. The general phenomenon: Activity-dependent development
In the previous section, we argued that TP/R is a stereotypy, one of the many rhythmic movements that appear before and after birth, which are neither goal-oriented nor triggered by specific stimuli. Yet despite their apparent “aimlessness,” the ubiquity of stereotypies in mammalian development suggests that they constitute a functional stage in sensorimotor development (Thelen Reference Thelen1979; Reference Thelen1981b). Thelen hypothesized that rhythmic stereotypies “bridge the gap” between disorganized and goal-directed behaviours, that they form a “substrate” for the directed behaviours to follow. Recent work on activity-dependent development suggests an answer that aligns with Thelen's view: Rhythmic movements, such as TP/R, drive a series of activity-dependent neurodevelopmental events.
Pioneered by the classic work of Hubel and Wiesel (Hubel & Wiesel Reference Hubel and Wiesel1970; Hubel et al. Reference Hubel, Wiesel and LeVay1977; Wiesel & Hubel Reference Wiesel and Hubel1963; Reference Wiesel and Hubel1965) on mammalian visual cortex development, abundant evidence now strongly suggests that neural activity modulates the development of the central nervous system (see Ben-Ari Reference Ben-Ari2001; Blankenship & Feller Reference Blankenship and Feller2009; O'Donovan Reference O'Donovan1999 for reviews). Once neurons are born, spontaneous, isolated activity begins in individual cells, which is characterized by a slow depolarization crested by a burst of activity. Soon this random activity coalesces into the synchronous activation of neighboring cells, with waves of activation flowing outwards from the locus. Notably, spontaneous activation is not confined to one area of the developing brain, say to motor or sensory areas alone. It has been recorded in the spine (Borodinsky et al. Reference Borodinsky, Root, Cronin, Sann, Gu and Spitzer2004; Hanson & Landmesser Reference Hanson and Landmesser2003; Reference Hanson and Landmesser2004; Whelan et al. Reference Whelan, Bonnot and O'Donovan2000), as well as in the cerebellum, retina (Meister et al. Reference Meister, Wong, Baylor and Shatz1991; Sretavan & Shatz Reference Sretavan and Shatz1986; Sretavan et al. Reference Sretavan, Shatz and Stryker1988; Torborg & Feller Reference Torborg and Feller2005; Wong et al. Reference Wong, Meister and Shatz1993), cochlea (Tritsch et al. Reference Tritsch, Yi, Gale, Glowatzki and Bergles2007), hippocampus (Garaschuk et al. Reference Garaschuk, Hanse and Konnerth1998), and visual cortex (Siegel et al. Reference Siegel, Heimel, Peters and Lohmann2012). Immature neurons throughout the brain – even neural progenitor cells yet to migrate to their permanent locations – are capable of spontaneous activation and signal propagation.
Spontaneous activity of the kind just described drives early developmental processes both directly and through epigenetic mechanisms. In Ca2+ spontaneous activation, for example, a Ca2+ transient leads to an influx of Ca2+ ions, an event that initiates further production of Ca2+ and amplifies calcium concentration within the cell (Gu et al. Reference Gu, Olson and Spitzer1994; Rosenberg & Spitzer Reference Rosenberg and Spitzer2011; Spitzer et al. Reference Spitzer, Gu and Olson1994). This sudden depolarization can initiate changes in the cytoskeleton, such as the growth of dendritic trees (Konur & Ghosh Reference Konur and Ghosh2005) or the emergence of synapses. Additionally, this intracellular Ca2+ can lead to the expression of genes for cell development. For example, calcium transients can inhibit or excite DNA synthesis and thus, control the rate of cell birth or neurogenesis (cf. Fiszman et al. Reference Fiszman, Borodinsky and Neale1999; LoTurco et al. Reference LoTurco, Owens, Heath and Davis1995); they can determine whether largely inhibitory or excitatory transmitters are produced (Borodinsky et al. Reference Borodinsky, Root, Cronin, Sann, Gu and Spitzer2004; Spitzer & Borodinsky Reference Spitzer and Borodinsky2008; Spitzer et al. Reference Spitzer, Root and Borodinsky2004), and; they contribute to pathfinding during cell migration (Hanson et al. Reference Hanson, Milner and Landmesser2008; Kita et al. Reference Kita, Scott and Goodhill2015) often in conjunction with chemical cues (Imai & Sakano Reference Imai and Sakano2011).
Importantly, what happens downstream, the effects of activity on cell maturation, depends upon a number of factors. One factor is the distance over which activation spreads, that is, only within the neuron, to near neighbors only, or to distal projections. A second factor is the activation “signature,” the unique variation on the burst-silence pattern produced (Kirkby et al. Reference Kirkby, Sack, Firl and Feller2013; Spitzer et al. Reference Spitzer, Root and Borodinsky2004). Shorten the inter-burst interval or alter the burst pattern and normal development will not occur. Finally, the causal effects of spontaneous activation are state dependent – that is, dependent upon previous activity and its effects on gene expression.
The upshot of this body of research is that activity dependence is a general developmental phenomenon. On one end of the continuum, sensory experience acts through the standard mechanisms of sensory transduction and transmission, and properties of stimuli affect neural organization. At the other end, neural organization arises out of variations in the standard pattern of long silences punctuated by short bursts of activity. But there are also a number of “in between” variations. Spontaneous activation can spread to mature neurons, thus propagating the signal to distal locations. Indeed, Khazipov et al. (Reference Khazipov, Sirota, Leinekugel, Holmes, Ben-Ari and Buzsáki2004) reported that visual signals, produced through photoreceptor transduction and transmission via retinal ganglion cells can lead to waves of spontaneous activity at the axon terminus, in the lateral geniculate nucleus (LGN), prior to maturation. Finally, activity-dependent development can be driven by self-induced sensory feedback. Spontaneous activity in motoneurons, within the spine, midbrain, or cortical motor areas produces muscle twitches. In turn, muscle twitches activate stretch and load receptors in the muscles, sensory feedback that initiates activity-dependent changes in sensory areas (Colonnese & Khazipov Reference Colonnese and Khazipov2010; Khazipov et al. Reference Khazipov, Sirota, Leinekugel, Holmes, Ben-Ari and Buzsáki2004). So, the self-production of sensory signals, caused by motor events with the classic burst-silence pattern, is yet another variant of activity-dependent development.
On the picture of development now emerging, neural development uses a rich form of neural scaffolding. Spontaneous activity can create temporary pathways between two regions and then eliminate or alter them once the scaffolding is no longer needed – for example, once a direct link between the two termini has formed (Khalilov et al. Reference Khalilov, Minlebaev, Mukhtarov and Khazipov2015; Luhmann et al. Reference Luhmann, Kirischuk, Sinning and Kilb2014; Shatz et al. Reference Shatz, Chun and Luskin1988). Epigenetic processes can lead to neurotransmitter specification and then their re-specification at a later time (Spitzer Reference Spitzer2012; Spitzer & Borodinsky Reference Spitzer and Borodinsky2008; Spitzer et al. Reference Spitzer, Root and Borodinsky2004). Similarly, an existent excitatory neurotransmitter may become inhibitory (or vice versa) as a result of the activity-dependent expression of different membrane channel receptors (Blankenship & Feller Reference Blankenship and Feller2009; Ford & Feller Reference Ford and Feller2012; Wolfram & Baines Reference Wolfram and Baines2013). Thus, the “storyline” of neural development looks much less like a pure cascade of events, each stage building on the last, and more like an economical solution to the Tower of Hanoi puzzle, a back and forth of developmental events that eventually results in the standard organizational patterns of the normal adult brain (Shatz Reference Shatz2012).
Against this general framework, the suggestion that rhythmic stereotypies participate in activity-dependent processes is more plausible. First, if motor events can bring about neural development through self-induced, rhythmic activation, then TP/R, along with other rhythmic stereotypies, is a potential cause of activity-dependent development. For another, it is less mysterious why there is a mismatch between the time periods of human gestational events typically measured in days or weeks (or occasionally months) and the lengthy lifespan of rhythmic stereotypies (~9 months). If mammalian neural development adheres to a “use, dispose, and replace” principle, and/or to the dictum of “write rough and refine later,” then TP/R might well drive a sequence of distinct developmental events: for example, pathfinding from B to A, followed by pathfinding from B to C.
In what follows, we begin with a short section on the physiology of the tongue, a prerequisite to understanding the development of its control, and then outline three activity-dependent developmental events to which TP/R as a rhythmic neurodevelopmental behaviour plausibly contributes.
6.2. The neurophysiology of tongue control
The mammalian tongue has a remarkable structure: It is a tethered limb without an internal skeleton (Takemoto Reference Takemoto2001). Without the constraints on motion imposed by a rigid skeleton and joints, tentacle-like limbs have an enormous range of deformation and (non-translational) motion, a bit like fiber optics compared to a flashlight. Tentacle-like limbs are also alarmingly strong (think of elephants and logs) yet capable of fast and accurate movement and deformation (Kier Reference Kier2012). For example, during rapid speech, an adult speaker produces ~1,400 phonemes a minute, an extraordinary sensorimotor feat (Hiiemae & Palmer Reference Hiiemae and Palmer2003).
The current, predominant theory of tongue physiology treats the human tongue as a solid muscular hydrostat, as a solid cylinder of muscle that maintains a constant volume under pressure, throughout deformation (Smith & Kier Reference Smith and Kier1985; Reference Smith and Kier1989; Takemoto Reference Takemoto2001). Decrease its height, and the cylinder must widen; decrease the girth, and the cylinder must lengthen. This inverse relation is the central principle behind the human tongue's physiology according to the hydrostatic theory. Because muscles contract on activation but are lengthened passively, all musculoskeletal systems involve muscle antagonists: When one contracts, the other lengthens and vice versa. Within a solid muscular hydrostatic, muscle antagonists are formed by their relative orientation. Muscles that run parallel to the tongue's long axis shorten the tongue via contraction. Muscles perpendicular to the long axis – the vertical and horizontal transverse layers – narrow the tongue and thus, lengthen it.
In the human tongue, these principles are implemented by complex physiology: Eight pairs of muscles form concentric layers around the cylinder's axis; each layer itself consists of finely interdigitated layers of muscle fiber (Takemoto Reference Takemoto2001). The tongue's core, for example, consists of three muscle groups each of which runs perpendicular to the axis, the transverse muscle interdigitated with the genioglossus and verticalis muscles. Thus, when the core contracts, the tongue narrows and protrudes. Importantly, deformation of the tongue always occurs under active resistance, by isotonic contraction (Pittman & Bailey Reference Pittman and Bailey2009). When the core muscles contract, the surrounding layer of parallel fibers provides active resistance to lengthening. Together, isotonic contraction plus muscle interdigitation add strength and rigidity to the tongue's structure and make complex deformation possible.
Not surprisingly (to motor physiologists at least), human tongue control is organized in the same way as limb control. At the level of the midbrain, tongue control is organized by activity, by the common repetitive behaviours in which the tongue plays a major role. At least five aerodigestive activities (respiration, suckling, swallowing, mastication, and licking) are controlled by central pattern generators (CPGs) located in the medulla and pons (Barlow & Estep Reference Barlow and Estep2006; Barlow et al. Reference Barlow, Radder, Radder and Radder2010; Dutschmann & Dick Reference Dutschmann and Dick2012; Smith et al. Reference Smith, Abdala, Rybak and Paton2009). A CPG is any set of neurons that produces a pattern of activation and maintains a rhythm. So, by definition, even a pacemaker neuron, a solitary neuron that fires spontaneously at regular intervals, is a CPG. But in practice most CPGs are complex circuits of interneurons that produce rhythmic movement through reciprocal inhibitory and excitatory connections, some of which are regulated by pacemaker neurons and some not (Marder & Taylor Reference Marder and Taylor2011). On some definitions, CPGs are said to be circuits that can produce “fictive behavior,” that is, can produce motor patterns without feedback or afferent signals. This is true: CPGs are capable of self-sustained behaviour. But again, in situ, the genius of a CPG is its ability to modulate rhythmic motor behaviour on the fly in response to signals from the senses, cortex, and from other CPGs (Harris-Warrick Reference Harris-Warrick2011; Marder Reference Marder2012; Marder & Bucher Reference Marder and Bucher2001).
Aerodigestive CPGs are large-scale circuits organized in rough hierarchies, what one might think of as “CPGs within CPGs.” CPGs for the simplest repetitive behaviours are recruited into larger networks that synchronize their activation into coherent motor runs. In turn, these circuits may themselves be recruited as the components of even larger CPGs. Aerodigestive CPGs are particularly complex given the functional overlap between aerodigestive behaviours, for example, suckling, respiration, emesis, and licking all involve TP/R. Barring the re-duplication of all low-level CPGs, there must be some means by which CPGs can be shared. In principle, there are a variety of forms that sharing could take, probably all of which are found in aerodigestive motor control. In the simplest case, large-scale CPGs with common components are loosely connected into a single network and “sharing” neural resources amounts simply to ceding control on the basis of competition or protocol (Gutierrez et al. Reference Gutierrez, O'Leary and Marder2013). A slightly more complex scenario involves a network of low-level components that can be activated in different orders, sometime using all of the components, sometimes not. In the most complex case, large-scale CPGs are genuinely multifunctional: A single pool of neurons collectively instantiates more than one CPG (Ramirez & Pearson Reference Ramirez and Pearson1988). Because neurons can express multiple types of synapses defined by the neurotransmitters they release (Briggman & Kristan Reference Briggman and Kristan2008; Harris-Warrick & Marder Reference Harris-Warrick and Marder1991; Kvarta et al. Reference Kvarta, Harris-Warrick and Johnson2012; Marder et al. Reference Marder, O'Leary and Shruti2014; Ramirez & Pearson Reference Ramirez and Pearson1988), functionally distinct neural circuits can exist within a single pool of interneurons. For example, the pre-Bötzinger complex within the respiratory network can produce normal inspiration, gasping, or sighing (Doi & Ramirez Reference Doi and Ramirez2008; Lieske et al. Reference Lieske, Thoby-Brisson, Telgkamp and Ramirez2000; Ruangkittisakul et al. Reference Ruangkittisakul, Schwarzacher, Secchia, Ma, Bobocea, Poon, Funk and Ballanyi2008; Tryba et al. Reference Tryba, Peña, Lieske, Viemari, Thoby-Brisson and Ramirez2008).
At present, very little is known about the sensorimotor representation of the tongue in cortex (but see Laine et al. Reference Laine, Nickerson and Bailey2012; Sakamoto et al. Reference Sakamoto, Nakata, Inui, Perrucci, Del Gratta, Kakigi and Romani2010). What we do know is that there are topographic maps of the tongue and other oropharyngeal structures in S1 and M1 (Cerkevich et al. Reference Cerkevich, Qi and Kaas2013; Reference Cerkevich, Qi and Kaas2014) and that the large areas of the homunculi devoted to the tongue and other oropharyngeal structures, is explained by their fine-grained motor control and multiple sensory systems. As we will see, TP/R is likely to play a role in the functional development of S1 and M1, but it ends too soon to participate in the “wiring” of the many cortical areas involved in even the “simple” act of adult swallowing.
6.3. The emergence and refinement of tongue protrusion
Despite its paradoxical sound, we suggest that TP/R begins as an activity “for” tongue protrusion itself, that tongue protrusion begets tongue protrusion of a “more better” kind. By the time TP/R is clearly visible in the human fetus, at 14–16 weeks GA, the brain has undergone significant development. The sensory and motor cranial nuclei, including the hypoglossal nuclei, have been in place for more than 8 weeks (Müller & O'Rahilly Reference Müller and O'Rahilly2011); all six layers of the cortex are almost completely formed (Clancy et al. Reference Clancy, Darlington and Finlay2000). Yet appearances aside, the visible structures/areas of the brain are not yet functional because they lack both the internal circuitry and distal connections to sensory transducers required for mature function. Significant development in the form of neural specification (and re-specification) must occur before birth and will continue thereafter.
Warp et al. (Reference Warp, Agarwal, Wyart, Friedmann, Oldfield, Conner, Dele Bene, Arrenberg, Baier and Isacoff2012) presented the first fine-grained description of how spontaneous activation leads to permanent circuit formation in the swimming CPG in zebrafish. The side-to-side swimming motion of the fish is the result of a simple circuit. In each spinal segment, two pools of motoneurons innervate muscle around the spine, one for each side of the body. Within each pool the connections are mutually excitatory; across the midline, between the two pools, the connections are inhibitory. In swimming, a wave of activity flows down the spine causing ipsilateral contraction and contralateral suppression (inhibition of contraction). The development of the swimming CPG follows this same head-to-tail pattern. At the top of the spine, release of a Ca2+ transient within one motor pool of the first spinal segment causes sporadic random activity that soon coalesces into synchronous activity; synchronous activity soon spreads across the midline into the contralateral motor pool where isolated, random activation begins. Again, isolated activity coalesces and now spreads to the next spinal segment. At the same time, neural coupling matures: Activation by transient release leads to the formation of gap junctions, and activity across gap junctions results in the expression of synapses. Without spontaneous activity, or activity across gap junctions, further specification does not occur. This is how the swimming CPG is born of incremental, activity-dependent developmental processes (Warp et al. Reference Warp, Agarwal, Wyart, Friedmann, Oldfield, Conner, Dele Bene, Arrenberg, Baier and Isacoff2012).
As we have seen, prenatal ultrasounds provide behavioral evidence that aerodigestive brainstem CPGs also emerge in an incremental fashion: Mouth opening/closing, tongue protrusion/retraction, and glottal opening/closing all begin with minute, uncertain movements that slowly develop into robust rhythmic motor sequences. We suggest that the CPG for TP/R develops along the same line. Motoneurons for tongue innervation that originate within the hypoglossus (cranial nerve XII) nucleus are grouped by muscle innervation (e.g., the genioglossus muscle) as well as by hydrostatic function. Two pools of motoneurons, in the medial and lateral branches of the hypoglossal nucleus, control tongue narrowing/elongation and tongue shortening/widening respectively (Guo et al. Reference Guo, Goldberg and McClung1996; McClung & Goldberg Reference McClung and Goldberg2000; Reference McClung and Goldberg2002; Smith et al. Reference Smith, McClung and Goldberg2005). We also know that in the early postnatal period (in rats), hypoglossal neurons switch from spontaneous/gap junction transmission to synaptic signaling. Thus, local spontaneous activation within the medial branch of cranial nerve XII explains the first weak protrusive movements of the tongue (by activation of the medial motoneurons). A widening circle of synchronous interneuron activation, representing muscle recruitment, explains the increasing strength of tongue protrusions. All else being equal, spontaneous activity in the lateral branches will cause tongue retraction. And like the neural pools on the opposite sides of the spinal segments in the zebrafish, inhibitory interconnections between the medial and lateral compartments ensures that, at the outset, tongue retraction does not hinder tongue protrusion and vice versa.
6.4. The interconnection and coordination of brainstem CPGs
Once lower-level motor components begin to emerge, they must be brought under the control of larger-scale aerodigestive CPGs. As we have seen, there are many ways that this can occur. Some neural circuits will be genuinely multifunctional: that is, capable of producing multiple distinct patterns like the pre-Bötzinger nucleus in respiration. Other CPGs might share a low-level circuit simply by passing its control back and forth between them, according to some engrained “rule” or on the basis of competition. But whichever strategies are implemented, both inhibitory and excitatory connections between the component CPGs are necessary: Inhibition ensures that mutually exclusive motor sequences are not activated by their shared components; excitation coordinates activation, binding motor components into synchronized sequences.
By the time TP/R is just discernible at 12 weeks post-conception in the human fetus, the sensory and motor nuclei of the cranial nerves have been in place for many weeks (Clancy et al. Reference Clancy, Darlington and Finlay2001). By the end of the embryonic period, at about 8 weeks post-conception (Müller & O'Rahilly Reference Müller and O'Rahilly2011), all of the cranial nerves and nuclei have formed and occupy their permanent locations – even before the motoneurons have innervated tongue muscles. (The exceptions are the facial cranial nerves (VII) and their nuclei that form later in the early fetal period.) What remains is the development of functional circuits.
Consider two aerodigestive CPGs that share control of the tongue, the CPG that controls the oral stage of swallowing and the CPG that controls suckling. The oral stage of swallowing involves innervation of the mouth, face, tongue, palate, and pharynx (cranial nerves V, VII, IX, X, and XII). The larger CPG for suckling, which comprises at least six separate areas of the brainstem, involves the (paired) cranial nerves V, VII, and XII (Broussard & Altschuler Reference Broussard and Altschuler2000). In feeding, suckling precedes swallowing – at first, in a cycle of one suckle and one swallow, but quickly progressing to one swallow after multiple suckles (sect. 4.2). Their coordination thus involves connections that suppress simultaneous activity yet allow each CPG to cede or gain control serially and allow flexibility, given maturational changes, of the suck-swallow rhythm. The control of TP/R involves the coordinated activation within the hypoglossal (XII), trigeminal (V), facial (VII), and glossopharyngeal (IX) cranial nerve nuclei. But TP/R also produces a cascade of sensory signals from the oral cavity, tongue, jaw, lips, and face, which will arrive simultaneously at the sensory portions of the trigeminal (V), facial (VII), and glossopharyngeal (IX) cranial nerves. Two of these cranial nuclei, V and VII, contain circuits common to both suckling and swallowing. So sensory feedback from TP/R will produce simultaneous activation in cranial sensory nuclei V and VII. (Cranial nerve XII, the hypoglossal nerve, is largely or entirely a motor nerve.) If neurons that fire together, wire together, then TP/R will produce interconnections between components of suckling and swallowing not initially connected – that is, between all those that involve the cranial motor nuclei V and VII. These are exactly the kinds of inhibitory connections needed to ensure flexibility in the suckling and swallowing sequence: No matter how many sucks precede the swallow, sensory feedback will inhibit the swallowing CPG.
In sum, robust TP/R can aid in the maturation of other aerodigestive CPGs because TP/R produces a wide range of – and widely ranging – sensory feedback to the cranial nuclei, relative to other oropharyngeal repetitive behaviours such as tongue peristalsis and glottal opening and closing. This goes some way to explaining why TP/R might continue to occur as an isolated behaviour.
6.5. The development of topographic maps in somatosensory cortex
In placental mammals, the formation of topographic maps within cortex, such as the motor and sensory homunculi, begins with the formation of a temporary developmental structure, the cortical subplate. Spontaneous activation within the subplate guides the axons of sensory neurons from the thalamus below, and the axons of cortical motor neurons above (Kanold & Luhmann Reference Kanold and Luhmann2010; Tolner et al. Reference Tolner, Sheikh, Yukin, Kaila and Kanold2012). In mammalian development, the crucial anatomical structures that connect brainstem nuclei with orofacial somatosensory cortex – the cranial nuclei, the thalamus, the cortical subplate, and all six layers of cortex – form largely prior to the onset of TP/R (Clancy et al. Reference Clancy, Darlington and Finlay2001). Yet although TP/R begins too late to be a major determinant in the mechanisms of neurogenesis, migration, or axon pathfinding to S1, the development of functional circuitry in S1 has yet to occur.
During this postnatal period of mammalian cortical development, S1 has a single form of organized neural activity, spindle bursts, that correlate with motor activity: for example, muscle twitches in the hind limb of the rat produce temporally correlated S1 signals, and extinction of muscle twitches largely silences S1 (Khazipov et al. Reference Khazipov, Sirota, Leinekugel, Holmes, Ben-Ari and Buzsáki2004). This suggests that spontaneous activity in M1 organizes sensorimotor cortical connections through self-initiated activity (muscle twitch). In much the same way that postnatal visual experience is required for normal formation of the ocular dominance and orientation columns of mammalian V1 (for a review, see Cang & Feldheim Reference Cang and Feldheim2013), sensory experience generated by self-motion organizes cortical homunculi. Thus, TP/R coincides with a period of dramatic cortical development driven by sensorimotor signals of the very kind required.
At this point, there is no direct evidence for the involvement of TP/R in these processes. This is not surprising: It is only within the past couple years that basic anatomical research on the cortical representation of orofacial regions (Cerkevich et al. Reference Cerkevich, Qi and Kaas2013; Reference Cerkevich, Qi and Kaas2014) has been completed. Still, TP/R and other orofacial behaviours continue into the postnatal period, and there is no lack of developmental events to which self-initiated signals might participate, namely (1) the generation of somatotopic S1 maps of the tongue, lips, jaw, and lower face; (2) the corticothalamic connections between facial/tongue regions of S1 and the ventral-posterior nucleus of thalamus (Deck et al. Reference Deck, Lokmane, Chauvet, Mailhes, Keita, Niquille, Yoshida, Yoshida, Lebrand, Mann, Grove and Garel2013); and/or (3) the corticobulbar connections between M1 and the hypoglossal, trigeminal, and facial nuclei (Sarnat Reference Sarnat1989; Reference Sarnat2003; Reference Sarnat2015). These are all circuits/networks that we know form in the neonatal infant, for which tongue protrusion would provide the requisite “end point” of neural activity.
7. Rethinking neonatal imitation
Thus far we have walked through the events of the aerodigestive development and the essential role that sensorimotor control of the tongue plays within all aerodigestive functions of the human neonate. We hope to have established that TP/R (1) has the hallmark features of the rhythmic stereotypies common in early infant development; (2) emerges early in prenatal life and continues until suckling and respiration are fully coordinated and developed; (3) ends prior to the learning period, during which the infant prepares for the ingestion of solid food; (4) is controlled exclusively by brainstem mechanisms given the immaturity of sensory and motor cortex; and (5) likely contributes to at least three kinds of activity-dependent development during the lengthy window of its existence. Viewed in this context, the positive results of TP/R imitation are more likely to be by-products of normal aerodigestive development, behaviours that increase in frequency when neonates interact with adults or are presented with other interesting stimuli, than they are to be the result of facial imitation. The coincident window of appearance and disappearance of TP/R “imitation” with the first phase of aerodigestive development lends further support to the aerodigestive origin of TP/R (Fig. 3).
Starting at 12 weeks, the human fetus develops a repertoire of rhythmic behaviors, including TP/R, mouth opening and closing (MO/C), isolated eye opening (as opposed to repetitive blinking), index finger protrusion, mouthing (with hand in mouth), yawning, grimacing, smiling, and swallowing. As we have seen in section 4.1, all of these movements begin as small, isolated gestures and increase in duration and frequency over the following weeks. Eight weeks before birth the behavioral repertoire of the neonate is in place ready for postnatal life and all of the gestures tested in imitation experiments come from this repertoire. The aerodigestive stereotypies (plus finger movements) peak in frequency at birth. Of these “early” stereotypies, TP/R and MO/C and index finger protrusion are produced with the highest frequencies during the first week after birth (Oostenbroek et al. Reference Oostenbroek, Suddendorf, Nielsen, Redshaw, Kennedy-Costantini, Davis, Clark and Slaughter2016). It is worrisome that all of the stereotypies that peak early in frequency are also the gestures that are tested in neonatal imitation experiments. Are these gestures imitated because they are frequent gestures in neonatal life? Or do imitation experiments yield positive results because these stereotypies are more frequent?
The aerodigestive theory situates the gestures at issue within a known class of fetal/infant behaviours – stereotypies – but also within the known processes of early neural development. These stereotypies form a developmental stage in motor learning. This suggests a very different explanation of why the gestures used in neonatal imitation experiments peak at birth, taper off, and then disappear. Proponents often suggest that the infant has lost interest in old social interactions and has moved on to new, more novel behaviours. Instead, all early rhythmic movements end by this time. From the physiological point of view, then, orofacial stereotypies make sense as members of a well-defined category of fetal/neonatal behaviours. The same conclusion applies to the other stereotypies that appear to elicit imitation.
We realize most proponents of neonatal imitation will not be satisfied with this argument, especially those who do not support the strong representational claims of AIM. And even readers who accept our account of aerodigestive neurodevelopment may question the consequences of these facts for neonatal imitation. To conclude, then, we address three questions the proponent of neonatal imitation might reasonably ask.
7.1. Could there be a subcortical locus of NI?
Suppose we agree that neonatal imitation is unlikely to be controlled by cortical mechanisms and shift our focus to subcortical ones. Here the mammalian superior colliculus (SC) seems like the most plausible candidate. SC is a laminar, midbrain structure that uses visual and multimodal cells to perform sensorimotor transformations. Its structural and functional properties make it perfectly suited to neonatal imitation (cf. May Reference May2006). Briefly, the superior three layers of SC (I-III) receive only visual input, from the retinal ganglion cells, V1, and the frontal eye fields (FEF). Superior SC conserves the topographic organization of the retina and V1, and its neurons preserve the properties of V1 cells (on-off center-surround organization, sensitivity to orientation and wavelength, and binocularity) (Tailby et al. Reference Tailby, Cheong, Pietersen, Solomon and Martin2012). The deep layers of SC receive input from multiple senses – vision, audition, proprioception, plus the somatosensory and vestibular systems – and they converge upon single cells in all possible combination (Sparks & Hartwich-Young Reference Sparks and Hartwich-Young1989). These multimodal neurons are also topographically organized, forming three distinct maps, one each for visual, auditory, and somatosensory inputs, which align in location within and between layers (Meredith & Stein Reference Meredith and Stein1986b). The net result is a systematic multimodal mapping of neurons that “prefer” whatever stimuli are coincident in space and time (Meredith & Stein Reference Meredith and Stein1986b). The sight of a dog and the sound of its bark – in spatiotemporal synchrony – produce a maximal response in deep SC neurons. Finally, SC deep layers drive motor behaviours: Efferent SC signals are sent to pre-motor and motor nuclei of the brainstem and spine (Meredith & Stein Reference Meredith and Stein1986a). All in all, the SC seems “purpose built” to implement the hardware for neonatal imitation.
7.1.1. Answer
Certainly, prima facie, SC looks like an excellent candidate. In fact, Pitti et al. (Reference Pitti, Kuniyoshi, Quoy and Gaussier2013) have produced a model that shows how SC could transform visually encoded facial gestures into imitative actions using the receptive properties of SC neurons. This is not as surprising as it might seem. SC visual neurons and V1 neurons have very similar response properties with the possible exception of S cone input (but see Hall & Colby Reference Hall and Colby2014). If we, as adults, recognize facial expressions/body gestures by means of V1 input, it would be very odd if one could not construct such a model from SC neural responses. Rather, the more significant question concerns the plausibility of the suggestion: Is SC likely to underwrite neonatal imitation?
Traditionally, we have understood the primary function of mammalian SC as one of orientation: In primates, the SC coordinates eye and head movements during saccades to maintain focus on visual targets (Marino et al. Reference Marino, Levy and Munoz2015; Schiller et al. Reference Schiller, Sandell and Maunsell1987). It also controls smooth-pursuit eye movements when targets move slowly (Krauzlis et al. Reference Krauzlis, Basso and Wurtz2000), provides updates on current location (Dash et al. Reference Dash, Yan, Wang and Crawford2015), and activates express saccades. This orientation function is well preserved across mammalian species. It controls whole-body orientation away from threat in rats (Redgrave et al. Reference Redgrave, McHaffie and Stein1996a; Reference Redgrave, Simkins, McHaffie and Stein1996b) and reaching behavior (towards a target) in cats (Courjon et al. Reference Courjon, Olivier and Pélisson2004; Iwamoto & Sasaki Reference Iwamoto and Sasaki1990; Werner et al. Reference Werner, Hoffmann and Dannenberg1997b), monkeys (Philipp & Hoffmann Reference Philipp and Hoffmann2014; Stuphorn et al. Reference Stuphorn, Bauswein and Hoffmann2000; Werner et al. Reference Werner, Dannenberg and Hoffmann1997a; Reference Werner, Hoffmann and Dannenberg1997b), and humans (Himmelbach et al. Reference Himmelbach, Linzenbold and Ilg2013; Linzenbold & Himmelbach Reference Linzenbold and Himmelbach2012). More recent research suggests that SC also participates in target selection – in picking out an item of interest – whether or not orienting behaviour follows (Müller et al. Reference Müller, Philiastides and Newsome2005).
It is this feature of SC that is most relevant here. Insofar as infants orient towards adult faces in the first moments after birth, the SC is the most likely candidate for this orienting mechanism. For example, Johnson et al. (Reference Johnson, Dziurawiec, Ellis and Morton1991; Reference Johnson, Senju and Tomalski2015) champion a two-process theory of facial processing in which an innate sub-cortical system, called CONSPEC, biases orientation towards faces. This bias ensures salient input for the “training up” of cortical areas in facial recognition. Still, few researchers have held that the superior layers of SC themselves process for orofacial features and/or expressions. Rather, the question at issue is whether the SC visual layers are biased towards some feature that all and only faces have, or whether SC orients towards faces much of the time given general biases of SC I–III visual neurons at birth. Either way, SC is understood as a mechanism for selection and orientation, not for facial/gesture recognition. And recognition of different facial/bodily gestures is necessary for imitation.
Further, there is a more conclusive reason why SC could not be the basis of neonate imitation. Mammalian research suggests that the topographic maps of SC deep layers are formed and aligned by multi-stage developmental processes (see Cang & Feldheim [2013] for a review). In utero, chemical cues provide guidance for the axons of retinal cells into SC that preserve the topographic maps of the retina and V1 (Triplett Reference Triplett2014; Triplett et al. Reference Triplett, Phan, Yamada and Feldheim2012). Next, endogenous wave-like activity from the retina establishes connections that preserve topographic relations both within and between these layers (Furman et al. Reference Furman, Xu and Crair2013). In the last stage, SC multimodal neurons undergo a critical period of plasticity, a learning period during which potentially multimodal cells adjust their response to reflect those modalities that prove most valuable (Balmer & Pallas Reference Balmer and Pallas2015; Xu et al. Reference Xu, Sun, Zhou, Zhang and Yu2014a; Reference Xu, Yu, Rowland, Stanford and Stein2014b; Reference Xu, Yu, Stanford, Rowland and Stein2015). Importantly, this critical period of postnatal plasticity cannot occur without input from association cortex (Jiang et al. Reference Jiang, Wallace, Jiang, Vaughan and Stein2001). So SC maturation requires (a) a functional association cortex, (b) functional connections between association cortex and SC, and (c) significant postnatal experience. In cats, this occurs 4 months after birth (Wallace & Stein Reference Wallace and Stein1997). Neil et al. (Reference Neil, Chee-Ruiter, Scheier, Lewkowicz and Shimojo2006) estimated that human infants are 8 to 10 months old before this particular kind of multimodal integration is in place. SC is thus highly unlikely to instantiate neonate imitation because the crucial step of multimodal mapping does not occur in newborns.
7.2. Can there be imitation without representation?
Let's agree for the sake of argument that neonates do not solve the correspondence problem through multi- or supramodal representations – or indeed through any representational system at all. Robust neonatal imitation could still occur. As the authors agree, infant stereotypies are produced through the coordinated activation of subcortical CPGs. Thus, the correspondence problem is more likely “solved” through resonance and entrainment. Think here of the aerodigestive system in the engineering terms of control systems. In a closed-loop system, sensory feedback produced during the last cycle of behaviour is used to approximate a set point of the system – that is, a value for one of the process variables – in the next oscillation. So in suckling, when the compression stroke of the jaw meets with resistance, the power stroke is adjusted to exert more force. Or in swallowing, feedback from the leading edge of esophageal peristalsis adjusts the speed/force of subsequent contractions. Yet because the “goal” of aerodigestive development is merely the smooth production of behavioural sequences often repeated thousands of times in an infant's day, this network is unlikely to represent its process variables. Resonance and entrainment produce faster, more reliable results than could any feed-forward model of the process state. Of course, by adulthood even the sight of food on a plate will reset the parameters of swallow in anticipation (Leopold & Daniels Reference Leopold and Daniels2009), presumably by means of the 15–20 cortical sites involved in producing adult swallow (Ertekin Reference Ertekin2011; Ertekin & Aydogdu Reference Ertekin and Aydogdu2003; Sörös et al. Reference Sörös, Lalone, Smith, Stevens, Theurer, Menon and Martin2008; Reference Sörös, Inamoto and Martin2009). But for the neonate, a continuous closed-loop control is a superior system. Thus, as long as the relevant visual stimuli release or entrain matching behaviour, the correspondence problem will be solved without representational matching.
7.2.1. Answer
Recent work on motor systems, including work on the mirror system, suggests that central pattern generators lie at the core of motor function in vertebrates (Grillner Reference Grillner2006; Grillner et al. Reference Grillner, Hellgren, Ménard, Saitoh and Wikström2005a; Reference Grillner, Markram, De Schutter, Silberberg and LeBeau2005b; Reference Grillner, Wallén, Saitoh, Kozlov and Robertson2008; Kozlov et al. Reference Kozlov, Huss, Lansner, Kotaleski and Grillner2009; Mahan & Georgopoulos Reference Mahan and Georgopoulos2013). Predictably, many questions remain about how resonance might work for coupled oscillators in situ: How are sub-threshold activation patterns brought to threshold? What are the means of coupling? How are the values of the process variables modified? And how do cortical signals alter or entrain CPG motor outputs? These are all open questions, and we cannot insist that the resonance theorist answer them on demand. On the other hand, the biggest hurdle for anyone who champions a resonance theory of “matching” is the explanation of how the neonatal visual system encodes adult gestures that are registered/encoded by networks of oscillators. It is not enough to suggest here that seeing an instance of TP/R disinhibits the TP/R network or that recognition of an open mouth releases the CPG for MO/C. Mere association between a sensory input and a motor output is not imitation. Instead, there must be a systematic explanation of how the neonatal brain recognizes specific gestures and selects the relevant CPG by means of resonance. To solve the correspondence problem – to imitate – the infant must have a systematic means by which this arbitrary visual input is matched to that proprioceptive feedback, which is produced by that repetitive stereotypy, using the concepts of oscillators and control systems. This is a tall order.
We are not suggesting that the aerodigestive theory offers a better explanation of neonatal imitation, of course, because it is not a theory of imitation. But it meshes nicely with other areas of research that can explain what we observe in these experiments: that is, why infants orient towards the face of the model, watch intently as the model poses, and then produce general movements in response to that neutral face. Perhaps the infant orients towards the model's face as a result of motion, novelty, or as a result of the orientation biases of visual cells in SC (Johnson et al. Reference Johnson, Dziurawiec, Ellis and Morton1991; Reference Johnson, Senju and Tomalski2015). Although the gesture is demonstrated, a rudimentary form of turn-taking in the neonate suppresses general movements as a class (Dominguez et al. Reference Dominguez, Devouche, Apter and Gratier2016). And when the model switches from TP/R to a neutral expression – or what amounts to a still face for the infant – the inhibition of aerodigestive CPGs ceases, and the most frequent stereotypies, as a function of age, are released. This is the kind of explanation that dovetails with models of early learning for gaze-following, emotional expression, facial recognition (of the mother's face), and categorical perception (seeing faces as a special kind of object).
7.3. How to explain the neonatal imitation experimental data?
Let's put aside questions of mechanism and talk about why neonatal imitation is a good explanation of the experimental results. Proponents of neonatal imitation have long argued that it fosters parental attachment, which is of vital importance to infant survival. As we come to know more about the social and cognitive development of infants, it seems clear that social interaction between the infant and caregiver is an essential factor in early motor, sensory, and cognitive development (Althaus & Plunkett Reference Althaus and Plunkett2015; Arditi et al. Reference Arditi, Feldman and Eidelman2006; Ham & Tronick Reference Ham and Tronick2006; Lavelli & Fogel Reference Lavelli and Fogel2002; Messinger & Fogel Reference Messinger and Fogel2007; Serrano et al. Reference Serrano, Iglesias and Loeches1992). By itself, the aerodigestive theory does not explain the neonatal imitation experimental data. It explains only why neonates would make aerodigestive behaviors.
7.3.1. Answer
Arousal theorists have often argued that the appearance of neonatal imitation is a general artifact of arousal (Anisfeld Reference Anisfeld1991; Reference Anisfeld1996; Reference Anisfeld, Hurley and Chater2005; Jones Reference Jones1996; Reference Jones2006a; Reference Jones2006b). Neonates orient towards salient visual properties and, once oriented, are aroused by this stimulation; once aroused, they increase the rate of some spontaneous movements. Human faces at close range – be it a face with a protruding tongue, or even a “still face” – are among these salient properties. We believe the arousal theorist must be right: We see increased orofacial stereotypies directly after birth and in the presence of other arousing stimuli such as human faces, music, moving inanimate objects, and so on. What the arousal theory has lacked, however, is an explanation of why neonatal arousal expresses itself in just this way, at precisely this time in development. Here we have the beginnings of an answer. At birth, the neurochemistry of the event creates unprecedented levels of arousal, which ensures a safe transition from an aquatic existence to land-based respiration and suckling (recall the survival value of suckling within the first hour after birth). This explains why the rate of orofacial “gestures” is greatest in the few moments after birth even without human interaction. When newly born infants are shown human faces, the visual biases inherent in (most likely) the superior layers of SC produce greater levels of transient arousal, which in turn causes ever more orofacial stereotypies. This same pattern of arousal and of transient orofacial gestures continues until the infant has mastered the mechanics of suckling and respiration – and until these rhythmic movements have produced the requisite changes in S1 and M1 functionality. In the weeks and months following birth, the infant broadens her typical response to arousal (Prechtl Reference Prechtl1993). Orofacial behaviours fade as the other stereotypies (from among the 47 that Thelen observed) become dominant. Glee (or rage!) can now be expressed by more frequent “variations of kicking, rocking, waving, bouncing, scratching, banging, rubbing, thrusting, swaying, and twisting” (Thelen Reference Thelen1981b, p. 239). All of these stereotypies are likely to aid sensorimotor development of the spine, brainstem, and cortex. But in the grand scheme of human sensorimotor development, it is subcortical aerodigestion first, all of the rest sometime later.
We have not explained, so far, the differential responses of neonates to specific gestures: For example, why do neonates show more TP/R than MO/C after watching an adult model TP/R? One thing we can say, here, is that we know very little about arousal, and the development of arousal, in the neonate. One naïve tendency – to which both authors unwittingly succumbed – is to imagine that sleep/arousal patterns in adults are a good model for the infant. Because the fetus is clearly more active at certain times than at others in utero, we imagine that the fetus is therefore either asleep or awake, no matter how early in gestation. But as with most other systems in the neonate, the mechanisms underlying sleep/arousal are not yet mature (Nijhuis et al. Reference Nijhuis, Prechtl, Martin and Bots1982). Nor is arousal controlled by a single mechanism, an on-off toggle switch between sleep and wakefulness. Arousal is effected differentially by both exogenous stimuli and endogenous mechanisms (Wass & Smith Reference Wass and Smith2014) and by interaction with both circadian and ultradian cycles (Blum et al. Reference Blum, Zhu, Moquin, Kokoeva, Gratton, Giros and Storch2014; Blumberg et al. Reference Blumberg, Gall and Todd2014; Mohawk et al. Reference Mohawk, Green and Takahashi2012). In other words, we may now know, in the broadest strokes, why we should be dubious about the results of neonatal imitation experiments. But without understanding the mechanisms of infant arousal, how they develop, or the developmental relationships between attention, emotion, and arousal, we are definitely missing the fine brushstrokes required. Without this knowledge, it is impossible to control for confounding factors in neonate imitation experiments. Neonatal apparent alertness, fussiness, and crying – even vagal tone – are only gross measures of arousal, a central factor in NI experiments. So we know arousal is relevant to what we see in these experiments, but we do not yet understand how stimuli (such as TP/R, still face, the voice of the model, or the absence of the mother) affects, or fails to affect, more subtle measures of infant arousal. Is tongue protrusion more interesting than mouth opening? Is still face more unnerving to the neonate than an open mouth? Or is still face unnerving only when it follows a period of normal interaction? Presumably animal models will help us determine how social stimuli of particular kinds interact with the internal states of neonates, both mammalian and human.
We should also point out that despite the obvious plausibility of social explanations of neonate imitation, the evidence for the social hypothesis in this particular case is quite weak. There are any number of other mechanisms that promote human maternal/parental attachment that are simple and effective: skin-to-skin contact (Bigelow & Power Reference Bigelow and Power2012; Feldman & Eidelman Reference Feldman and Eidelman2003), breast-feeding (Kim et al. Reference Kim, Feldman, Mayes, Eicher, Thompson, Leckman and Swain2011), increased oxytocin levels during pregnancy and after birth (Feldman et al. Reference Feldman, Weller, Zagoory-Sharon and Levine2007; Levine et al. Reference Levine, Zagoory-Sharon, Feldman and Weller2007), olfactory cues (Fleming et al. Reference Fleming, O'Day and Kraemer1999; Marlier et al. Reference Marlier, Schaal and Soussignan1998; Schaal Reference Schaal2009; Varendi & Porter Reference Varendi and Porter2001), maternal voice (Ockleford et al. Reference Ockleford, Vince, Layton and Reader1988), and the co-ordination of maternal-infant heart rhythms (Feldman et al. Reference Feldman, Magori-Cohen, Galili, Singer and Louzoun2011). Most of these mechanisms are triggered in the course of normal infant care and can be explained in terms of regulatory/physiological mechanisms present at birth. Given the importance of attachment, it seems likely that further mechanisms of attachment will be discovered. The more known mechanisms of attachment we discover, however, the weaker the evolutionary argument that imitation is necessary for survival. In contrast, a competent neonatal aerodigestive system requires specific kinds of neonatal aerodigestive sequences, each comprising multiple stereotypies. Assuming that aerodigestive development occurs via activity-dependent processes, then, stereotypies such as TP/R and MO/C are a necessary part of human development.
8. Conclusion
In our view, a critical step in resolving questions about the development of complex psychological processes will be to examine them from different levels of explanation. The combination of advances in motor development and detailed neurophysiological studies of both humans and nonhuman animals could provide developmental psychology with a more biologically plausible view of infant development.
Understanding developmental processes requires going beyond the dichotomies of nature and nurture, innate and acquired, and focusing instead on the broader biological principles that govern and constrain development. For example, developmental psychologists' interest in intermodal perception has generated a number of findings about the discrimination and cross-modal transfer abilities of young infants (Bahrick Reference Bahrick1987; Reference Bahrick1992; Bushnell Reference Bushnell1982; Gibson & Spelke Reference Gibson, Spelke, Mussen, Flavell and Markman1983; Gibson & Walker Reference Gibson and Walker1984; Lewkowicz Reference Lewkowicz1986; Reference Lewkowicz1992; Meltzoff & Borton Reference Meltzoff and Borton1979; Streri Reference Streri, Pownall and Kingerlee1993; Streri & Molina Reference Streri, Molina, Lewkowicz and Lickliter1994; Streri & Pêcheux Reference Streri and Pêcheux1986). However, this interest has not sparked any corresponding interest in either (a) the various contributions of prior prenatal and postnatal experience, (b) the various constraints arising from different developmental trajectories of sensory and motor systems, or (c) the specific processes and mechanisms whereby intermodal functioning is achieved and modified during early development (Bahrick & Lickliter Reference Bahrick and Lickliter2000). Using different levels of analyses to fill the gaps between these kinds of developmental concerns could substantially inform the complex relationship between genetic, sensory, motor, and environmental influences on infant development.
What we have tried to demonstrate, in the preceding long story, is the interconnectedness of the mechanisms of the developing system. Suckling, swallowing, or indeed any behavior is not hardwired but rather is assembled in real time within a particular context as the product of multiple developing elements. Many factors routinely shape development, from the ordinary – such as the importance of suckling for survival – to the extraordinary – such as the size of the oral cavity and the forward position of the tongue. Developmental psychologists thus should take a broader perspective that acknowledges the complex and contingent nature of development and that seeks to integrate relevant data from developmental biology and neuroscience into a more coherent and comprehensive account of the ways infants develop. Such approaches have become increasingly prevalent in the study of motor development (Thelen et al. Reference Thelen, Schöner, Scheier and Smith2001; Thelen & Ulrich Reference Thelen and Ulrich1991), cognitive development (Bjorklund Reference Bjorklund1995; Richardson Reference Richardson1998), language development (Dent Reference Dent1990; Zukow-Goldring Reference Zukow-Goldring, Dent-Reed and Zukow-Goldring1997), personality and emotional development (Lerner Reference Lerner, Hetherington, Lerner and Perlmutter1988; Lewis & Granic Reference Lewis and Granic2002), and social development (Cairns et al. Reference Cairns, Gariépy and Hood1990; Fogel Reference Fogel1993), to cite but a few examples. This perspective has the potential to achieve a fuller and more useful understanding of development and could move developmental psychology away from extreme forms of nativism and towards a more integrated account of development.
ACKNOWLEDGMENTS
This research was funded by a James S. McDonnell Foundation Centennial Fellowship (Philosophy of Science) awarded to Kathleen A. Akins. The original version of the paper leading up to this article was co-authored with Lyle Crawford. His careful research into the methodology of neonatal experiments and reading of the original literature convinced us to dig deeper into the neurophysiological and developmental literature, to follow our noses. Thanks also to the SFU Neurophilosophy Supper Club, in particular Holly Andersen, Trey Boone, Tereza Hadravova, Rick Grush, and Simon Pollon – and at Washington University in St. Louis, to Carl Craver, John Doris, Daniel Povinelli, and fellow PNP graduate students. We are grateful for helpful comments and suggestions of our anonymous reviewers. Special thanks to Martin Hahn, who read and commented on the many versions of this paper.
Target article
Neonatal imitation in context: Sensorimotor development in the perinatal period
Related commentaries (21)
A major blow to primate neonatal imitation and mirror neuron theory
An unsettled debate: Key empirical and theoretical questions are still open
Animal studies help clarify misunderstandings about neonatal imitation
Beyond aerodigestion: Exaptation of feeding-related mouth movements for social communication in human and nonhuman primates
Beyond sensorimotor imitation in the neonate: Mentalization psychotherapy in adulthood
Do innate stereotypies serve as a basis for swallowing and learned speech movements?
Does early motor development contribute to speech perception?
Ecological validity, embodiment, and killjoy explanations in developmental psychology
Elements of a comprehensive theory of infant imitation
Infant orofacial movements: Inputs, if not outputs, of early imitative ability?
Mommy or me? Who is the agent in a sense of agency in infant orofacial stereotypies?
Multisensory control of ingestive movements and the myth of food addiction in obesity
Philosopher's disease and its antidote: Perspectives from prenatal behavior and contagious yawning and laughing
Spontaneous communication and infant imitation
The case against newborn imitation grows stronger
The functional and developmental role of imitation in the (a)typical brain
There is no compelling evidence that human neonates imitate
Turning the tide: A plea for cognitively lean interpretations of infant behaviour
When dyadic interaction is the context: Mimicry behaviors on the origin of imitation
“It takes two to know one” – Tongue protrusion-retraction is only one small facet of early intersubjectivity
“What” matters more than “Why” – Neonatal behaviors initiate social responses
Author response
Beyond neonatal imitation: Aerodigestive stereotypies, speech development, and social interaction in the extended perinatal period