The authors of the target article stated, “the interaction between representation and previous experience may be key to building machines that learn as fast as people do” (sect. 4.2.3, last para.) To design such machines, they should function as humans do. But a human acts and learns based on his or her social-MOTOR experience. Three main pieces of evidence can demonstrate our claim:
First, any learning or social interacting is based on social motor embodiment. In the field of human movement sciences, many pieces of evidence indicate that we are all influenced by the motor behavior of the one with whom we are interacting (e.g., Schmidt & Richardson Reference Schmidt, Richardson, Fuchs and Jirsa2008). The motor behavior directly expresses the state of mind of the partner (Marin et al. Reference Marin, Issartel and Chaminade2009). For example, if someone is shy, this state of mind will be directly embodied in her or his entire posture, facial expressions, gaze, and gestures. It is in the movement that we observe the state of mind of the other “interactant.” But when we are responding to that shy person, we are influenced in return by that behavior. Obviously we can modify intentionally our own motor behavior (to ease the interaction with him or her). But in most cases we are not aware of the alterations of our movements. For example, when an adult walks next to a child, they both unintentionally synchronize their stride length to each other (implying they both modify their locomotion to walk side-by-side). Another example in mental health disorders showed that an individual suffering from schizophrenia does not interact “motorly” the same way as a social phobic (Varlet et al. Reference Varlet, Marin, Capdevielle, Del-Monte, Schmidt, Salesse, Boulenger, Bardy and Raffard2014). Yet, both pathologies present motor impairment and social withdrawal. But what characterizes their motor differences is based on the state of mind of the patients. In our example, the first patient presents attentional impairment, whereas the other suffers from social inhibition. If, however, a healthy participant is engaged in a social-motor synchronization task, both participants (the patient and the healthy subject) unintentionally adjust their moves (Varlet et al. Reference Varlet, Marin, Capdevielle, Del-Monte, Schmidt, Salesse, Boulenger, Bardy and Raffard2014).
This study demonstrates that unconscious communication is sustained even though the patients are suffering from social interaction disorders. We can then state that mostly low-level treatments of sensorimotor flows are involved in this process. Consequently, machines/robots should be embedded with computational models, which tackles the very complex question of adapting to the human world using sensorimotor learning.
We claim that enactive approaches of this type will drastically reduce the complexity of future computational models. Methods of this type are indeed supported by recent advances in the human brain mirroring system and theories based on motor resonance (Meltzoff Reference Meltzoff2007). In this line of thinking, computational models have been built and used to improve human robot interaction and communication, in particular through the notion of learning by imitation (Breazeal & Scassellati Reference Breazeal and Scassellati2002; Lopes & Santos-Victor Reference Lopes and Santos-Victor2007). Furthermore, some studies embedded machines with computational models using an adequate action-perception loop and showed that some complex social competencies such as immediate imitation (present in early human development) could emerge through sensorimotor ambiguities as proposed in Gaussier et al. (Reference Gaussier, Moga, Quoy and Banquet1998), Nagai et al. (Reference Nagai, Kawai and Asada2011), and Braud et al. (Reference Braud, Mostafaoui, Karaouzene, Gaussier, del Pobil, Chinalleto, Martinez-Martin, Hallam, Cervera and Morales2014).
This kind of model allows future machines to better generalize their learning and to acquire new social skills. In other recent examples, using a very simple neural network providing minimal sensorimotor adaptation capabilities to the robot, unintentional motor coordination could emerge during an imitation game (of a simple gesture) with a human (Hasnain et al. Reference Hasnain, Mostafaoui and Gaussier2012; Reference Hasnain, Mostafaoui, Salesse, Marin and Gaussier2013). An extension of this work demonstrated that a robot could quickly and “online” learn more complex gestures and synchronize its behavior to the human partner based on the same sensorimotor approach (Ansermin et al. Reference Ansermin, Mostafaoui, Beausse, Gaussier, Tuci, Giagkos, Wilson and Hallam2016).
Second, even to learn (or understand) what a simple object is, people need to act on it (O'Regan Reference O'Regan2011). For example, if we do not know what a “chair” is, we will understand its representation by sitting on it, touching it. The definition is then easy: A chair is an object on which we can sit, regardless of its precise shape. Now, if we try to define its representation before acting, it becomes very difficult to describe it. This requires determining the general shape, number of legs, with or without arms or wheels, texture, and so on. Hence, when programming a machine, this latter definition brings a high computational cost that drastically slows down the speed of the learning (and pushes away the idea of learning as fast as humans do). In that case, the machines/robots should be able to learn directly by acting and perceiving the consequences of their actions on the object/person.
Finally, from a more low-level aspect, even shape recognition is strongly connected to our motor experience. Viviani and Stucchi (Reference Viviani and Stucchi1992) demonstrated that when they showed a participant a point light performing a perfect circle, as soon as this point slowed down at the upper and lower parts of this circle, the participant did not perceive the trajectory as a circle any longer, but as an ellipse. This perceptual mistake is explained by the fact that we perceive the shape of an object based on the way we draw it (in drawing a circle, we move with a constant speed, whereas in drawing an ellipse, we slow down at the two opposite extremities). Typically, handwriting learning (often cited by the authors) is based not only on learning visually the shape of the letters, but also mainly on global sensorimotor learning of perceiving (vision) and acting (writing, drawing). Once again, this example indicates that machines/robots should be able to understand an object or the reaction of a person based on how they have acted on that object/person.
Therefore, to design machines that learn as fast as humans, we need to make them able to (1) learn through a perception-action paradigm, (2) perceive and react to the movements of other agents or to the object on which they are acting, and (3) learn to understand what his or her or its actions mean.
The authors of the target article stated, “the interaction between representation and previous experience may be key to building machines that learn as fast as people do” (sect. 4.2.3, last para.) To design such machines, they should function as humans do. But a human acts and learns based on his or her social-MOTOR experience. Three main pieces of evidence can demonstrate our claim:
First, any learning or social interacting is based on social motor embodiment. In the field of human movement sciences, many pieces of evidence indicate that we are all influenced by the motor behavior of the one with whom we are interacting (e.g., Schmidt & Richardson Reference Schmidt, Richardson, Fuchs and Jirsa2008). The motor behavior directly expresses the state of mind of the partner (Marin et al. Reference Marin, Issartel and Chaminade2009). For example, if someone is shy, this state of mind will be directly embodied in her or his entire posture, facial expressions, gaze, and gestures. It is in the movement that we observe the state of mind of the other “interactant.” But when we are responding to that shy person, we are influenced in return by that behavior. Obviously we can modify intentionally our own motor behavior (to ease the interaction with him or her). But in most cases we are not aware of the alterations of our movements. For example, when an adult walks next to a child, they both unintentionally synchronize their stride length to each other (implying they both modify their locomotion to walk side-by-side). Another example in mental health disorders showed that an individual suffering from schizophrenia does not interact “motorly” the same way as a social phobic (Varlet et al. Reference Varlet, Marin, Capdevielle, Del-Monte, Schmidt, Salesse, Boulenger, Bardy and Raffard2014). Yet, both pathologies present motor impairment and social withdrawal. But what characterizes their motor differences is based on the state of mind of the patients. In our example, the first patient presents attentional impairment, whereas the other suffers from social inhibition. If, however, a healthy participant is engaged in a social-motor synchronization task, both participants (the patient and the healthy subject) unintentionally adjust their moves (Varlet et al. Reference Varlet, Marin, Capdevielle, Del-Monte, Schmidt, Salesse, Boulenger, Bardy and Raffard2014).
This study demonstrates that unconscious communication is sustained even though the patients are suffering from social interaction disorders. We can then state that mostly low-level treatments of sensorimotor flows are involved in this process. Consequently, machines/robots should be embedded with computational models, which tackles the very complex question of adapting to the human world using sensorimotor learning.
We claim that enactive approaches of this type will drastically reduce the complexity of future computational models. Methods of this type are indeed supported by recent advances in the human brain mirroring system and theories based on motor resonance (Meltzoff Reference Meltzoff2007). In this line of thinking, computational models have been built and used to improve human robot interaction and communication, in particular through the notion of learning by imitation (Breazeal & Scassellati Reference Breazeal and Scassellati2002; Lopes & Santos-Victor Reference Lopes and Santos-Victor2007). Furthermore, some studies embedded machines with computational models using an adequate action-perception loop and showed that some complex social competencies such as immediate imitation (present in early human development) could emerge through sensorimotor ambiguities as proposed in Gaussier et al. (Reference Gaussier, Moga, Quoy and Banquet1998), Nagai et al. (Reference Nagai, Kawai and Asada2011), and Braud et al. (Reference Braud, Mostafaoui, Karaouzene, Gaussier, del Pobil, Chinalleto, Martinez-Martin, Hallam, Cervera and Morales2014).
This kind of model allows future machines to better generalize their learning and to acquire new social skills. In other recent examples, using a very simple neural network providing minimal sensorimotor adaptation capabilities to the robot, unintentional motor coordination could emerge during an imitation game (of a simple gesture) with a human (Hasnain et al. Reference Hasnain, Mostafaoui and Gaussier2012; Reference Hasnain, Mostafaoui, Salesse, Marin and Gaussier2013). An extension of this work demonstrated that a robot could quickly and “online” learn more complex gestures and synchronize its behavior to the human partner based on the same sensorimotor approach (Ansermin et al. Reference Ansermin, Mostafaoui, Beausse, Gaussier, Tuci, Giagkos, Wilson and Hallam2016).
Second, even to learn (or understand) what a simple object is, people need to act on it (O'Regan Reference O'Regan2011). For example, if we do not know what a “chair” is, we will understand its representation by sitting on it, touching it. The definition is then easy: A chair is an object on which we can sit, regardless of its precise shape. Now, if we try to define its representation before acting, it becomes very difficult to describe it. This requires determining the general shape, number of legs, with or without arms or wheels, texture, and so on. Hence, when programming a machine, this latter definition brings a high computational cost that drastically slows down the speed of the learning (and pushes away the idea of learning as fast as humans do). In that case, the machines/robots should be able to learn directly by acting and perceiving the consequences of their actions on the object/person.
Finally, from a more low-level aspect, even shape recognition is strongly connected to our motor experience. Viviani and Stucchi (Reference Viviani and Stucchi1992) demonstrated that when they showed a participant a point light performing a perfect circle, as soon as this point slowed down at the upper and lower parts of this circle, the participant did not perceive the trajectory as a circle any longer, but as an ellipse. This perceptual mistake is explained by the fact that we perceive the shape of an object based on the way we draw it (in drawing a circle, we move with a constant speed, whereas in drawing an ellipse, we slow down at the two opposite extremities). Typically, handwriting learning (often cited by the authors) is based not only on learning visually the shape of the letters, but also mainly on global sensorimotor learning of perceiving (vision) and acting (writing, drawing). Once again, this example indicates that machines/robots should be able to understand an object or the reaction of a person based on how they have acted on that object/person.
Therefore, to design machines that learn as fast as humans, we need to make them able to (1) learn through a perception-action paradigm, (2) perceive and react to the movements of other agents or to the object on which they are acting, and (3) learn to understand what his or her or its actions mean.
ACKNOWLEDGMENT
This work was supported by the Dynamics of Interactions, Rhythmicity, Action and Communication (DIRAC), a project funded by the Agence Nationale de la Recherche (Grant ANR 13-ASTR-0018-01).