Deep learning (DL) approaches made great advances in artificial intelligence, but are still far from human learning. As argued convincingly by Lake et al., differences include human capabilities to learn causal models of the world from very few data, leveraging compositional representations and priors like intuitive physics and psychology. However, there are other fundamental differences between current DL systems and human learning, as well as technical ingredients to fill this gap that are either superficially, or not adequately, discussed by Lake et al.
These fundamental mechanisms relate to autonomous development and learning. They are bound to play a central role in artificial intelligence in the future. Current DL systems require engineers to specify manually a task-specific objective function for every new task, and learn through offline processing of large training databases. On the contrary, humans learn autonomously open-ended repertoires of skills, deciding for themselves which goals to pursue or value and which skills to explore, driven by intrinsic motivation/curiosity and social learning through natural interaction with peers. Such learning processes are incremental, online, and progressive. Human child development involves a progressive increase of complexity in a curriculum of learning where skills are explored, acquired, and built on each other, through particular ordering and timing. Finally, human learning happens in the physical world, and through bodily and physical experimentation, under severe constraints on energy, time, and computational resources.
In the two last decades, the field of Developmental and Cognitive Robotics (Asada et al. Reference Asada, Hosoda, Kuniyoshi, Ishiguro, Inui, Yoshikawa and Yoshida2009; Cangelosi and Schlesinger Reference Cangelosi and Schlesinger2015), in strong interaction with developmental psychology and neuroscience, has achieved significant advances in computational modeling of mechanisms of autonomous development and learning in human infants, and applied them to solve difficult artificial intelligence (AI) problems. These mechanisms include the interaction between several systems that guide active exploration in large and open environments: curiosity, intrinsically motivated reinforcement learning (Barto Reference Barto, Baldassarre and Mirolli2013; Oudeyer et al. Reference Oudeyer, Kaplan and Hafner2007; Schmidhuber Reference Schmidhuber1991) and goal exploration (Baranes and Oudeyer Reference Baranes and Oudeyer2013), social learning and natural interaction (Chernova and Thomaz Reference Chernova and Thomaz2014; Vollmer et al. Reference Vollmer, Mühlig, Steil, Pitsch, Fritsch, Rohlfing and Wrede2014), maturation (Oudeyer et al. Reference Oudeyer, Baranes, Kaplan, Baldassarre and Mirolli2013), and embodiment (Pfeifer et al. Reference Pfeifer, Lungarella and Iida2007). These mechanisms crucially complement processes of incremental online model building (Nguyen and Peters Reference Nguyen-Tuong and Peters2011), as well as inference and representation learning approaches discussed in the target article.
Intrinsic motivation, curiosity and free play
For example, models of how motivational systems allow children to choose which goals to pursue, or which objects or skills to practice in contexts of free play, and how this can affect the formation of developmental structures in lifelong learning have flourished in the last decade (Baldassarre and Mirolli Reference Baldassarre and Mirolli2013; Gottlieb et al. Reference Gottlieb, Oudeyer, Lopes and Baranes2013). In-depth models of intrinsically motivated exploration, and their links with curiosity, information seeking, and the “child-as-a-scientist” hypothesis (see Gottlieb et al. [2013] for a review), have generated new formal frameworks and hypotheses to understand their structure and function. For example, it was shown that intrinsically motivated exploration, driven by maximization of learning progress (i.e., maximal improvement of predictive or control models of the world; see Oudeyer et al. [2007] and Schmidhuber [1991]) can self-organize long-term developmental structures, where skills are acquired in an order and with timing that share fundamental properties with human development (Oudeyer and Smith Reference Oudeyer and Smith2016). For example, the structure of early infant vocal development self-organizes spontaneously from such intrinsically motivated exploration, in interaction with the physical properties of the vocal systems (Moulin-Frier et al. Reference Moulin-Frier, Nguyen and Oudeyer2014). New experimental paradigms in psychology and neuroscience were recently developed and support these hypotheses (Baranes et al. Reference Baranes, Oudeyer and Gottlieb2014; Kidd Reference Kidd, Piantadosi and Aslin2012).
These algorithms of intrinsic motivation are also highly efficient for multitask learning in high-dimensional spaces. In robotics, they allow efficient stochastic selection of parameterized experiments and goals, enabling incremental collection of data and learning of skill models, through automatic and online curriculum learning. Such active control of the growth of complexity enables robots with high-dimensional continuous action spaces to learn omnidirectional locomotion on slippery surfaces and versatile manipulation of soft objects (Baranes and Oudeyer Reference Baranes and Oudeyer2013) or hierarchical control of objects through tool use (Forestier and Oudeyer Reference Forestier, Oudeyer, Papafragou, Grodner, Mirman and Trueswell2016). Recent work in deep reinforcement learning has included some of these mechanisms to solve difficult reinforcement learning problems, with rare or deceptive rewards (Bellemare et al. Reference Bellemare, Srinivasan, Ostrovski, Schaul, Saxton, Munos, Lee, Sugiyama, Luxburg, Guyon and Garnett2016; Kulkarni et al. Reference Kulkarni, Narasimhan, Saeedi and Tenenbaum2016), as learning multiple (auxiliary) tasks in addition to the target task simplifies the problem (Jaderberg et al. Reference Jaderberg, Mnih, Czarnecki, Schaul, Leibo, Silver and Kavukcuoglu2016). However, there are many unstudied synergies between models of intrinsic motivation in developmental robotics and deep reinforcement learning systems; for example, curiosity-driven selection of parameterized problems/goals (Baranes and Oudeyer Reference Baranes and Oudeyer2013) and learning strategies (Lopes and Oudeyer Reference Lopes and Oudeyer2012) and combinations between intrinsic motivation and social learning, for example, imitation learning (Nguyen and Oudeyer Reference Nguyen and Oudeyer2013), have not yet been integrated with deep learning.
Embodied self-organization
The key role of physical embodiment in human learning has also been extensively studied in robotics, and yet it is out of the picture in current deep learning research. The physics of bodies and their interaction with their environment can spontaneously generate structure guiding learning and exploration (Pfeifer and Bongard Reference Pfeifer, Lungarella and Iida2007). For example, mechanical legs reproducing essential properties of human leg morphology generate human-like gaits on mild slopes without any computation (Collins et al. Reference Collins, Ruina, Tedrake and Wisse2005), showing the guiding role of morphology in infant learning of locomotion (Oudeyer Reference Oudeyer2016). Yamada et al. (Reference Yamada, Mori and Kuniyoshi2010) developed a series of models showing that hand-face touch behaviours in the foetus and hand looking in the infant self-organize through interaction of a non-uniform physical distribution of proprioceptive sensors across the body with basic neural plasticity loops. Work on low-level muscle synergies also showed how low-level sensorimotor constraints could simplify learning (Flash and Hochner Reference Flash and Hochner2005).
Human learning as a complex dynamical system
Deep learning architectures often focus on inference and optimization. Although these are essential, developmental sciences suggested many times that learning occurs through complex dynamical interaction among systems of inference, memory, attention, motivation, low-level sensorimotor loops, embodiment, and social interaction. Although some of these ingredients are part of current DL research, (e.g., attention and memory), the integration of other key ingredients of autonomous learning and development opens stimulating perspectives for scaling up to human learning.
Deep learning (DL) approaches made great advances in artificial intelligence, but are still far from human learning. As argued convincingly by Lake et al., differences include human capabilities to learn causal models of the world from very few data, leveraging compositional representations and priors like intuitive physics and psychology. However, there are other fundamental differences between current DL systems and human learning, as well as technical ingredients to fill this gap that are either superficially, or not adequately, discussed by Lake et al.
These fundamental mechanisms relate to autonomous development and learning. They are bound to play a central role in artificial intelligence in the future. Current DL systems require engineers to specify manually a task-specific objective function for every new task, and learn through offline processing of large training databases. On the contrary, humans learn autonomously open-ended repertoires of skills, deciding for themselves which goals to pursue or value and which skills to explore, driven by intrinsic motivation/curiosity and social learning through natural interaction with peers. Such learning processes are incremental, online, and progressive. Human child development involves a progressive increase of complexity in a curriculum of learning where skills are explored, acquired, and built on each other, through particular ordering and timing. Finally, human learning happens in the physical world, and through bodily and physical experimentation, under severe constraints on energy, time, and computational resources.
In the two last decades, the field of Developmental and Cognitive Robotics (Asada et al. Reference Asada, Hosoda, Kuniyoshi, Ishiguro, Inui, Yoshikawa and Yoshida2009; Cangelosi and Schlesinger Reference Cangelosi and Schlesinger2015), in strong interaction with developmental psychology and neuroscience, has achieved significant advances in computational modeling of mechanisms of autonomous development and learning in human infants, and applied them to solve difficult artificial intelligence (AI) problems. These mechanisms include the interaction between several systems that guide active exploration in large and open environments: curiosity, intrinsically motivated reinforcement learning (Barto Reference Barto, Baldassarre and Mirolli2013; Oudeyer et al. Reference Oudeyer, Kaplan and Hafner2007; Schmidhuber Reference Schmidhuber1991) and goal exploration (Baranes and Oudeyer Reference Baranes and Oudeyer2013), social learning and natural interaction (Chernova and Thomaz Reference Chernova and Thomaz2014; Vollmer et al. Reference Vollmer, Mühlig, Steil, Pitsch, Fritsch, Rohlfing and Wrede2014), maturation (Oudeyer et al. Reference Oudeyer, Baranes, Kaplan, Baldassarre and Mirolli2013), and embodiment (Pfeifer et al. Reference Pfeifer, Lungarella and Iida2007). These mechanisms crucially complement processes of incremental online model building (Nguyen and Peters Reference Nguyen-Tuong and Peters2011), as well as inference and representation learning approaches discussed in the target article.
Intrinsic motivation, curiosity and free play
For example, models of how motivational systems allow children to choose which goals to pursue, or which objects or skills to practice in contexts of free play, and how this can affect the formation of developmental structures in lifelong learning have flourished in the last decade (Baldassarre and Mirolli Reference Baldassarre and Mirolli2013; Gottlieb et al. Reference Gottlieb, Oudeyer, Lopes and Baranes2013). In-depth models of intrinsically motivated exploration, and their links with curiosity, information seeking, and the “child-as-a-scientist” hypothesis (see Gottlieb et al. [2013] for a review), have generated new formal frameworks and hypotheses to understand their structure and function. For example, it was shown that intrinsically motivated exploration, driven by maximization of learning progress (i.e., maximal improvement of predictive or control models of the world; see Oudeyer et al. [2007] and Schmidhuber [1991]) can self-organize long-term developmental structures, where skills are acquired in an order and with timing that share fundamental properties with human development (Oudeyer and Smith Reference Oudeyer and Smith2016). For example, the structure of early infant vocal development self-organizes spontaneously from such intrinsically motivated exploration, in interaction with the physical properties of the vocal systems (Moulin-Frier et al. Reference Moulin-Frier, Nguyen and Oudeyer2014). New experimental paradigms in psychology and neuroscience were recently developed and support these hypotheses (Baranes et al. Reference Baranes, Oudeyer and Gottlieb2014; Kidd Reference Kidd, Piantadosi and Aslin2012).
These algorithms of intrinsic motivation are also highly efficient for multitask learning in high-dimensional spaces. In robotics, they allow efficient stochastic selection of parameterized experiments and goals, enabling incremental collection of data and learning of skill models, through automatic and online curriculum learning. Such active control of the growth of complexity enables robots with high-dimensional continuous action spaces to learn omnidirectional locomotion on slippery surfaces and versatile manipulation of soft objects (Baranes and Oudeyer Reference Baranes and Oudeyer2013) or hierarchical control of objects through tool use (Forestier and Oudeyer Reference Forestier, Oudeyer, Papafragou, Grodner, Mirman and Trueswell2016). Recent work in deep reinforcement learning has included some of these mechanisms to solve difficult reinforcement learning problems, with rare or deceptive rewards (Bellemare et al. Reference Bellemare, Srinivasan, Ostrovski, Schaul, Saxton, Munos, Lee, Sugiyama, Luxburg, Guyon and Garnett2016; Kulkarni et al. Reference Kulkarni, Narasimhan, Saeedi and Tenenbaum2016), as learning multiple (auxiliary) tasks in addition to the target task simplifies the problem (Jaderberg et al. Reference Jaderberg, Mnih, Czarnecki, Schaul, Leibo, Silver and Kavukcuoglu2016). However, there are many unstudied synergies between models of intrinsic motivation in developmental robotics and deep reinforcement learning systems; for example, curiosity-driven selection of parameterized problems/goals (Baranes and Oudeyer Reference Baranes and Oudeyer2013) and learning strategies (Lopes and Oudeyer Reference Lopes and Oudeyer2012) and combinations between intrinsic motivation and social learning, for example, imitation learning (Nguyen and Oudeyer Reference Nguyen and Oudeyer2013), have not yet been integrated with deep learning.
Embodied self-organization
The key role of physical embodiment in human learning has also been extensively studied in robotics, and yet it is out of the picture in current deep learning research. The physics of bodies and their interaction with their environment can spontaneously generate structure guiding learning and exploration (Pfeifer and Bongard Reference Pfeifer, Lungarella and Iida2007). For example, mechanical legs reproducing essential properties of human leg morphology generate human-like gaits on mild slopes without any computation (Collins et al. Reference Collins, Ruina, Tedrake and Wisse2005), showing the guiding role of morphology in infant learning of locomotion (Oudeyer Reference Oudeyer2016). Yamada et al. (Reference Yamada, Mori and Kuniyoshi2010) developed a series of models showing that hand-face touch behaviours in the foetus and hand looking in the infant self-organize through interaction of a non-uniform physical distribution of proprioceptive sensors across the body with basic neural plasticity loops. Work on low-level muscle synergies also showed how low-level sensorimotor constraints could simplify learning (Flash and Hochner Reference Flash and Hochner2005).
Human learning as a complex dynamical system
Deep learning architectures often focus on inference and optimization. Although these are essential, developmental sciences suggested many times that learning occurs through complex dynamical interaction among systems of inference, memory, attention, motivation, low-level sensorimotor loops, embodiment, and social interaction. Although some of these ingredients are part of current DL research, (e.g., attention and memory), the integration of other key ingredients of autonomous learning and development opens stimulating perspectives for scaling up to human learning.