Autonomous development and learning in artificial intelligence and robotics: Scaling up deep learning to human-like learning

Pierre-Yves Oudeyer

doi:10.1017/S0140525X17000243

Autonomous development and learning in artificial intelligence and robotics: Scaling up deep learning to human-like learning

Published online by Cambridge University Press: 10 November 2017

Pierre-Yves Oudeyer

Show author details

Pierre-Yves Oudeyer*: Affiliation:
Inria and Ensta Paris-Tech, 33405 Talence, France. pierre-yves.oudeyer@inria.frhttp://www.pyoudeyer.com

Article contents

Abstract
References

Rights & Permissions

Abstract

Autonomous lifelong development and learning are fundamental capabilities of humans, differentiating them from current deep learning systems. However, other branches of artificial intelligence have designed crucial ingredients towards autonomous learning: curiosity and intrinsic motivation, social learning and natural interaction with peers, and embodiment. These mechanisms guide exploration and autonomous choice of goals, and integrating them with deep learning opens stimulating perspectives.

Type: Open Peer Commentary
Information: Behavioral and Brain Sciences , Volume 40 , 2017 , e275

DOI: https://doi.org/10.1017/S0140525X17000243 [Opens in a new window]
Copyright: Copyright © Cambridge University Press 2017

Deep learning (DL) approaches made great advances in artificial intelligence, but are still far from human learning. As argued convincingly by Lake et al., differences include human capabilities to learn causal models of the world from very few data, leveraging compositional representations and priors like intuitive physics and psychology. However, there are other fundamental differences between current DL systems and human learning, as well as technical ingredients to fill this gap that are either superficially, or not adequately, discussed by Lake et al.

These fundamental mechanisms relate to autonomous development and learning. They are bound to play a central role in artificial intelligence in the future. Current DL systems require engineers to specify manually a task-specific objective function for every new task, and learn through offline processing of large training databases. On the contrary, humans learn autonomously open-ended repertoires of skills, deciding for themselves which goals to pursue or value and which skills to explore, driven by intrinsic motivation/curiosity and social learning through natural interaction with peers. Such learning processes are incremental, online, and progressive. Human child development involves a progressive increase of complexity in a curriculum of learning where skills are explored, acquired, and built on each other, through particular ordering and timing. Finally, human learning happens in the physical world, and through bodily and physical experimentation, under severe constraints on energy, time, and computational resources.

In the two last decades, the field of Developmental and Cognitive Robotics (Asada et al. Reference Asada, Hosoda, Kuniyoshi, Ishiguro, Inui, Yoshikawa and Yoshida2009; Cangelosi and Schlesinger Reference Cangelosi and Schlesinger2015), in strong interaction with developmental psychology and neuroscience, has achieved significant advances in computational modeling of mechanisms of autonomous development and learning in human infants, and applied them to solve difficult artificial intelligence (AI) problems. These mechanisms include the interaction between several systems that guide active exploration in large and open environments: curiosity, intrinsically motivated reinforcement learning (Barto Reference Barto, Baldassarre and Mirolli2013; Oudeyer et al. Reference Oudeyer, Kaplan and Hafner2007; Schmidhuber Reference Schmidhuber1991) and goal exploration (Baranes and Oudeyer Reference Baranes and Oudeyer2013), social learning and natural interaction (Chernova and Thomaz Reference Chernova and Thomaz2014; Vollmer et al. Reference Vollmer, Mühlig, Steil, Pitsch, Fritsch, Rohlfing and Wrede2014), maturation (Oudeyer et al. Reference Oudeyer, Baranes, Kaplan, Baldassarre and Mirolli2013), and embodiment (Pfeifer et al. Reference Pfeifer, Lungarella and Iida2007). These mechanisms crucially complement processes of incremental online model building (Nguyen and Peters Reference Nguyen-Tuong and Peters2011), as well as inference and representation learning approaches discussed in the target article.

Intrinsic motivation, curiosity and free play

For example, models of how motivational systems allow children to choose which goals to pursue, or which objects or skills to practice in contexts of free play, and how this can affect the formation of developmental structures in lifelong learning have flourished in the last decade (Baldassarre and Mirolli Reference Baldassarre and Mirolli2013; Gottlieb et al. Reference Gottlieb, Oudeyer, Lopes and Baranes2013). In-depth models of intrinsically motivated exploration, and their links with curiosity, information seeking, and the “child-as-a-scientist” hypothesis (see Gottlieb et al. [2013] for a review), have generated new formal frameworks and hypotheses to understand their structure and function. For example, it was shown that intrinsically motivated exploration, driven by maximization of learning progress (i.e., maximal improvement of predictive or control models of the world; see Oudeyer et al. [2007] and Schmidhuber [1991]) can self-organize long-term developmental structures, where skills are acquired in an order and with timing that share fundamental properties with human development (Oudeyer and Smith Reference Oudeyer and Smith2016). For example, the structure of early infant vocal development self-organizes spontaneously from such intrinsically motivated exploration, in interaction with the physical properties of the vocal systems (Moulin-Frier et al. Reference Moulin-Frier, Nguyen and Oudeyer2014). New experimental paradigms in psychology and neuroscience were recently developed and support these hypotheses (Baranes et al. Reference Baranes, Oudeyer and Gottlieb2014; Kidd Reference Kidd, Piantadosi and Aslin2012).

These algorithms of intrinsic motivation are also highly efficient for multitask learning in high-dimensional spaces. In robotics, they allow efficient stochastic selection of parameterized experiments and goals, enabling incremental collection of data and learning of skill models, through automatic and online curriculum learning. Such active control of the growth of complexity enables robots with high-dimensional continuous action spaces to learn omnidirectional locomotion on slippery surfaces and versatile manipulation of soft objects (Baranes and Oudeyer Reference Baranes and Oudeyer2013) or hierarchical control of objects through tool use (Forestier and Oudeyer Reference Forestier, Oudeyer, Papafragou, Grodner, Mirman and Trueswell2016). Recent work in deep reinforcement learning has included some of these mechanisms to solve difficult reinforcement learning problems, with rare or deceptive rewards (Bellemare et al. Reference Bellemare, Srinivasan, Ostrovski, Schaul, Saxton, Munos, Lee, Sugiyama, Luxburg, Guyon and Garnett2016; Kulkarni et al. Reference Kulkarni, Narasimhan, Saeedi and Tenenbaum2016), as learning multiple (auxiliary) tasks in addition to the target task simplifies the problem (Jaderberg et al. Reference Jaderberg, Mnih, Czarnecki, Schaul, Leibo, Silver and Kavukcuoglu2016). However, there are many unstudied synergies between models of intrinsic motivation in developmental robotics and deep reinforcement learning systems; for example, curiosity-driven selection of parameterized problems/goals (Baranes and Oudeyer Reference Baranes and Oudeyer2013) and learning strategies (Lopes and Oudeyer Reference Lopes and Oudeyer2012) and combinations between intrinsic motivation and social learning, for example, imitation learning (Nguyen and Oudeyer Reference Nguyen and Oudeyer2013), have not yet been integrated with deep learning.

Embodied self-organization

The key role of physical embodiment in human learning has also been extensively studied in robotics, and yet it is out of the picture in current deep learning research. The physics of bodies and their interaction with their environment can spontaneously generate structure guiding learning and exploration (Pfeifer and Bongard Reference Pfeifer, Lungarella and Iida2007). For example, mechanical legs reproducing essential properties of human leg morphology generate human-like gaits on mild slopes without any computation (Collins et al. Reference Collins, Ruina, Tedrake and Wisse2005), showing the guiding role of morphology in infant learning of locomotion (Oudeyer Reference Oudeyer2016). Yamada et al. (Reference Yamada, Mori and Kuniyoshi2010) developed a series of models showing that hand-face touch behaviours in the foetus and hand looking in the infant self-organize through interaction of a non-uniform physical distribution of proprioceptive sensors across the body with basic neural plasticity loops. Work on low-level muscle synergies also showed how low-level sensorimotor constraints could simplify learning (Flash and Hochner Reference Flash and Hochner2005).

Human learning as a complex dynamical system

Deep learning architectures often focus on inference and optimization. Although these are essential, developmental sciences suggested many times that learning occurs through complex dynamical interaction among systems of inference, memory, attention, motivation, low-level sensorimotor loops, embodiment, and social interaction. Although some of these ingredients are part of current DL research, (e.g., attention and memory), the integration of other key ingredients of autonomous learning and development opens stimulating perspectives for scaling up to human learning.

References

Asada, M., Hosoda, K., Kuniyoshi, Y., Ishiguro, H., Inui, T., Yoshikawa, Y. & Yoshida, C. (2009) Cognitive developmental robotics: A survey. IEEE Transactions on Autonomous Mental Development 1(1):12–34.Google Scholar

Baldassarre, G. & Mirolli, M., eds. (2013) Intrinsically motivated learning in natural and artificial systems. Springer.Google Scholar

Baranes, A. & Oudeyer, P.-Y. (2013) Active learning of inverse models with intrinsically motivated goal exploration in robots. Robotics and Autonomous Systems 61(1):49–73.Google Scholar

Baranes, A. F., Oudeyer, P. Y. & Gottlieb, J. (2014) The effects of task difficulty, novelty and the size of the search space on intrinsically motivated exploration. Frontiers in Neurosciences 8:1–9.Google Scholar

Barto, A. (2013) Intrinsic motivation and reinforcement learning. In: Intrinsically motivated learning in natural and artificial systems, ed. Baldassarre, G. & Mirolli, M., pp. 17–47. Springer.Google Scholar

Bellemare, M., Srinivasan, S., Ostrovski, G., Schaul, T., Saxton, D. & Munos, R. (2016) Unifying count-based exploration and intrinsic motivation. Presented at the 2016 Neural Information Processing Systems conference, Barcelona, Spain, December 5–10, 2016. In: Advances in neural information processing systems 29 (NIPS 2016), ed. Lee, D. D., Sugiyama, M., Luxburg, U. V., Guyon, I. & Garnett, R., pp. 1471–79. Neural Information Processing Systems.Google Scholar

Cangelosi, A. & Schlesinger, M. (2015) Developmental robotics: From babies to robots. MIT Press.Google Scholar

Chernova, S. & Thomaz, A. L. (2014) Robot learning from human teachers. Synthesis lectures on artificial intelligence and machine learning. Morgan & Claypool.Google Scholar

Collins, S., Ruina, A., Tedrake, R. & Wisse, M. (2005) Efficient bipedal robots based on passive-dynamic walkers. Science 307(5712):1082–85.Google Scholar

Flash, T., Hochner, B. (2005) Motor primitives in vertebrates and invertebrates. Current Opinion in Neurobiology 15(6):660–66.Google Scholar

Forestier, S. & Oudeyer, P.-Y. (2016) Curiosity-driven development of tool use precursors: A computational model. In: Proceedings of the 38th Annual Conference of the Cognitive Science Society, Philadelphia, PA, ed. Papafragou, A., Grodner, D., Mirman, D. & Trueswell, J. C., pp. 1859–1864. Cognitive Science Society.Google Scholar

Gottlieb, J., Oudeyer, P-Y., Lopes, M. & Baranes, A. (2013) Information seeking, curiosity and attention:Computational and neural mechanisms. Trends in Cognitive Science 17(11):585–96.Google Scholar

Jaderberg, M., Mnih, V., Czarnecki, W. M., Schaul, T., Leibo, J. Z., Silver, D. & Kavukcuoglu, K. (2016) Reinforcement learning with unsupervised auxiliary tasks. Presented at the 5th International Conference on Learning Representations, Palais des Congrès Neptune, Toulon, France, April 24–26, 2017. arXiv preprint 1611.05397. Available at: https://arxiv.org/abs/1611.05397.Google Scholar

Kidd, C., Piantadosi, S. T. & Aslin, R. N. (2012) The Goldilocks effect: Human infants allocate attention to visual sequences that are neither too simple nor too complex. PLoS One 7(5):e36399.CrossRef Google Scholar PubMed

Kulkarni, T. D., Narasimhan, K. R., Saeedi, A. & Tenenbaum, J. B. (2016) Hierarchical deep reinforcement learning: Integrating temporal abstraction and intrinsic motivation. arXiv preprint 1604.06057. Available at: https://arxiv.org/abs/1604.06057.Google Scholar

Lopes, M. & Oudeyer, P.-Y. (2012) The strategic student approach for life-long exploration and learning. In: IEEE International Conference on Development and Learning and Epigenetic Robotics (ICDL), San Diego, CA, November 7–9, 2012, pp. 1–8. IEEE.Google Scholar

Moulin-Frier, C., Nguyen, M. & Oudeyer, P.-Y. (2014) Self-organization of early vocal development in infants and machines: The role of intrinsic motivation. Frontiers in Psychology 4:1006. Available at: http://dx.doi.org/10.3389/fpsyg.2013.01006.Google Scholar

Nguyen, M. & Oudeyer, P.-Y. (2013) Active choice of teachers, learning strategies and goals for a socially guided intrinsic motivation learner, Paladyn Journal of Behavioural Robotics 3(3):136–46.Google Scholar

Nguyen-Tuong, D. & Peters, J. (2011) Model learning for robot control: A survey. Cognitive Processing 12(4):319–40.Google Scholar

Oudeyer, P.-Y. (2016) What do we learn about development from baby robots? WIREs Cognitive Science 8(1–2):e1395. Available at: http://www.pyoudeyer.com/oudeyerWiley16.pdf. doi: 10.1002/wcs.1395.Google Scholar

Oudeyer, P.-Y., Baranes, A. & Kaplan, F. (2013) Intrinsically motivated learning of real-world sensorimotor skills with developmental constraints. In: Intrinsically motivated learning in natural and artificial systems, ed. Baldassarre, G. & Mirolli, M., pp. 303–65. Springer.Google Scholar

Oudeyer, P.-Y., Kaplan, F. & Hafner, V. (2007) Intrinsic motivation systems for autonomous mental development. IEEE Transactions on Evolutionary Computation 11(2):265–86.Google Scholar

Oudeyer, P-Y. & Smith, L. (2016) How evolution may work through curiosity-driven developmental process. Topics in Cognitive Science 8(2):492–502.Google Scholar

Pfeifer, R., Lungarella, M. & Iida, F. (2007) Self-organization, embodiment, and biologically inspired robotics. Science 318(5853):1088–93.Google Scholar

Schmidhuber, J. (1991) Curious model-building control systems. Proceedings of the IEEE International Joint Conference on Neural Networks 2:1458–63.Google Scholar

Vollmer, A-L., Mühlig, M., Steil, J. J., Pitsch, K., Fritsch, J., Rohlfing, K. & Wrede, B. (2014) Robots show us how to teach them: Feedback from robots shapes tutoring behavior during action learning, PLoS One 9(3):e91349.Google Scholar

Yamada, Y., Mori, H. & Kuniyoshi, Y. (2010) A fetus and infant developmental scenario: Self-organization of goal-directed behaviors based on sensory constraints. In: Proceedings of the 10th International Conference on Epigenetic Robotics, Őrenäs Slott, Sweden, pp. 145–52. Lund University Cognitive Studies.Google Scholar