Building machines that learn and think for themselves

Matthew Botvinick; David G. T. Barrett; Peter Battaglia; Nando de Freitas; Darshan Kumaran; Joel Z Leibo; Timothy Lillicrap; Joseph Modayil; Shakir Mohamed; Neil C. Rabinowitz; Danilo J. Rezende; Adam Santoro; Tom Schaul; Christopher Summerfield; Greg Wayne; Theophane Weber; Daan Wierstra; Shane Legg; Demis Hassabis

doi:10.1017/S0140525X17000048

Building machines that learn and think for themselves

Published online by Cambridge University Press: 10 November 2017

Matthew Botvinick ,

David G. T. Barrett ,

Shakir Mohamed and

Matthew Botvinick: Affiliation:
DeepMind, Kings Cross, London N1c4AG, United Kingdom. botvinick@google.combarrettdavid@google.competerbattaglia@google.comnandodefreitas@google.comdkumaran@google.comjzl@google.comcountzero@google.commodayil@google.comshakir@google.comncr@google.comdanilor@google.comadamsantoro@google.comschaul@google.comcsummerfield@google.comgregwayne@google.comtheophane@google.comwierstra@google.comlegg@google.comdemishassahassaibis@google.comhttp://www.deepmind.com
David G. T. Barrett: Affiliation:
DeepMind, Kings Cross, London N1c4AG, United Kingdom. botvinick@google.combarrettdavid@google.competerbattaglia@google.comnandodefreitas@google.comdkumaran@google.comjzl@google.comcountzero@google.commodayil@google.comshakir@google.comncr@google.comdanilor@google.comadamsantoro@google.comschaul@google.comcsummerfield@google.comgregwayne@google.comtheophane@google.comwierstra@google.comlegg@google.comdemishassahassaibis@google.comhttp://www.deepmind.com
Peter Battaglia: Affiliation:
DeepMind, Kings Cross, London N1c4AG, United Kingdom. botvinick@google.combarrettdavid@google.competerbattaglia@google.comnandodefreitas@google.comdkumaran@google.comjzl@google.comcountzero@google.commodayil@google.comshakir@google.comncr@google.comdanilor@google.comadamsantoro@google.comschaul@google.comcsummerfield@google.comgregwayne@google.comtheophane@google.comwierstra@google.comlegg@google.comdemishassahassaibis@google.comhttp://www.deepmind.com
Nando de Freitas: Affiliation:
DeepMind, Kings Cross, London N1c4AG, United Kingdom. botvinick@google.combarrettdavid@google.competerbattaglia@google.comnandodefreitas@google.comdkumaran@google.comjzl@google.comcountzero@google.commodayil@google.comshakir@google.comncr@google.comdanilor@google.comadamsantoro@google.comschaul@google.comcsummerfield@google.comgregwayne@google.comtheophane@google.comwierstra@google.comlegg@google.comdemishassahassaibis@google.comhttp://www.deepmind.com
Darshan Kumaran: Affiliation:
DeepMind, Kings Cross, London N1c4AG, United Kingdom. botvinick@google.combarrettdavid@google.competerbattaglia@google.comnandodefreitas@google.comdkumaran@google.comjzl@google.comcountzero@google.commodayil@google.comshakir@google.comncr@google.comdanilor@google.comadamsantoro@google.comschaul@google.comcsummerfield@google.comgregwayne@google.comtheophane@google.comwierstra@google.comlegg@google.comdemishassahassaibis@google.comhttp://www.deepmind.com
Joel Z Leibo: Affiliation:
DeepMind, Kings Cross, London N1c4AG, United Kingdom. botvinick@google.combarrettdavid@google.competerbattaglia@google.comnandodefreitas@google.comdkumaran@google.comjzl@google.comcountzero@google.commodayil@google.comshakir@google.comncr@google.comdanilor@google.comadamsantoro@google.comschaul@google.comcsummerfield@google.comgregwayne@google.comtheophane@google.comwierstra@google.comlegg@google.comdemishassahassaibis@google.comhttp://www.deepmind.com
Timothy Lillicrap: Affiliation:
DeepMind, Kings Cross, London N1c4AG, United Kingdom. botvinick@google.combarrettdavid@google.competerbattaglia@google.comnandodefreitas@google.comdkumaran@google.comjzl@google.comcountzero@google.commodayil@google.comshakir@google.comncr@google.comdanilor@google.comadamsantoro@google.comschaul@google.comcsummerfield@google.comgregwayne@google.comtheophane@google.comwierstra@google.comlegg@google.comdemishassahassaibis@google.comhttp://www.deepmind.com
Joseph Modayil: Affiliation:
DeepMind, Kings Cross, London N1c4AG, United Kingdom. botvinick@google.combarrettdavid@google.competerbattaglia@google.comnandodefreitas@google.comdkumaran@google.comjzl@google.comcountzero@google.commodayil@google.comshakir@google.comncr@google.comdanilor@google.comadamsantoro@google.comschaul@google.comcsummerfield@google.comgregwayne@google.comtheophane@google.comwierstra@google.comlegg@google.comdemishassahassaibis@google.comhttp://www.deepmind.com
Shakir Mohamed: Affiliation:
DeepMind, Kings Cross, London N1c4AG, United Kingdom. botvinick@google.combarrettdavid@google.competerbattaglia@google.comnandodefreitas@google.comdkumaran@google.comjzl@google.comcountzero@google.commodayil@google.comshakir@google.comncr@google.comdanilor@google.comadamsantoro@google.comschaul@google.comcsummerfield@google.comgregwayne@google.comtheophane@google.comwierstra@google.comlegg@google.comdemishassahassaibis@google.comhttp://www.deepmind.com
Neil C. Rabinowitz: Affiliation:
DeepMind, Kings Cross, London N1c4AG, United Kingdom. botvinick@google.combarrettdavid@google.competerbattaglia@google.comnandodefreitas@google.comdkumaran@google.comjzl@google.comcountzero@google.commodayil@google.comshakir@google.comncr@google.comdanilor@google.comadamsantoro@google.comschaul@google.comcsummerfield@google.comgregwayne@google.comtheophane@google.comwierstra@google.comlegg@google.comdemishassahassaibis@google.comhttp://www.deepmind.com
Danilo J. Rezende: Affiliation:
DeepMind, Kings Cross, London N1c4AG, United Kingdom. botvinick@google.combarrettdavid@google.competerbattaglia@google.comnandodefreitas@google.comdkumaran@google.comjzl@google.comcountzero@google.commodayil@google.comshakir@google.comncr@google.comdanilor@google.comadamsantoro@google.comschaul@google.comcsummerfield@google.comgregwayne@google.comtheophane@google.comwierstra@google.comlegg@google.comdemishassahassaibis@google.comhttp://www.deepmind.com
Adam Santoro: Affiliation:
DeepMind, Kings Cross, London N1c4AG, United Kingdom. botvinick@google.combarrettdavid@google.competerbattaglia@google.comnandodefreitas@google.comdkumaran@google.comjzl@google.comcountzero@google.commodayil@google.comshakir@google.comncr@google.comdanilor@google.comadamsantoro@google.comschaul@google.comcsummerfield@google.comgregwayne@google.comtheophane@google.comwierstra@google.comlegg@google.comdemishassahassaibis@google.comhttp://www.deepmind.com
Tom Schaul: Affiliation:
DeepMind, Kings Cross, London N1c4AG, United Kingdom. botvinick@google.combarrettdavid@google.competerbattaglia@google.comnandodefreitas@google.comdkumaran@google.comjzl@google.comcountzero@google.commodayil@google.comshakir@google.comncr@google.comdanilor@google.comadamsantoro@google.comschaul@google.comcsummerfield@google.comgregwayne@google.comtheophane@google.comwierstra@google.comlegg@google.comdemishassahassaibis@google.comhttp://www.deepmind.com
Christopher Summerfield: Affiliation:
DeepMind, Kings Cross, London N1c4AG, United Kingdom. botvinick@google.combarrettdavid@google.competerbattaglia@google.comnandodefreitas@google.comdkumaran@google.comjzl@google.comcountzero@google.commodayil@google.comshakir@google.comncr@google.comdanilor@google.comadamsantoro@google.comschaul@google.comcsummerfield@google.comgregwayne@google.comtheophane@google.comwierstra@google.comlegg@google.comdemishassahassaibis@google.comhttp://www.deepmind.com
Greg Wayne: Affiliation:
DeepMind, Kings Cross, London N1c4AG, United Kingdom. botvinick@google.combarrettdavid@google.competerbattaglia@google.comnandodefreitas@google.comdkumaran@google.comjzl@google.comcountzero@google.commodayil@google.comshakir@google.comncr@google.comdanilor@google.comadamsantoro@google.comschaul@google.comcsummerfield@google.comgregwayne@google.comtheophane@google.comwierstra@google.comlegg@google.comdemishassahassaibis@google.comhttp://www.deepmind.com
Theophane Weber: Affiliation:
DeepMind, Kings Cross, London N1c4AG, United Kingdom. botvinick@google.combarrettdavid@google.competerbattaglia@google.comnandodefreitas@google.comdkumaran@google.comjzl@google.comcountzero@google.commodayil@google.comshakir@google.comncr@google.comdanilor@google.comadamsantoro@google.comschaul@google.comcsummerfield@google.comgregwayne@google.comtheophane@google.comwierstra@google.comlegg@google.comdemishassahassaibis@google.comhttp://www.deepmind.com
Daan Wierstra: Affiliation:
DeepMind, Kings Cross, London N1c4AG, United Kingdom. botvinick@google.combarrettdavid@google.competerbattaglia@google.comnandodefreitas@google.comdkumaran@google.comjzl@google.comcountzero@google.commodayil@google.comshakir@google.comncr@google.comdanilor@google.comadamsantoro@google.comschaul@google.comcsummerfield@google.comgregwayne@google.comtheophane@google.comwierstra@google.comlegg@google.comdemishassahassaibis@google.comhttp://www.deepmind.com
Shane Legg: Affiliation:
DeepMind, Kings Cross, London N1c4AG, United Kingdom. botvinick@google.combarrettdavid@google.competerbattaglia@google.comnandodefreitas@google.comdkumaran@google.comjzl@google.comcountzero@google.commodayil@google.comshakir@google.comncr@google.comdanilor@google.comadamsantoro@google.comschaul@google.comcsummerfield@google.comgregwayne@google.comtheophane@google.comwierstra@google.comlegg@google.comdemishassahassaibis@google.comhttp://www.deepmind.com
Demis Hassabis: Affiliation:
DeepMind, Kings Cross, London N1c4AG, United Kingdom. botvinick@google.combarrettdavid@google.competerbattaglia@google.comnandodefreitas@google.comdkumaran@google.comjzl@google.comcountzero@google.commodayil@google.comshakir@google.comncr@google.comdanilor@google.comadamsantoro@google.comschaul@google.comcsummerfield@google.comgregwayne@google.comtheophane@google.comwierstra@google.comlegg@google.comdemishassahassaibis@google.comhttp://www.deepmind.com

Article contents

Abstract
References

Rights & Permissions

Abstract

We agree with Lake and colleagues on their list of “key ingredients” for building human-like intelligence, including the idea that model-based reasoning is essential. However, we favor an approach that centers on one additional ingredient: autonomy. In particular, we aim toward agents that can both build and exploit their own internal models, with minimal human hand engineering. We believe an approach centered on autonomous learning has the greatest chance of success as we scale toward real-world complexity, tackling domains for which ready-made formal models are not available. Here, we survey several important examples of the progress that has been made toward building autonomous agents with human-like abilities, and highlight some outstanding challenges.

Type: Open Peer Commentary
Information: Behavioral and Brain Sciences , Volume 40 , 2017 , e255

DOI: https://doi.org/10.1017/S0140525X17000048 [Opens in a new window]
Copyright: Copyright © Cambridge University Press 2017

Lake et al. identify some extremely important desiderata for human-like intelligence. We agree with many of their central assertions: Human-like learning and decision making surely do depend upon rich internal models; the learning process must be informed and constrained by prior knowledge, whether this is part of the agent's initial endowment or acquired through learning; and naturally, prior knowledge will offer the greatest leverage when it reflects the most pervasive or ubiquitous structures in the environment, including physical laws, the mental states of others, and more abstract regularities such as compositionality and causality. Together, these points comprise a powerful set of target goals for AI research. However, while we concur on these goals, we choose a differently calibrated strategy for accomplishing them. In particular, we favor an approach that prioritizes autonomy, empowering artificial agents to learn their own internal models and how to use them, mitigating their reliance on detailed configuration by a human engineer.

Lake et al. characterize their position as “agnostic with regards to the origins of the key ingredients” (sect. 4, para. 2) of human-like intelligence. This agnosticism implicitly licenses a modeling approach in which detailed, domain-specific information can be imparted to an agent directly, an approach for which some of the authors' Bayesian Program Learning (BPL) work is emblematic. The two domains Lake and colleagues focus most upon – physics and theory of mind – are amenable to such an approach, in that these happen to be fields for which mature scientific disciplines exist. This provides unusually rich support for hand design of cognitive models. However, it is not clear that such hand design will be feasible in other more idiosyncratic domains where comparable scaffolding is unavailable. Lake et al. (Reference Lake, Salakhutdinov and Tenenbaum2015a) were able to extend the approach to Omniglot characters by intuiting a suitable (stroke-based) model, but are we in a position to build comparably detailed domain models for such things as human dialogue and architecture? What about Japanese cuisine or ice skating? Even video-game play appears daunting, when one takes into account the vast amount of semantic knowledge that is plausibly relevant (knowledge about igloos, ice floes, cold water, polar bears, video-game levels, avatars, lives, points, and so forth). In short, it is not clear that detailed knowledge engineering will be realistically attainable in all areas we will want our agents to tackle.

Given this observation, it would appear most promising to focus our efforts on developing learning systems that can be flexibly applied across a wide range of domains, without an unattainable overhead in terms of a priori knowledge. Encouraging this view, the recent machine learning literature offers many examples of learning systems conquering tasks that had long eluded more hand-crafted approaches, including object recognition, speech recognition, speech generation, language translation, and (significantly) game play (Silver et al. Reference Silver, Huang, Maddison, Guez, Sifre, Driessche, Schrittwieser, Antonoglou, Panneershelvam, Lanctot, Dieleman, Grewe, Nham, Kalchbrenner, Sutskever, Lillicrap, Leach, Kavukcuoglu, Graepel and Hassabis2016). In many cases, such successes have depended on large amounts of training data, and have implemented an essentially model-free approach. However, a growing volume of work suggests that flexible, domain-general learning can also be successful on tasks where training data are scarcer and where model-based inference is important.

For example, Rezende and colleagues (Reference Rezende, Mohamed, Danihelka, Gregor and Wierstra2016) reported a deep generative model that produces plausible novel instances of Omniglot characters after one presentation of a model character, going a significant distance toward answering Lake's “Character Challenge.” Lake et al. call attention to this model's “need for extensive pre-training.” However, it is not clear why their pre-installed model is to be preferred over knowledge acquired through pre-training. In weighing this point, it is important to note that the human modeler, to furnish the BPL architecture with its “start-up software,” must draw on his or her own large volume of prior experience. In this sense, the resulting BPL model is dependent on the human designer's own “pre-training.”

A more significant aspect of the Rezende model is that it can be applied without change to very different domains, as Rezende and colleagues (Reference Rezende, Mohamed, Danihelka, Gregor and Wierstra2016) demonstrate through experiments on human facial images. This flexibility is one hallmark of an autonomous learning system, and contrasts with the more purpose-built flavor of the BPL approach, which relies on irreducible primitives with domain-specific content (e.g., the strokes in Lake's Omniglot model). Furthermore, a range of recent work with deep generative models (e.g. van den Oord Reference van den Oord, Kalchbrenner and Kavukcuoglu2016; Ranzato et al. Reference Ranzato, Szlam, Bruna, Mathieu, Collobert and Chopra2016) indicates that they can identify quite rich structure, increasingly avoiding silly mistakes like those highlighted in Lake et al.'s Figure 6.

Importantly, a learning-centered approach does not prevent us from endowing learning systems with some forms of a priori knowledge. Indeed, the current resurgence in neural network research was triggered largely by work that does just this, for example, by building an assumption of translational invariance into the weight matrix of image classification networks (Krizhevsky et al. Reference Krizhevsky, Sutskever, Hinton, Pereira, Burges, Bottou and Weinberger2012a). The same strategy can be taken to endow learning systems with assumptions about compositional and causal structure, yielding architectures that learn efficiently about the dynamics of physical systems, and even generalize to previously unseen numbers of objects (Battaglia et al. Reference Battaglia, Pascanu, Lai, Rezende, Lee, Sugiyama, Luxburg, Guyon and Garnett2016), another challenge problem highlighted by Lake et al. In such cases, however, the inbuilt knowledge takes a highly generic form, leaving wide scope for learning to absorb domain-specific structure (see also Eslami et al Reference Eslami, Heess, Weber, Tassa, Kavukcuoglu, Hinton, Lee, Sugiyama, Luxburg, Guyon and Garnett2016; Raposo et al. Reference Raposo, Santoro, Barrett, Pascanu, Lillicrap and Battaglia2017; Reed and de Freitas Reference Reed and de Freitas2016).

Under the approach we advocate, high-level prior knowledge and learning biases can be installed not only at the level of representational structure, but also through larger-scale architectural and algorithmic factors, such as attentional filtering (Eslami et al. Reference Eslami, Heess, Weber, Tassa, Kavukcuoglu, Hinton, Lee, Sugiyama, Luxburg, Guyon and Garnett2016), intrinsic motivation mechanisms (Bellemare et al. Reference Bellemare, Srinivasan, Ostrovski, Schaul, Saxton, Munos, Lee, Sugiyama, Luxburg, Guyon and Garnett2016), and episodic learning (Blundell et al. Reference Blundell, Uria, Pritzel, Li, Ruderman, Leibo, Rae, Wierstra and Hassabis2016). Recently developed architectures for memory storage (e.g., Graves et al. Reference Graves, Wayne, Reynolds, Harley, Danihelka, Grabska-Barwińska, Colmenarejo, Grefenstette, Ramalho, Agapiou, Badia, Hermann, Zwols, Ostrovski, Cain, King, Summerfield, Blunsom, Kayukcuoglu and Hassabis2016) offer a critical example. Lake et al. describe neural networks as implementing “learning as a process of gradual adjustment of connection strengths.” However, recent work has introduced a number of architectures within which learning depends on rapid storage mechanisms, independent of connection-weight changes (Duan et al. Reference Duan, Schulman, Chen, Bartlett, Sutskever and Abbeel2016; Graves et al. Reference Graves, Wayne, Reynolds, Harley, Danihelka, Grabska-Barwińska, Colmenarejo, Grefenstette, Ramalho, Agapiou, Badia, Hermann, Zwols, Ostrovski, Cain, King, Summerfield, Blunsom, Kayukcuoglu and Hassabis2016; Wang et al. Reference Wang, Kurth-Nelson, Tirumala, Soyer, Leibo, Munos, Blundell, Kumaran and Botvinick2017; Vinyals et al. Reference Vinyals, Blundell, Lillicrap, Wierstra, Vinyals, Blundell, Lillicrap, Kavukcuoglu, Wierstra, Lee, Sugiyama, Luxburg, Guyon and Garnett2016). Indeed, such mechanisms have even been applied to one-shot classification of Omniglot characters (Santoro et al., Reference Santoro, Bartunov, Botvinick, Wierstra and Lillicrap2016) and Atari video game play (Blundell et al. Reference Blundell, Uria, Pritzel, Li, Ruderman, Leibo, Rae, Wierstra and Hassabis2016). Furthermore, the connection-weight changes that do occur in such models can serve in part to support learning-to-learn (Duan et al. Reference Duan, Schulman, Chen, Bartlett, Sutskever and Abbeel2016; Graves et al. Reference Graves, Wayne, Reynolds, Harley, Danihelka, Grabska-Barwińska, Colmenarejo, Grefenstette, Ramalho, Agapiou, Badia, Hermann, Zwols, Ostrovski, Cain, King, Summerfield, Blunsom, Kayukcuoglu and Hassabis2016; Ravi and Larochelle Reference Ravi and Larochelle2017; Vinyals et al. Reference Vinyals, Blundell, Lillicrap, Wierstra, Vinyals, Blundell, Lillicrap, Kavukcuoglu, Wierstra, Lee, Sugiyama, Luxburg, Guyon and Garnett2016; Wang et al. Reference Wang, Kurth-Nelson, Tirumala, Soyer, Leibo, Munos, Blundell, Kumaran and Botvinick2017), another of Lake et al.'s key ingredients for human-like intelligence. As recent work has shown (Andrychowicz et al. Reference Andrychowicz, Denil, Gomez, Hoffman, Pfau, Schaul, Shillingford, de Freitas, Lee, Sugiyama, Luxburg, Guyon and Garnett2016; Denil et al. Reference Denil, Agrawal, Kulkarni, Erez, Battaglia and de Freitas2016; Duan et al. Reference Duan, Schulman, Chen, Bartlett, Sutskever and Abbeel2016; Hochreiter et al. Reference Hochreiter, Younger, Conwell, Dorffner, Bischoff and Hornik2001; Santoro et al. Reference Santoro, Bartunov, Botvinick, Wierstra and Lillicrap2016; Wang et al. Reference Wang, Kurth-Nelson, Tirumala, Soyer, Leibo, Munos, Blundell, Kumaran and Botvinick2017), this learning-to-learn mechanism can allow agents to adapt rapidly to new problems, providing a novel route to install prior knowledge through learning, rather than by hand. Learning to learn enables us to learn a neural network agent over a long time. This network, however, is trained to be good at learning rapidly from few examples, regardless of what those examples might be. So, although the meta-learning process might be slow, the product is a neural network agent that can learn to harness a few data points to carry out numerous tasks, including imitation, inference, task specialization, and prediction.

Another reason why we believe it may be advantageous to autonomously learn internal models is that such models can be shaped directly by specific, concrete tasks. A model is valuable not because it veridically captures some ground truth, but because it can be efficiently leveraged to support adaptive behavior. Just as Newtonian mechanics is sufficient for explaining many everyday phenomena, yet too crude to be useful to particle physicists and cosmologists, an agent's models should be calibrated to its tasks. This is essential for models to scale to real-world complexity, because it is usually too expensive, or even impossible, for a system to acquire and work with extremely fine-grained models of the world (Botvinick & Weinstein Reference Botvinick, Weinstein, Solway and Barto2015; Silver et al. Reference Silver, van Hasselt, Hessel, Schaul, Guez, Harley, Dulac-Arnold, Reichert, Rabinowitz, Barreto, Degris, Balcan and Weinberger2017). Of course, a good model of the world should be applicable across a range of task conditions, even ones that have not been previously encountered. However, this simply implies that models should be calibrated not only to individual tasks, but also to the distribution of tasks – inferred through experience or evolution – that is likely to arise in practice.

Finally, in addition to the importance of model building, it is important to recognize that real autonomy also depends on control functions, the processes that leverage models to make actual decisions. An autonomous agent needs good models, but it also needs to know how to make use of them (Botvinick & Cohen Reference Botvinick and Cohen2014), especially in settings where task goals may vary over time. This point also favors a learning and agent-based approach, because it allows control structures to co-evolve with internal models, maximizing their compatibility. Though efforts to capitalize on these advantages in practice are only in their infancy, recent work from Hamrick and colleagues (Reference Hamrick, Ballard, Pascanu, Vinyals, Heess and Battaglia2017), which simultaneously trained an internal model and a corresponding set of control functions, provides a case study of how this might work.

Our comments here, like the target article, have focused on model-based cognition. However, an aside on model-free methods is warranted. Lake et al. describe model-free methods as providing peripheral support for model-based approaches. However, there is abundant evidence that model-free mechanisms play a pervasive role in human learning and decision making (Kahneman Reference Kahneman2011). Furthermore, the dramatic recent successes of model-free learning in areas such as game play, navigation, and robotics suggest that it may constitute a first-class, independently valuable approach for machine learning. Lake et al. call attention to the heavy data demands of model-free learning, as reflected in DQN learning curves. However, even since the initial report on DQN (Mnih et al. Reference Mnih, Kavukcuoglu, Silver, Rusu, Veness, Bellemare, Graves, Riedmiller, Fidjeland, Ostrovski, Petersen, Beattie, Sadik, Antonoglous, King, Kumaran, Wierstra and Hassabis2015), techniques have been developed that significantly reduce the data requirements of this and related model-free learning methods, including prioritized memory replay (Schaul et al. Reference Schaul, Quan, Antonoglou and Silver2016), improved exploration methods (Bellemare et al. Reference Bellemare, Srinivasan, Ostrovski, Schaul, Saxton, Munos, Lee, Sugiyama, Luxburg, Guyon and Garnett2016), and techniques for episodic reinforcement learning (Blundell et al. Reference Blundell, Uria, Pritzel, Li, Ruderman, Leibo, Rae, Wierstra and Hassabis2016). Given the pace of such advances, it may be premature to relegate model-free methods to a merely supporting role.

To conclude, despite the differences we have focused on here, we agree strongly with Lake et al. that human-like intelligence depends at least in part on richly structured internal models. Our approach to building human-like intelligence can be summarized as a commitment to developing autonomous agents: agents that shoulder the burden of building their own models and arriving at their own procedures for leveraging them. Autonomy, in this sense, confers a capacity to build economical task-sensitive internal models, and to adapt flexibly to diverse circumstances, while avoiding a dependence on detailed, domain-specific prior information. A key challenge in pursuing greater autonomy is the need to find more efficient means of extracting knowledge from potentially limited data. But recent work on memory, exploration, compositional representation, and processing architectures, provides grounds for optimism. In fairness, the authors of the target article have also offered, in other work, some indication of how their approach might be elaborated to support greater agent autonomy (Lake et al. Reference Lake, Lawrence and Tenenbaum2016). We may therefore be following slowly converging paths. On a final note, it is worth pointing out that as our agents gain in autonomy, the opportunity increasingly arises for us to obtain new insights from what they themselves discover. In this way, the pursuit of agent autonomy carries the potential to transform the current AI landscape, revealing new paths toward human-like intelligence.

References

Andrychowicz, M., Denil, M., Gomez, S., Hoffman, M. W., Pfau, D., Schaul, T., Shillingford, B. & de Freitas, N. (2016). Learning to learn by gradient descent by gradient descent. Presented at the 2016 Neural Information Processing Systems conference, Barcelona, Spain, December 5–10, 2016. In: Advances in neural information processing systems 29 (NIPS 2016), ed. Lee, D. D., Sugiyama, M., Luxburg, U. V., Guyon, I. & Garnett, R., pp. 3981–89). Neural Information Processing Systems.Google Scholar

Battaglia, P., Pascanu, R., Lai, M. & Rezende, D. J. (2016) Interaction networks for learning about objects, relations and physics. Presented at the 2016 Neural Information Processing Systems conference, Barcelona, Spain, December 5–10, 2016. In: Advances in neural information processing systems 29 (NIPS 2016), ed. Lee, D. D., Sugiyama, M., Luxburg, U. V., Guyon, I. & Garnett, R., pp. 4502–10. Neural Information Processing Systems.Google Scholar

Bellemare, M., Srinivasan, S., Ostrovski, G., Schaul, T., Saxton, D. & Munos, R. (2016) Unifying count-based exploration and intrinsic motivation. Presented at the 2016 Neural Information Processing Systems conference, Barcelona, Spain, December 5–10, 2016. In: Advances in neural information processing systems 29 (NIPS 2016), ed. Lee, D. D., Sugiyama, M., Luxburg, U. V., Guyon, I. & Garnett, R., pp. 1471–79. Neural Information Processing Systems.Google Scholar

Blundell, C., Uria, B., Pritzel, A., Li, Y., Ruderman, A., Leibo, J. Z., Rae, J., Wierstra, D. & Hassabis, D. (2016) Model-free episodic control. arXiv preprint 1606.04460. Available at: https://arxiv.org/abs/1606.04460.Google Scholar

Botvinick, M. M. & Cohen, J. D. (2014) The computational and neural basis of cognitive control: Charted territory and new frontiers. Cognitive Science 38:1249–85.CrossRef Google Scholar PubMed

Botvinick, M., Weinstein, A., Solway, A. & Barto, A. (2015) Reinforcement learning, efficient coding, and the statistics of natural tasks. Current Opinion in Behavioral Sciences 5:71–77.CrossRef Google Scholar

Denil, M., Agrawal, P., Kulkarni, T. D., Erez, T., Battaglia, P. & de Freitas, N. (2016). Learning to perform physics experiments via deep reinforcement learning. arXiv preprint:1611.01843. Available at: https://arxiv.org/abs/1611.01843.Google Scholar

Duan, Y., Schulman, J., Chen, X., Bartlett, P. L., Sutskever, I. & Abbeel, P. (2016) RL²: Fast reinforcement learning via slow reinforcement learning. arXiv preprint 1611.02779. Available at: https://arxiv.org/pdf/1703.07326.pdf.Google Scholar

Eslami, S. M., Heess, N., Weber, T., Tassa, Y., Kavukcuoglu, K. & Hinton, G. E. (2016) Attend, infer, repeat: Fast scene understanding with generative models. Presented at the 2016 Neural Information Processing Systems conference, Barcelona, Spain, December 5–10, 2016. In: Advances in Neural Information Processing Systems 29 (NIPS 2016), ed. Lee, D. D., Sugiyama, M., Luxburg, U. V., Guyon, I. & Garnett, R., pp. 3225–33. Neural Information Processing Systems Foundation.Google Scholar

Graves, A., Wayne, G., Reynolds, M., Harley, T., Danihelka, I., Grabska-Barwińska, A., Colmenarejo, S. G., Grefenstette, E., Ramalho, T., Agapiou, J., Badia, A. P., Hermann, K. M., Zwols, Y., Ostrovski, G., Cain, A., King, H., Summerfield, C., Blunsom, P., Kayukcuoglu, K. & Hassabis, D. (2016) Hybrid computing using a neural network with dynamic external memory. Nature 538(7626):471–76.CrossRef Google Scholar PubMed

Hamrick, J. B., Ballard, A. J., Pascanu, R., Vinyals, O., Heess, N. & Battaglia, P. W. (2017) Metacontrol for adaptive imagination-based optimization. In: Proceedings of the 5th International Conference on Learning Representations (ICLR).Google Scholar

Hochreiter, S. A., Younger, S. & Conwell, P. R. (2001) Learning to learn using gradient descent. In: International Conference on Artificial Neural Network—ICANN 2001, ed. Dorffner, G., Bischoff, H. & Hornik, K., pp. 87–94. Springer.CrossRef Google Scholar

Kahneman, D. (2011) Thinking, fast and slow. Macmillan.Google Scholar

Krizhevsky, A., Sutskever, I. & Hinton, G. E. (2012). ImageNet classification with deep convolutional neural networks. Presented at the 25th International Conference on Neural Information Processing Systems, Lake Tahoe, NV, December 3–6, 2012. In: Advances in Neural Information Processing Systems 25 (NIPS 2012), ed. Pereira, F., Burges, C. J. C., Bottou, L. & Weinberger, K. Q., pp. 1097–105. Neural Information Processing Systems Foundation.Google Scholar

Lake, B. M., Lawrence, N. D. & Tenenbaum, J. B. (2016) The emergence of organizing structure in conceptual representation. arXiv preprint 1611.09384. Available at: http://arxiv.org/abs/1611.09384.Google Scholar

Lake, B. M., Salakhutdinov, R. & Tenenbaum, J. B. (2015a) Human-level concept learning through probabilistic program induction. Science 350(6266):1332–38.CrossRef Google Scholar PubMed

Mnih, V., Kavukcuoglu, K., Silver, D., Rusu, A. A., Veness, J., Bellemare, M. G., Graves, A., Riedmiller, M., Fidjeland, A. K., Ostrovski, G., Petersen, S., Beattie, C., Sadik, A., Antonoglous, I., King, H., Kumaran, D., Wierstra, D. & Hassabis, D. (2015) Human-level control through deep reinforcement learning. Nature 518(7540):529–33.CrossRef Google Scholar PubMed

Ranzato, M., Szlam, A., Bruna, J., Mathieu, M., Collobert, R. & Chopra, S. (2016) Video (language) modeling: A baseline for generative models of natural videos. arXiv preprint 1412.6604. Available at: https://www.google.com/search?q=arXiv+preprint+1412.6604&ie=utf-8&oe=utf-8.Google Scholar

Raposo, D., Santoro, A., Barrett, D. G. T., Pascanu, R., Lillicrap, T. & Battaglia, P. (2017) Discovering objects and their relations from entangled scene representations. Presented at the Workshop Track at the International Conference on Learning Representations, Toulon, France, April 24–26, 2017. arXiv preprint 1702.05068. Available at: https://openreview.net/pdf?id=Bk2TqVcxe.Google Scholar

Ravi, S. & Larochelle, H. (2017) Optimization as a model for few-shot learning. Presented at the International Conference on Learning Representations, Toulon, France, April 24–26, 2017. Available at: https://openreview.net/pdf?id=rJY0-Kcll.Google Scholar

Reed, S. & de Freitas, N. (2016) Neural programmer-interpreters. Presented at the 4th International Conference on Learning Representations (ICLR), San Juan, Puerto Rico, May 2–5, 2016. arXiv preprint 1511.06279. Available at: https://arxiv.org/abs/1511.06279.Google Scholar

Rezende, D. J., Mohamed, S., Danihelka, I., Gregor, K. & Wierstra, D. (2016) One-shot generalization in deep generative models. Presented at the International Conference on Machine Learning, New York, NY, June 20–22, 2016. Proceedings of Machine Learning Research 48:1521–29.Google Scholar

Santoro, A., Bartunov, S., Botvinick, M., Wierstra, D. & Lillicrap, T. (2016). Meta-learning with memory-augmented neural networks. Presented at the 33rd International Conference on Machine Learning, New York, NY, June 19–24, 2016. Proceedings of Machine Learning Research 48:1842–50.Google Scholar

Schaul, T., Quan, J., Antonoglou, I. & Silver, D. (2016) Prioritized experience replay. Presented at International Conference on Learning Representations (ICLR), San Diego, CA, May 7–9, 2015. arXiv preprint 1511.05952. Available at: https://arxiv.org/abs/1511.05952.Google Scholar

Silver, D., Huang, A., Maddison, C. J., Guez, A., Sifre, L., Driessche, G. V. D., Schrittwieser, J., Antonoglou, I., Panneershelvam, V., Lanctot, M., Dieleman, S., Grewe, D., Nham, J., Kalchbrenner, N., Sutskever, I., Lillicrap, T., Leach, M., Kavukcuoglu, K, Graepel, T. & Hassabis, D. (2016) Mastering the game of go with deep neural networks and tree search. Nature 529(7585):484–89.CrossRef Google Scholar PubMed

Silver, D., van Hasselt, H., Hessel, M., Schaul, T., Guez, A., Harley, T., Dulac-Arnold, G. Reichert, D., Rabinowitz, N., Barreto, A. & Degris, T. (2017) The predictron: End-to-end learning and planning. In: Proceedings of the 34rd International Conference on Machine Learning, Sydney, Australia, ed. Balcan, M. F. & Weinberger, K. Q..Google Scholar

van den Oord, A., Kalchbrenner, N. & Kavukcuoglu, K. (2016). Pixel recurrent neural networks. Presented at the 33rd International Conference on Machine Learning, New York, NY. Proceedings of Machine Learning Research 48:1747–56.Google Scholar

Vinyals, O., Blundell, C., Lillicrap, T. & Wierstra, D. (2016) Matching networks for one shot learning. Vinyals, O., Blundell, C., Lillicrap, T. Kavukcuoglu, K. & Wierstra, D. (2016). Matching networks for one shot learning. Presented at the 2016 Neural Information Processing Systems conference, Barcelona, Spain, December 5–10, 2016. In: Advances in Neural Information Processing Systems 29 (NIPS 2016), ed. Lee, D. D., Sugiyama, M., Luxburg, U. V., Guyon, I. & Garnett, R., pp. 3630–38. Neural Information Processing Systems Foundation.Google Scholar

Wang, J. X., Kurth-Nelson, Z., Tirumala, D., Soyer, H., Leibo, J. Z., Munos, R., Blundell, C., Kumaran, D. & Botvinick, M. (2017). Learning to reinforcement learn. In: Presented at the 39th Annual Meeting of the Cognitive Science Society, London, July 26–29, 2017. arXiv preprint 1611.05763. Available at: https://arxiv.org/abs/1611.05763.Google Scholar