Better universal algorithms or more inductive biases?
Learning and inference are instances of optimization algorithms. If we could derive a universal optimization algorithm that works well for all data, the learning and inference problems for building AGI would be solved as well. Researchers who work on assumption-free algorithms are pushing the frontier on this question.
Exploiting inductive biases and the structure of the AI problem makes learning and inference more efficient. Our brains show remarkable abilities to perform a wide variety of tasks on data that look very different. What if all of these different tasks and data have underlying similarities? Our view is that biological evolution, by trial and error, figured out a set of inductive biases that work well for learning in this world, and the human brain's efficiency and robustness derive from these biases. Lake et al. note that many researchers hope to overcome the need for inductive biases by bringing biological evolution into the fold of the learning algorithms. We point out that biological evolution had the advantage of using building blocks (proteins, cells) that obeyed the laws of the physics of the world in which these organisms were evolving to excel. In this way, assumptions about the world were implicitly baked into the representations that evolution used. Trying to evolve intelligence without assumptions might therefore be a significantly harder problem than biological evolution. AGI has one existence proof – our brains. Biological evolution is not an existence proof for artificial universal intelligence.
At the same time, we think a research agenda for building AGI could be synergistic with the quest for better universal algorithms. Our strategy is to build systems that strongly exploit inductive biases, while keeping open the possibility that some of those assumptions can be relaxed by advances in optimization algorithms.
What kind of generative model is the brain? Neuroscience can help, not just cognitive science
Lake et al. offered several compelling arguments for using cognitive science insights. In addition to cognitive science, neuroscience data can be examined to obtain clues about what kind of generative model the brain implements and how this model differs from models being developed in the AI community.
For instance, spatial lateral connections between oriented features are a predominant feature of the visual cortex and are known to play a role in enforcing contour continuity. However, lateral connections are largely ignored in current generative models (Lee Reference Lee2015). Another example is the factorization of contours and surfaces. Evidence indicates that contours and surfaces are represented in a factored manner in the visual cortex (Zhou et al. Reference Zhou, Friedman and von der Heydt2000), potentially giving rise to the ability of humans to imagine and recognize objects with surface appearances that are not prototypical – like a blanket made of bananas or a banana made of blankets. Similarly, studies on top-down attention demonstrate the ability of the visual cortex to separate out objects even when they are highly overlapping and transparent (Cohen & Tong Reference Cohen and Tong2015). These are just a handful of examples from the vast repository of information on cortical representations and inference dynamics, all of which could be used to build AGI.
The conundrum of “human-level performance”: Benchmarks for AGI
We emphasize the meaninglessness of “human-level performance,” as reported in mainstream AI publications, and then use as a yardstick to measure our progress toward AGI. Take the case of the DeepQ network playing “breakout” at a “human level” (Mnih et al. Reference Mnih, Kavukcuoglu, Silver, Rusu, Veness, Bellemare, Graves, Riedmiller, Fidjeland, Ostrovski, Petersen, Beattie, Sadik, Antonoglous, King, Kumaran, Wierstra and Hassabis2015). We found that even simple changes to the visual environment (as insignificant as changing the brightness) dramatically and adversely affect the performance of the algorithm, whereas humans are not affected by such perturbations at all. At this point, it should be well accepted that almost any narrowly defined task can be “solved” with brute force data and computation and that any use of “human-level” as a comparison should be reserved for benchmarks that adhere to the following principles: (1) learning from few examples, (2) generalizing to distributions that are different from the training set, and (3) generalizing to new queries (for generative models) and new tasks (in the case of agents interacting with an environment).
Message passing-based algorithms for probabilistic models
Although the article makes good arguments in favor of structured probabilistic models, it is surprising that the authors mentioned only Markov chain Monte Carlo (MCMC) as the primary tool for inference. Although MCMC has asymptotic guarantees, the speed of inference in many cortical areas is more consistent with message passing (MP)-like algorithms, which arrive at maximum a posteriori solutions using only local computations. Despite lacking theoretical guarantees, MP has been known to work well in many practical cases, and recently we showed that it can be used for learning of compositional features (Lázaro-Gredilla et al. Reference Lázaro-Gredilla, Liu, Phoenix and George2016). There is growing evidence for the use of MP-like inference in cortical areas (Bastos et al. Reference Bastos, Usrey, Adams, Mangun, Fries and Friston2012; George & Hawkins Reference George and Hawkins2009), and MP could offer a happy medium where inference is fast, as in neural networks, while retaining MCMC's capability for answering arbitrary queries on the model.
The fact that “airplanes do not flap their wings” is often offered as a reason for not looking to biology for artificial intelligence (AI) insights. This is ironic because the idea that flapping is not required to fly, could easily have originated from observing eagles soaring on thermals. The comic strip in Figure 1 offers a humorous take on the current debate in AI. A flight researcher who does not take inspiration from birds defines an objective function for flight and ends up creating a catapult. Clearly, a catapult is an extremely useful invention. It can propel objects through the air, and in some cases, it can even be a better alternative to flying. Just as researchers who are interested in building “real flight” would be well advised to pay close attention to the differences between catapult flight and bird flight, researchers who are interested in building “human-like intelligence” or artificial general intelligence (AGI) would be well advised to pay attention to the differences between the recent successes of deep learning and human intelligence. We believe the target article delivers on that front, and we agree with many of its conclusions.
Figure 1. A humorous take on the current debate in artificial intelligence.
Better universal algorithms or more inductive biases?
Learning and inference are instances of optimization algorithms. If we could derive a universal optimization algorithm that works well for all data, the learning and inference problems for building AGI would be solved as well. Researchers who work on assumption-free algorithms are pushing the frontier on this question.
Exploiting inductive biases and the structure of the AI problem makes learning and inference more efficient. Our brains show remarkable abilities to perform a wide variety of tasks on data that look very different. What if all of these different tasks and data have underlying similarities? Our view is that biological evolution, by trial and error, figured out a set of inductive biases that work well for learning in this world, and the human brain's efficiency and robustness derive from these biases. Lake et al. note that many researchers hope to overcome the need for inductive biases by bringing biological evolution into the fold of the learning algorithms. We point out that biological evolution had the advantage of using building blocks (proteins, cells) that obeyed the laws of the physics of the world in which these organisms were evolving to excel. In this way, assumptions about the world were implicitly baked into the representations that evolution used. Trying to evolve intelligence without assumptions might therefore be a significantly harder problem than biological evolution. AGI has one existence proof – our brains. Biological evolution is not an existence proof for artificial universal intelligence.
At the same time, we think a research agenda for building AGI could be synergistic with the quest for better universal algorithms. Our strategy is to build systems that strongly exploit inductive biases, while keeping open the possibility that some of those assumptions can be relaxed by advances in optimization algorithms.
What kind of generative model is the brain? Neuroscience can help, not just cognitive science
Lake et al. offered several compelling arguments for using cognitive science insights. In addition to cognitive science, neuroscience data can be examined to obtain clues about what kind of generative model the brain implements and how this model differs from models being developed in the AI community.
For instance, spatial lateral connections between oriented features are a predominant feature of the visual cortex and are known to play a role in enforcing contour continuity. However, lateral connections are largely ignored in current generative models (Lee Reference Lee2015). Another example is the factorization of contours and surfaces. Evidence indicates that contours and surfaces are represented in a factored manner in the visual cortex (Zhou et al. Reference Zhou, Friedman and von der Heydt2000), potentially giving rise to the ability of humans to imagine and recognize objects with surface appearances that are not prototypical – like a blanket made of bananas or a banana made of blankets. Similarly, studies on top-down attention demonstrate the ability of the visual cortex to separate out objects even when they are highly overlapping and transparent (Cohen & Tong Reference Cohen and Tong2015). These are just a handful of examples from the vast repository of information on cortical representations and inference dynamics, all of which could be used to build AGI.
The conundrum of “human-level performance”: Benchmarks for AGI
We emphasize the meaninglessness of “human-level performance,” as reported in mainstream AI publications, and then use as a yardstick to measure our progress toward AGI. Take the case of the DeepQ network playing “breakout” at a “human level” (Mnih et al. Reference Mnih, Kavukcuoglu, Silver, Rusu, Veness, Bellemare, Graves, Riedmiller, Fidjeland, Ostrovski, Petersen, Beattie, Sadik, Antonoglous, King, Kumaran, Wierstra and Hassabis2015). We found that even simple changes to the visual environment (as insignificant as changing the brightness) dramatically and adversely affect the performance of the algorithm, whereas humans are not affected by such perturbations at all. At this point, it should be well accepted that almost any narrowly defined task can be “solved” with brute force data and computation and that any use of “human-level” as a comparison should be reserved for benchmarks that adhere to the following principles: (1) learning from few examples, (2) generalizing to distributions that are different from the training set, and (3) generalizing to new queries (for generative models) and new tasks (in the case of agents interacting with an environment).
Message passing-based algorithms for probabilistic models
Although the article makes good arguments in favor of structured probabilistic models, it is surprising that the authors mentioned only Markov chain Monte Carlo (MCMC) as the primary tool for inference. Although MCMC has asymptotic guarantees, the speed of inference in many cortical areas is more consistent with message passing (MP)-like algorithms, which arrive at maximum a posteriori solutions using only local computations. Despite lacking theoretical guarantees, MP has been known to work well in many practical cases, and recently we showed that it can be used for learning of compositional features (Lázaro-Gredilla et al. Reference Lázaro-Gredilla, Liu, Phoenix and George2016). There is growing evidence for the use of MP-like inference in cortical areas (Bastos et al. Reference Bastos, Usrey, Adams, Mangun, Fries and Friston2012; George & Hawkins Reference George and Hawkins2009), and MP could offer a happy medium where inference is fast, as in neural networks, while retaining MCMC's capability for answering arbitrary queries on the model.