In The Society of Mind, Minsky (Reference Minsky1986) argued that the human brain is more similar to a complex society of diverse neural networks, than to a large, single one. The current theoretical mainstream in “deep” (artificial neural network [ANN]-based) learning leans in the opposite direction: building large ANNs with many layers of hidden units, relying more on computational power than on reverse engineering of brain functioning (Bengio Reference Bengio2009). The distinctive structural feature of the human brain is its synthesis of uniformity and diversity. Although the structure and functioning of neurons are uniform across the brain and across humans, the structure and evolution of neural connections make every human subject unique. Moreover, the mode of functioning of the left versus right hemisphere of the brain seems distinctively different (Gazzaniga Reference Gazzaniga2004). If we do not wonder about this homogeneity of components that results in a diversity of functions, we cannot understand the computational design principles of the brain, or make sense of the variety of “constitutional arrangements” in the governance of neural interactions at various levels – “monarchic” in some cases, “democratic” or “federative” in others.
In an environment characterized by considerable stimulus variability, a biological machine that responds by combining two different principles (as embodied in its two hemispheres) has a better chance of devising solutions that can flexibly adapt to circumstances, and even anticipate singular events. The two hemispheres seem to follow two opposite criteria: an analogical-intuitive one, gradient descent-like, and a digital-rational one, vector quantization-like. The former aims at anticipating and understanding sudden environmental changes – the “black swans.” The latter extrapolates trends from (currently classified as) familiar contexts and situations. These two criteria are conceptually orthogonal and, therefore, span a very rich space of cognitive functioning through their complex cooperation. On the other hand, the Bayesian approach advocated by the authors to complement the current “deep” learning agenda is useful only to simulate the functioning of the left-brain hemisphere.
The best way to capture these structural features is to imagine the brain as a society of agents (Minsky Reference Minsky1986), very heterogeneous and communicating through their common neural base by means of shared protocols, much like the Internet. The brain, as a highly functionally bio-diverse computational ecology, may therefore extract, from a large volume of external data, limited meaningful subsets (small data sets), to generate a variety of possible responses to these data sets and to learn from these very responses. This logic is antithetical to the mainstream notion of “deep learning” and of the consequential “big data” philosophy of processing large volumes of data to generate a few, “static” (i.e., very domain specific) responses – and which could, perhaps, more appropriately be called “fat” learning. Such dichotomy clearly echoes the tension between model-based learning and pattern recognition highlighted by the authors of the target article. Teaching a single, large, neural network how to associate an output to a certain input through millions of examples of a single situation is an exercise in brute force. It would be much more effective, in our view, to train a whole population of “deep” ANNs, mathematically very different from one another, on the same problem and to filter their results by means of a Meta-Net (Buscema Reference Buscema1998; Buscema et al. Reference Buscema, Terzi and Tastle2010; Reference Buscema, Tastle, Terzi and Tastle2013) that ignores their specific architectures, in terms of both prediction performance and biological plausibility.
We can therefore sum up the main tenets of our approach as follows:
-
1. There is extreme diversity in the architectures, logical principles, and mathematical structures of the deployed ANNs.
-
2. “parliament” is created whereby each ANN proposes its solution to each case, in view of its past track record for similar occurrences.
-
3. There is dynamic negotiation among the various hypotheses: The solution proposal of an ANN and its reputation re-enter as inputs for the other ANNs, until the ANN assembly reaches a consensus.
-
4. Another highly diverse pool of ANNs learns the whole dynamic process generated by the previous negotiation.
Responding to a pattern with a dynamic process rather than with a single output is much closer to the actual functioning of the human brain than associating a single output in a very domain-specific way, however nonlinear. Associative memory is a fundamental component of human intelligence: It is a cognitive morphing that connects apparently diverse experiences such as a lightning bolt and the fracture of a window pane. Human intelligence is a prediction engine working on hypotheses, generated from a relatively small database and constantly verified through sequential sampling: a cycle of perception, prediction, validation, and modification. Novelties, or changes in an already known environmental scene, will command immediate attention. Pattern recognition, therefore, is but the first step in understanding human intelligence. The next step should be building machines that generate dynamic responses to stimuli, that is, behave as dynamic associative memories (Buscema Reference Buscema1995; Reference Buscema1998; Reference Buscema, Buscema and Tastle2013; Buscema et al. Reference Buscema, Grossi, Montanini and Street2015). The very same associative process generated by the machine, in addition to interacting with itself and the external stimuli, must itself become the object of learning: This is learning-to-learn in its fuller meaning. In this way, the artificial intelligence frontier moves from pattern recognition to recognition of pattern transformations – learning the topology used by the brain to connect environmental scenes. Analyzing the cause-effect links within these internal processes provides the basis to identify meaningful rules of folk psychology or cognitive biases: A pound of feathers may be judged lighter than a pound of lead only in a thought process where feathers are associated with lightness. The meta-analysis of the connections generated by a mind may yield physically absurd, but psychologically consistent, associations.
An approach based on ecologies of computational diversity and dynamic brain associations seems to us the most promising route to a model-based learning paradigm that capitalizes on our knowledge of the brain's computational potential. And this also means allowing for mental disturbances, hallucinations, or delirium. A “deep” machine that cannot reproduce a dissociated brain is just not intelligent enough, and if it merely maximizes IQ, it is, in a sense, “dumb.” A system that can also contemplate stupidity or craziness is the real challenge of the “new” artificial intelligence.
In The Society of Mind, Minsky (Reference Minsky1986) argued that the human brain is more similar to a complex society of diverse neural networks, than to a large, single one. The current theoretical mainstream in “deep” (artificial neural network [ANN]-based) learning leans in the opposite direction: building large ANNs with many layers of hidden units, relying more on computational power than on reverse engineering of brain functioning (Bengio Reference Bengio2009). The distinctive structural feature of the human brain is its synthesis of uniformity and diversity. Although the structure and functioning of neurons are uniform across the brain and across humans, the structure and evolution of neural connections make every human subject unique. Moreover, the mode of functioning of the left versus right hemisphere of the brain seems distinctively different (Gazzaniga Reference Gazzaniga2004). If we do not wonder about this homogeneity of components that results in a diversity of functions, we cannot understand the computational design principles of the brain, or make sense of the variety of “constitutional arrangements” in the governance of neural interactions at various levels – “monarchic” in some cases, “democratic” or “federative” in others.
In an environment characterized by considerable stimulus variability, a biological machine that responds by combining two different principles (as embodied in its two hemispheres) has a better chance of devising solutions that can flexibly adapt to circumstances, and even anticipate singular events. The two hemispheres seem to follow two opposite criteria: an analogical-intuitive one, gradient descent-like, and a digital-rational one, vector quantization-like. The former aims at anticipating and understanding sudden environmental changes – the “black swans.” The latter extrapolates trends from (currently classified as) familiar contexts and situations. These two criteria are conceptually orthogonal and, therefore, span a very rich space of cognitive functioning through their complex cooperation. On the other hand, the Bayesian approach advocated by the authors to complement the current “deep” learning agenda is useful only to simulate the functioning of the left-brain hemisphere.
The best way to capture these structural features is to imagine the brain as a society of agents (Minsky Reference Minsky1986), very heterogeneous and communicating through their common neural base by means of shared protocols, much like the Internet. The brain, as a highly functionally bio-diverse computational ecology, may therefore extract, from a large volume of external data, limited meaningful subsets (small data sets), to generate a variety of possible responses to these data sets and to learn from these very responses. This logic is antithetical to the mainstream notion of “deep learning” and of the consequential “big data” philosophy of processing large volumes of data to generate a few, “static” (i.e., very domain specific) responses – and which could, perhaps, more appropriately be called “fat” learning. Such dichotomy clearly echoes the tension between model-based learning and pattern recognition highlighted by the authors of the target article. Teaching a single, large, neural network how to associate an output to a certain input through millions of examples of a single situation is an exercise in brute force. It would be much more effective, in our view, to train a whole population of “deep” ANNs, mathematically very different from one another, on the same problem and to filter their results by means of a Meta-Net (Buscema Reference Buscema1998; Buscema et al. Reference Buscema, Terzi and Tastle2010; Reference Buscema, Tastle, Terzi and Tastle2013) that ignores their specific architectures, in terms of both prediction performance and biological plausibility.
We can therefore sum up the main tenets of our approach as follows:
1. There is extreme diversity in the architectures, logical principles, and mathematical structures of the deployed ANNs.
2. “parliament” is created whereby each ANN proposes its solution to each case, in view of its past track record for similar occurrences.
3. There is dynamic negotiation among the various hypotheses: The solution proposal of an ANN and its reputation re-enter as inputs for the other ANNs, until the ANN assembly reaches a consensus.
4. Another highly diverse pool of ANNs learns the whole dynamic process generated by the previous negotiation.
Responding to a pattern with a dynamic process rather than with a single output is much closer to the actual functioning of the human brain than associating a single output in a very domain-specific way, however nonlinear. Associative memory is a fundamental component of human intelligence: It is a cognitive morphing that connects apparently diverse experiences such as a lightning bolt and the fracture of a window pane. Human intelligence is a prediction engine working on hypotheses, generated from a relatively small database and constantly verified through sequential sampling: a cycle of perception, prediction, validation, and modification. Novelties, or changes in an already known environmental scene, will command immediate attention. Pattern recognition, therefore, is but the first step in understanding human intelligence. The next step should be building machines that generate dynamic responses to stimuli, that is, behave as dynamic associative memories (Buscema Reference Buscema1995; Reference Buscema1998; Reference Buscema, Buscema and Tastle2013; Buscema et al. Reference Buscema, Grossi, Montanini and Street2015). The very same associative process generated by the machine, in addition to interacting with itself and the external stimuli, must itself become the object of learning: This is learning-to-learn in its fuller meaning. In this way, the artificial intelligence frontier moves from pattern recognition to recognition of pattern transformations – learning the topology used by the brain to connect environmental scenes. Analyzing the cause-effect links within these internal processes provides the basis to identify meaningful rules of folk psychology or cognitive biases: A pound of feathers may be judged lighter than a pound of lead only in a thought process where feathers are associated with lightness. The meta-analysis of the connections generated by a mind may yield physically absurd, but psychologically consistent, associations.
An approach based on ecologies of computational diversity and dynamic brain associations seems to us the most promising route to a model-based learning paradigm that capitalizes on our knowledge of the brain's computational potential. And this also means allowing for mental disturbances, hallucinations, or delirium. A “deep” machine that cannot reproduce a dissociated brain is just not intelligent enough, and if it merely maximizes IQ, it is, in a sense, “dumb.” A system that can also contemplate stupidity or craziness is the real challenge of the “new” artificial intelligence.