Digging deeper on “deep” learning: A computational ecology approach

Massimo Buscema; Pier Luigi Sacco

doi:10.1017/S0140525X1700005X

Digging deeper on “deep” learning: A computational ecology approach

Published online by Cambridge University Press: 10 November 2017

Massimo Buscema and

Pier Luigi Sacco

Show author details

Massimo Buscema: Affiliation:
Semeion Research Center, 00128 Rome, Italy. m.buscema@semeion.it. www.semeion.itwww.researchgate.net/profile/Massimo_Buscema University of Colorado at Denver, Denver, CO 80217
Pier Luigi Sacco: Affiliation:
IULM University of Milan, 20143 Milan, Italy. pierluigi.sacco@iulm.itwww.researchgate.net/profile/Pier_Sacco Harvard University Department of Romance Languages and Literatures, Cambridge, MA 02138. pierluigi@metalab.harvard.edupierluigi_sacco@fas.harvard.edu

Article contents

Abstract
References

Rights & Permissions

Abstract

We propose an alternative approach to “deep” learning that is based on computational ecologies of structurally diverse artificial neural networks, and on dynamic associative memory responses to stimuli. Rather than focusing on massive computation of many different examples of a single situation, we opt for model-based learning and adaptive flexibility. Cross-fertilization of learning processes across multiple domains is the fundamental feature of human intelligence that must inform “new” artificial intelligence.

Type: Open Peer Commentary
Information: Behavioral and Brain Sciences , Volume 40 , 2017 , e256

DOI: https://doi.org/10.1017/S0140525X1700005X [Opens in a new window]
Copyright: Copyright © Cambridge University Press 2017

In The Society of Mind, Minsky (Reference Minsky1986) argued that the human brain is more similar to a complex society of diverse neural networks, than to a large, single one. The current theoretical mainstream in “deep” (artificial neural network [ANN]-based) learning leans in the opposite direction: building large ANNs with many layers of hidden units, relying more on computational power than on reverse engineering of brain functioning (Bengio Reference Bengio2009). The distinctive structural feature of the human brain is its synthesis of uniformity and diversity. Although the structure and functioning of neurons are uniform across the brain and across humans, the structure and evolution of neural connections make every human subject unique. Moreover, the mode of functioning of the left versus right hemisphere of the brain seems distinctively different (Gazzaniga Reference Gazzaniga2004). If we do not wonder about this homogeneity of components that results in a diversity of functions, we cannot understand the computational design principles of the brain, or make sense of the variety of “constitutional arrangements” in the governance of neural interactions at various levels – “monarchic” in some cases, “democratic” or “federative” in others.

In an environment characterized by considerable stimulus variability, a biological machine that responds by combining two different principles (as embodied in its two hemispheres) has a better chance of devising solutions that can flexibly adapt to circumstances, and even anticipate singular events. The two hemispheres seem to follow two opposite criteria: an analogical-intuitive one, gradient descent-like, and a digital-rational one, vector quantization-like. The former aims at anticipating and understanding sudden environmental changes – the “black swans.” The latter extrapolates trends from (currently classified as) familiar contexts and situations. These two criteria are conceptually orthogonal and, therefore, span a very rich space of cognitive functioning through their complex cooperation. On the other hand, the Bayesian approach advocated by the authors to complement the current “deep” learning agenda is useful only to simulate the functioning of the left-brain hemisphere.

The best way to capture these structural features is to imagine the brain as a society of agents (Minsky Reference Minsky1986), very heterogeneous and communicating through their common neural base by means of shared protocols, much like the Internet. The brain, as a highly functionally bio-diverse computational ecology, may therefore extract, from a large volume of external data, limited meaningful subsets (small data sets), to generate a variety of possible responses to these data sets and to learn from these very responses. This logic is antithetical to the mainstream notion of “deep learning” and of the consequential “big data” philosophy of processing large volumes of data to generate a few, “static” (i.e., very domain specific) responses – and which could, perhaps, more appropriately be called “fat” learning. Such dichotomy clearly echoes the tension between model-based learning and pattern recognition highlighted by the authors of the target article. Teaching a single, large, neural network how to associate an output to a certain input through millions of examples of a single situation is an exercise in brute force. It would be much more effective, in our view, to train a whole population of “deep” ANNs, mathematically very different from one another, on the same problem and to filter their results by means of a Meta-Net (Buscema Reference Buscema1998; Buscema et al. Reference Buscema, Terzi and Tastle2010; Reference Buscema, Tastle, Terzi and Tastle2013) that ignores their specific architectures, in terms of both prediction performance and biological plausibility.

We can therefore sum up the main tenets of our approach as follows:

1. There is extreme diversity in the architectures, logical principles, and mathematical structures of the deployed ANNs.
2. “parliament” is created whereby each ANN proposes its solution to each case, in view of its past track record for similar occurrences.
3. There is dynamic negotiation among the various hypotheses: The solution proposal of an ANN and its reputation re-enter as inputs for the other ANNs, until the ANN assembly reaches a consensus.
4. Another highly diverse pool of ANNs learns the whole dynamic process generated by the previous negotiation.

Responding to a pattern with a dynamic process rather than with a single output is much closer to the actual functioning of the human brain than associating a single output in a very domain-specific way, however nonlinear. Associative memory is a fundamental component of human intelligence: It is a cognitive morphing that connects apparently diverse experiences such as a lightning bolt and the fracture of a window pane. Human intelligence is a prediction engine working on hypotheses, generated from a relatively small database and constantly verified through sequential sampling: a cycle of perception, prediction, validation, and modification. Novelties, or changes in an already known environmental scene, will command immediate attention. Pattern recognition, therefore, is but the first step in understanding human intelligence. The next step should be building machines that generate dynamic responses to stimuli, that is, behave as dynamic associative memories (Buscema Reference Buscema1995; Reference Buscema1998; Reference Buscema, Buscema and Tastle2013; Buscema et al. Reference Buscema, Grossi, Montanini and Street2015). The very same associative process generated by the machine, in addition to interacting with itself and the external stimuli, must itself become the object of learning: This is learning-to-learn in its fuller meaning. In this way, the artificial intelligence frontier moves from pattern recognition to recognition of pattern transformations – learning the topology used by the brain to connect environmental scenes. Analyzing the cause-effect links within these internal processes provides the basis to identify meaningful rules of folk psychology or cognitive biases: A pound of feathers may be judged lighter than a pound of lead only in a thought process where feathers are associated with lightness. The meta-analysis of the connections generated by a mind may yield physically absurd, but psychologically consistent, associations.

An approach based on ecologies of computational diversity and dynamic brain associations seems to us the most promising route to a model-based learning paradigm that capitalizes on our knowledge of the brain's computational potential. And this also means allowing for mental disturbances, hallucinations, or delirium. A “deep” machine that cannot reproduce a dissociated brain is just not intelligent enough, and if it merely maximizes IQ, it is, in a sense, “dumb.” A system that can also contemplate stupidity or craziness is the real challenge of the “new” artificial intelligence.

References

Bengio, J. (2009) Learning deep architectures for AI. Foundations and Trends in Machine Learning 2(1):1–127.Google Scholar

Buscema, M. (1995) Self-reflexive networks: Theory – topology – Applications. Quality and Quantity 29(4):339–403.CrossRef Google Scholar

Buscema, M. (1998) Metanet*: The theory of independent judges. Substance Use and Misuse 32(2):439–61.CrossRef Google Scholar

Buscema, M. (2013) Artificial adaptive system for parallel querying of multiple databases. In: Intelligent data mining in law enforcement analytics, ed. Buscema, M. & Tastle, W. J., pp. 481–511. Springer.Google Scholar

Buscema, M., Grossi, E., Montanini, L. & Street, M. E. (2015) Data mining of determinants of intrauterine growth retardation revisited using novel algorithms generating semantic maps and prototypical discriminating variable profiles. PLoS One 10(7):e0126020.Google Scholar

Buscema, M., Tastle, W. J. & Terzi, S. (2013) Meta net: A new meta-classifier family. In: Data mining applications using artificial adaptive systems, ed. Tastle, W. J., pp. 141–82. Springer.Google Scholar

Buscema, M., Terzi, S. & Tastle, W.J. (2010). A new meta-classifier. In: 2010 Annual Meeting of the North American Fuzzy Information Processing Society (NAFIPS), Toronto, ON, Canada, pp. 1–7. IEEE.Google Scholar

Gazzaniga, M. (2004) Cognitive neuroscience. MIT Press.Google Scholar

Minsky, M. (1986) The society of mind. Simon and Schuster.Google Scholar