Back to the future: The return of cognitive functionalism

Leyla Roskan Çağlar; Stephen José Hanson

doi:10.1017/S0140525X17000061

Back to the future: The return of cognitive functionalism

Published online by Cambridge University Press: 10 November 2017

Leyla Roskan Çağlar and

Stephen José Hanson

Show author details

Leyla Roskan Çağlar: Affiliation:
Psychology Department, Rutgers University Brain Imaging Center (RUBIC), Rutgers University, Newark, NJ 07102. lcaglar@psychology.rutgers.edujose@rubic.rutgers.eduhttps://leylaroksancaglar.github.io/http://nwkpsych.rutgers.edu/~jose/
Stephen José Hanson: Affiliation:
Psychology Department, Rutgers University Brain Imaging Center (RUBIC), Rutgers University, Newark, NJ 07102. lcaglar@psychology.rutgers.edujose@rubic.rutgers.eduhttps://leylaroksancaglar.github.io/http://nwkpsych.rutgers.edu/~jose/

Article contents

Abstract
References

Rights & Permissions

Abstract

The claims that learning systems must build causal models and provide explanations of their inferences are not new, and advocate a cognitive functionalism for artificial intelligence. This view conflates the relationships between implicit and explicit knowledge representation. We present recent evidence that neural networks do engage in model building, which is implicit, and cannot be dissociated from the learning process.

Type: Open Peer Commentary
Information: Behavioral and Brain Sciences , Volume 40 , 2017 , e257

DOI: https://doi.org/10.1017/S0140525X17000061 [Opens in a new window]
Copyright: Copyright © Cambridge University Press 2017

The neural network revolution occurred more than 30 years ago, stirring intense debate over what neural networks (NNs) can and cannot learn and represent. Much of the target article resurrects these earlier concerns, but in the context of the latest NN revolution, spearheaded by an algorithm that was known, but failed because of scale and computational power, namely, deep learning (DL).

Claims that learning systems must build causal models and provide explanations of their inferences are not new (DeJong Reference DeJong and Mooney1986; Lenat Reference Lenat, Miller and Yokoi1995; Mitchell 1986), nor have they been proven successful. Advocating the idea that artificial intelligence (AI) systems need commonsense knowledge, ambitious projects such as “Cyc” (Lenat Reference Lenat and Guha1990) created hand-crafted and labor-intensive knowledge bases, combined with an inference engine to derive answers in the form of explicit knowledge. Despite feeding a large but finite number of factual assertions and explicit rules into such systems, the desired human-like performance was never accomplished. Other explanation-based and expert systems (e.g., WordNet [Miller Reference Miller, Beckwith, Fellbaum, Gross and Miller1990]) proved useful in some applied domains, but were equally unable to solve the problem of AI. At the essence of such projects lies the idea of “cognitive functionalism.” Proposing that mental states are functional states determined and individuated by their causal relations to other mental states and behaviors, it suggests that mental states are programmable with explicitly determined representational structures (Fodor, Reference Fodor1981; Hayes Reference Hayes1974; McCarthy & Hayes Reference McCarthy, Hayes, Meltzer and Michie1969; Putnam Reference Putnam, Capitan and Merrill1967). Such a view stresses the importance of “formalizing concepts of causality, ability, and knowledge” to create “a computer program that decides what to do by inferring in a formal language that a certain strategy will achieve its assigned goal” (McCarthy & Hayes, Reference McCarthy, Hayes, Meltzer and Michie1969, p. 1). Lake et al.'s appeal to causal mechanisms and their need for explicit model representations is closely related to this cognitive functionalism, which had been put forth as a set of principles by many founders of the AI field (Hayes Reference Hayes1974; McCarthy Reference McCarthy1959; McCarthy & Hayes Reference McCarthy, Hayes, Meltzer and Michie1969; Newell & Simon, Reference Newell and Simon1956).

One important shortcoming of cognitive functionalism is its failure to acknowledge that the same behavior/function may be caused by different representations and mechanisms (Block Reference Block1978; Hanson Reference Hanson, Chauvin and Rummelhart1995). Consequently, the problem with this proposition that knowledge within a learning system must be explicit is that it conflates the relationship between implicit knowledge and explicit knowledge and their representations. The ability to throw a low hanging fast ball would be difficult, if not impossible, to encode as a series of rules. However, this type of implicit knowledge can indeed be captured in a neural network, simply by having it learn from an analog perception–action system and a series of ball throws – all while also having the ability to represent rule-based knowledge (Horgan & Tienson Reference Horgan and Tienson1996). This associative versus rule learning debate, referred to in this article as “pattern recognition” versus “model building,” was shown a number of times to be a meaningless dichotomy (Hanson & Burr Reference Hanson and Burr1990; Hanson et al. Reference Hanson2002; Prasada & Pinker Reference Prasada and Pinker1993).

Although we agree with Lake et al. that “model building” is indeed an important component of any AI system, we do not agree that NNs merely recognize patterns and lack the ability to build models. Our disagreement arises from the presumption that “a model must include explicit representations of objects, identity and relations” (Lake et al. 2016, pp. 38–39). Rather than being explicit or absent altogether, model representation is implicit in NNs. Investigating implicitly learned models is somewhat more challenging, but work on learning dynamics and learning functions with respect to their relationship to representations provides insights into these implicit models (Caglar & Hanson Reference Caglar and Hanson2016; Cleeremans Reference Cleeremans1993; Hanson & Burr Reference Hanson and Burr1990; Metcalf et al. Reference Metcalfe, Cottrell and Mencl1992; Saxe et al. Reference Saxe, McClelland and Ganguli2014).

Recent work has shown that in DL, the internal structure, or “model,” accumulates at later layers, and is effectively constructing “scaffolds” over the learning process that are then used to train subsequent layers (Caglar & Hanson Reference Caglar and Hanson2016; Saxe Reference Saxe, McClelland and Ganguli2013). These learning dynamics can be investigated through analysis of the learning curves and the internal representations resultant in the hidden units. Analysis of the learning curves of NNs with different architectures reveals that merely adding depth to a NN results in different learning dynamics and representational structures, which do not require explicit preprogramming or pre-training (Caglar & Hanson Reference Caglar and Hanson2016). In fact, the shape of the learning curves for single-layer NNs and for multilayered DLs are qualitatively different, with the former fitting a negative exponential function (“associative”) and the latter fitting a hyperbolic function (“accumulative”). This type of structured learning, consistent with the shape of the learning curves, can be shown to be equivalent to the “learning-to-learn” component suggested by the authors. Appearing across different layers of the NNs, it also satisfies the need for “learning-to-learn to occur at multiple levels of the hierarchical generative process” (Lake et al., sect. 4.2.3, para. 5).

Furthermore, in category learning tasks with DLs, the internal representation of the hidden units shows that it creates prototype-like representations at each layer of the network (Caglar & Hanson Reference Caglar and Hanson2016). These higher-level representations are the result of concept learning from exemplars, and go far beyond simple pattern recognition. Additionally, the plateau characteristic of the hyperbolic learning curves provides evidence for rapid learning, as well as one-shot learning once this kind of implicit conceptual representation has been formed over some subset of exemplars (similar to a “prior”) (Saxe 2014). Longstanding investigation in the learning theory literature proposes that the hyperbolic learning curve of DLs is also the shape that best describes human learning (Mazur & Hastie Reference Mazur and Hastie1978; Thurstone Reference Thurstone1919), thereby suggesting that the learning mechanisms of DLs and humans might be more similar than thought (Hanso et al., in preparation).

Taken together, the analysis of learning curves and internal representations of hidden units indicates that NNs do in fact build models and create representational structures. However, these models are implicitly built into the learning process and cannot be explicitly dissociated from it. Exploiting the rich information of the stimulus and its context, the learning process creates models and shapes representational structures without the need for explicit preprogramming.

References

Block, N. (1978) Troubles with functionalism. Minnesota Studies in the Philosophy of Science 9:261–325.Google Scholar

Caglar, L. R. & Hanson, S. J. (2016) Deep learning and attentional bias in human category learning. Poster presented at the Neural Computation and Psychology Workshop on Contemporary Neural Networks, Philadelphia, PA, August 8–10, 2016.Google Scholar

Cleeremans, A. (1993) Mechanisms of implicit learning: Connectionist models of sequence processing. MIT Press.CrossRef Google Scholar

DeJong, G. & Mooney, R. (1986) Explanation-based learning: An alternative view. Machine Learning 1(2):145–76.Google Scholar

Fodor, J. A. (1981) Representations: Philosophical essays on the foundations of cognitive science. MIT Press.Google Scholar

Hanson, S. J., (1995) Some comments and variations on back-propagation. In: The handbook of back-propagation, ed. Chauvin, Y. & Rummelhart, D., pp. 292–323. Erlbaum.Google Scholar

Hanson, S. J. (2002) On the emergence of rules in neural networks. Neural Computation 14(9):2245–68.Google Scholar

Hanson, S. J. & Burr, D. J., (1990) What connectionist models learn: Toward a theory of representation in connectionist networks. Behavioral and Brain Sciences 13:471–518.Google Scholar

Hanson, S. J., Caglar, L. R. & Hanson, C. (under review) The deep history of deep learning.Google Scholar

Hayes, P. J. (1974) Some problems and non-problems in representation theory. In: Proceedings of the 1st summer conference on artificial intelligence and simulation of behaviour, pp. 63–79. IOS Press.Google Scholar

Horgan, T. & Tienson, J., (1996) Connectionism and the philosophy of psychology. MIT Press.Google Scholar

Lenat, D. & Guha, R. V. (1990) Building large. Knowledge based systems: Representation and inference in the Cyc project. Addison-Wesley.Google Scholar

Lenat, D., Miller, G. & Yokoi, T (1995) CYC, WordNet, and EDR: Critiques and responses. Communications of the ACM 38(11):45–48.CrossRef Google Scholar

Mazur, J. E. & Hastie, R. (1978) Learning as accumulation: A reexamination of the learning curve. Psychological Bulletin 85:1256–74.Google Scholar

McCarthy, J. (1959) Programs with common sense at the Wayback machine (archived October 4, 2013). In: Proceedings of the Teddington Conference on the Mechanization of Thought Processes, pp. 756–91. AAAI Press.Google Scholar

McCarthy, J. & Hayes, P. J. (1969) Some philosophical problems from the standpoint of artificial intelligence. In: Machine Intelligence 4, ed. Meltzer, B. & Michie, D., pp. 463–502. Edinburgh University Press.Google Scholar

Metcalfe, J., Cottrell, G. W. & Mencl, W. E. (1992) Cognitive binding: A computational-modeling analysis of a distinction between implicit and explicit memory. Journal of Cognitive Neuroscience 4(3):289–98.Google Scholar

Miller, G. A., Beckwith, R., Fellbaum, C., Gross, D. & Miller, K. J. (1990) Introduction to WordNet: An on-line lexical database. International Journal of Lexicography 3(4):235–44.Google Scholar

Newell, A. & Simon, H. (1956) The logic theory machine. A complex information processing system. IRE Transactions on Information Theory 2(3):61–79.Google Scholar

Prasada, S. & Pinker, S. (1993) Generalizations of regular and irregular morphology. Language and Cognitive Processes 8(1):1–56.CrossRef Google Scholar

Putnam, H. (1967) Psychophysical predicates. In: Art, mind, and religion, ed. Capitan, W. & Merrill, D.. University of Pittsburgh Press. (Reprinted in 1975 as The nature of mental states, pp. 429–40. Putnam.)Google Scholar

Saxe, A. M., McClelland, J. L. & Ganguli, S. (2013) Dynamics of learning in deep linear neural networks. Presented at the NIPS 2013 Deep Learning Workshop, Lake Tahoe, NV, December 9, 2013.Google Scholar

Saxe, A. M., McClelland, J. L. & Ganguli, S. (2014) Exact solutions to the nonlinear dynamics of learning in deep linear neural networks. Presented at the International Conference on Learning Representations, Banff, Canada, April 14–16, 2014. arXiv preprint 1312.6120. Available at:https://arxiv.org/abs/1312.6120.Google Scholar