Clark's paper deserves far more than 1,000 words, but I have to be brief and dogmatic. Characterizing brains as predicting machines ignores many abilities produced by evolution and development,Footnote
1
including mathematical discovery and reasoning, using evolved mechanisms (perhaps) shared by several species capable of the “representational redescription” postulated in Karmiloff-Smith (Reference Karmiloff-Smith1992) and the meta-configured competences suggested in Chappell & Sloman (Reference Chappell and Sloman2007), including (largely unstudied) discoveries of “toddler theorems” (Sloman Reference Sloman, Pease, Guhe and Smaill2010). The “action-oriented predictive processing” approach treats everything as on-line control (Powers Reference Powers1973), like “enactivist” theorists who usually ignore competences required to make predictions true and processes generating and choosing (sometimes unconsciously) between goals, plans, designs (for houses, machines, etc.), preferences, explanations, theories, arguments, story plots, forms of representation, ontologies, grammars, and proofs. Predictive processing doesn't explain termite cathedral building. (Compare Chittka & Skorupski Reference Chittka and Skorupski2011).
Simultaneous localisation and mapping (SLAM) robotic techniques, partly inspired by things animals do, create useful (topological, metrical, and possibly logical) representations of enduring extended environments. That's not learning about mappings between inputs and outputs. It's a special case of using actions, percepts, and implicit theories to derive useful information about the environment. Another is producing a theory of chemical valency.
Systematically varying how things are squeezed, stroked, sucked, lifted, rotated, and so forth, supports learning about kinds of matter, and different spatial configurations and processes involving matter (Gibson Reference Gibson1966). Predicting sensory signals is only one application. Others include creating future structures and processes in the environment, and understanding processes. Choosing future actions often ignores sensory and motor details, since a different ontology is used (e.g., choosing between a holiday spent practising French and a music-making holiday, or choosing insulation for a new house). For more on “off-line” aspects of intelligence ignored by many “enactivist” and “embodied cognition” enthusiasts, see Sloman (Reference Sloman, Aiello and Shapiro1996; Reference Sloman2006; Reference Sloman, Sendhoff, Koerner, Sporns, Ritter and Doya2009). Even for on-line control, the use of servo-control with qualitative modifications of behavior responding to changing percepts reduces the need for probabilistic prediction: Head for the center of the gap, then as you get close use vision or touch to control your heading. Choosing a heading may, but need not, involve prediction: it could be a reflex action.
Predicting environmental changes need not use Bayesian inference, for example when you predict that two more chairs will ensure seats for everyone, or that the gear wheel rotating clockwise will make the one meshed with it rotate counter-clockwise. And some predictions refer to what cannot be sensed, for example most deep scientific predictions, or a prediction that a particular way of trying to prove Fermat's last theorem will fail.
Many things humans use brains for do not involve on-line intelligence, for example mulling over a conversation you had a week ago, lying supine with eyes shut composing a piano piece, trying to understand the flaw in a philosophical argument, or just daydreaming about an inter-planetary journey.
I don't deny that many cognitive processes involve mixtures of top-down, bottom-up, middle-out (etc.) influence: I helped produce a simple model of such visual processing decades ago, Popeye (Sloman Reference Sloman1978, Ch. 9), and criticized over-simple theories of vision that ignored requirements for process perception and on-line control (Sloman Reference Sloman, Braddick and Sleigh1982; Reference Sloman1989). David Hogg, then my student, used 3-D prediction to reduce visual search in tracking a human walker (Hogg Reference Hogg1983). Sloman (Reference Sloman2008) suggests that rapid perception of complex visual scenes requires rapid activation and instantiation of many normally dormant, previously learnt model fragment types and relationships, using constraint propagation to rapidly assemble and instantiate multi-layered percepts of structures and processes: a process of interpretation, not prediction (compare parsing). Building working models to test the ideas will be difficult, but not impossible. Constraint propagation need not use Bayesian inference.
“Thus consider a black box taking inputs from a complex external world. The box has input and output channels along which signals flow. But all it ‘knows’ about, in any direct sense, are the ways its own states (e.g., spike trains) flow and alter….The brain is one such black box” (sect. 1.2). This sounds like a variant of concept empiricism, defeated long ago by Kant (Reference Kant and Smith1781) and buried by philosophers of science.
Many things brains and minds do, including constructing interpretations and extending their own meta-cognitive mechanisms, are not concerned merely with predicting and controlling sensory and motor signals.
Evolutionary “trails”, from very simple to much more complex systems, may provide clues for a deep theory of animal cognition explaining the many layers of mechanism in more complex organisms. We need to distinguish diverse requirements for information processing of various sorts, and also the different behaviors and mechanisms. A notable contribution is Karmiloff-Smith (Reference Karmiloff-Smith1992). Other relevant work includes McCarthy (Reference McCarthy2008) and Trehub (Reference Trehub1991), and research by biologists on the diversity of cognition, even in very simple organisms. I have been trying to do this this sort of exploration of “design space” and “niche space” for many years (Sloman Reference Sloman and Cooper1971; Reference Sloman1978; Reference Sloman, MacCafferty and Gray1979; Reference Sloman, du Boulay, Hogg and Steels1987; Reference Sloman, Hookway and Peterson1993; Reference Sloman, Aiello and Shapiro1996; Reference Sloman, Anderson, Meyer and Olivier2002; Reference Sloman, Cox and Raja2011a; Reference Sloman2011b).
Where no intermediate evolutionary steps have been found, it may be possible to learn from alternative designs on branches derived from those missing cases. We can adopt the designer stance (McCarthy Reference McCarthy2008) to speculate about testable mechanisms. (It is a mistake to disparage “just so” stories based on deep experience of struggling to build working systems, when used to guide research rather than replace it.) This project requires studying many types of environment, including not only environments with increasingly complex and varied physical challenges and opportunities, but also increasingly rich and varied interactions with other information processing systems: predators, prey, and conspecifics (young and old). Generalizing Turing (Reference Turing1952), I call this the “Meta-morphogenesis project” (Sloman 2013).
Clark compares the prediction “story” with “mainstream computational accounts that posit a cascade of increasingly complex feature detection (perhaps with some top-down biasing)” (sect. 5.1). This fits some AI research, but labelling it as “mainstream” and treating it as the only alternative, ignores the diversity of approaches and techniques including constraint-processing, SLAM, theorem proving, planning, case-based reasoning, natural language processing, and many more. Much human motivation, especially in young children, seems to be concerned with extensions of competences, as opposed to predicting and acting, and similar learning by exploration and experiment is being investigated in robotics.
A minor point: Binocular rivalry doesn't always lead to alternating percepts. For example look at an object with one eye, with something moving slowly up and down blocking the view from the other eye. The remote object can appear as if behind a textured window moving up and down.
Clark claims (in his abstract) that the “hierarchical prediction machine” approach “offers the best clue yet to the shape of a unified science of mind and action”. But it unifies only the phenomena its proponents attend to.
Clark's paper deserves far more than 1,000 words, but I have to be brief and dogmatic. Characterizing brains as predicting machines ignores many abilities produced by evolution and development,Footnote 1 including mathematical discovery and reasoning, using evolved mechanisms (perhaps) shared by several species capable of the “representational redescription” postulated in Karmiloff-Smith (Reference Karmiloff-Smith1992) and the meta-configured competences suggested in Chappell & Sloman (Reference Chappell and Sloman2007), including (largely unstudied) discoveries of “toddler theorems” (Sloman Reference Sloman, Pease, Guhe and Smaill2010). The “action-oriented predictive processing” approach treats everything as on-line control (Powers Reference Powers1973), like “enactivist” theorists who usually ignore competences required to make predictions true and processes generating and choosing (sometimes unconsciously) between goals, plans, designs (for houses, machines, etc.), preferences, explanations, theories, arguments, story plots, forms of representation, ontologies, grammars, and proofs. Predictive processing doesn't explain termite cathedral building. (Compare Chittka & Skorupski Reference Chittka and Skorupski2011).
Simultaneous localisation and mapping (SLAM) robotic techniques, partly inspired by things animals do, create useful (topological, metrical, and possibly logical) representations of enduring extended environments. That's not learning about mappings between inputs and outputs. It's a special case of using actions, percepts, and implicit theories to derive useful information about the environment. Another is producing a theory of chemical valency.
Systematically varying how things are squeezed, stroked, sucked, lifted, rotated, and so forth, supports learning about kinds of matter, and different spatial configurations and processes involving matter (Gibson Reference Gibson1966). Predicting sensory signals is only one application. Others include creating future structures and processes in the environment, and understanding processes. Choosing future actions often ignores sensory and motor details, since a different ontology is used (e.g., choosing between a holiday spent practising French and a music-making holiday, or choosing insulation for a new house). For more on “off-line” aspects of intelligence ignored by many “enactivist” and “embodied cognition” enthusiasts, see Sloman (Reference Sloman, Aiello and Shapiro1996; Reference Sloman2006; Reference Sloman, Sendhoff, Koerner, Sporns, Ritter and Doya2009). Even for on-line control, the use of servo-control with qualitative modifications of behavior responding to changing percepts reduces the need for probabilistic prediction: Head for the center of the gap, then as you get close use vision or touch to control your heading. Choosing a heading may, but need not, involve prediction: it could be a reflex action.
Predicting environmental changes need not use Bayesian inference, for example when you predict that two more chairs will ensure seats for everyone, or that the gear wheel rotating clockwise will make the one meshed with it rotate counter-clockwise. And some predictions refer to what cannot be sensed, for example most deep scientific predictions, or a prediction that a particular way of trying to prove Fermat's last theorem will fail.
Many things humans use brains for do not involve on-line intelligence, for example mulling over a conversation you had a week ago, lying supine with eyes shut composing a piano piece, trying to understand the flaw in a philosophical argument, or just daydreaming about an inter-planetary journey.
I don't deny that many cognitive processes involve mixtures of top-down, bottom-up, middle-out (etc.) influence: I helped produce a simple model of such visual processing decades ago, Popeye (Sloman Reference Sloman1978, Ch. 9), and criticized over-simple theories of vision that ignored requirements for process perception and on-line control (Sloman Reference Sloman, Braddick and Sleigh1982; Reference Sloman1989). David Hogg, then my student, used 3-D prediction to reduce visual search in tracking a human walker (Hogg Reference Hogg1983). Sloman (Reference Sloman2008) suggests that rapid perception of complex visual scenes requires rapid activation and instantiation of many normally dormant, previously learnt model fragment types and relationships, using constraint propagation to rapidly assemble and instantiate multi-layered percepts of structures and processes: a process of interpretation, not prediction (compare parsing). Building working models to test the ideas will be difficult, but not impossible. Constraint propagation need not use Bayesian inference.
“Thus consider a black box taking inputs from a complex external world. The box has input and output channels along which signals flow. But all it ‘knows’ about, in any direct sense, are the ways its own states (e.g., spike trains) flow and alter….The brain is one such black box” (sect. 1.2). This sounds like a variant of concept empiricism, defeated long ago by Kant (Reference Kant and Smith1781) and buried by philosophers of science.
Many things brains and minds do, including constructing interpretations and extending their own meta-cognitive mechanisms, are not concerned merely with predicting and controlling sensory and motor signals.
Evolutionary “trails”, from very simple to much more complex systems, may provide clues for a deep theory of animal cognition explaining the many layers of mechanism in more complex organisms. We need to distinguish diverse requirements for information processing of various sorts, and also the different behaviors and mechanisms. A notable contribution is Karmiloff-Smith (Reference Karmiloff-Smith1992). Other relevant work includes McCarthy (Reference McCarthy2008) and Trehub (Reference Trehub1991), and research by biologists on the diversity of cognition, even in very simple organisms. I have been trying to do this this sort of exploration of “design space” and “niche space” for many years (Sloman Reference Sloman and Cooper1971; Reference Sloman1978; Reference Sloman, MacCafferty and Gray1979; Reference Sloman, du Boulay, Hogg and Steels1987; Reference Sloman, Hookway and Peterson1993; Reference Sloman, Aiello and Shapiro1996; Reference Sloman, Anderson, Meyer and Olivier2002; Reference Sloman, Cox and Raja2011a; Reference Sloman2011b).
Where no intermediate evolutionary steps have been found, it may be possible to learn from alternative designs on branches derived from those missing cases. We can adopt the designer stance (McCarthy Reference McCarthy2008) to speculate about testable mechanisms. (It is a mistake to disparage “just so” stories based on deep experience of struggling to build working systems, when used to guide research rather than replace it.) This project requires studying many types of environment, including not only environments with increasingly complex and varied physical challenges and opportunities, but also increasingly rich and varied interactions with other information processing systems: predators, prey, and conspecifics (young and old). Generalizing Turing (Reference Turing1952), I call this the “Meta-morphogenesis project” (Sloman 2013).
Clark compares the prediction “story” with “mainstream computational accounts that posit a cascade of increasingly complex feature detection (perhaps with some top-down biasing)” (sect. 5.1). This fits some AI research, but labelling it as “mainstream” and treating it as the only alternative, ignores the diversity of approaches and techniques including constraint-processing, SLAM, theorem proving, planning, case-based reasoning, natural language processing, and many more. Much human motivation, especially in young children, seems to be concerned with extensions of competences, as opposed to predicting and acting, and similar learning by exploration and experiment is being investigated in robotics.
A minor point: Binocular rivalry doesn't always lead to alternating percepts. For example look at an object with one eye, with something moving slowly up and down blocking the view from the other eye. The remote object can appear as if behind a textured window moving up and down.
Clark claims (in his abstract) that the “hierarchical prediction machine” approach “offers the best clue yet to the shape of a unified science of mind and action”. But it unifies only the phenomena its proponents attend to.