Hostname: page-component-745bb68f8f-lrblm Total loading time: 0 Render date: 2025-02-10T03:38:27.393Z Has data issue: false hasContentIssue false

Thinking through other minds: A variational approach to cognition and culture

Published online by Cambridge University Press:  30 May 2019

Samuel P. L. Veissière
Affiliation:
Division of Social and Transcultural Psychiatry, Department of Psychiatry, McGill University, Montreal, Quebec, CanadaH3A 1A1samuel.veissiere@mcgill.caMaxwell.ramstead@mcgill.caLaurence.kirmayer@mcgill.ca Culture, Mind, and Brain Program, McGill University, Montreal, Quebec, CanadaH3A 1A1 Department of Anthropology, McGill University, Montreal, Quebec, CanadaH3A 2T7
Axel Constant
Affiliation:
Culture, Mind, and Brain Program, McGill University, Montreal, Quebec, CanadaH3A 1A1 Charles Perkins Centre, The University of Sydney, Sydney, New South Wales, Australia2006axel.constant.pruvost@gmail.com Wellcome Centre for Human Neuroimaging, University College London, LondonWC1N 3AR, UK. k.friston@ucl.ac.uk
Maxwell J. D. Ramstead
Affiliation:
Division of Social and Transcultural Psychiatry, Department of Psychiatry, McGill University, Montreal, Quebec, CanadaH3A 1A1samuel.veissiere@mcgill.caMaxwell.ramstead@mcgill.caLaurence.kirmayer@mcgill.ca Culture, Mind, and Brain Program, McGill University, Montreal, Quebec, CanadaH3A 1A1 Wellcome Centre for Human Neuroimaging, University College London, LondonWC1N 3AR, UK. k.friston@ucl.ac.uk
Karl J. Friston
Affiliation:
Wellcome Centre for Human Neuroimaging, University College London, LondonWC1N 3AR, UK. k.friston@ucl.ac.uk
Laurence J. Kirmayer
Affiliation:
Division of Social and Transcultural Psychiatry, Department of Psychiatry, McGill University, Montreal, Quebec, CanadaH3A 1A1samuel.veissiere@mcgill.caMaxwell.ramstead@mcgill.caLaurence.kirmayer@mcgill.ca Culture, Mind, and Brain Program, McGill University, Montreal, Quebec, CanadaH3A 1A1 Department of Anthropology, McGill University, Montreal, Quebec, CanadaH3A 2T7
Rights & Permissions [Opens in a new window]

Abstract

The processes underwriting the acquisition of culture remain unclear. How are shared habits, norms, and expectations learned and maintained with precision and reliability across large-scale sociocultural ensembles? Is there a unifying account of the mechanisms involved in the acquisition of culture? Notions such as “shared expectations,” the “selective patterning of attention and behaviour,” “cultural evolution,” “cultural inheritance,” and “implicit learning” are the main candidates to underpin a unifying account of cognition and the acquisition of culture; however, their interactions require greater specification and clarification. In this article, we integrate these candidates using the variational (free-energy) approach to human cognition and culture in theoretical neuroscience. We describe the construction by humans of social niches that afford epistemic resources called cultural affordances. We argue that human agents learn the shared habits, norms, and expectations of their culture through immersive participation in patterned cultural practices that selectively pattern attention and behaviour. We call this process “thinking through other minds” (TTOM) – in effect, the process of inferring other agents’ expectations about the world and how to behave in social context. We argue that for humans, information from and about other people's expectations constitutes the primary domain of statistical regularities that humans leverage to predict and organize behaviour. The integrative model we offer has implications that can advance theories of cognition, enculturation, adaptation, and psychopathology. Crucially, this formal (variational) treatment seeks to resolve key debates in current cognitive science, such as the distinction between internalist and externalist accounts of theory of mind abilities and the more fundamental distinction between dynamical and representational accounts of enactivism.

Type
Target Article
Copyright
Copyright © The Author(s), 2019. Published by Cambridge University Press

[Humans] form with others joint goals to which both parties are normatively committed, they establish with others domains of joint attention and common conceptual ground, and they create with others symbolic, institutional realities that assign deontic powers to otherwise inert entities.

—Michael Tomasello (Reference Tomasello2009, p. 105)

Choosing a swimsuit—

when did his eyes replace mine?

(mizugi erabu itsu shika kare no me to natte)

—Mayuzumi Madoka (Reference Madoka and Ueda2003, p. 232)Footnote 1

1. Introduction: Learning in cultural context

1.1. The puzzle of implicit cultural learning

Since the advent of the social sciences in the late nineteenth century, a recurring trope casts “society” or, in its Durkheimian formulation, “regulatory social forces” (Durkheim Reference Durkheim1985/2014) as superordinate to individual human agency. As the story goes, humans acquire norms, tastes, preferences, and ways of doing things that are consistent with those of others in their local world and communities – that is, the relevant social and cultural groups (in-groups and out-groups) to which they belong and with whom they interact (Kurzban and Neuberg Reference Kurzban, Neuberg and Buss2005).

Group variations in learned and structured dispositions extend to such domains as culturally shaped body practices like walking, sitting, eating, and sleeping (Mauss Reference Mauss1973); differentiated patterns of prejudice or bias against certain kinds of persons (e.g., racism, sexism, and classism; Machery Reference Machery, Brownstein and Saul2016); proneness to optical illusions (McCauley & Henrich Reference McCauley and Henrich2006); colour perception (Goldstein et al. Reference Goldstein, Davidoff and Roberson2009); food preferences (Wright et al. Reference Wright, Nancarrow and Kwok2001); desirable body types (Swami et al. Reference Swami, Frederick, Aavik, Alcalay, Allik, Anderson, Andrianto, Arora, Brännström, Cunningham, Danel, Doroszewicz, Forbes, Furnham, Greven, Halberstadt, Hao, Haubner, Hwang, Inman, Jaafar, Johansson, Jung, Keser, Kretzschmar, Lachenicht, Li, Locke, Lönnqvist, Lopez, Loutzenhiser, Maisel, McCabe, McCreary, McKibbin, Mussap, Neto, Nowell, Alampay, Pillai, Pokrajac-Bulian, Proyer, Quintelier, Ricciardelli, Rozmus-Wrzesinska, Ruch, Russo, Schütz, Shackelford, Shashidharan, Simonetti, Sinniah, Swami, Vandermassen, van Duynslaeger, Verkasalo, Voracek, Yee, Zhang, Zhang and Zivcic-Becirevic2010); and thresholds for pain (Zatzick & Dimsdale Reference Zatzick and Dimsdale1990) and other forms of suffering and affliction that are shaped by culture (Kirmayer Reference Kirmayer1989; Kirmayer & Young Reference Kirmayer and Young1998; Kirmayer et al. Reference Kirmayer, Gomez-Carrillo and Veissière2017) and historical context (Gold & Gold Reference Gold and Gold2015; Hacking Reference Hacking1998). As developmental psychologists have argued, it is precisely because of the existence of intergroup behavioural and cognitive variations that arise through social learning within members of the same species that we can speak of culture (Tomasello Reference Tomasello2009). We know there is such a “thing” as culture, in other words, because there are cultural differences (Brown Reference Brown2004). Although it is clear that specific developmental experiences – governed by explicit social norms and contexts – shape these perceptual, cognitive, and attitudinal processes, most of cultural learning appears to be implicit, in the sense that it occurs without explicit instruction.

Implicit cultural learning poses a classical “poverty of stimulus” problem, in that acquired knowledge, attitudes, and dispositions appear to go far beyond what can be learned by direct experience (Berwick et al. Reference Berwick, Chomsky, Piattelli-Palmarini, Piattelli-Palmarini and Berwick2013; Chomsky Reference Chomsky1996) – they evince a special, ampliative form of abductive inference. For example, alongside the many rules and facts about the world that are explicitly taught, human children learn a large and stable set of implicit beliefs that govern action without needing to be stated explicitly, described, or explained (Sperber Reference Sperber1996; Reference Sperber1997). By age 7, children are already proficient in complex, though mostly tacit intergroup relational rules and dynamics of power, and already form implicit judgments about the “value” of members of other groups, and that of their group in relation to others (e.g., children of minority groups often internalize preferences for prestige-laden groups different from their own ethnic group; for a review, see Clark Reference Clark1988; Clark & Clark Reference Clark and Clark1939; Huneman & Machery Reference Huneman, Machery, Heams, Huneman, Lecointre and Silberstein2015; Kelly et al. Reference Kelly, Faucher and Machery2010; Kinzler & Spelke Reference Kinzler and Spelke2011; Machery & Faucher Reference Machery, Faucher, Cohen and Lefebvre2017; Navarrete & Fessler Reference Navarrete and Fessler2005; Pauker et al. Reference Pauker, Williams and Steele2016).

Clearly, we are continuously immersed in culturally shaped environments and interactions from before birth. Despite advances in developmental psychology (Csibra & Gergely Reference Csibra and Gergely2009; Tomasello Reference Tomasello2014) and cognitive anthropology (Boyd and Richerson Reference Boyd and Richerson2005), we still lack a formal account of the mechanisms of enculturation. The processes that enable implicit cultural habits and norms to arise from inference and imitation, and to be learned and maintained with a high degree of precision and reliability across large-scale sociocultural phenomena, involving multiple interlocking minds and institutional structures, are only partly understood. This is our puzzle.

1.2. The theory of mind debates

In this article, we will propose a solution to the puzzle of implicit cultural learning. We present a model of the ability to perform inferences about the shared beliefs that underwrite social norms and patterned cultural practices derived from first principles. In helping to solve the puzzle of the implicit acquisition of culture, our model provides an integrative view of what has variously been called mind reading, perspective taking, joint intentionality, folk psychology, mentalizing, or theory of mind (TOM) in short, the human ability to ascribe mental states, intentions, and feelings to other human agents and to oneself. To simplify, we will use the term TOM to refer to this ability. Of pertinence to our argument here, TOM (in its various theoretical formulations) is generally described as a key mechanism underwriting the human capacity to form joint goals leading to cultural forms of life (Tomasello Reference Tomasello2009).

As a generative framework, TOM has been the subject of sometimes fierce and still ongoing debate in cognitive science (Michael et al. Reference Michael, Christensen and Overgaard2014; for a comprehensive review, see Heyes & Frith Reference Heyes and Frith2014). Historically, much of the debate has occurred between three camps that have advanced alternative explanations for the human ability to infer the mental states of others – namely, the theory theory (TT), simulation theory (ST), and embodied cognition (EC) accounts.

Whether one considers the debate settled depends on one's disciplinary and theoretical position. Outside of the field of developmental psychology, which seems to have adopted some arguments from embodied cognition in favour of an enriched TT account, philosophers in the enactivist camp – and, to different extents, anthropologists – still disagree with the mainstream “cognitivist” psychological account of TOM.

Revisiting the TOM debate from the perspective of cognitive and evolutionary anthropology is helpful to contextualise current critiques (e.g., Christensen & Michael Reference Christensen and Michael2016; Michael et al. Reference Michael, Christensen and Overgaard2014). These critiques stress the importance of considering culture-specific, embodied, and shared interactions with the environment, over the manipulation of internal representations about other minds (reviewed in sections 1.2.1–1.2.3). Beyond extending debates in the philosophy of mind, the arguments here will be helpful to anthropologists – who are today, attributable in part to the popularity of the so-called ontological turn (e.g., De Castro Reference De Castro2009), largely committed to anti-cognitivist accounts – and psychologists, who largely fail to consider the extent to which cognition is “collective.”

The basic idea behind TT is that human agents acquire knowledge about the ways in which mental states should be ascribed, which takes the form of a (literal) theory of how minds operate (Carruthers & Smith Reference Carruthers and Smith1996; Gopnik & Wellman Reference Gopnik and Wellman2012). Proponents of TT hold that social coordination and social cognition require the capacity to make inferences about other people's mental states and propositional attitudes as such (i.e., an ability to explicitly formulate to oneself that others also think “silently,” that they may hold beliefs that are true or false, and that there may be a difference between their stated and true intentions, beliefs, or needs – the ability, in other words, to hold a folk theory about other people's minds).

According to a large body of related critiques in the social sciences and phenomenological philosophy, the TT account fails to describe a species-wide mechanism on several counts:

  1. 1. TT is a construct derived from Western contexts and fails to describe universal human mechanisms – we call this the cross-cultural critique.

  2. 2. TT is a dualistic cognitivist construct and thus fails to account for the embodied nature of cognition – we call this the embodiment critique.

  3. 3. TT is committed to a Machiavellian view of the evolution of cognition that fails to account for the cooperative nature of cognition and behaviour – we call this the cooperativity critique.

1.2.1. The cross-cultural critique

For many anthropologists, the TT account reflects a culture-bound, historically specific notion of “mind” and the person that is biased towards individualistic Western folk models popularized by enlightenment philosophers (e.g., Locke's notion of personhood as psychological interiority, Cartesian mind-body dualism, and Kant's notion of phenomenal reality and selfhood). Critics in this camp point out that in many non-Western cultures, folk reasoning about human action does not emphasize individuals’ intentions or mental states (Astuti & Bloch Reference Astuti and Bloch2015; Duranti Reference Duranti2015; Geertz Reference Geertz1973; Keane Reference Keane, Lambek, Das, Fassin and Hau2015; Luhrmann Reference Luhrmann2011; Rosaldo Reference Rosaldo1982).

Instead, actions may be explained in terms of their perlocutionary effects – that is, in terms of their purported consequences according to locally relevant norms, such as “what would upset the ancestors” (Astuti & Bloch Reference Astuti and Bloch2015). Extreme versions of this claim have pointed to ethnographic examples from a group of primarily Melanesian cultures described as having a folk psychology characterized by an “opacity of mind” in which the notion of mental states and psychological interiority is reportedly absent (Ramsey Reference Ramsey2007; Robbins & Rumsey Reference Robbins and Rumsey2008).

Recent reviews of this controversy, however, noted that there is no experimental evidence to verify whether and how Melanesians make inferences about others’ mental states based on others’ behaviour (Robbins et al. Reference Robbins, Cassaniti and Luhrmann2011), while a close reading of the ethnographic record suggests that folk notions of opacity are normative rather than descriptive. This is suggested by ethnographic reports of children being reprimanded for overt curiosity about others’ actions or intentions. On this view, Melanesians are simply taught that they ought not to wonder about what people are thinking (Robbins Reference Robbins2008; Robbins & Rumsey Reference Robbins and Rumsey2008; Rumsey Reference Rumsey2013). Moreover, reports from other Melanesian contexts indicate that it is widely recognized that people “think silently” (e.g., in the context of courtship among the Korowai of New Guinea; Luhrmann Reference Luhrmann2011; Stasch Reference Stasch2009).

Although the current balance of evidence does not support critiques that TT describes a process that is exclusively found in Western cultural contexts, ethnographic studies document wide variation in the ways that people inquire into and talk about others’ states of mind that must be accommodated by any account of TOM.

1.2.2. The embodiment critique

Philosophers and psychologists in the embodied cognition camp have also objected to the TT account on the grounds that understanding others or responding to social cues is characterized by “quick,” “intuitive,” “embodied” responses that need not entail interpretations about other minds or any notion of mental states (Michael et al. Reference Michael, Christensen and Overgaard2014). “Some of these critics of TT have proposed an alternative approach based on the idea that, rather than mobilizing an explicit theory to ascribe mental states to others, human agents use their own experiences and intuitions to understand other human agents through a process of ‘simulation’ - other people's propositional attitudes are on this view ‘simulated’ from one's own mental experience, but are not ‘theorized’ as such” (Goldman Reference Goldman2006). On the view of such simulation theories (ST), TOM abilities involve processes of modelling others’ actions, which may be embodied and automatic (Gallese & Goldman Reference Gallese and Goldman1998). Embodied cognition need not involve anything that looks like a theory because it uses bodily sensorimotor systems to provide analogical models of human motivation, intention, and action (Shapiro Reference Shapiro2010).

Radical enactivist cognitive science takes this emphasis on embodied cognition further to argue that basic cognition does not entail any kind of mental content – particularly not about others’ mental states and propositional attitudes (Hutto & Myin Reference Hutto and Myin2013). In more recent accounts (Hutto & Myin Reference Hutto and Myin2017; Hutto & Satne Reference Hutto and Satne2015), enactivists grant the existence of explicit inferences about others, but only in situations that are developmentally contingent on language. Learning to make explicit ascriptions is then a separate, later, developmentally achieved result of narrative practices (Hutto Reference Hutto2012).

As Heyes and Frith (Reference Heyes and Frith2014) point out, some current accounts have adopted a compromise position, which gives credence to both sides of the debate, through recognizing multiple processes and progressive elaboration over development. In Apperly and Butterfill's (Reference Apperly and Butterfill2009) two-systems model, for example, most social cognition may be largely automatic, while a process akin to TT may underpin specific types of language-dependent inferences. Apperly and Butterfill's account stemmed from a growing consensus in cognitive science – famously exemplified in Daniel Kahneman's Thinking Fast and Slow (Reference Kahneman2011) – that cognition can be divided into two “systems”: one evolutionarily old, innate, implicit, “cheap” automatic system of informational foraging supported by a series of largely social biases, and a developmentally older, evolutionarily young, effortful, relatively inefficient modality of volitional, voluntary reflection. Apperly and Butterfill proposed that the distinction between TT and ST could be cast along this spectrum, with explicit mentalizing about others entailing a situationally specific, relatively rare sort of reflexivity acquired later in developmental.

Others still have proposed a “multi-system,” progressive scaffolding of socio-cognitive inferences ranging from the fully automatic to the effortfully explicit (Michael et al. Reference Michael, Christensen and Overgaard2014). These later “interactionist” models offer a more nuanced and dynamic account of the gradients of inferences, which, rather than being “located” in discrete cognitive systems, likely occur on a continuum of attunement to different statistical regularities. This is a point elaborated on in detail in Hugo Mercier and Dan Sperber's Enigma of Reason (Reference Mercier and Sperber2017), in which they also recast so-called system 2 reflexivity as varieties of automatic inference about other's inferences triggered by communicative cues – actual or imaginary (e.g., in engaging in, or mentally rehearsing, conversation and interaction with others). Crucially, these recent models (two systems, multi-systems, and interactionist) all study the manner in which agents optimize the metabolic cost of cognition by tuning attentional preference to different domains of statistical regularities, emphasizing the function of social and cultural modulations of automaticity. These models, as we argue in section 1.3, lend themselves to a culturally informed free-energy principle (FEP) model.

1.2.3. The cooperativity critique

TOM has played a key role in evolutionary psychology. Early accounts of evolutionary psychology described the evolution of human intelligence and TOM abilities by appealing to the so-called Machiavellian intelligence hypothesis (Dunbar Reference Dunbar2003; Gavrilets & Vose Reference Gavrilets and Vose2006; Pinker Reference Pinker1999; Trivers Reference Trivers2000). On this view, the ability to rightly infer others’ mental states – human mind reading – and propositional attitudes about others’ mental states evolved through a cognitive arms race between cheaters (who need to understand others so as to deceive them) and cheater detectors (who need to understand others to detect deception).

In contrast, scholars in the mutualist camp (Henrich Reference Henrich2015; Tomasello Reference Tomasello2014) contend that individual human fitness is best maximized by cooperation with others, leading to an evolved preference for promoting group fitness through the cooperative division of labour. Such cooperation requires knowledge of others’ states of mind or intentions. In support of these views, natural pedagogy (Csibra & Gergely Reference Csibra and Gergely2009; Reference Csibra and Gergely2011), interactionist (Mercier & Sperber Reference Mercier and Sperber2017), and other cultural intelligence paradigms have emphasized the evolved propensity for a non-Machiavellian, cooperative division of cognitive labour, in which mind reading evolved for the purpose of outsourcing contextually relevant information to specific others from our in-groups and to leverage knowledge, skills, and attitudes from a cumulative cultural repertoire. In more radical versions of mutualist models, such as Hrdy's cooperative breeding hypothesis (Burkart et al. Reference Burkart, Hrdy and Van Schaik2009; Hrdy Reference Hrdy2011), mind reading is thought to have evolved in the pre–Homo sapiens lineage as a result of a “cuteness and care” arms race, because selection favoured individuals who were, at once, good caregivers and good at eliciting care from others.

Heyes and Frith (Reference Heyes and Frith2014) have proposed an account of the cultural co-evolutionary elaboration of TOM abilities, suggesting that the internalist, brain-centred accounts provided by proponents of TT and ST need to be augmented by an account of how cultural evolution and cultural inheritance sculpt an innate mind reading “start-up kit,” in ways that are analogous to how cultural practices of reading harnessed an evolutionarily older linguistic “start-up kit” (Dehaene & Cohen Reference Dehaene and Cohen2007).

The extent to which the evolution of perspective-taking abilities requires mental content about other minds is still hotly debated. In the mind-shaping hypothesis (Mameli Reference Mameli2001; Zawidzki Reference Zawidzki2008; Reference Zawidzki2013), for example, mind reading likely emerges from an evolutionarily older and developmentally earlier capacity to imitate, learn, teach, and directly influence others. Nevertheless, current work suggests that the ability to engage with others as agents with interior states and intentions is central to the cooperative forms of social life we call “culture.”

1.3. Piecing together the puzzle of implicit learning: A new portrait of TOM

1.3.1. Conceptualization

The cultural, embodiment, and cooperative critiques of TOM emphasize either internal cognitive processes of theory building or simulation or external, social-cultural processes of interaction and cooperation. Clearly, these are differences in emphasis, and a more complete picture must show how they fit together.

In this article, we complete this picture by proposing a model of implicit cultural learning that we call “thinking through other minds” (TTOM). In recognizing the virtues (and limitations) of both internalist and externalist accounts, the TTOM model proposes a resolution of the dialectic – and false dichotomy – between so-called internalist (TT and ST) and externalist (mutualist, interactionist, and cultural evolutionist) positions.

TTOM integrates a number of recent approaches to the study of cognition – in particular, the cultural intelligence hypothesis in evolutionary anthropology (Boyer Reference Boyer2018; Henrich Reference Henrich2015; Tomasello Reference Tomasello2014), the niche construction perspective in evolutionary biology (Laland et al. Reference Laland, Uller, Feldman, Sterelny, Müller, Moczek, Jablonka and Odling-Smee2015; Odling-Smee et al. Reference Odling-Smee, Laland and Feldman2003), the interactionist approach to the evolution of reasoning in cognitive science (Mercier & Sperber Reference Mercier and Sperber2017), and the sociocultural enactivist approach to mind reading (Fabry Reference Fabry2018; Gallagher Reference Gallagher2017; Gallagher & Allen Reference Gallagher and Allen2018; Hutto Reference Hutto2012; Hutto et al. Reference Hutto, Kirchhoff and Myin2014).

1.3.2. What the variational model affords

At a formal level, we integrate these approaches within the framework of the variational free-energy principle (FEP; Friston Reference Friston2005; Reference Friston2010) in theoretical neuroscience and biology. Framing this integration in terms of the FEP allows us to derive, from first principles, an interactional model that can explain the acquisition, production, and stabilization of cultural expectations (Friston Reference Friston2013; Friston & Stephan Reference Friston and Stephan2007; Ramstead et al. Reference Ramstead, Badcock and Friston2018). See Box 1.

Box 1. The formal structure of the FEP model adds significantly to the general approach we outline in this article in two ways.

  1. 1. Conceptually, the FEP provides us with an explanation from first principles of the processes involved in, and the adaptive value of, implicit cultural learning and mind-reading abilities. It gives us a formal grip on the underlying dynamics of these two phenomena (for a schematic overview, see Figs. 1–4 and the mathematical appendix). The main challenge confronting TTOM is that of making sense of the dynamics involved when agents learn domains of socially relevant expectations – that are involved in the acquisition of culture – and how these domains are scaffolded from joint intentionality, basic perspective-taking abilities, and evolved attentional dispositions for learning from and through others. These domains are internal (e.g., neural scale) and external (environmental scale) to individual agents. Without a formal apparatus, it is difficult to make sense of these multiscale learning dynamics or to examine how they interact. We employ the FEP to formulate TTOM for the simple reason that it is, to our knowledge, the only theory that has produced formal models (supported by computer simulations) of many of the cognitive mechanisms involved in the learning dynamics of TTOM, including, for example, action, perception, learning and attention (Friston et al. Reference Friston, FitzGerald, Rigoli, Schwartenbeck, O'Doherty and Pezzulo2016), visual foraging (Mirza et al. Reference Mirza, Adams, Mathys and Friston2016), communication (Friston & Frith Reference Friston and Frith2015b), decision making (Friston et al. Reference Friston, Schwartenbeck, FitzGerald, Moutoussis, Behrens and Dolan2014a), planning and navigation (Kaplan & Friston Reference Kaplan and Friston2018), emotions (Joffily & Coricelli Reference Joffily and Coricelli2013), curiosity and insights (Friston et al. Reference Friston, Lin, Frith, Pezzulo, Hobson and Ondobaka2017b), and niche construction (Bruineberg et al. Reference Bruineberg, Rietveld, Parr, van Maanen and Friston2018b; Constant et al. Reference Constant, Ramstead, Veissière, Campbell and Friston2018b).

  2. 2. Empirically, the FEP offers a set of equations that can be used to develop computational models of data acquired in studies of social interaction, in which implicit cultural learning and mind reading are at play. These models can then be used to identify new dynamics and make predictions that can, in turn, be tested in real-world situations. The scope of the current argument is limited to discussing the theoretical relevance of the FEP. That said, we can indicate candidate tasks to produce data amenable to FEP modelling. Notably, the different variants of two-person psychophysiology in social interaction studies (e.g., Bolis & Schilbach Reference Bolis and Schilbach2018a; Bolis et al. Reference Bolis, Balsters, Wenderoth, Becchio and Schilbach2017; Schilbach Reference Schilbach2016; Timmerman et al. Reference Timmermans, Schilbach, Pasquali and Cleeremans2012; von der Lühe et al. Reference von der Lühe, Manera, Barisic, Becchio, Vogeley and Schilbach2016) are target modelling candidates, as they already rely on core principles of active inference and involve the manipulation of what we call “epistemic resources.”

We will argue from the formal perspective of embodied (i.e., active) inference, which rests upon our species’ remarkable capacity to infer or assign conspecifics to some pragmatic (i.e., prosocial) categories. A successful inference about the “sort of person you are” enables a host of conditional inferences, many of which have a direct bearing on “how I should behave.” This is particularly true if I infer that “you are like me.” We will unpack this view with a special focus on epistemic action, via the selective patterning of salience and attention – and how this is mediated via cultural affordances. We hope to show that these epistemic resources arise naturally from cultural niche construction when, and only when, I share an environment with other “creatures like me.”

The formalism of the FEP allows us to take further steps towards operationalizing the process of implicit cultural learning and mind reading that we describe as thinking through other minds (see Box 2). In brief, the set of equations that model the process of TTOM could be implemented in computational models, to study simulations of, for example, psychophysical, neuronal, and behavioural measurements of the processes involved in a mind-reading or cultural learning task.

Box 2. Glossary of key terms.

  • Active inference: Active inference is the process whereby organisms learn the statistical structure of their environment through the selective sampling of predicted or expected sensory information (also known as action), based on perceptual inferences about the cause of the sensory input (also known as perception). The process of active inference realises the free-energy principle. In active inference, everything that can change does change to minimize variational free energy, which is a statistical measure of the mismatch between organism and environment. This mandates actions that minimize expected free energy following an action – namely, actions that resolve uncertainty.

  • Affordance: Generally speaking, possibilities for engagement with an ecological niche that are defined in interactional terms, as a relation between features of organisms’ environment and their own abilities.

  • Attentional salience: The degree to which uncertainty is reduced under a particular course of action. Mathematically, salience is known as expected Bayesian surprise, information gain, intrinsic motivation, and epistemic value. Salience underwrites epistemic affordance.

  • Attentional selection: Calibration or weighting of the precision (inverse variance) of sensory evidence, or prior beliefs.

  • Conventional affordance: Affordances that agents can engage by skilfully leveraging explicit or implicit expectations, norms, conventions, and cooperative social practices.

  • Cultural affordance: The kind of affordance that characterizes the human niche. Cultural affordances depend on shared expectations that are acquired over development (i.e., through enculturation and social learning). Cultural affordances come in two flavours, which form a spectrum from the more innately specified to the more learning dependent: natural and conventional affordances.

  • Epistemic affordance: One of the two components of expected free energy that determine action selection. Epistemic affordance quantifies the extent to which a particular way of actively sampling the world reduces uncertainty about the state of the world or its statistical regularities.

  • Epistemic authority: A symbol, person, cue, or feature of the environment (usually associated with prestige, status, and group affiliation) that signals salient, high-quality, uncertainty-reducing information in a given cultural context, and as such possess the “power” to guide attention, enhance credibility, and prescribe action (e.g., biomedicine and neuroscience possess high epistemic authority in current culture; the Guardian newspaper possesses high epistemic authority for liberals, as does Fox News for conservatives).

  • Epistemic foraging: The agent's uncertainty-resolving behaviour. Epistemic foraging disambiguates Bayesian beliefs about a situation in order to be better poised to exploit the pragmatic value of action (i.e., value that relates to the sensory preference of the agent).

  • Epistemic resources (also known as cultural affordances): Cues that are encoded in external states of the ecological niche (e.g., material cues and other agents), which guide epistemic foraging and implicit learning of patterned cultural practices.

  • Expectations: Bayesian beliefs and preferences about external states of the world, which are operationalized as probability distributions.

  • Free-energy principle (FEP): A principle of least action derived from information theory. The free-energy principle states the minimal conditions that systems must meet if they are able to endure in a bounded set of states (i.e., if they are endowed with a phenotype).

  • Generative model: A probability distribution or mapping from beliefs about hidden causes to observed consequences (i.e., sensations). Technically, this is the joint probability of a sensory state and a (hidden) state of the world. Under the FEP, the generative model defines free-energy gradients (a function of sensations and predictions under the generative model) and subsequent perception and action.

  • Natural affordance: Affordances that agents can engage by leveraging their innate phenotypical endowments.

  • Niche construction: The process whereby organisms (implicitly and explicitly) modify their ecological niches, such that the states of the environment come to encode relevant aspects of their prior beliefs, which they can leverage “downstream” to optimize their adaptive behaviour and act in contextually appropriate ways. The “Janus face” of active inference.

  • Pragmatic affordances: One of the two components of expected free energy in policy selection. Pragmatic affordance is essentially equivalent to expected utility in economics and quantifies the extent to which an action policy conforms to the prior preferences of the agent (also known as pragmatic or instrumental value).

  • Regimes of attention: Patterned cultural practices whereby members of a group of people acquire and maintain shared expectations that modulate attention, structure salience, and thereby guide action (Fig. 2), as well as the internalized patterns of attention that result from the repeated engagement with such practices (e.g., as a group-specific affordance, it takes a regime of attention for the colour white to signify mourning for Hindus; it also takes a species-wide regime of attention for humans to feel invited by a path in the woods that signals the trace of other humans’ intentions).

  • Salience: Expected information gain under a given action.

  • Surprise: Also known as surprisal or self-information in information theory. This is simply the negative log probability of some state or event.

  • Thinking through other minds (TTOM): The domain of beliefs about statistical regularities (i.e., Bayesian prior beliefs) that are exploited in learning cultural affordances. This domain is primarily situated in the realm of expectations that humans learn to form about other people in the niche – that is, in the realm of folk psychology. TTOM is also the process of engaging others’ expectations and inferences by leveraging this domain.

On the one hand, such simulations would allow researchers to generate hypotheses about mind reading and cultural learning that may be tested with other empirical methods. On the other hand, FEP simulations can be employed to replicate in vivo experiments (e.g., Kiebel & Friston Reference Friston, Tschacher and Bergomi2011; Schwartenbeck & Friston Reference Schwartenbeck and Friston2016). One can then use the model to explore the dynamic consequences of changes in parameters associated with the causal factors that led to the generation of the experimental outcomes that were studied empirically. With this method, one also might identify potential contributors to pathological and healthy responses to the task by manipulating the parameters and generating new simulated psychophysical, neural, and behavioural measurements based on the model that has been fitted with in vivo data (e.g., Cullen et al. Reference Cullen, Davey, Friston and Moran2018).

1.3.3. Outline of the argument

Section 2 of this article introduces the notions of expectations and cultural affordances. We describe shared attention and evolved attentional biases as crucial mechanisms for engaging with and stabilizing sociocultural niches. We describe the selective patterning of salience and attention as the main process behind enculturation, which in turn enables the engagement of human agents with the sets of possible actions (or cultural affordances) that make up their local world (Ramstead et al. Reference Ramstead, Veissière and Kirmayer2016).

Section 3 presents our solution to the puzzle of implicit cultural learning. Human beings acquire the shared habits, norms, and expectations that constitute their culture through their immersive engagement within specific cultural practices, which we call regimes of attention (Veissière Reference Veissière, Raz and Lifshitz2016). Regimes of attention mark off certain contextually adequate actions as especially salient and help agents learn to respond to the norms and resources of their local cultural niche. The most important of these resources are the epistemic resources that indicate salient information deemed relevant and reliable (Bertolotti & Magnani Reference Bertolotti and Magnani2017; Clark Reference Clark2006; Pinker Reference Pinker, Kirby and Christiansen2003; Whiten & Erdal Reference Whiten and Erdal2012).

As we elaborate through the notion of epistemic authority, we show that humans are typically biased towards the source rather than the content of information (Mercier & Sperber Reference Mercier and Sperber2017). As amply documented in the literature on so-called cognitive errors (Kahneman Reference Kahneman2011), this tendency can also direct humans towards low-quality, but otherwise high-fidelity, information, particularly when it can be intuitively associated with social proof and other mechanisms of social influence (Cialdini & Goldstein Reference Cialdini and Goldstein2004). We identify the prestige bias in particular (Henrich & Gil-White Reference Henrich and Gil-White2001) as a central attentional mechanism in the mediation of salience for humans.

The notion of salience understood as expected information gain is a central theme of the FEP (Friston et al. Reference Friston, FitzGerald, Rigoli, Schwartenbeck, O'Doherty and Pezzulo2016; Kaplan & Friston Reference Kaplan and Friston2018; Parr & Friston Reference Parr and Friston2017a; Reference Parr and Friston2017b). Recent FEP-based models of cognition in context cast niche construction behaviour as the process whereby organisms “outsource” the computation of salience to statistical structures of the physical environment. The environmental niche then registers information about salience (what an organism trusts or preferentially attends to for it will lead to information gain).

This information corresponds to epistemic resources of the niche (Bruineberg et al. Reference Bruineberg, Rietveld, Parr, van Maanen and Friston2018b; Constant et al. Reference Constant, Bervoets, Hens and Van de Cruys2018a; Reference Constant, Ramstead, Veissière, Campbell and Friston2018b). Niche construction allows the scaffolding of complex networks of shared expectations encoded across brains, bodies, constructed environments, and other agents, which modulate attention, guide action, and entail the learning of patterned behaviours. Human niches are fundamentally social and cultural – built and constituted by interactions with other people. In the general human niche or any local sub-niche, behaviour is to a large extent culturally patterned. Hence, in addition to (and, as we will argue, often prior to) observable statistical regularities in external states of the world, human behaviour is patterned through expectations about what other people also expect of the world. It is this domain of expectations about salience and the process of leveraging these expectations that we call “thinking through other minds” (TTOM).

The processes that make up TTOM extend from the conventionalized, normative behaviour of encultured individual agents (e.g., stopping at a red traffic light), which only in some cases requires making inferences about agents, to cases that require bona fide inferences about others’ mental states for proper (i.e., situationally appropriate) modes of engagement.

Section 4 of this article shows how TTOM integrates standard TOM approaches to tackle the cultural, embodiment, and cooperative critiques. TTOM argues for a compromise position between internalist, brain-based approaches (e.g., simulation and theory-theory theories), which emphasize the neural machinery in individual human brains that is necessary to read other minds, and externalist approaches (e.g., radical enactive and cultural evolutionary theory). Indeed, one of the main motivations for the FEP is to capture the two-way traffic between the organism and the world, to emphasize both the enactment of shared cultural expectations and norms, and the brain-based cognitive abilities that make such an enactment possible, adaptive, and situationally appropriate. Under the FEP, there is no justification for any strict distinction between dynamics (as emphasized by externalists) and inference (the focus of internalist models).

The conclusion discusses the implications of this model for future research on enculturation and the cultural shaping of cognition in health and illness.

2. Expectations and cultural affordances

In this section, we show that human agents learn most of their expectations through the selective patterning of attention, based on immersive participation in cultural practices. At the outset, we should define what we mean by “expectations.”Footnote 2 We use the term to describe a rich repertoire or spectrum of priors or beliefs that reflect action readiness, which ranges from the fully automatic to the effortfully deliberate. Our concept of expectation describes the patterns of action readiness that modulate and direct the adaptive action of agents; it is therefore very broad in its applicability and ranges from the implicit, embodied expectations that we enact continuously, often without noticing, to the more consciously held, effortful, psychologically contentful expectations that characterize encultured human consciousness.

2.1. The concept of expectation

On the more automatic end of the spectrum, we can speak of expectations when one's stomach prepares a digestive response upon expecting that food is coming from mastication, or when one's hand and arm prepare an adequate muscle response to lift a half-full glass of wine. Each of these processes reflects different kinds or levels of prior engagement of the world, across different timescales, which include evolutionarily old dispositions common to all vertebrates that have been exapted for new uses, as well as distinctive developmental experiences, and learning histories. Together, these elicit physiological, bodily, and emotional orientations towards the possibilities for action available in a specific context. Immersion in cultural contexts, moreover, will structure such low-level expectations through participation in patterned cultural practices (e.g., contextually patterned modes of affect associated with specific kinds of food and drink and ritual contexts of consumption).

Human expectations, thus, are always scaffolded through “levels” (or scales) of evolutionary and developmentally inscribed prior dispositions that come to be modulated by higher-level symbolic conventions (Kirmayer & Ramstead Reference Kirmayer, Ramstead, Durt, Fuchs and Tewes2017). The intuitive distrust of other people symbolically marked as belonging to an out-group, for example, has been shown to recruit evolutionarily old disgust responses (Phillips et al. Reference Phillips, Young, Senior, Brammer, Andrew, Calder, Bullmore, Perrett, Rowland, Williams, Gray and David1997; Rozin et al. Reference Rozin, Haidt and Fincher2009; Tybur et al. Reference Tybur, Lieberman, Kurzban and DeScioli2013). This involves another level of implicit “expectations” in which evolutionarily old threat and poison-detection dispositions are activated by (differently implicit) symbolic conventions or affordances.

At the other end of the spectrum, many of the expectations that guide behaviour are explicitly taught, effortfully learned, and can be reflected upon (e.g., “sit up straight,” “do not fidget in class”). Such expectations, however, are also more difficult to learn, and least likely to become fully patterned. Indeed, one may sit badly most of the time, fidget in class despite my embarrassment, and face disappointment when one's daughter chooses to become an engineer. Later developing forms of explicit inference require abstract thought, formal instruction, and perhaps deliberation to learn; but once the agent is properly enculturated, new practices usually can be figured out without the direct presence or instruction of other agents. The learner learns the meta-cognitive strategy of how to access, offload, and work with conventional forms of presented cultural knowledge (Heyes Reference Heyes2018b). This process, however, will generally entail different modes of indirect social learning, for example, from instructional codes devised by others (such as learning a cooking skill from a written guide or YouTube video).

Examining these processes of acquiring conventional or normative behaviours, social scientists have pointed to the important difference between dogma (official doctrine) and doxa (common belief; Bourdieu Reference Bourdieu1977). The explicit rules and conventions established in dogma (what people know they must do) and reported in everyday speech are poor indicators of the regularities of a culture – and how humans learn cultural behaviour in general. Doxa, in Pierre Bourdieu's famous formulation, refers to all that is taken for granted in any given context or society. For example, in his “dramaturgical” account of social life, sociologist Erving Goffman (Reference Goffman2009) describes the gradients of effort and explicit performance required in the obedience to and enactment of social conventions in everyday life. Goffman notes that in some spaces (such as the home), which are symbolically marked as the “backstage,” people tend to relax their effortful behaviour and ignore or disobey many social rules; they trade off the dogma for the doxa. Nevertheless, their behaviours necessarily draw from the culturally shaped repertoire of normative and conventional forms.

What interests us here is how the doxa of backstage behaviour (indeed most of solitary cognition) is itself already culturally patterned, despite the immediate absence of others’ enforcing gaze (and the foregrounding of inferences we make about what others know and expect in context). A first hint is the fact that human agents are constantly (deliberately or automatically) adjusting what they are doing to what relevant others (e.g., role models or anti-role models, specific or generalized) expect, and expect them to expect, and so on. Much of this is accomplished implicitly (Tomasello et al. Reference Tomasello, Carpenter, Call, Behne and Moll2005), usually through nonverbal communication with gesture, facial expression, posture, and pantomime, but also through language when necessary. Evidence that this kind of expectation does not depend on language comes from the observation that infants as young as 15 months are able to make implicit inferences about others’ mental states (Onishi & Baillargeon Reference Onishi and Baillargeon2005) and actions well before they can formulate explicit statements to this effect (Michael et al. Reference Michael, Christensen and Overgaard2014).

2.2. The concept of affordance

In Gibson's ecological approach to perception (Gibson Reference Gibson1979), things and features of the world are said to afford possibilities for engagement (Chemero Reference Chemero2009; van Dijk & Rietveld Reference Van Dijk and Rietveld2017). An affordance is a relation between an agent's abilities and the physical states of its environment. For example, water affords drinking, cups afford drinking out of, bridges afford crossing, axes afford cutting, handles afford holding, and so on. Affordances are defined in terms of physical properties of the thing in the world (e.g., being graspable, being able to support the weight of a person) and in terms of the abilities or expectations of the agent (e.g., knowing how to sit straight). Abilities can be described in terms of the spectrum of expectations with which the agent is endowed (Gibson Reference Gibson1979; Pezzulo & Cisek Reference Pezzulo and Cisek2016; Rietveld & Kiverstein Reference Rietveld and Kiverstein2014; Tschacher & Haken Reference Tschacher and Haken2007). It takes an agent with a mouth, throat, stomach, and so on (to drink); and hands and opposable thumbs (to grasp a cup); and a certain set of skills (hand-eye coordination, for example) to be able to “discover” the relationship of water and cups to the action of drinking.

The relation of affordances to the notion of expectations is a recent extension of the ecological approach that explains perception as conditioned on the beliefs of the agent (Bruineberg & Rietveld Reference Bruineberg and Rietveld2014; Chemero Reference Chemero2009). Hence, affordances are not simply static features of the environment, independent of the presence and engagement of an agent, nor are they states of the cognitive agent alone. Affordances are “invariant variables” or structures of relatedness (Gibson Reference Gibson1979, p. 134). In the case of sensorimotor affordances, for example, they are invariant, in that they are grounded in the physics and geometry of the agent's interaction with the environment, which results in relationships that are highly reliable and stable across time and are ready to be perceived or (re)discovered by the agent; and they are variable, in that they are specified dynamically by the sensorimotor and other cognitive abilities of the agent. In the case of affective affordances and expectations, the stability may reside in the neurobiology of organisms’ learning and memory systems coupled with the persistence of the environmental cues to which particular patterns of recollection and enactment have become linked. The relational space of possibilities between agents and their environments constitutes an ecological niche. Agents and their environments are modified, and become attuned to each other, as the result of their history of co-adaptive interactions (Bruineberg & Rietveld Reference Bruineberg and Rietveld2014; Gibson Reference Gibson1979).

These examples are congruent with work on the evolution and cultural learning of tool use (Stout & Chaminade Reference Stout and Chaminade2007; Stout et al. Reference Stout, Toth, Schick and Chaminade2008), which illustrates the need for humans to learn to hierarchically structure actions with long-term consequences. “Hierarchical” here means that actions are nested within one another, and that complex behaviours require planning a whole chain of nested actions, not just the immediate optimization of current actions or a simple sequence. This kind of executive control of behaviours is characteristic of enculturation, in which complex sequences of action are built out of iterative structures of simpler components strung together in ways that reflect the results of collective experiences of trial and error. An individual is therefore able to borrow from and integrate the experimentation and learning of others in the cultural group.

Direct or “natural” affordances in the humanly constructed (“anthropogenic”) environment can be supplemented, modified, or supplanted by “conventional” affordances (Ramstead et al. Reference Ramstead, Veissière and Kirmayer2016), which depend on shared cultural conventions, based on skills learned through immersive social practices. Hence, bodies of water (“naturally”) afford drowning for all humans, and swimming for those with the acquired skills that allow them access to that specific cultural affordance. Mastering swimming, like all cultural affordances and most of what humans do and think, requires immersive participation (Hutto Reference Hutto2012; Roepstorff et al. Reference Roepstorff, Niewöhner and Beck2010), which includes imitation, practice, repetition, and a grasp of norms and conventions. Hence, affordances are contextually sensitive. For example, for the right kind of agent, a formal suit and tie might function as a cue that indicates authority and affords deference; but when additional cues are added (e.g., a napkin draped over the forearm and a silver tray with glasses), the affordances will change whose enculturation enables them to respond appropriately to the cues.

2.3. Learning cultural affordances

How are the affordances of the niche learned? What does it mean to learn to recognize and engage a specific field of affordances? This is a puzzle, because affordance theory tends to collapse basic categories of learning like “knowing how” and “knowing that.” For example, there is no necessary precedence of the knowing “that” a cup is for drinking over the knowing of “how” to drink from a cup, and vice versa. Even in domains where knowing “that” seems to precede knowing “how,” such a distinction does not hold, because knowing “that” is leveraged as a skill interiorized and integrated to normal implicit motor practice – for example, architectural design (Rietveld & Brouwers Reference Rietveld and Brouwers2017) and mathematical thinking (Menary Reference Menary and Menary2010). Put simply, knowing “that” is only knowing “that” when it becomes know “how,” and acquiring know “how” requires interiorizing and embodying know “that.” This circularity can be understood through a process of scaffolding that occurs on multiple temporal scales associated with the cultural co-evolution of particular niches, communities, or traditions; the developmental trajectory of individuals; and the process of learning to engage with new social contexts.

What, then, are the underpinnings of scaffolding? Some anthropologists, like Tim Ingold, have argued that human niches comprise affordances that can be figured out, rediscovered, or rebuilt by human individuals in each generation without the “transmission” of a purportedly separate realm of “cultural representations” (Ingold Reference Ingold and Whitehouse2001). Critics of Ingold (e.g., Howes Reference Howes2011) have pointed out that most of what humans learn over their life spans in order to become proficient at functioning in their local worlds, is learned socially – that is to say, learned primarily from other humans, and not just from what things or situations themselves afford. However, Ingold maintains that many aspects of human life are simply emulated (Hamilton Reference Hamilton2008), “shown,” or “pointed to,” and left to be explored, “figured out,” and experimented with by individual learners (for example, in play).

The main role of others in this kind of social learning is to direct attention rather than to convey specific semantic content (Tomasello Reference Tomasello2014). In effect, social learning involves immersion in local contexts through what we call regimes of attention and imitation that direct human agents to engage differentially in forms of shared intentionality. We have argued that such regimes of attention play a central role in the enculturation of human agents (Ramstead et al. Reference Ramstead, Veissière and Kirmayer2016). Indeed, human beings seem particularly specialized for such forms of social learning (Sterelny Reference Sterelny2012).

Humans mostly learn deictically (in context) and pragmatically by participating in cultural practices and by being immersed in the ways of doing things that characterize a given local culture. Some of this involves following the “tracks” laid down in local environments by others, or following the norms and rules presented through institutions, without engaging with others’ interiority. But many convention-dependent forms of learning require inferences based on prior knowledge about how we expect others to think and behave in specific settings (e.g., adjusting to culturally specific turn-taking rituals in public space; Ramstead et al. Reference Ramstead, Veissière and Kirmayer2016).

The process of learning how to engage cultural affordances to think through other minds likely begins in infancy when we seek or accept guidance from our caregivers, and it further develops through exposure to social hierarchies of prestige, themselves embodied in kinds of high-status agents that can be leveraged as models (Feinman Reference Feinman1982), which are knowledgeable or skilful in-group members, educators, community and religious leaders, celebrities, and imaginative reconstructions of folk or historical personages with high epistemic prestige (e.g., “What would Wittgenstein think of this theory?”). Individual action, in turn, is guided by what agents expect relevant agents to expect of them (“What would mother expect me to do?”).

Others in our social world present us with cultural affordances, as well as solicitations, for action. Engagement with these realizes a specific social niche, context, group, or community. The reliance on social and cultural affordances co-constructed with and maintained by other people makes it important for us to distinguish between those who think like us and those whose thinking is either systematically different from our own or else unfamiliar and, hence, unpredictable – and inherently surprising. This distinction marks off domains of in-group and out-group, with corresponding epistemic authority. Regimes of attention then make the right kinds of social solicitations stand out in context, thereby allowing the learning of socially relevant affordances in a given cultural niche, community, or local world.

2.4. The phylogeny and ontogeny of cultural affordances

In human ontogeny, it is likely that affordances are first learned implicitly, automatically, and with little conscious effort, through imitation, repetition, and rewards. Phylogenetically, the human mind evolved to support a series of adaptive “content biases” (Henrich Reference Henrich2015) for features of the world that possess high intrinsic learnability, and feed-forward potential through teachability and memorability. Fire, edible foods, and simple tools, for example, all have been amply documented as possessing these heuristic properties (Henrich Reference Henrich2015). In the realm of more conventional affordances, compared with other primates, humans are also unusually adept at tracking other agents’ social status and shifts in symbolically assigned prestige through gossip (Dunbar Reference Dunbar2004; Henrich & Gil-White Reference Henrich and Gil-White2001).

Status among social animals generally provides a guide for whom to follow and obey, and from whom or what to learn. As cultural evolutionists have pointed out (Henrich & Gil-White Reference Henrich and Gil-White2001; Mercier & Sperber Reference Mercier and Sperber2017), social status among humans serves a primarily epistemic function. One seeks guides for thought, behaviour, and affect in agents who embody sources of relevant cultural information that are deemed to be of high quality in relevant social contexts (e.g., we learn from professors in the classroom and seek help from good students, or we seek to publish in high-impact journals). Among humans, symbolically conferred prestige has largely replaced sheer physical dominance as a way to find, acquire, and signal status (Henrich Reference Henrich2015). In social context, marks of distinctions (Bourdieu Reference Bourdieu1984) such as styles of dress, forms of speech, and other techniques of the body provide a shortcut that signal an agent's status on the various prestige scales deemed relevant. Gossip, in turn, serves the more fine-grained communicative function of keeping track of an agent's conferred prestige and epistemic status.

The aforementioned mechanisms rely on evolved cognitive biases for cultural transmission that have been hypothesised to serve an information-tracking function (Henrich Reference Henrich2015) – that is, as enabling humans to outsource their decision making to other agents, through patterned interactions with them and the shared places in which they dwell. The physical structure of the environment – including artefacts, practices, and other socially constructed aspects of the ecological niche – embody or encode adaptive, context-relevant cultural information endowed with salience (i.e., as high-quality or “useful” sources of information in context). A dramatic illustration of this is provided by the infamous Milgram experiments (Milgram Reference Milgram1963), which demonstrated the extent to which human agents are ready to outsource their actions to those that symbolically display the right credentials and wield epistemic authority.

Social status serves the epistemic function of locating the person in a locally relevant hierarchy – a process that can also be described in terms of affordances as prestigious agents solicit imitation through such perceived qualities as trustworthiness (Mercier & Sperber Reference Mercier and Sperber2017) and credibility (Henrich Reference Henrich2015). How well or badly agents respond to such affordances – as indexed through gossip (e.g., circulating stories about cheating spouses, embezzling chiefs, or free-riding subordinates) – thus will largely determine the levels of trust that they inspire in others. Furthermore, the hierarchy that locates the person is not only material, but also symbolic, as expressed through historically acquired and socially displayed marks of distinction. This poses a challenge to an account of affordances in terms of immediately present features.

Humans are accustomed to attending to certain people, in certain places for tones of voice, facial expressions, shifts in body posture, and so on, which signal approbation, disapproval, or moral concern and hence convey (in context) normative information (Ignatow Reference Ignatow2009; Williams Reference Williams2011). As we have seen, beyond what they naturally afford, human material environments have additional, symbolically inscribed normative and deontic powers that deeply permeate the way that individuals affectively approach and engage with their niches (Kaufmann & Clément Reference Kaufmann and Clément2014). For example, in the European Middle Ages, children may have been socialized to fear forests as dark and dangerous spaces full of beasts, witches, and evil spirits through folktales and bedtime stories. In contrast, in many hunter-gatherer cultures, like the Aka of Central Africa, children are equipped with cultural knowledge to expect the forest to offer a safe, nurturing space (Hewlett Reference Hewlett1994; Reference Hewlett2017).

The physical environments occupied by various human groups and sub-groups also characterize group-specific affordances (e.g., a neighbourhood or a city; Einarsson & Ziemke Reference Einarsson and Ziemke2017). Consider how a space (e.g., a university or museum) that is symbolically marked with group-general standards of prestige – a space, thus, that has been historically inaccessible to low-status individuals – will afford radically different experiences to high- and low-status individuals depending on how their respective sub-group is valorized in their macro-cultural niche. Pierre Bourdieu's concept of habitus (as the internalization of social norms in techniques of the body) is one way of approaching the varying effects of a sociocultural niche on individuals with different status or position. To expand on Bourdieu's (Reference Bourdieu1977) reflections on the effects of cultural capital on habitus, we note that a similar space can be marked as “welcoming” for some, but as “intimidating” or outright “hostile” to others (e.g., for minority groups). This reflects a related, orthogonal distinction between the familiar (predictable) versus the unfamiliar (unpredictable). From a cultural affordances perspective, being socially marked and positioned at a particular place in a cultural niche enables automatic responses in one's patterns of movement, posture, breathing, and gaze, as well as in neurobiological responses, such as fluctuations in cortisol (Bijleveld et al. Reference Bijleveld, Scheepers and Ellemers2012), oxytocin (Hrdy Reference Hrdy2011; Luo et al. Reference Luo, Li, Ma, Zhang, Rao and Han2015), or testosterone (Cheng et al. Reference Cheng, Tracy, Foulsham, Kingstone and Henrich2013).

The co-existence of habitus or internal physiological dispositions with external features of an adaptive niche points to a crucial feature of affordance theory – namely, that the affordances of the environment and the capacities of an individual are inextricably interwoven, and co-determining. However, developmentally, and in shared social contexts, culture precedes individual action and experience. In a sense, culture confers on the environment latent affordances such that, if one learns the right repertoire of skills (including attentional strategies) from one's forebears (by acquiring specific cultural knowledge and practices), one can “read” the environment in new ways, thereby discovering “new” affordances (that were, in a sense, there all along, insofar as they engaged other or prior skilled actors). Moreover, because one of the functions of cultural affordances is to allow improvisation (and hence the creation of new cultural forms), the affordances of a niche that are being actively engaged are always in the process of discovery, elaboration, and extension. Clarifying the temporal move from group or cooperative affordances to individual ones (and back) is part of explaining developmental enculturation, skill acquisition, and culture production.

So far, we have described regimes of attention and symbolic layering as cultural affordances of the conventional and normative variety. Over the course of human ontogeny, this “conventional” domain of culture eventually becomes superordinate to the natural domain. Past a certain developmental stage, language can be used to install superordinate frames through which subsequent affordances are perceived and engaged (cf. Bengio Reference Bengio, Kowaliw, Bredeche and Doursat2014). This linguistic capacity to leverage affordances can include cooperative behaviours that reflect social norms and cultural forms of life. The statistical regularities exploited in learning cultural affordances, thus, are primarily situated in the realm of expectations that humans learn to form about other people in the niche – that is, in the realm of folk psychology. We call this intersubjective process of engaging others’ expectations and inferences “thinking through other minds.” In the next section, drawing on the FEP, we turn to the question of how cultural affordances can be acquired and maintained to coordinate large cultural groups, through selective patterns of attention and learning.

3. TTOM: Learning cultural affordances under the free-energy principle

3.1. The free-energy principle as applied to individual cognition

To explain cultural affordances and implicit cultural learning, we draw on the variational free-energy principle. The FEP is a mathematical statement of the fact that living systems act to limit the repertoire of physiological (interoceptive) and perceptual (exteroceptive) states in which they can find themselves (Friston Reference Friston2013; Friston et al. Reference Friston, Kilner and Harrison2006) (See Box 1). Although even simple organisms have autoregulatory mechanisms to restrict themselves to a limited number of sensory states (compatible with their survival), humans additionally accomplish this feat by leveraging cognitive functions and socioculturally installed behaviour. For example, if core body temperature drops from its usual 37 degrees Celsius, internal processes of shivering are automatically evoked, and externally oriented actions are initiated to move the agent towards a heat source or to put on a jacket or parka.

This requires the agent to learn about the structure of its environment, which, from the point of view of the brain, is not a small business, because the (skull-bound) brain is secluded from the causal regularities in the environment it seeks to learn (Hohwy Reference Hohwy2013).

The brain only has direct access to the way its sensory states fluctuate (i.e., sensory input), and not the causes of those inputs, which it must learn to guide adaptive action (Clark Reference Clark2013a) – where “adaptive” action solicits familiar, unsurprising (interoceptive and exteroceptive) sensations from the world. The brain overcomes this problematic seclusion by matching the statistical organization of its states to the statistical structure of causal regularities in the world. To do so, the brain needs to re-shape itself, self-organizing so as to expect, and be ready to respond with effective action to, patterned changes in its sensory states that correspond to adaptively relevant changes “out there” in the world (Bruineberg & Rietveld Reference Bruineberg and Rietveld2014). Because action selection and response conforms to such expectations, behaviour can effectively maintain the agent within expected states.

The FEP describes this complex adaptive learning process in terms of variational inference (also called approximate Bayesian inference). Briefly, the idea is that the agent learns a statistical model of sensory causes in the world, called a generative model. This model represents the agent's relation to the environment and enables it to predict how sensory inputs are generated, by modelling their causes (including, crucially, the actions of the agent itself).

The generative model underwrites the agent's perception and action as they unfold over time. The parameters of the generative model encode the beliefs of the agent about its relation to the environment (e.g., when I move my finger to flip the switch, the light goes off). This is realized by neural network dynamics that change over short timescales (reflecting external states of the world) and slower changes in network connectivity that encode parameters that change over longer timescales to reflect the contingencies that underlie the agent's representations of the transitions among the states of the world (e.g., the probability of my finger moving the switch to change its state from “down/off” to “up/on”; Kiebel et al. Reference Kiebel, Daunizeau and Friston2008).

The generative model functions as a point of reference in a cyclical (action-perception) process that allows the organism to engage in active inference. Internal states of the agent (e.g., the states of its brain) encode a recognition density – that is, a probability distribution or Bayesian belief about the current state of affairs and contingencies causing sensory input. This (posterior) belief is encoded by neuronal activity, synaptic efficacy, and connection strength (Friston Reference Friston2010). The mathematical formulation behind the FEP claims that all of these internal brain states change in a way to minimize variational free energy. By construction, the variational free energy is always greater than a quantity known as surprisal, self-information, or, more simply, surprise in information theory. This means that minimizing free energy minimizes surprise, which can be quantified as the negative logarithm of the probability that “a creature like me” would sample “these sensations.”

Crucially, in minimizing free energy, the posterior beliefs encoded by neuronal quantities approximate the true posterior density over the causes of sensations (see Fig. 1 for details). Intuitively, the variational principle of least free energy is just a description of systems (like you and me) that seek out expected sensations. An equivalent and complementary interpretation follows from the fact that surprise is the converse of Bayesian model evidence in statistics. This means that we can understand active inference as gathering sensory evidence for an agent's model of its world – sometimes referred to as self-evidencing.

Figure 1. Self-evidencing and the Bayesian brain. Upper panel: Schematic of the quantities that define an agent and its coupling to the world. These quantities include the internal states of the agent (e.g., a brain) and quantities describing exchange with the world – namely, sensory input and action that changes the way the environment is sampled. The environment is described by equations of motion that specify the dynamics of (hidden) states of the world. Internal states and action both change to minimize free energy or self-information, which is a function of sensory input and a probabilistic belief encoded by the internal states. Lower panel: Alternative expressions for free energy illustrating what its minimization entails. For action, free energy (i.e., self-information) can only be suppressed by increasing the accuracy of sensory data (i.e., selectively sampling data that are predicted). Conversely, optimizing internal states makes the representation an approximate conditional density on the causes of sensory input (by minimizing a Kullback-Leibler divergence between the approximate and true posterior density). This optimization makes the free-energy bound on self-information tighter and enables action to avoid surprising sensations (because the divergence can never be less than zero). When selecting actions that minimize the expected free energy, the expected divergence becomes (negative) epistemic value or salience, whereas the expected surprise becomes (negative) extrinsic value – namely, the expected likelihood that prior preferences will be realized following an action. See the Appendix for a technical explanation – and description of the variables in this figure.

Put another way, this can take the form of seeking expected sensations associated with novelty or danger (e.g., thrill seeking) or, in more maladaptive cases (e.g., depression), of “confirming” the negative valence of one's world through rumination (Badcock et al. Reference Badcock, Davey, Whittle, Allen and Friston2017). As we discuss in section 3.3, accounting for novelty seeking in free-energy minimization is an important contribution of the model. On the face of it, humans seem to find a certain kind of surprise desirable. To understand this mathematically, it is useful to appreciate that expected surprise (i.e., expected free energy) is uncertainty (i.e., entropy). This means that certain acts such as “attending to this” or “looking over there” become attractive if they afford the opportunity to reduce uncertainty. Think of the game of “peek-a-boo” played with infants as a case in point, in which the infant (as learned through repeated practice) attends earnestly in pleasurable anticipation of resolving uncertainty about where her mother will reveal herself. Generally speaking, epistemic affordance of this sort has a positive valence because it entails a reduction of uncertainty, both about states of affairs in the world and “what will happen if I do that.”

In summary, the FEP – as applied to individual cognition – describes the process by which an agent updates its (Bayesian) beliefs, encoded by brain states, to optimize a generative (in the sense that it makes predictions) model of the world. When these beliefs are realized by action upon the world, this process is known as active inference (Friston Reference Friston, Tschacher and Bergomi2011; Friston et al. Reference Friston, FitzGerald, Rigoli, Schwartenbeck and Pezzulo2017a). Active inference involves the coordination of sensorimotor patterns (1) by selectively sampling sensations that minimize expected surprise (i.e., by actions that include orientation, attention, and exploration) and (2) by updating expectations about the most probable causes of sensory inputs (i.e., perception). Perception entails optimizing beliefs about states of the world and learning the parameters of generative models, via Hebbian processes of associative learning (Friston Reference Friston2010).

3.2. Attention and learning

Not all kinds of sensory inputs are equal in their significance or reliability, and therefore, they need to be differentially weighted when updating beliefs via free-energy minimization. For example, interoceptive signals might merely be tracking physiological noise (Feldman Reference Feldman2013; Seth & Friston Reference Seth and Friston2016), or, again, exteroceptive sensory streams can stem from anomalous events that are unlikely to recur. Nevertheless, a priori, any signal can indicate relevant information that is worth accumulating, insofar as it enables an agent to track statistical regularities of the niche. An important aspect of self-evidencing involves updating beliefs about the reliability or precision of sources of information, particularly sensory input. Sensory precision corresponds to the precision of sensory information (e.g., how much confidence or reliability can be afforded auditory input when a rabbit listens out for a fox sneaking in the grass).

Because the agent has to navigate a capricious and context-sensitive environment, it also needs to assess the precision of its own expectations – namely, how far expectations depart from typical beliefs. This corresponds to prior precision (e.g., how much confidence or precision a rabbit should afford its prior beliefs, given its expectations about the presence of foxes in the area at that time of the day). Note the subtle but fundamental difference between expectations or beliefs about the (first-order) causes of sensations and expectations about precision, which constitute (second-order) estimates of statistical context (Hohwy Reference Hohwy2013). In short, precision reflects the reliability of expectations about states of affairs – that is, whether or not sensory evidence or prior beliefs can be trusted (and not what they concern per se).

Using the FEP, we can distinguish two complementary, but computationally distinct, aspects of the folk-psychological concept of “attention” (Parr & Friston Reference Parr and Friston2017a; Reference Parr and Friston2017b; Reference Parr and Friston2019): (1) as the process of directing the organism to selective sampling of the world (through shifting attention, sensory modulation, movement, or exploratory behaviour) such as to resolve uncertainty (i.e., expected surprise)Footnote 3; and (2) as the calibration or weighting of this information as it is gathered to minimize surprise. Both play a crucial role in what follows. Under the FEP, salience is considered the main candidate for the implementation of attentional processes in the first sense – namely, the information gain or resolution of uncertainty afforded by the active sampling of the sensorium. The second sort of attentional selection corresponds to precision weighting (the modulation of belief updating as a function of estimated precision). This attentional process selects certain (neuronal) messages for belief updating through differential selection or modulation (Stephan et al. Reference Stephan, Kasper, Harrison, Daunizeau, den Ouden, Breakspear and Friston2008). In short, salience is an attribute of action, in the sense that a particular way of sampling the world epistemic affordances, whereas attentional selection via precision weighting is an attribute of perception, in the sense of accumulating the right sort of information after it has been sampled.

Figure 2 illustrates the attentional selection of messages using a predictive coding formulation of free-energy minimization. In this formulation, prediction errors are passed upwards through hierarchical connectivity architectures in the brain to update higher-order expectations. In turn, the expectations provide descending predictions to create prediction errors. In this scheme, sensory precision is assigned to prediction errors at the sensory level of the hierarchy, whereas prior precision is assigned to prediction errors at higher levels. This precision weighting is thought to underwrite attentional selection of sensory input and is a crucial aspect of perceptual inference (Feldman & Friston Reference Feldman and Friston2010; Hohwy Reference Hohwy2013). In what follows, we will subsume both sorts of attentional mechanisms under salience, given that overt sampling and covert attentional selection both conform to the same variational principles, under the FEP.

Figure 2. Cultural affordances. A schematic illustration of the looping effects that modulate social learning by human agents through expectations that, in turn, enable their interaction with cultural affordances. The attentional processes of individual agents are modulated by regimes of attention and by the shared expectations, norms, and conventions that characterize their local culture. In this example, the key point is that the yellow arrows effectively bias self-evidencing towards or away from (certain kinds of) sensory evidence – and that the optimal selection (i.e., salience) has to be both learned and learnable in the right sort of cultural context. Adapted from Ramstead et al. (Reference Ramstead, Veissière and Kirmayer2016).

Attentional salience plays a central role in learning to engage with culturally constructed niches, both to select sensory evidence relative to the individual's goals and to identify sources with high reliability. The cultural affordances model proposes that human agents acquire culture by being immersed in specific, culturally patterned practices that modulate salience, which we call “regimes of attention” (Ramstead et al. Reference Ramstead, Veissière and Kirmayer2016; Veissière Reference Veissière, Raz and Lifshitz2016). Most regimes of attention do not involve isolated independent features of the environment, but correlated cues and opportunities for epistemic action that are organized in terms of local, cultural forms of cooperative activity, norms, and practices.

As we will describe in section 3.4, and as shown in Figure 3, these epistemic actions are supported by epistemic resources offered by the local cultural niche. In turn, regimes of attention correspond to the salience or epistemic affordance of sources of cultural information embodied in the epistemic cues of the niche. As shown in Figure 2, through active inference over the local cultural niche, humans can learn the norms and other contingencies that govern their local cultures.

Figure 3. Summary of the variational approach to niche construction. As in Figure 1, internal states and action change to minimize free energy based on sensations and beliefs. Heuristically, one can think of niche construction as the process whereby the agent's action creates a symmetry between internal and external states. The agent changes the statistical structure of the world as it acts on the world. The statistical structure of the world here simply refers to the actual probability of finding some causes of outcomes at a given location in the environment (e.g., the bread being the cause of pleasant smell in the bakery). From the point of view of niche construction, such probability changes as a function of the agent's action and in a way that is consistent with the agent's beliefs. Indeed, a simple consequence of agents acting to optimize action based on beliefs is that the traces produced by agents’ action will tend to be consistent with their beliefs. Another intriguing consequence of this is that, over time, traces in the world will effectively “learn” agents’ beliefs, in the sense that those traces will encode statistical regularities that relate to those beliefs. For example, consider a well-worn path cut through the grass in the park. Such a “desire path” encodes a robust probability that the location of the path in the environment will map onto the probability outcome “being walked on.” The value of that probability mapping increases over time as people wear down the path. This means that changes in the niche mirror changes in agents’ beliefs enacted via action. With the mathematical apparatus of the free-energy principle, one can model “environmental learning” about the agents’ action in the same way that one models “agents’ learning” of the environment's sensory causes. The only twist is that the quantities are inversed (compare blue and green vs. yellow and red boxes). From the point of view of the environment's generative process, actions play the same role as sensations in the agent's generative model (for a detailed mathematical description, see Bruineberg et al. 2018; Constant et al. Reference Constant, Ramstead, Veissière, Campbell and Friston2018b).

Crucially, the configuration of regimes of attention by cultural practices and the ensuing attribution of salience to cultural information is only one of two aspects of cultural learning under active inference. The other aspect is the modulation of salience via the modification of the environmental aspects of the patterned cultural practices (e.g., people and material artefacts). As we will see in section 3.4, this “external” modulation of salience is enabled by mechanisms that we associate with developmental niche construction broadly construed (by analogy to internal mechanisms, such as perception and learning in the brain; Bruineberg et al. Reference Bruineberg, Rietveld, Parr, van Maanen and Friston2018b; Constant et al. Reference Constant, Bervoets, Hens and Van de Cruys2018a; Reference Constant, Ramstead, Veissière, Campbell and Friston2018b). Indeed, most predictions made by human agents result from – and pertain to – interactions with other human agents that co-construct a shared local culture and its niches. Through these niches, this culture furnishes feedback for the neurocognitive processes that serve the cultural patterning of attention (Seligman et al. Reference Seligman, Choudhury, Kirmayer, Chiao, Li, Seligman and Turner2016). As such, it follows that what we call “culture” is an extensive process that recruits elements both within the brain and in the shared cultural world (e.g., constructed places and designed artefacts).

3.3. Novelty, salience, and surprise

One might argue that there is an important design specification issue here; that is, to what patterns is salience or epistemic affordance attached (e.g., specific sensory information, families of similar events, and sources of information)? Any such assignment implies a pre-existing conceptual structure that allows for parsing the flow of information and that imparts some kind of hierarchical organization to available information. Precision and salience estimates are judged against some notion of what is salient (and this cannot just be what is stable over time, because that could result in a small, self-satisficing circle).

Under the FEP, these design specification issues are addressed by assuming that the agent embodies expectations that are established through histories of learning and, ultimately, through natural selection (Badcock Reference Badcock2012; Badcock et al. Reference Badcock, Friston and Ramstead2019; Friston Reference Friston2010). Prior expectations are heritable through genetic, epigenetic, and exogenetic mechanisms (Constant et al. Reference Constant, Ramstead, Veissière, Campbell and Friston2018b). These specify the epistemic value of sensations and, by the same token, the extent to which they should be considered. Priors that are inherited by the agent thus mandate the occupation of a limited repertoire of sensory states with high epistemic value that are revisited again and again (Friston Reference Friston2010; Friston et al. Reference Friston, Rigoli, Ognibene, Mathys, Fitzgerald and Pezzulo2015; Pezzulo & Cisek Reference Pezzulo and Cisek2016), thus giving the impression that the agent maintains its organization (i.e., limits or minimizes the free energy of its phenotypic states with regard to the states in its niche). Our account thus focuses on the conservative nature of human culture – its ability to ensure that certain well-bounded and highly valuable states are frequented.Footnote 4

Conservation is essential to cultural continuity and enculturation, but cultural niches also constantly change through creative innovation and adaptation. This raises the question of how free-energy minimization and dynamical coupling can account for creativity and innovation in social coordination, behaviour patterning, and the organization of sociocultural ensembles. Proponents of the FEP face a similar issue at the level of individual cognition, known as the “dark room problem” (Friston et al. Reference Friston, Thornton and Clark2012a; Kiverstein et al. Reference Kiverstein, Miller and Rietveld2019). The problem is simple: If agents aim to avoid unexpected encounters with their environment, we should expect minimally changing sensory environments like dark rooms and correspondingly monotonous sensations to be the most frequently (re)visited states of an organism. Yet, there are countless examples in every aspect of life (from art and politics to eroticism, contemplation, and drug taking, to name but a few) in which humans seem motivated (or driven) to maximize novelty and evanescent states of being (Veissière Reference Veissière2018). What, then, prompts novelty seeking behaviour at the level of individuals and social ensembles?

The FEP deals with the issue of novelty-seeking behaviour by formalizing action as being in the game of maximizing the epistemic value of action (or epistemic affordance). In essence, free-energy minimizing agents seek to sample the world in the most efficient way possible. Because the information gain (i.e., salience) is the amount of uncertainty resolved, it makes good sense for the agent to selectively sample regions of environment with high uncertainty, which will yield the most informative observations. This relates to the development of artificial curiosity in neurorobotics as a form of intrinsic motivation – so called because the resolution of uncertainty is itself intrinsically valuable and drives exploration (Friston Reference Friston, FitzGerald, Rigoli, Schwartenbeck and Pezzulo2017a; Reference Friston, Lin, Frith, Pezzulo, Hobson and Ondobaka2017b; Oudeyer & Kaplan Reference Oudeyer and Kaplan2007; Schmidhuber Reference Schmidhuber2006).

In effect, agents will act to optimize the epistemic value or affordance of an action before acting on its pragmatic value, which is essentially its expected utility (Friston et al. Reference Friston, Rigoli, Ognibene, Mathys, Fitzgerald and Pezzulo2015; Pezzulo et al. Reference Pezzulo, Cartoni, Rigoli, Pio-Lopez and Friston2016). For example, if one enters a dimly lit kitchen to grab a midnight snack from the pantry, one is more likely to turn the light switch on before heading to the pantry. Turning the light on allows one to get an optimal grip and disambiguate the situation, before one acts on the pragmatic value (i.e., the utility) offered by snack foods. In short, the dark room objection fails because it simply does not take into account the formal description of action under the free-energy principle. In selecting action, an active inference agent (also known as a free-energy minimizing agent) attributes an intrinsic value to the reduction of uncertainty, which entails exploration. Hence, under active inference, policy selection fundamentally is guided by intrinsic, epistemic (belief-based) imperatives. This formally differentiates approaches based on the FEP from non-epistemic (belief-free) formulations, such as reinforcement learning (Cullen et al. Reference Cullen, Davey, Friston and Moran2018).

Intrinsic motivationFootnote 5 and artificial curiosity enable the agent to explore novel, transient, and unexpected regions of the space of policies open to them. This can be an “adaptive” exploration or epistemic foraging, because it allows for the exploration of this space; over longer timescales, the local increase in free energy serves the more general process of reducing free energy (for either the individual or the group, because it prepares the organism for potential changes in adaptive contexts and enlarges the repertoire of responses for the individual or the group). Similarly, cultural diversity allows individuals and groups to explore alternative niches that may provide adaptive advantage in the larger fitness landscape (Bengio Reference Bengio, Kowaliw, Bredeche and Doursat2014).

This can be seen on the temporal scale of human cultural co-evolution. The 7R variant of the DRD4 gene (which encodes the D4 subtype of the dopamine receptor) appears to have become more widespread 50,000 years ago at a time of great migrations and a revolution in hunting technology among early Homo sapiens (Andrews et al. Reference Andrews, Gangestad and Matthews2002; Shelley-Tremblay & Rosén Reference Shelley-Tremblay and Rosén1996; Swanson et al. Reference Swanson, Moyzis, Fossella, Fan and Posner2002). Traits like novelty seeking, creativity, high energy, and willingness to take risks associated with that gene likely conferred adaptive advantages in the environment of our ancestors. These may have become less valuable or even maladaptive later as human niches became safer, more standardized, and more predictable. Indeed, this shift in adaptive value with cultural context is invoked in evolutionary explanations of some forms of behavioural dysfunction, such as attention deficit hyperactivity disorder (Shelley-Tremblay & Rosén Reference Shelley-Tremblay and Rosén1996; Tovo-Rodrigues et al. Reference Tovo-Rodrigues, Rohde, Menezes, Polanczyk, Kieling, Genro, Anselmi and Hutz2013). Of course, even maladaptive (non-optimal) traits may come to be culturally valued or exploited by individuals and communities, perhaps to their own detriment. Only the first of these pathways relates to the normal, adaptive acquisition of culture, which is the main focus of this article. However, both forms of epistemic foraging might contribute to cultural evolution.

3.4. Niche construction and learning

Culturally competent agents must learn regimes of attention across similar kinds of situations. For example, drivers must learn how pedestrians waiting at a red traffic light or crosswalk behave. The norms of pedestrian-vehicle behaviour vary in different cultural contexts. In some local contexts, pedestrians have the right of way and cars must stop, or pedestrians may observe red lights more laxly and attempt to cross against a red light, if the traffic is sparse. Within a given context, individuals’ behaviour may vary. Drivers must learn how to respond quickly in such varying situations. To do this, drivers may internalize different estimates of precision (i.e., rates of variability) for different classes of agents (e.g., children might be more likely to cross the street without warning), and in turn, when travelling, drivers will re-adjust their expectations in light of local cultural variations in official rule obeying (e.g., in a country where people are more likely to jaywalk). In addition to the internal updating of precision estimates, one can think of epistemic affordances as encoded in the social-ecological niche (Constant et al. Reference Constant, Ramstead, Veissière, Campbell and Friston2018b), in the patterned cultural practices that direct the epistemic foraging of agents (Ramstead et al. Reference Ramstead, Veissière and Kirmayer2016), and in the specifically constructed aspects of the material environment (Constant et al. Reference Constant, Bervoets, Hens and Van de Cruys2018a). For example, drivers and pedestrians learn not only how to assess the information afforded by traffic lights, but also how to leverage the traffic light's probable influence on others to improve the quality of their assessment (Constant et al. Reference Constant, Bervoets, Hens and Van de Cruys2018a) – for example, checking that the bus driver can see his red light, before stepping out onto a pedestrian crossing.

Responding to a culturally constructed niche depends on a developmental history of learning to negotiate similar niches (a developmental history that is shared with all conspecifics within the same econiche). In the process of development, however, humans not only respond to niches, but also take part actively in their (re)construction. For example, based on the frequency of traffic accidents at an intersection, the location or timing of traffic lights may be modified by collective action. This (re)construction of the niche occurs in more rudimentary ways constantly throughout the development of individuals and groups in local niches.

From the point of view of the FEP, developmental niche construction can be viewed as the process whereby agents make their niche conform to their expectations (Constant et al. Reference Constant, Bervoets, Hens and Van de Cruys2018a). Developmental niches are the set of exogenetic, physically and behaviourally grounded resources necessary to guide the reproduction of the adaptive life cycle (Stotz Reference Stotz2017; Stotz & Griffiths Reference Stotz, Griffiths, Lewens and Hannon2017). Because actions are guided by salience, and change the physical architecture (and epistemic affordance) of the environment, they tend to make the niche a good statistical “mirror” of the agent's epistemic foraging, functional anatomy, and, ultimately, brain-based expectations (Constant et al. Reference Constant, Ramstead, Veissière, Campbell and Friston2018b; Fig. 3). In short, if we all act successfully to minimize uncertainty, our econiche will become inherently more predictable – if, and only if, epistemic affordances become encultured.

The exploitation of regimes of attention – encoded in the niche – is especially useful to track regularities unfolding over longer timescales of the history of a community, whose variability would be harder to assess over the timescale of an individual's perceptual and procedural learning. In humans, the epistemic affordance offered by niches constitutes epistemic resources that shape learning, and shared cultural practices (Hutto Reference Hutto2012; Roepstorff et al. Reference Roepstorff, Niewöhner and Beck2010), as well as social relationships necessary for cooperative activities like breeding of animals (Burkart et al. Reference Burkart, Hrdy and Van Schaik2009). Many of these epistemic resources involve specific kinds of patterned cultural practice that we associate with regimes of attention (Burkart et al. Reference Burkart, Hrdy and Van Schaik2009; Hutto Reference Hutto2012; Roepstorff et al. Reference Roepstorff, Niewöhner and Beck2010; Veissière Reference Veissière, Raz and Lifshitz2016). These epistemic resources are states of the environment that, when repeatedly engaged by agents, shape their neurally encoded precision and salience expectations and, thereby, direct their future patterns of attention, epistemic foraging and learning, and subsequent patterns of engagement through perception and action. Epistemic resources help agents learn (from others) how to attend to or forage the niche for relevant affordances and how to weigh the cues associated with different affordances. Epistemic resources allow the agent to track and evaluate the relevance of more abstract, temporally extended, stable, and general statistical regularities structuring agent-niche relationships, like conventionalized patterns of interaction shared among multiple agents.

3.5. Learning cultural affordances under the free-energy principle

Epistemic affordances are encoded by – or installed in – the environment, as repeated physical actions leave traces that change the structure of the developmental niche in ways that influence agents’ expectations (e.g., “I can trust that by taking this trail, which other people have also taken, I will end up at the other side of the park”). Over time, these traces of the actions of other people (e.g., traffic signals, dirt paths across a park, and shelters for hikers along a mountain trail) make certain affordances stand out as especially relevant. These are the affordances that yield highly reliable actions (i.e., uncertainty minimizing action, or actions that are expected to guide the agent towards goals or expected states) (see Fig. 4).

Figure 4. Thinking through other minds (see Figs. 1 and 3 for the equations). This figure depicts the loop between action, sensations, and niche construction that lead to the acquisition and production of cultural habits, and to the inference and learning about other minds. The shared epistemic resources in the constructed niche (i.e., external states modified by actions from agents 1 to n) and the regimes of attention (i.e., internal state) constitute the domains of statistical regularities that tune to one another via the physical engagement of the niche. Those domains are finessed (i.e., mutual learning of internal and external states) by a community of practices (agents from 1 to n) over ontogenetic (e.g., over development) and phylogenetic timescales (e.g., via the inheritance of material resources). The learning and deployment of internal and external domains of statistical regularities is what we call “thinking through other minds” (TTOM). TTOM entails, and depends on, the production of culturally patterned practices. Cultural practices and associated artefacts are epistemic resources that guide the attention (and learning) of members in the community by shaping sensory perception.

In many situations, affordances based on the history of human action will be more salient than those that reflect simple optimization (e.g., cutting across a lawn might afford getting to the other side faster, but many people will walk along a winding path, even in the absence of other humans). The well-worn path reflects an implicit consensus among many previous walkers. Individualized expectations guiding behaviour in context may thus be inferred from a continuum of expectations about other agents, ranging from reflective to fully intuitive, and, in turn, from actually present to probable and generalized others. Under the FEP, the dynamics and acquisition of all these expectations by a group of agents are mediated by the very same inference mechanisms.

Developmental niche construction can be cast as an interactional process between agents and a shared environment, producing affordances that support the reproduction of a normative life trajectory, through the norm-guided development of each new member of the community (cf. Constant et al. Reference Constant, Ramstead, Veissière, Campbell and Friston2018b; Fulda Reference Fulda2017). These norms are implicit in the structure of cultural affordances in the specific local niches occupied by individuals at a particular developmental age or stage. Individuals become attuned to the niches they discover or are directed to by others according to their age, gender, and other dimensions of social status.

These niches afford individuals epistemic resources for acquiring specific types of knowledge, skills, or dispositions to respond. In effect, the function of external mechanisms for evaluating epistemic affordances is to enable the emergence and stabilization of epistemic resources. The notion of epistemic resources relates directly to work on how cultural knowledge held by others in the community can reach into the hierarchy of processing at higher levels through linguistic or symbolic communication to install priors directly (Bengio Reference Bengio, Kowaliw, Bredeche and Doursat2014).

Epistemic resources, which underwrite epistemic affordance (either overtly through action selection or covertly through attentional selection; i.e., mental action), are stabilized through niche construction, in the sense that the niche comes to encode the expectations that enable the interaction with those affordances. Epistemic resources act as developmental anchors. In human social contexts, epistemic resources can be viewed as shared expectations and cultural affordances that become available to a group of agents, as expectations that “sediment” in public places, practices, and affordances that are repetitively and reiteratively engaged by groups of agents. This process involves feedback or looping effects and hence is self-reinforcing over time. For example, the grass patch on a street corner solicits cutting across and, over time and in turn, as it is worn down by many walkers, comes to afford a “desire path” (Ingold Reference Ingold2016).

One might ask whether the story should not be told the other way around. It might be that dirt trails allow for cutting across the park, but only later, solicits a “desire path,” as it is only once the agent has acquired the cultural knowledge that the path can be traversed that it can become “desired” as something that the agent wants to engage. Precisely what is at stake here is the virtuous circularity and bootstrapping operative in social learning – which must go from simple to more complicated. On a phenomenological level, what is being challenged is that the world calls to us in specific ways prior to the desires installed by culture – in cutting across the path, the unstated background of desire might have to do with getting somewhere we want to be more quickly, with enjoying transgressing the rule of walking (only) on sidewalks, or simply the aesthetics of walking along a dirt path. Hence, it is not self-evident that one can consider a desire path, or for that matter, any cultural object, as a cultural affordance until some way of engaging the world has been acquired.

Affordances have been proposed to explain how skilled agents manage to engage their environment without having to know how their environment “works” (i.e., to employ learnt representations or to acquire representational contents). The variational approach furthers this line of thinking by distinguishing mathematically action that is selected by the agent and the affordance of action for the agent. In effect, the FEP allows us to formulate a principle of most affordance – that is, a version of the principle of least action from physics, applied to the adaptive behaviour of groups of organisms living together in a niche (Ramstead et al. Reference Ramstead, Constant, Badcock and Friston2019a). The action with the most affordance, the one that solicits the organism most (i.e., the one associated to the least expected free energy), is the one that ends up selected by the organism.

The cultural affordances framework suggests that acquiring the ability to leverage conventionalized affordances means acquiring a regime of attention. The regime of attention is not some specific content that one learns, but a mode of attending to and actively sampling the world, through a generative process that involves (overt) motor behaviour and the (covert) tuning of neural gating via expectations about precision, as well as culturally patterned search strategies for salient information, which are “shared” to some extent by all individuals of a local culture.

The idea behind the desire path as a cultural affordance relies on and extends the notion of regime of attention by highlighting that epistemic affordances depend not only on the brain, but also on features of the environment (see Fig. 2).Footnote 6 The desire path, as a cultural affordance, enables skilful pre-reflective engagement. This can often happen without the agent having to know the content of the specific artefact from the start. For example, I might be late for my train, and following that trajectory through the park might be a good solution to catch my train on time. In that scenario, there is probably very little content involved with about where exactly the path will lead. Rather, there is (1) an expectation on the part of the agent, (2) a solicitation on the part of the environment, and between those, (3) an embodied history of agent-niche interactions (i.e., the traces left by repeated actions), which increases the likelihood of the path leading to a commonly experienced goal (e.g., the other side of the park). This history of cycles of expectation, solicitation, and action, encoded in cultural affordances, supports individuals’ intuitive, culturally meaningful response to environmental cues. Under the TTOM model, when individual agents do not know quite what is situationally appropriate, their behaviour switches to epistemic foraging, in which agents will preferentially sample whatever other, relevant agents sample as well.

A large part of the social learning enabled by the developmental niche is mediated by shared attention (Tomasello Reference Tomasello2014). For example, once a path is worn in the grass, implicit shared attention and expectations that others also intended to do so will prompt followers to walk along the path. This will hold even for paths that are not otherwise efficient, even if a less costly path is available – and, in some instances, this holds even for paths with uncertain trajectories or end points. Of course, most of the traces of human activity are not paths on grass, but the affordances provided by institutions, archives, and repositories of knowledge, plans, and protocols. Regimes of attention provide ways to locate, attend to, and engage these affordances in a wide variety of structured cooperative activities (Malafouris Reference Malafouris2015).

3.6. Why human thinking is always already thinking through other minds

Homo sapiens evolved to rely on bodies of accumulated cultural knowledge and skills for survival (Henrich Reference Henrich2015; Sterelny Reference Sterelny2012; Tomasello Reference Tomasello2014). We shape each other's learning through specifically adapted cultural practices (regimes of attention) that allow individuals to enact recursively nested forms of intentionality. This includes the capacity to view ourselves through the eyes of another in a kind of reciprocal aboutness (e.g., “What would Mother expect me to do?”) After childhood, typically, these ways of thinking about oneself are internalized, encoded, and expressed as “What should I do?” or “What am I expected to do?” Recent research on mind wandering suggests that most of our spontaneous mental life is dedicated to rehearsing social scenarios (Poerio & Smallwood Reference Poerio and Smallwood2016). In their recent “interactionist” account of the evolution of human reasoning, Mercier and Sperber (Reference Mercier and Sperber2017) review a wealth of experimental evidence to support the claim that humans best solve problems and optimize individual intelligence collectively in dialogical and argumentative contexts, which may extend to hypothetical, “silent” scenarios. Although no large-scale evidence is available on what so-called “silent reasoning” entails in individual human heads, Sperber and Mercier conjecture that most silent reflective ideas are generated through the rehearsal of arguments with, and justifications to, others. Even solitary thinking, on this view, is a rehearsal for bona fide social interactions with peers.

Recent work in the philosophy of psychiatry also supports the hypothesis that solitary human cognition is social through and through. In their cultural and evolutionary account of the origins of psychosis, for example, Gold and Gold (Reference Gold and Gold2015) propose that the many kinds of delusions described in the literature on psychopathology (i.e., persecutory, grandiose, erotomanic, control, thought, somatic, nihilistic, reference, guilt, and misidentification) share one broad, overarching theme: a concern with one's relationship to other people. Hence, all known delusions can be recast as statistically improbable interpretations of, and expectations about, one's experiences in relation to others.

For a species such as Homo sapiens that evolved to rely upon cooperative and highly elaborate coordinated action, expectations about folk psychology (or probabilistic inferences about the way other people think and reason and what they expect of the world) are at least as important as, if not more important than, expectations about statistical regularities that characterize the physical world. In other words, in a world populated by creatures “like me,” most of my expectations call on the prior belief that “I am like you and you are like me – and you believe that I am like you and you are like me” and so on. In effect, the world of human experience is always already mediated by, and filtered through, the “lens” of expectations about another's expectations.

The expectations that Homo sapiens have leveraged most over their phylogenetic history involve the capacity to outsource cognition to relevant others (people, artefacts, practices, and institutions). In other words, human beings outsource to other humans many of the evaluations of salience that they employ in their engagement with their worlds, which allows others to perform culturally relevant tasks (Tomasello Reference Tomasello2014). Indeed, it is precisely these evaluations by others that make worlds “meaningful” for humans. To exploit this cooperative cognitive task sharing, humans agents explicitly and implicitly bestow trust and assign authority to others – both individuals and institutions – acquiescing to and leveraging cues (physical, culturally meaningful signs) associated with reliability, authority, and prestige (Henrich Reference Henrich2015).

What distinguishes between different human phenotypes is the priors under which they are operating, and which guide adaptive behaviour. If we consider the dynamics of human TOM abilities in this light, the process of TTOM consists in inferring the priors or expectations that guide the beliefs of another agent or group of agents. Provided that agents can solve the inference problem about the sort of person that their interlocutors are, and provided that they have a model of their conspecifics’ prior beliefs, then any one agent can leverage their own action (policy) selection mechanisms under the prior beliefs of their fellows to infer the mental states of their fellows (and, indeed, their own mental states).

Epistemics get into the game when this inference is made more difficult by a lack of shared priors. Hence, the cues that emerge from niche construction can be nonspecific cues that tell agents about what is situationally appropriate to do (but which could be done in a solitary way, such as stopping at a red traffic light) or very particular cues that provide information about the priors of other agents, which coincides with mind reading and properly thinking through other minds (e.g., I have a prior about you having a prior about me stopping at the red light and crossing at the green light – and, hence, that you will not run me over). The process of inference is made easier by the availability of cues (that shape regimes of attention) that tell agents “where to look” (i.e., that allow one to leverage where others are looking to determine where oneself should look). For example, if I do not know when to cross at the intersection because I am not familiar with the colours used by the traffic light system, I can guide my action by relying on epistemic cues that have been shaped by (presumably adaptive) cultural practices such as the ways people around me act in context (e.g., other agents’ behaviour or gaze patterns).

The TTOM model accounts for the ways in which human agents outsource their policy selection to relevant others and to aspects of their material niche. In this sense, our model covers cases of cultural cognition that range from the lone encultured agent acting in conformity with the cultural norms that they have internalized – which involves inferences only indirectly about and through other minds – to full-blown cultural engagement with other human agents, which requires (implicit and explicit) inferences about the minds of other humans. Given the nature of their inferential systems and the way they learn generative models according to TTOM, inferences about my own generative model can be leveraged, and, in effect, is always being leveraged, to make inferences about others like me. Inference about one's own mind is always mediated and made possible by inferences about the minds of others.

4. Addressing TOM critiques with TTOM

According to TTOM, human agents organize most of their behaviour as a function of what they can infer from other human minds. Humans find guides for action by picking up on statistical regularities in the realm of folk psychology, which identifies the most relevant states of the external world, as well as the most relevant sources of inferences about the shared social world. Our framework recognizes the contribution of the varied approaches to human TOM abilities outlined in the first section and offers a compromise position.

4.1. Response to the cross-cultural critique: TTOM is universal for Homo sapiens, but realized through cultural niches

We agree that folk notions of personhood vary across culture and likely exercise specific constraints on automatic perception and social coordination through normative social learning (e.g., McGeer Reference McGeer, Hutto and Ratcliffe2007). Although folk notions of the locus of personhood and agency vary broadly between groups and historical periods (e.g., to include a soul, brain-mind, heart-mind, or external agencies like gods, ancestors, or spirits), we question the extent to which communication and coordination would be possible without a species-wide intuitive notion of propositional psychological interiority (which may be postulated and enriched in different ways culturally).

The example of “silent thinking” during courtship, reported from ethnographers of the Korowai (Stasch Reference Stasch2009), is telling. In everyday human experience, affectively charged situations such as “I wonder if she really likes me” abound and likely emerge in infancy without recourse to language or explicit mentalizing, as humans form mental models of other agents in their life. Indeed, developmental psychologists have shown that 15-month-old infants are able to take into account the false beliefs of other agents (Onishi & Baillargeon Reference Onishi and Baillargeon2005) and that the ability to attribute goals to any entity (living or not) that appears to be animate emerges as early as 5 months (Luo & Baillargeon Reference Luo and Baillargeon2005; see Mahajan & Woodward Reference Mahajan and Woodward2009 for different results).

Additional cross-cultural and developmental findings support the view that intuitive dualism (Jack Reference Jack2014), or the folk tendency to situate personhood in an intangible psychological interior, is likely a cross-cultural universal that does not require specific cultural immersion in Cartesian cultures (Chudek et al. Reference Chudek, McNamara, Burch, Bloom and Henrich2013). As Paul Bloom (Reference Bloom2005) has argued, children across cultures can readily understand a story about a prince becoming a frog without explicit enculturation into folk Cartesianism.

As we argue in Section 4.2, TTOM makes no ontological claims about mind-body dualism; we simply point out from experimental and ethnographic evidence that coordinated action in human sociality does rest on the universal human cognitive capacity to understand others as having goals, beliefs, desires, and intentions that may be different from their stated ones (what we call “propositional psychological interiority”). At the core of this cognitive capacity is the process of active inference mediated by processes of developmental and selective niche construction, which, in humans, scaffold complex sets of prior beliefs encoded in sites across the brain-body-environment-others system. Hence, “mind reading” sometimes requires explicit deliberation (something resembling “theory theory”) and at other times can be automatically intuited through simulation (in forms of embodied and extended cognition).

4.2. Response to the embodiment critique: TTOM is grounded in the bodies of self and others

Anxieties around dualism in current cognitive science reflect a common confusion between normative and descriptive commitments on the part of philosophers and cognitive scientists. Although dualism as a scientific description of the relation between the mind and body is mistaken, it does not follow that our theorizing about other minds should not consider folk dualist thinking as a normative and very real phenomenon that shapes every day and scientific thinking. As an illustration, even psychiatrists who espouse an integrative, monistic view of mind and body employ a naive dualism in assessing vignettes of problematic behaviour as indicating either deliberate action (rooted in individual psychology and, hence, blameworthy) or as accidental, because of malfunctioning biology of the brain (Miresco & Kirmayer Reference Miresco and Kirmayer2006) – as though these two causes were grounded in distinct mental and bodily processes. Our best theories about folk social cognition ought to reflect that dualism, on pain of descriptive inadequacy.

TTOM, to be sure, does not make ontological claims about the nature of mind as separate from the body. We simply offer that, as a matter of universal human epistemology, patterned cultural practice involves an ability to make inferences from, through, and about other minds, as propositional processes – indeed as inferential processes. In some cases, folk theorizing about dualism may simply be a useful tool to both generate and inquire on such practices (e.g., through dialogues in clinical setting). TTOM formalizes the inferential structure of such folk theorizing.

The ability to infer each other's expectations, which makes human cognition, sociality, and culture possible at all, ranges from the fully explicit to the fully automatic depending on the situation. In our model, this ability depends on the learning of a spectrum of expectations encoded across the brain-body-environment-others system that underwrites regimes of attentions. The FEP is unique here in its ability to account for inference and dynamics as two sides of the same coin, and this is what allows TTOM to overcome the sharp dichotomy between internalist and externalist approaches to TOM abilities. Under the FEP, all systems dynamics are inferential, and inference is itself dynamics; namely, the dynamics of sentient systems are a gradient flow over free energy (Friston Reference Friston2010; Ramstead et al. Reference Ramstead, Badcock and Friston2018). Because free energy is a measure of the complementarity between the organism and the niche, in terms of a generative model of the relation between them, any dynamics formulated in terms of the FEP are ipso facto inferential dynamics that pertain to the self-organization of information flows in sentient systems.

Rather than describing cultural differences in the folk models (including Western philosophical models!) of social cognition in “either/or” terms (either dualistic or not; focusing on explicit intentions or focusing on resonance in action), we propose to situate these differences on a continuum of hypo-cognition to hyper-cognition of intentions (see Duranti Reference Duranti2015). The notion of hyper- and hypo-cognition has been explored in the context of cultural variations in emotions (Levy Reference Levy1975; Reference Levy, Shweder and LeVine1984). The degree or depth of cognitive elaboration of emotion serves individual and social regulatory functions. As a matter of normative concern, cultures vary in the kinds of emotions people are encouraged to cultivate or suppress, thereby allocating attention, attributing meaning, and patterning behaviour in ways that constitute specific codes of conduct or expression, modes of experience, and folk explanations that account for behaviour.

4.3. Response to the cooperativity critique: TTOM is built on the developmental scaffolding of cooperativity

Shedding light on a cross-cultural continuum of normative commitments to the hyper- and hypo-cognition of intentions may also help resolve the Machiavellian-mutualist debate on the evolution of human cognition. It seems self-evident from the human record that our species is capable of both selfishness and altruism as a matter of individual, situational, and cultural variation – but also that the scaffolding of “altruism” proper clearly follows an evolutionary and developmental trajectory. Tomasello (Reference Tomasello2009), for example, proposed the early Spelke, later Dweck hypothesisFootnote 7 to describe children's gradual immersion into social norms that harness and enhance their natural capacity for adjusting their behaviour to what others expect of them.

Rather than start from a specific commitment to one normative position (e.g., “humans ought to be altruistic”; “humans ought to act in rational self-interest”), our account recognizes these varied possibilities inherent in human behaviour and stresses the importance of specific cultural practices in patterning behaviour to elaborate either side of the selfish-altruistic continuum.

Hrdy herself, as a proponent of the mutualist argument, has stressed the importance of developmental environments, such as collective parenting, in providing rich (or impoverished) opportunities to form bonds and learn to relate with multiple attachment figures – a process she describes as crucial in the development of social cognition, emotional regulation, and empathy (Hrdy Reference Hrdy2011). In Hrdy's account, our “proximity” to the kind of selfish intelligence found among chimpanzees is a matter of ontogenetic contingencies at least as much as evolutionary “distance.” Indeed, the capacity to engage in nuanced, compassionate, other-regarding action is increasingly understood to be dependent on language, explicit teaching, effortful deliberation, and practices and to be distinct from (though perhaps developmentally scaffolded on) the innate capacity to imitate and follow others and favour one's narrow in-group (Bloom Reference Bloom2017).

Contemplative practices of loving-kindness meditation, for example, entail the explicit enrichment and effortful rehearsal of one's mental models of others, which eventually become automatic through practice (Lebois et al., Reference Lebois, Wilson-Mendenhall, Simmons, Feldman Barrett and Barsalouin press; Lutz et al. Reference Lutz, Brefczynski-Lewis, Johnstone and Davidson2008). The linguistic (narrative) elaboration of these models may be essential to their extension to include members of out-groups, the whole of humanity, or even to all sentient beings. These varied examples point to the importance of both implicit and explicit mentalizing mechanisms in the mediation of human cognition and cultural practice.

TTOM supports current mutualist, cultural intelligence, or “dual-inheritance” accounts that emphasize the co-evolution of human cognition and culture (Henrich Reference Henrich2015; Tomasello Reference Tomasello2014). Rather than to discount Machiavellian and other “selfish” accounts of these processes altogether, we suggest that what one might call extended mutualism (i.e., large-scale cooperation), and the ability to leverage a large repertoire of shared expectations to guide group action, arises because of the match between naturally and culturally selected dispositions to acquire cultural abilities (e.g., mind-reading abilities) and inherited developmental conditions enabling the (re)acquisition of these abilities. Selected, or evolutionarily old, dispositions constitute a cultural learning “start-up kit” of sorts (Heyes Reference Heyes2018b; Heyes & Frith Reference Heyes and Frith2014), which includes the kind of neural machinery that underwrites attention and the estimation of salience, leading to the acquisition of shared expectations (see Fig. 2).

At the developmental timescale, inherited cultural practices enable the learning of shared expectations via the patterning of those evolutionarily old dispositions. This emerges via agents’ engagement with epistemic cues that undergo processes of cultural evolution through developmental niche construction activities, which filter what persists across generations as a function of the success of the behaviours they afford (Laland Reference Laland2018; see Fig. 3).

This sets up a cycle of mutual fitting between individual and niche. For example, in a circular fashion, I can trust the learning biases provided by my caregiver – and more specifically, the cues they provide through their gaze direction, pointing, gesturing, and so on, towards salient situations. I am licensed to do this because patterns of offspring-caregiver interaction have been filtered and fine-tuned through gene-culture co-evolutionary processes and developed in specific cultural norms, signs, places, and practices over historical time – all in the service of guiding the learning of salience; that is, to guide the learning of what is adaptive in the local cultural context (e.g., “listen to and copy this high prestige individual because prestigious individuals are typically the ones that have succeeded in the past”). Put another way, one can trust learning biases because biases indicate action policies selected by other agents “like me,” so these must have been the most adaptive for creatures “like me.”

On our account, cognition and culture are largely synonymous for humans, as both are predicated on the capacity for shared expectations. Priors leveraged and finessed through active inference, and the folk psychology they specify (i.e., what we expect others also to expect), constitute the central domain of statistical regularities that ground humans’ models of their world. This domain of statistical regularities that we call TTOM specifies the mechanistic processes that drives the implicit acquisition of culture over development.

5. Concluding remarks: The future of TTOM

5.1. Future research

We have argued that the pervasive influence of culture, through widespread shared expectations, institutions, and practices, can be cast as a process of co-constructing and responding to a shared set of affordances. Human engagement with cultural affordances is enabled by (often implicit, recursively nested) expectations about other relevant agents’ expectations. These expectations are acquired by agents through immersive participation in the practices that define their shared way of life, in a process that gradually takes hold in ontogeny through regimes of attention and niche construction (See Box 2).

The human mind is optimized for outsourcing information to other human minds in order to function in a niche that requires the shared, coordinated pursuit of joint goals. Error and surprise minimization in large-scale social systems hold because individual human minds are coupled to one another in an environment of other minds. This kind of “extended mind” is distinctive to human beings because of the capacities for culture (i.e., regimes of attention, linguistic communication and installation of higher-order priors, multiscale cooperation, declarative memory/historicity, and collective norms and goal setting) that are made possible by the human nervous system (Clark Reference Clark2008; Clark & Chalmers Reference Clark and Chalmers1998; Menary Reference Menary and Menary2010; Sutton Reference Sutton and Menary2010).

If we have been successful in presenting our account, however, from an FEP point of view, it should also be clear that humans think, feel, imagine, and act in ways that are only possible because they are afforded by the niches they inhabit and co-construct, and the cultural practices that make up their shared form of life, and that all serve to enculture human agents (Constant et al. Reference Constant, Bervoets, Hens and Van de Cruys2018a; Reference Constant, Ramstead, Veissière, Campbell and Friston2018b; Ramstead et al. Reference Ramstead, Veissière and Kirmayer2016). Even the collaborative construction of new niches, which allows the exploration of new modes of experience and the improvisation of new forms of cooperative action, depends on the cultural scaffolding of a relatively stable set of shared expectations and regimes of attention through the cognitive tools or gadgets of narrative and metaphor (Heyes Reference Heyes2018b; Lakoff & Johnson Reference Lakoff and Johnson1980) and the social organization that constitutes particular niches or communities.

TTOM is a generic active inference (also known as FEP or variational) account of the acquisition of culture and mind-reading abilities. We have designed TTOM as a guide for the production of testable models in related domains. Although TTOM per se would be difficult to test (because of its generality), one can derive specific integrative models from TTOM to study specific forms of sociocultural dynamics. A good example of a testable model derived from TTOM is the theory of regimes of expectations as applied to the study of social conformity (Constant et al. Reference Constant, Ramstead, Veissière and Friston2019b).

Social conformity refers to the deference to social norms such as that embodied by other agents. From the point of view of social psychology, social conformity is one possible response to social influence of epistemic, trusted others (Asch Reference Asch1956). From the point of view of cultural evolution, in turn, social conformity is viewed as an adaptive social learning strategy in an uncertain environment (Morgan & Laland Reference Morgan and Laland2012).

The theory of regimes of expectations integrates the perspectives of social psychology and cultural evolutionary theory by modelling social conformity as a process that obtains through the intergenerational finessing of environmental cues that guide social learning over development. Social learning that is aided by these cues, in turn, allows the active inference agent to perform action selection in a fast and efficient way in uncertain contexts by leveraging trusted others (either through material cues that stand as culturally signalled proxies for other, relevant or prestigious minds or directly by copying such individuals). These trusted others are defined as “deontic cues” (Constant et al. Reference Constant, Ramstead, Veissière and Friston2019b).

“Deontic cues” in this model are context-specific epistemic resources (as defined by TTOM) that enforce an obligatory response to the context that embeds them (e.g., a red traffic light enforcing stopping behaviour). The theory of regimes of expectations models social conformity as an active inference process of action selection that operates via the estimation of the epistemic, pragmatic, and also “deontic” value of action, which is the type of value learned through the engagement of deontic cues. The deontic value is essentially the value of an action policy specified by the shared beliefs and preferences of a sociocultural group.

In line with the sort of specific models that can be derived from TTOM, the theory of regimes of expectations as applied to the study of social conformity integrates externalist approaches (e.g., cultural evolutionary approach) and internalist ones (e.g., the social psychology approach) by describing the cultural domain of statistical regularities optimized through active inference and governing action selection.

The theory of regimes of expectations as applied to the study of social conformity makes specific predictions that stem from the TTOM model – namely, that (1) social conformity leads to more efficient cognitive processing and policy selection (e.g., as conveyed by psychophysics measurements like reaction time) in the presence of deontic cues (epistemic resources in TTOM terms); (2) conforming actions minimize variational free energy over time more efficiently in social context, because regimes of attention will be optimized for zeroing in on social information conveyed through deontic cues; (3) deontic cues reproduce conformist biases in cross-cultural between-subjects designs but fail in within-subjects designs (i.e., not all deontic cues will elicit social conformity for participants with culturally diverse backgrounds because of the influence of culture-specific regimes of attention).

5.2. Limitations

Because it is based on the FEP, TTOM provides a mathematical formalism that can be used to model the effects of cultural affordances on adaptation to specific kinds of social niches. The model needs to be further elaborated to deal explicitly with the many varieties of cultural learning and regimes of attention. These include the distinctively human functions of narrativity that entail the linguistic and symbolic hierarchical installation of higher-order priors (Bengio Reference Bengio, Kowaliw, Bredeche and Doursat2014). For example, this will include culturally shared expectations about the cause of sensory observations (e.g., the prior belief that “the slap I received on my wrist was caused by my belief that it is permissible to reach for the cookie jar, which motivated my action, which then led to the slap, which indicated it was not”). In this sequence, the slap not only conveys a social norm, but in itself reflects the broader social norm that it is permissible to intervene in childrearing in this fashion – these overarching norms are learned over time within a particular niche and may change, for example, with migration to a new sociocultural context, with serious consequences for how one (mis)reads (culturally conventional or permissible) affordances. In modelling an active inference agent, such structures of high-order priors could capture the potential for reflexivity and self-reference that gives human cultural-linguistic cognition its unique reach (Taylor Reference Taylor2016).

The free-energy minimizing dynamics described previously involve feedback processes that tune organismic expectancies to fit local environmental contexts and therein minimize surprise and uncertainty. Accounts of enculturation tend to suppose stable social contexts, and the FEP assumes a kind of optimization that depends on stability in adaptive contexts, but the reality (especially in the context of cultural interactions and contexts) is often one of constant change. Hence, realistic models of human cognition in context will require taking into account cultural mobility, hybridity, and the cognitive effects of the constantly changing social niches that reflect cultural co-evolution. Ultimately, models based on conservative processes like the FEP model need to address the significance of historicity and contingency in the emergence and evolution of cultural systems.

Among other potential domains of application, our model has implications for psychiatry. One interesting path towards experimental verification builds on recent proposals for a computational psychiatry (Adams et al. Reference Adams, Huys. and Roiser2016; Friston et al. Reference Friston, Stephan, Montague and Dolan2014b; Huys et al. Reference Huys, Maia and Frank2016; Montague et al. Reference Montague, Dolan, Friston and Dayan2012). In brief, computational psychiatry aims to leverage computational techniques in order to better phenotype various psychiatric conditions, such as psychosis (Adams et al. Reference Adams, Huys. and Roiser2016) and autism (Constant et al. Reference Constant, Bervoets, Hens and Van de Cruys2018a). Characterizing individual and group variations in the capacity to leverage TTOM, and the ways in which human agents adapt to their ecological niche, could reveal an important set of dimensions for such diagnostic frameworks. One could, for example, consider individuals who experience inference about the sort of person they and others are in a way markedly different from the neurotypical population (e.g., people with autistic traits). One could recruit participants who score high and low on the autistic spectrum, to test their relative ability to make inferences and predictions about others based on the ability to leverage information about gaze direction, or vary the context in which they deploy such inferences, to study the coupled dynamics between context and cognition that is typical to such individuals (Constant et al. Reference Constant, Bervoets, Hens and Van de Cruys2018a).

Other conditions could be studied in this manner as well, shedding light both on TTOM as a general cognitive architecture and on these specific conditions. Higher rates of schizophrenia and psychosis among migrant populations might also be an excellent lens to approach such phenomena. Indeed, the careful study of such populations highlights the need for an interactional view of how sense of self and functioning may be destabilized by migration – to a new niche that has specific affordances for people of colour (Kirmayer & Gold Reference Kirmayer, Gold, Choudhury and Slaby2011; Kirmayer et al. Reference Kirmayer, Lemelson and Cummings2015). Depression might also be a useful phenomenon to consider, as it is an interactional phenomenon that involves complex inferences about self and other that is aggravated by retreat from the social niche, now perceived as lacking positively valenced affordances and occupied by other minds with intentions that are hard to understand, and which may in turn aggravate the condition itself (Baldwin Reference Baldwin1992; Wang et al. Reference Wang, Wang, Chen, Zhu and Wang2008). This kind of work could inform a formal phenotyping of psychopathology based on the TTOM model.

Finally, although arguing for the applicability of the FEP to the puzzle of the acquisition of cultural practices, knowledge, and grammars, we caution against describing cultural ensembles as autonomous systems that maintain their organization and structural integrity through allostasis and homeostasis (Veissière Reference Veissière2018). Adaptation rests on an ongoing process of predicting events, engaging with the environment, and adjusting expectations in response to feedback from the world (including the body and other creatures). This occurs through constant transactions with the environment, and, in the case of human beings, that environment is fundamentally cultural and social – constructed with, and inhabited by, other people with whom individual agents must cooperate if they are to survive. This cooperation is itself patterned by cultural knowledge, skills, norms, institutions, places, and practices that have their own history and contingency.

Acknowledgments

We thank Paul Badcock, Shaun Gallagher, Casper Hesp, Dan Hutto, Safae Essafi, Michael Kirchhoff, Sander Van de Cruys, Alan Jürgens, Thomas Parr, Ian Robertson, Ryan Smith, Anna Strasser, Auguste Nahas, Erik Rietveld, Jonathan St-Onge, Simon Tremblay, Jared Vasil, Eric White, Julian Xue, and all those present at the Naturally Evolving Minds conference at University of Wollongong (20–23 February 2018) for helpful discussions and comments. We also sincerely thank the editor, Barbara Finlay, and the anonymous reviewers who provided us with valuable feedback.

This research was produced thanks in part to funding from the Canada First Research Excellence Fund, awarded to McGill University for the Healthy Brains for Healthy Lives initiative (S. P. L. Veissière and M. J. D. Ramstead), from a a grant from the Foundation for Psychocultural Research (S. P. L. Veissière ). from an Australian Laureate Fellowship (Ref: FL170100160) (A. Constant) and by a Social Sciences and Humanities Research Council doctoral fellowship (Ref: 752-2019-0065) (AConstant), a Joseph-Armand Bombardier Canada Doctoral Scholarship and a Michael Smith Foreign Study Supplements award from the Social Sciences and Humanities Research Council of Canada (M. J. D. Ramstead), and a Wellcome Principal Research Fellowship (K. J. Friston - Ref: 088130/Z/09/Z).

Appendix

This appendix describes the free-energy principle in terms of a Bayesian mechanics that emerges from the existence of a Markov blanket in a random dynamical system at nonequilibrium steady state. A Markov blanket is a four-way partition of states that define a self-organizing system and its environment (i.e., a system that has self-organized to nonequilibrium steady state). This partition comprises internal and external states {μ, η} that are separated by blanket states b = {s, a}. In turn, blanket states are divided into sensory and active states. In brief, the Markov blanket allows us to talk about internal states representing external states in a probabilistic sense. Heuristically, this means that one can ascribe probabilistic beliefs to internal states, in the sense that they are about something – namely, external states. This interpretation rests upon a variational density over external states that is parameterized by internal states:

(1.1)$$\eqalign{{\bf \mu }\lpar b\rpar &\triangleq \arg \max _\mu p\lpar \mu \vert b\rpar \cr q_{\bf \mu }\lpar \eta \rpar &= p\lpar \eta \vert b\rpar } $$

This variational density arises in virtue of the blanket as follows: If we condition internal and external states on the blanket, then there must exist a most likely internal state for every blanket state. This means that there must be a conditional density over external states conditioned on that blanket state. At nonequilibrium steady state, the flow of internal and active states can be expressed as a gradient flow on the same quantity – namely, the surprisal (i.e., negative log likelihood) of states that comprise the system (Friston Reference Friston2013). We will refer to internal and active states α = {a, μ} as autonomous because they are not influenced by external states:

(1.2)$$\eqalign{\,f_\alpha \lpar s\comma \;\alpha \rpar &= \lpar Q_{\alpha \alpha }-\Gamma _{\alpha \alpha }\rpar \nabla _\alpha \Im \lpar s\comma \;\alpha \rpar \cr \Im \lpar s\comma \;\alpha \rpar &={-}\ln p\lpar s\comma \;\alpha \rpar } $$

These two aspects of a Markov blanket underwrite a Bayesian mechanics, in which we can talk about internal states holding Bayesian beliefs about external states – and autonomous states acting on external states, under those beliefs. We will first look at the underlying formalism in terms of a free-energy lemma and its path integral form that speak to (1) the most likely flow of internal states (i.e., perception) and (2) the trajectory of active states (i.e., action).

Lemma (variational free energy): Given a variational density, $q_{\bf \mu }\lpar \eta \rpar = p\lpar \eta \vert b\rpar $, the most likely path of autonomous states, given sensory states, can be expressed as a gradient flow on a free-energy functional of systemic states, π = {b, μ} = {s, α}:

(1.3)$$\eqalign{{\bf \alpha }\lsqb \tau \rsqb &= \arg \min _{\alpha \lsqb \tau \rsqb }{\cal A}\lpar \alpha \lsqb \tau \rsqb \vert s\lsqb \tau \rsqb \rpar \cr& \Rightarrow \delta _{{\bf \alpha }\lsqb \tau \rsqb }{\cal A}\lpar {\bf \alpha }\lsqb \tau \rsqb \vert s\lsqb \tau \rsqb \rpar = 0 \cr& \Rightarrow \dot{{\bf \alpha }} = \lpar Q_{\alpha \alpha }-\Gamma _{\alpha \alpha }\rpar \nabla _\alpha F\lpar s\comma \;{\bf \alpha }\rpar } $$

This means the most likely path conforms to a variational principle of least action, where variational free energy is an upper bound on surprisal:

(1.4)$$\eqalign{F\lpar \pi \rpar &\triangleq \underbrace{{E_q\lsqb \Im \lpar \eta \comma \;s\comma \;\alpha \rpar \rsqb }}_{{Energy}}-\underbrace{{H\lsqb q_\mu \lpar \eta \rpar \rsqb }}_{{Entropy}} \cr &= \underbrace{{\Im \lpar s\comma \;\alpha \rpar }}_{{Surprisal}} + \underbrace{{D\lsqb q_\mu \lpar \eta \rpar \vert \vert p\lpar \eta \vert s\comma \;\alpha \rpar \rsqb }}_{{Divergence}} \cr &= \underbrace{{E_q\lsqb \Im \lpar s\comma \;\alpha \vert \eta \rpar \rsqb }}_{{Inaccuracy}} + \underbrace{{D\lsqb q_\mu \lpar \eta \rpar \vert \vert p\lpar \eta \rpar \rsqb }}_{{Complexity}} \ge \Im \lpar s\comma \;\alpha \rpar } $$

This functional can be expressed in several forms; namely, an energy minus the entropy of the variational density, which is equivalent to the surprise associated with systemic states (i.e., surprisal) plus the KL (Kullback-Leibler) divergence between the variational and posterior density (i.e., divergence). In turn, this can be decomposed into the negative log likelihood of systemic states (i.e., inaccuracy) and the KL divergence between posterior and prior densities (i.e., complexity).

Proof: The most likely trajectory – that minimizes action – obtains when the random fluctuations about the flow take their most likely value of zero. By equation (1.2), the flow of the most likely autonomous states ${\bf \alpha } = \lcub {\bf a}\comma \;{\bf \mu }\rcub $ can be expressed as a gradient flow on surprisal or, by definition, variational free energy:

(1.5)$$\eqalign{{\bf \alpha }\lsqb \tau \rsqb = \arg \min _{\alpha \lsqb \tau \rsqb }{\cal A}\lpar \alpha \lsqb \tau \rsqb \vert s\lsqb \tau \rsqb \rpar \Rightarrow \cr \dot{{\bf \alpha }} = \lpar Q_{\alpha \alpha }-\Gamma _{\alpha \alpha }\rpar \nabla _\alpha \Im \lpar s\comma \;{\bf \alpha }\rpar \cr = \lpar Q_{\alpha \alpha }-\Gamma _{\alpha \alpha }\rpar \nabla _\alpha F\lpar s\comma \;{\bf \alpha }\rpar } $$

Where, for the most likely internal state, ${\bf \mu }\in {\bf \alpha }$:

(1.6)$$F\lpar s\comma \;{\bf \alpha }\rpar = \Im \lpar s\comma \;{\bf \alpha }\rpar + \underbrace{{D\lsqb q_{\bf \mu }\lpar \eta \rpar \vert \vert p\lpar \eta \vert s\comma \;{\bf \alpha }\rpar \rsqb }}_{{Divergence}} = \Im \lpar s\comma \;{\bf \alpha }\rpar $$

The equivalence between variational free energy and the surprisal of systemic states follows from the definition of the variational density that renders the divergence zero.

Given this stipulative formulation of gradient flows under a Markov blanket, one can now use the path integral formalism to characterize the most likely path of autonomous states from any initial state.

Corollary (path integral formulation): Under some simplifying assumptions, the action of autonomous paths from any initial systemic state is upper bounded by expected free energy:

(1.7)$${\cal A}\lpar \alpha \lsqb \tau \rsqb \vert \pi _0\rpar \le G\lpar \alpha \lsqb \tau \rsqb \rpar $$

Expected free energy is defined as follows:

(1.8)$$\eqalign{G\lpar \alpha \lsqb \tau \rsqb \rpar &\triangleq \underbrace{{E_q\lsqb \Im \lpar \eta \comma \;s\comma \;\alpha _\tau \rpar \rsqb }}_{{Energy}}-\underbrace{{H\lsqb q_\tau \lpar \eta \rpar \rsqb }}_{{Entropy}} \cr &= \underbrace{{E_q\lsqb \Im \lpar s\comma \;\alpha _\tau \rpar }}_{{Expected\ surprisal}} + \underbrace{{D\lsqb q_\tau \lpar \eta \vert s\rpar \vert \vert p\lpar \eta \vert s\comma \;\alpha _\tau \rpar \rsqb \rsqb }}_{{Expected\ divergence}}-\underbrace{{D\lsqb q_\tau \lpar \eta \vert s\rpar \vert \vert q_\tau \lpar \eta \rpar \rsqb }}_{{Information\ gain}} \cr &= \underbrace{{E_q\lsqb \Im \lpar s\comma \;\alpha _\tau \vert \eta \rpar \rsqb }}_{{Ambiguity}} + \underbrace{{D\lsqb q_\tau \lpar \eta \rpar \vert \vert p\lpar \eta \rpar \rsqb }}_{{Risk}} \cr &\ge {\cal A}\lpar \alpha \lsqb \tau \rsqb \vert \pi _0\rpar } $$

The expectation in equation (1.8) is under the predictive density over hidden and sensory states, conditioned upon the initial systemic state and subsequent trajectory of autonomous states:

(1.9)$$q_\tau \lpar s\comma \;\eta \rpar \triangleq p\lpar s\comma \;\eta \comma \;\tau \vert \alpha \lsqb \tau \rsqb \comma \;\pi _0\rpar $$

The expected free energy in equation has been formulated to emphasize the formal correspondence with variational free energy in equation (1.4), where the complexity and accuracy terms become risk (i.e., expected complexity) and ambiguity (i.e., expected inaccuracy).

In summary, variational free energy is an upper bound on the surprisal of systemic states, and expected free energy is an upper bound on the action of autonomous states. On a conceptual note, the role of nonequilibrium steady state takes on a different aspect, depending upon whether the aforementioned variational dynamics are thought of in terms of gradient flows (i.e., the variational free-energy lemma) or as picking out the most likely paths (i.e., the path integral corollary).

From the point of view of a statistician, the gradient flow formulation regards the probability density at nonequilibrium steady state as a generative model – in other words, a probabilistic specification of the sensory impressions of external states hidden behind the Markov blanket. It is this dynamic that licenses an interpretation of self-organization in terms of statistical (i.e., approximate Bayesian) inference.

The picture changes when we consider the path integral formulation. Here, we are picking out trajectories of autonomous states (i.e., active and internal states) that are most likely under the generative model. On this view, the generative model can be regarded as some prior beliefs about the sensory states (and their external causes) that will be encountered in the future. In other words, the generative model prescribes the attracting set that the system will autonomously work towards – by apparently selecting the paths of activity that lead to these attracting states. This enactive perspective makes it look as if the generative model is no longer simply an explanation for sensory samples but a specification of the states to which a system aspires.

Footnotes

1 There are many ways of interpreting this haiku by the modern poet Mayuzumi Madoka. The shift in gaze might be seen as an experience of erotic presence or represent an awakening to sexism and self-estrangement. It also recalls a culture-specific experience of the self as a performance (echoing the Japanese sense of always being on a stage; Heine et al. Reference Heine, Takemoto, Moskalenko, Lasaleta and Henrich2008). At its core, though, the poem powerfully illustrates the fundamentally human affective process of seeing and feeling oneself through the perspectives (and desires) of another.

2 Technically, an expectation corresponds to the average of a probabilistic belief or probability distribution. When the distribution is over (discrete) states of affairs, the expectation corresponds to the likelihood that any given state of affairs is true. Throughout, we will use beliefs in the sense of Bayesian belief updating or belief propagation, which could be either propositional or subpersonal in nature.

3 That is, the act of deploying precision weighting to select sources of sensory evidence, often discussed in terms of mental action.

4 The FEP is a variational principle of least action, like those that describe other systems with conserved quantities – for example, in the Lagrangian formulation of Newtonian mechanics, in which energy and momentum are conserved (Coopersmith Reference Coopersmith2017).

5 Intrinsic motivation is commonly used in developmental robotics to describe the epistemic value that reduces uncertainty (i.e., promotes information gain). In active inference, salience scores the reduction in uncertainty about transient states of the world, whereas novelty scores the reduction in uncertainty about the more stable parameters of a generative model. In short, salience is to inference as novelty is to learning.

6 The epistemic, uncertainty-reducing aspect of this formulation comes to the fore when human agents need to figure out what to do, more so than when agents are simply acting in accordance with the regimes of attention that they have internalized through enculturation.

7 With reference to the works of psychologists, Elizabeth Spelke, who documents infant “core knowledge” in the domains of intuitive physics, intuitive biology, and intuitive psychology, and Carol Dweck (Dweck Reference Dweck2013; Johnson et al. Reference Johnson, Dweck and Chen2007), who emphasizes the role of learning, experience, and rewards from adherence to social norms (Kinzler et al. Reference Kinzler, Dupoux and Spelke2007; Olson & Spelke Reference Olson and Spelke2008; Spelke & Kinzler Reference Spelke and Kinzler2007).

References

Adams, R. A., Huys., Q. J. M. & Roiser, J. P. (2016) Computational psychiatry: Towards a mathematically informed understanding of mental illness. Journal of Neurology, Neurosurgery, and Psychiatry 87(1):5363.Google ScholarPubMed
Andrews, P. W., Gangestad, S. W. & Matthews, D. (2002) Adaptationism–how to carry out an exaptationist program. Behavioral and Brain Sciences 25(4): 489504, discussion 504–53.CrossRefGoogle ScholarPubMed
Apperly, I. A. & Butterfill, S. A. (2009) Do humans have two systems to track beliefs and belief-like states? Psychological Review 116(4):953–70.CrossRefGoogle ScholarPubMed
Asch, S. E. (1956) Studies of independence and conformity: I. A minority of one against a unanimous majority. Psychological Monographs: General and Applied 70(9):170.CrossRefGoogle Scholar
Astuti, R. & Bloch, M. (2015) The causal cognition of wrong doing: Incest, intentionality, and morality. Frontiers in Psychology 6:136.CrossRefGoogle ScholarPubMed
Badcock, P. B. (2012) Evolutionary systems theory: A unifying meta-theory of psychological science. Review of General Psychology 16(1):1023.CrossRefGoogle Scholar
Badcock, P. B., Davey, C. G., Whittle, S., Allen, N. B. & Friston, K. J. (2017) The depressed brain: An evolutionary systems theory. Trends in Cognitive Sciences 21(3):182–94.CrossRefGoogle Scholar
Badcock, P. B., Friston, K. J. & Ramstead, M. J. (2019) The hierarchically mechanistic mind: A free-energy formulation of the human psyche. Physics of Life Reviews 31:104–21.CrossRefGoogle ScholarPubMed
Baldwin, M. W. (1992) Relational schemas and the processing of social information. Psychological Bulletin 112(3):461–84.CrossRefGoogle Scholar
Bengio, Y. (2014) Evolving culture versus local minima. In: Growing adaptive machines: Combining development and learning in artificial neural networks, eds. Kowaliw, T., Bredeche, N. & Doursat, R., pp. 109138. Studies in Computational Intelligence. Springer.Google Scholar
Bertolotti, T. & Magnani, L. (2017) Theoretical considerations on cognitive niche construction. Synthese 194(12):4757–79.CrossRefGoogle Scholar
Berwick, R. C., Chomsky, N. & Piattelli-Palmarini, M. (2013) Poverty of the stimulus stands: Why recent challenges fail. In: Rich Languages from Poor Inputs, eds. Piattelli-Palmarini, M. & Berwick, R. C., pp. 1942. Oxford University Press.Google Scholar
Bijleveld, E., Scheepers, D. & Ellemers, N. (2012) The cortisol response to anticipated intergroup interactions predicts self-reported prejudice. PLoS ONE 7(3):e33681.CrossRefGoogle ScholarPubMed
Bloom, P. (2005) Descartes’ baby: How the science of child development explains what makes us human. Random House.Google Scholar
Bloom, P. (2017) Against empathy: The case for rational compassion. Random House.Google Scholar
Bolis, D., Balsters, J., Wenderoth, N., Becchio, C. & Schilbach, L. (2017) Beyond autism: Introducing the dialectical misattunement hypothesis and a Bayesian account of intersubjectivity. Psychopathology 50(6):355–72. https://doi.org/10.1159/000484353.CrossRefGoogle Scholar
Bolis, D. & Schilbach, L. (2018a) Observing and participating in social interactions: Action perception and action control across the autistic spectrum. Developmental Cognitive Neuroscience 29:168–75. https://doi.org/10.1016/j.dcn.2017.01.009.CrossRefGoogle Scholar
Bourdieu, P. (1977) Equisse D'une Théorie de La Pratique. Cambridge University Press.Google Scholar
Bourdieu, P. (1984) Distinction: A social critique of the judgement of taste. Harvard University Press.Google Scholar
Boyd, R. & Richerson, P. J. (2005) The origin and evolution of cultures. Oxford University Press.Google Scholar
Boyer, P. (2018) Minds make societies: How cognition explains the world humans create. Yale University Press.Google Scholar
Brown, D. E. (2004) Human universals, human nature & human culture. Daedalus 133(4):4754. https://doi.org/10.1162/0011526042365645.CrossRefGoogle Scholar
Bruineberg, J. & Rietveld, E. (2014) Self-organization, free energy minimization, and optimal grip on a field of affordances. Frontiers in Human Neuroscience 8:599.CrossRefGoogle ScholarPubMed
Bruineberg, J., Rietveld, E., Parr, T., van Maanen, L. & Friston, K. J. (2018b) Free-energy minimization in joint agent-environment systems: A niche construction perspective. Journal of Theoretical Biology 455:161–78. https://doi.org/10.1016/j.jtbi.2018.07.002.CrossRefGoogle Scholar
Burkart, J. M., Hrdy, S. B. & Van Schaik, C. P. (2009) Cooperative breeding and human cognitive evolution. Evolutionary Anthropology: Issues, News, and Reviews 18(5):175–86.Google Scholar
Carruthers, P. & Smith, P. K. (1996) Theories of theories of mind. Cambridge University Press.CrossRefGoogle Scholar
Chemero, A. (2009) Radical embodied cognitive science. MIT Press.CrossRefGoogle Scholar
Cheng, J. T., Tracy, J. L., Foulsham, T., Kingstone, A. & Henrich, J. (2013) Two ways to the top: Evidence that dominance and prestige are distinct yet viable avenues to social rank and influence. Journal of Personality and Social Psychology 104(1):103–25.CrossRefGoogle ScholarPubMed
Chomsky, N. (1996) Studies on semantics in generative grammar. Walter de Gruyter.CrossRefGoogle Scholar
Christensen, W. & Michael, J. (2016) From two systems to a multi-systems architecture for mindreading. New Ideas in Psychology 40:4864.CrossRefGoogle Scholar
Chudek, M., McNamara, R., Burch, S., Bloom, P. & Henrich, J. (2013) Developmental and cross-cultural evidence for intuitive dualism. Unpublished manuscript, University of British Columbia.Google Scholar
Cialdini, R. B. & Goldstein, N. J. (2004) Social influence: Compliance and conformity. Annual Review of Psychology 55:591621.CrossRefGoogle ScholarPubMed
Clark, A. (2006) Language, embodiment, and the cognitive niche. Trends in Cognitive Sciences 10(8):370–74.CrossRefGoogle ScholarPubMed
Clark, A. (2008) Supersizing the mind: Embodiment, action, and cognitive extension. Oxford University Press.CrossRefGoogle Scholar
Clark, A. (2013a) Whatever next? Predictive brains, situated agents, and the future of cognitive science. The Behavioral and Brain Sciences 36(3):181204. https://doi.org/10.1017/S0140525X12000477.CrossRefGoogle Scholar
Clark, A. & Chalmers, D. (1998) The extended mind. Analysis 58(1):719.CrossRefGoogle Scholar
Clark, K. B. (1988) Prejudice and your child. Wesleyan University Press.Google Scholar
Clark, K. B. & Clark, M. K. (1939) The development of consciousness of self and the emergence of racial identification in Negro preschool children. Journal of Social Psychology 10(4):591–99.CrossRefGoogle Scholar
Constant, A., Bervoets, J., Hens, K. & Van de Cruys, S. (2018a) Precise worlds for certain minds: An ecological perspective on the relational self in autism. Topoi 112. https://doi.org/10.1007/s11245-018-9546-4.Google Scholar
Constant, A., Ramstead, M. J. D., Veissière, S. P. L., Campbell, J. O. & Friston, K. (2018b) A variational approach to niche construction. Journal of the Royal Society Interface 15(141):20170685.CrossRefGoogle Scholar
Constant, A., Ramstead, M. J. D., Veissière, S. P. L. & Friston, K. J. (2019b) Regimes of expectations: An active inference model of social conformity and decision making. Frontiers in Psychology 10:679. https://doi.org/10.3389/fpsyg.2019.00679.CrossRefGoogle Scholar
Coopersmith, J. (2017) The lazy universe: An introduction to the principle of least action. Oxford University Press.CrossRefGoogle Scholar
Csibra, G. & Gergely, G. (2009) Natural pedagogy. Trends in Cognitive Sciences 13(4):148–53.CrossRefGoogle ScholarPubMed
Csibra, G. & Gergely, G. (2011) Natural pedagogy as evolutionary adaptation. Philosophical Transactions of the Royal Society, Series B: Biological Sciences 366(1567):1149–57. https://doi.org/10.1098/rstb.2010.0319.Google Scholar
Cullen, M., Davey, B., Friston, K. J. & Moran, R. J. (2018) Active inference in OpenAI Gym: A paradigm for computational investigations into psychiatric illness. Biological Psychiatry: Cognitive Neuroscience and Neuroimaging 3(9):809–18. https://doi.org/10.1016/j.bpsc.2018.06.010.Google ScholarPubMed
De Castro, E. V. (2009) Métaphysiques Cannibales: Lignes D'anthropologie Post-Structurale. Presses universitaires de France.CrossRefGoogle Scholar
Dehaene, S. & Cohen, L. (2007) Cultural recycling of cortical maps. Neuron 56(2):384–98.CrossRefGoogle ScholarPubMed
Dunbar, R. I. M. (2003) The social brain: Mind, language, and society in evolutionary perspective. Annual Review of Anthropology 32(1):163–81.CrossRefGoogle Scholar
Dunbar, R. I. M. (2004) Gossip in evolutionary perspective. Review of General Psychology 8(2):100–10.CrossRefGoogle Scholar
Duranti, A. (2015) The anthropology of intentions. Cambridge University Press.CrossRefGoogle Scholar
Durkheim, E. (1985/2014) The rules of sociological method: And selected texts on sociology and its method. Simon and Schuster. (Original work published in 1985).Google Scholar
Dweck, C. S. (2013) Self-theories: Their role in motivation, personality, and development. Taylor & Francis. https://content.taylorfrancis.com/books/download?dac=C2009-0-07336-6&isbn=9781317710332&format=googlePreviewPdf.CrossRefGoogle Scholar
Einarsson, A. & Ziemke, T. (2017) Exploring the multi-layered affordances of composing and performing interactive music with responsive technologies. Frontiers in Psychology 8:1701.CrossRefGoogle ScholarPubMed
Fabry, R. E. (2018) Betwixt and between: The enculturated predictive processing approach to cognition. Synthese 195:2483–518.CrossRefGoogle Scholar
Feinman, S. (1982) Social referencing in infancy. Merrill-Palmer Quarterly 28:445–70. https://www.jstor.org/stable/pdf/23086154.pdf.Google Scholar
Feldman, H. & Friston, K. J. (2010) Attention, uncertainty, and free-energy. Frontiers in Human Neuroscience 4:215.CrossRefGoogle ScholarPubMed
Feldman, J. (2013) Tuning your priors to the world. Topics in Cognitive Science 5(1):1334.CrossRefGoogle ScholarPubMed
Friston, K. (2005) A theory of cortical responses. Philosophical Transactions of the Royal Society B: Biological Sciences 360(1456):815–36.CrossRefGoogle ScholarPubMed
Friston, K. J. (2010) The free-energy principle: A unified brain theory? Nature Reviews Neuroscience 11(2):127–38. https://doi.org/10.1038/nrn2787.CrossRefGoogle ScholarPubMed
Friston, K. (2011) Embodied inference: Or “I think therefore I am, if I am what I think.” In: Implications of Embodiment: Cognition and Communication, eds. Tschacher, W. & Bergomi, C., pp. 89125.Google Scholar
Friston, K. (2013) Life as we know it. Journal of the Royal Society Interface 10(86):20130475.CrossRefGoogle Scholar
Friston, K. J., FitzGerald, T., Rigoli, F., Schwartenbeck, P., O'Doherty, J. & Pezzulo, G. (2016) Active inference and learning. Neuroscience and Biobehavioral Reviews 68:862–79.CrossRefGoogle Scholar
Friston, K., FitzGerald, T., Rigoli, F., Schwartenbeck, P. & Pezzulo, G. (2017a) Active inference: A process theory. Neural Computation 29(1):149. https://doi.org/10.1162/NECO_a_00912.CrossRefGoogle Scholar
Friston, K. J. & Frith, C. D. (2015b) Active inference, communication and hermeneutics. Cortex 68:129–43. https://doi.org/10.1016/j.cortex.2015.03.025.CrossRefGoogle Scholar
Friston, K. J., Kilner, J. & Harrison, L. (2006) A free energy principle for the brain. Journal of Physiology, Paris 100(1–3):7087.CrossRefGoogle ScholarPubMed
Friston, K. J., Lin, M., Frith, C. D., Pezzulo, G., Hobson, J. A. & Ondobaka, S. (2017b) Active inference, curiosity and insight. Neural Computation 29(10):2633–83.CrossRefGoogle Scholar
Friston, K., Rigoli, F., Ognibene, D., Mathys, C., Fitzgerald, T. & Pezzulo, G. (2015) Active inference and epistemic value. Cognitive Neuroscience 6(4):187214. https://doi.org/10.1080/17588928.2015.1020053.CrossRefGoogle ScholarPubMed
Friston, K. J., Schwartenbeck, P., FitzGerald, T., Moutoussis, M., Behrens, T. & Dolan, R. J. (2014a) The anatomy of choice: Dopamine and decision-making. Philosophical Transactions of the Royal Society B: Biological Sciences 369(1655):20130481. https://doi.org/10.1098/rstb.2013.0481.CrossRefGoogle Scholar
Friston, K. J. & Stephan, K. E. (2007) Free-energy and the brain. Synthese 159(3):417–58.CrossRefGoogle ScholarPubMed
Friston, K. J., Stephan, K. E., Montague, R. & Dolan, R. J. (2014b) Computational psychiatry: The brain as a phantastic organ. Lancet Psychiatry 1(2):148–58.CrossRefGoogle Scholar
Friston, K., Thornton, C. & Clark, A. (2012a) Free-energy minimization and the dark-room problem. Frontiers in Psychology 3:130. https://doi.org/10.3389/fpsyg.2012.00130.CrossRefGoogle Scholar
Fulda, F. C. (2017) Natural agency: The case of bacterial cognition. Journal of the American Philosophical Association 3(1):6990.CrossRefGoogle Scholar
Gallagher, S. (2017) Enactivist interventions: Rethinking the mind. Oxford University Press.CrossRefGoogle Scholar
Gallagher, S. & Allen, M. (2018) Active inference, enactivism and the hermeneutics of social cognition. Synthese 195(6):2627–48. https://doi.org/10.1007/s11229-016-1269-8.CrossRefGoogle ScholarPubMed
Gallese, V. & Goldman, A. (1998) Mirror neurons and the simulation theory of mind-reading. Trends in Cognitive Sciences 2(12):493501.CrossRefGoogle ScholarPubMed
Gavrilets, S. & Vose, A. (2006) The dynamics of Machiavellian intelligence. Proceedings of the National Academy of Sciences of the United States of America 103(45):16823–28.CrossRefGoogle ScholarPubMed
Geertz, C. (1973) The interpretation of culture. Basic Books.Google Scholar
Gibson, J. J. (1979) The ecological approach to visual perception. Houghton Mifflin.Google Scholar
Goffman, E. (2009) Relations in public. Transaction.Google Scholar
Gold, J. & Gold, I. (2015) Suspicious minds: How culture shapes madness. Free Press.Google Scholar
Goldman, A. I. (2006) Simulating minds: The philosophy, psychology, and neuroscience of mindreading. Oxford University Press.CrossRefGoogle Scholar
Goldstein, J., Davidoff, J. & Roberson, D. (2009) Knowing color terms enhances recognition: Further evidence from English and Himba. Journal of Experimental Child Psychology 102(2):219–38.CrossRefGoogle ScholarPubMed
Gopnik, A. & Wellman, H. M. (2012) Reconstructing constructivism: Causal models, Bayesian learning mechanisms, and the theory theory. Psychological Bulletin 138(6):10851108. doi:10.1037/a0028044.CrossRefGoogle ScholarPubMed
Hacking, I. (1998) Mad travelers: Reflections on the reality of transient mental illnesses. University of Virginia Press.Google Scholar
Hamilton, A. F. de C. (2008) Emulation and mimicry for social interaction: A theoretical approach to imitation in autism. Quarterly Journal of Experimental Psychology 61(1):101–15. https://doi.org/10.1080/17470210701508798.CrossRefGoogle ScholarPubMed
Heine, S. J., Takemoto, T., Moskalenko, S., Lasaleta, J. & Henrich, J. (2008) Mirrors in the head: Cultural variation in objective self-awareness. Personality & Social Psychology Bulletin 34(7):879–87.CrossRefGoogle ScholarPubMed
Henrich, J. (2015) The secret of our success: How culture is driving human evolution, domesticating our species, and making us smarter. Princeton University Press.CrossRefGoogle Scholar
Henrich, J. & Gil-White, F. J. (2001) The evolution of prestige: Freely conferred deference as a mechanism for enhancing the benefits of cultural transmission. Evolution and Human Behavior 22(3):165–96.CrossRefGoogle ScholarPubMed
Hewlett, B. S. (1994) Intimate fathers: The nature and context of Aka pygmy paternal infant care. University of Michigan Press.Google Scholar
Hewlett, B. S. (2017) Hunter-gatherer childhoods: Evolutionary, developmental, and cultural perspectives. Routledge.CrossRefGoogle Scholar
Heyes, C. (2018b) Cognitive gadgets: The cultural evolution of thinking. Harvard University Press.CrossRefGoogle Scholar
Heyes, C. M. & Frith, C. D. (2014) The cultural evolution of mind reading. Science 344(6190):1243091.CrossRefGoogle ScholarPubMed
Hohwy, J. (2013) The predictive mind. Oxford University Press.CrossRefGoogle Scholar
Howes, D. (2011) Reply to Tim Ingold. Social Anthropology 19(3):318–22.CrossRefGoogle Scholar
Hrdy, S. B. (2011) Mothers and others. Harvard University Press.Google Scholar
Huneman, P. & Machery, E. (2015) Evolutionary psychology: Issues, results, debates. In: Handbook of evolutionary thinking in the sciences, eds. Heams, T., Huneman, P., Lecointre, G. & Silberstein, M., pp. 647–57. Springer.Google Scholar
Hutto, D. & Myin, E. (2013) Radical enactivism: Basic minds without content. MIT Press.Google Scholar
Hutto, D. & Myin, E. (2017) Evolving enactivism: Basic minds meet content. MIT Press.CrossRefGoogle Scholar
Hutto, D. & Satne, G. (2015) The natural origins of content. Philosophia 43(3):521–36.CrossRefGoogle Scholar
Hutto, D. D. (2012) Folk psychological narratives: The sociocultural basis of understanding reasons. MIT Press.Google Scholar
Hutto, D. D., Kirchhoff, M. D. & Myin, E. (2014) Extensive enactivism: Why keep it all in? Frontiers in Human Neuroscience 8:706.CrossRefGoogle ScholarPubMed
Huys, Q. J. M., Maia, T. V. & Frank, M. J. (2016) Computational psychiatry as a bridge from neuroscience to clinical applications. Nature Neuroscience 19(3):404–13.CrossRefGoogle ScholarPubMed
Ignatow, G. (2009) Why the sociology of morality meeds Bourdieu's habitus. Sociological Inquiry 79:98114. http://onlinelibrary.wiley.com/doi/10.1111/j.1475-682X.2008.00273.x/full.CrossRefGoogle Scholar
Ingold, T. (2001) From the transmission of representations to the education of Attention. In: Debated Mind: Evolutionary Psychology versus Ethnography, ed. Whitehouse, H., pp. 113–53. Berg.Google Scholar
Ingold, T. (2016) Lines: A brief history. Routledge.CrossRefGoogle Scholar
Jack, A. I. (2014) A scientific case for conceptual dualism: The problem of consciousness and the opposing domains hypothesis. Oxford Studies in Experimental Philosophy 1:132.Google Scholar
Joffily, M. & Coricelli, G. (2013) Emotional valence and the free-energy principle. PLoS Computational Biology 9(6):e1003094. https://doi.org/10.1371/journal.pcbi.1003094.CrossRefGoogle ScholarPubMed
Johnson, S. C., Dweck, C. S. & Chen, F. S. (2007) Evidence for infants’ internal working models of attachment. Psychological Science 18(6):501502.CrossRefGoogle ScholarPubMed
Kahneman, D. (2011) Thinking, fast and slow. Macmillan.Google Scholar
Kaplan, R. & Friston, K. J. (2018) Planning and navigation as active inference. Biological Cybernetics 112:323–43. https://doi.org/10.1007/s00422-018-0753-2.CrossRefGoogle ScholarPubMed
Kaufmann, L. & Clément, F. (2014) Wired for society: Cognizing pathways to society and culture. Topoi. An International Review of Philosophy 33(2):459–75.Google Scholar
Keane, W. (2015) Varieties of ethical stance. In: Four lectures on ethics: Anthropological perspectives, eds. Lambek, M., Das, V., Fassin, D. & Hau, W. Keane..Google Scholar
Kelly, D., Faucher, L. & Machery, E. (2010) Getting rid of racism: Assessing three proposals in light of psychological evidence. Journal of Social Philosophy 41(3):293322.CrossRefGoogle Scholar
Kiebel, S. J., Daunizeau, J. & Friston, K. J. (2008) A hierarchy of time-scales and the brain. PLoS Computational Biology 4(11):e1000209. https://doi.org/10.1371/journal.pcbi.1000209.CrossRefGoogle ScholarPubMed
Kiebel, S. J. & Friston, K. J. (2011) Free energy and dendritic self-organization. Frontiers in Systems Neuroscience 5:80.CrossRefGoogle ScholarPubMed
Kinzler, K. D., Dupoux, E. & Spelke, E. S. (2007) The native language of social cognition. Proceedings of the National Academy of Sciences of the United States of America 104(30):12577–80.CrossRefGoogle ScholarPubMed
Kinzler, K. D. & Spelke, E. S. (2011) Do infants show social preferences for people differing in race? Cognition 119(1):19.CrossRefGoogle ScholarPubMed
Kirmayer, L. J. (1989) Cultural variations in the response to psychiatric disorders and emotional distress. Social Science & Medicine 29(3):327–39.CrossRefGoogle ScholarPubMed
Kirmayer, L. J. (2015) Re-visioning psychiatry: Toward an ecology of mind in health and illness. In: Re-visioning psychiatry: Cultural phenomenology, critical neuroscience and global mental health, eds. Kirmayer, L. J., Lemelson, R. & Cummings, C. A., pp. 622–60. Cambridge University Press.CrossRefGoogle Scholar
Kirmayer, L. J. & Gold, I. (2011) Re-socializing psychiatry: Critical neuroscience and the limits of reductionism. In: Critical Neuroscience, eds. Choudhury, S. and Slaby, J., pp. 305–30. Wiley-Blackwell.CrossRefGoogle Scholar
Kirmayer, L. J., Gomez-Carrillo, A. & Veissière, S. P. L. (2017) Culture and depression in global mental health: An ecosocial approach to the phenomenology of psychiatric disorders. Social Science and Medicine 183:163–68.CrossRefGoogle ScholarPubMed
Kirmayer, L. J., Lemelson, R. & Cummings, C. A. (2015) Re-visioning psychiatry: Cultural phenomenology, critical neuroscience, and global mental health. Cambridge University Press.CrossRefGoogle Scholar
Kirmayer, L. J. & Ramstead, M. J. D. (2017) Embodiment and enactment in cultural psychiatry. In: Embodiment, enaction, and culture: Investigating the constitution of the shared world, eds. Durt, C., Fuchs, T., & Tewes, C., pp. 397422. MIT Press.Google Scholar
Kirmayer, L. J. & Young, A. (1998) Culture and somatization: Clinical, epidemiological, and ethnographic perspectives. Psychosomatic Medicine 60(4):420–30.CrossRefGoogle ScholarPubMed
Kiverstein, J., Miller, M. & Rietveld, E. (2019) The feeling of grip: Novelty, error dynamics and the predictive brain. Synthese 196(7):2847–69. https://doi.org/10.1007/s11229-017-1583-9.CrossRefGoogle Scholar
Kurzban, R. & Neuberg, S. (2005) Managing ingroup and outgroup relationships. In: The handbook of evolutionary psychology, ed. Buss, D. M., pp. 653–75. Wiley.Google Scholar
Lakoff, G. & Johnson, M. (1980) The metaphorical structure of the human conceptual system. Cognitive Science 4(2):195208.CrossRefGoogle Scholar
Laland, K. N. (2018) Darwin's unfinished symphony: How culture made the human mind. Princeton University Press.Google Scholar
Laland, K. N., Uller, T., Feldman, M. W., Sterelny, K., Müller, G. B., Moczek, A., Jablonka, E. & Odling-Smee, J. (2015) The extended evolutionary synthesis: Its structure, assumptions and predictions. Proceedings of the Royal Society B: Biological Sciences 282(1813):20151019.CrossRefGoogle ScholarPubMed
Lebois, L. A. M., Wilson-Mendenhall, C. D., Simmons, W. K., Feldman Barrett, L. & Barsalou, L. W. (in press) Learning situated emotions. Neuropsychologia.Google Scholar
Levy, R. I. (1975) Tahitians: Mind and experience in the society islands. University of Chicago Press.Google Scholar
Levy, R. I. (1984) Emotion, knowing and culture. In: Culture theory: Essays on mind, self, and emotion, eds. Shweder, R. & LeVine, R., pp. 214–37. Cambridge University Press.Google Scholar
Luhrmann, T. (2011) Toward an anthropological theory of mind. Suomen Antropologi: Journal of the Finnish Anthropological Society 36(4):569.Google Scholar
Luo, S., Li, B., Ma, Y., Zhang, W., Rao, Y. & Han, S. (2015) Oxytocin receptor gene and racial ingroup bias in empathy-related brain activity. NeuroImage 110:2231.CrossRefGoogle ScholarPubMed
Luo, Y. & Baillargeon, R. (2005) Can a self-propelled box have a goal? Psychological reasoning in 5-month-old infants. Psychological Science 16(8):601608.CrossRefGoogle ScholarPubMed
Lutz, A., Brefczynski-Lewis, J., Johnstone, T. & Davidson, R. J. (2008) Regulation of the neural circuitry of emotion by compassion meditation: Effects of meditative expertise. PLoS ONE 3(3):e1897.CrossRefGoogle ScholarPubMed
Machery, E. (2016) De-Freuding implicit attitudes. In: Implicit bias and philosophy. Vol. 1. Metaphysics and epistemology, eds. Brownstein, M & Saul, J., pp. 104–29. Oxford University Press.CrossRefGoogle Scholar
Machery, E. & Faucher, L. (2017) Why do we think racially? Culture, evolution, and cognition. In: Handbook of categorization in cognitive science, 2nd edition, eds. Cohen, H. and Lefebvre, C., pp. 1135–75. Elsevier.CrossRefGoogle Scholar
Madoka, M. (2003) Haiku. In: Far beyond the field: Haiku by Japanese women, ed. Ueda, M., p. 232. Columbia University Press.Google Scholar
Mahajan, N. & Woodward, A. (2009) Seven-month-old infants selectively reproduce the goals of animate but not inanimate agents. Infancy 14(6):667–79.CrossRefGoogle Scholar
Malafouris, L. (2015) Metaplasticity and the primacy of material engagement. Time and Mind 8(4):351–71.CrossRefGoogle Scholar
Mameli, M. (2001) Mindreading, mindshaping, and evolution. Biology and Philosophy 16(5):595626.CrossRefGoogle Scholar
Mauss, M. (1973) Techniques of the body. Economy and Society 2(1):7088.CrossRefGoogle Scholar
McCauley, R. N. & Henrich, J. (2006) Susceptibility to the Müller-Lyer illusion, theory-neutral observation, and the diachronic penetrability of the visual input system. Philosophical Psychology 19(1):79101.CrossRefGoogle Scholar
McGeer, V. (2007) The regulative dimension of folk psychology. In: Folk psychology re-assessed, eds. Hutto, D. D. & Ratcliffe, M., pp. 137–56. Springer. https://doi.org/10.1007/978-1-4020-5558-4_8.CrossRefGoogle Scholar
Menary, R. (2010) The extended mind and cognitive integration. In: The extended mind, ed. Menary, R., pp. 267–88. MIT Press,CrossRefGoogle Scholar
Mercier, H. & Sperber, D. (2017) The enigma of reason. Harvard University Press.CrossRefGoogle Scholar
Michael, J., Christensen, W. & Overgaard, S. (2014) Mindreading as social expertise. Synthese 191(5):817–40.CrossRefGoogle Scholar
Milgram, S. (1963) Behavioral study of obedience. Journal of Abnormal Psychology 67:371–78.CrossRefGoogle ScholarPubMed
Miresco, M. J. & Kirmayer, L. J. (2006) The persistence of mind-brain dualism in psychiatric reasoning about clinical scenarios. American Journal of Psychiatry 163(5):913–18.CrossRefGoogle ScholarPubMed
Mirza, M. B., Adams, R. A., Mathys, C. D. & Friston, K. J. (2016) Scene construction, visual foraging, and active inference. Frontiers in Computational Neuroscience 10(56):116. https://doi.org/10.3389/fncom.2016.00056.CrossRefGoogle ScholarPubMed
Montague, P. R., Dolan, R. J., Friston, K. J. & Dayan, P. (2012) Computational psychiatry. Trends in Cognitive Sciences 16(1):7280.CrossRefGoogle ScholarPubMed
Morgan, T. J. H. & Laland, K. N. (2012) The biological bases of conformity. Frontiers in Neuroscience 6:87. https://doi.org/10.3389/fnins.2012.00087.CrossRefGoogle ScholarPubMed
Navarrete, C. D. & Fessler, D. M. T. (2005) Normative bias and adaptive challenges: A relational approach to coalitional psychology and a critique of terror management theory. Evolutionary Psychology 3(1):297325.CrossRefGoogle Scholar
Odling-Smee, J., Laland, K. N. & Feldman, M. W. (2003) Niche construction: The neglected process in evolution. Princeton University Press.Google Scholar
Olson, K. R. & Spelke, E. S. (2008) Foundations of cooperation in young children. Cognition 108(1):222–31.CrossRefGoogle ScholarPubMed
Onishi, K. H. & Baillargeon, R. (2005) Do 15-month-old infants understand false beliefs? Science 308(5719):255–58.CrossRefGoogle ScholarPubMed
Oudeyer, P.-Y. & Kaplan, F. (2007) What is intrinsic motivation? A typology of computational approaches. Frontiers in Neurorobotics 1:6.CrossRefGoogle ScholarPubMed
Parr, T. & Friston, K. J. (2017a) Uncertainty, epistemics and active inference. Journal of the Royal Society Interface 14(136):20170376. https://doi.org/10.1098/rsif.2017.0376.CrossRefGoogle Scholar
Parr, T. & Friston, K. J. (2017b) Working memory, attention, and salience in active inference. Scientific Reports 7(1):14678.CrossRefGoogle Scholar
Parr, T. & Friston, K. J. (2019) Attention or salience? Current Opinion in Psychology 29:15. https://doi.org/10.1016/j.copsyc.2018.10.006.CrossRefGoogle ScholarPubMed
Pauker, K., Williams, A. & Steele, J. R. (2016) Children's racial categorization in context. Child Development Perspectives 10(1):3338.CrossRefGoogle ScholarPubMed
Pezzulo, G., Cartoni, E., Rigoli, F., Pio-Lopez, L. & Friston, K. J. (2016) Active inference, epistemic value, and vicarious trial and error. Learning & Memory 23(7):322–38.CrossRefGoogle ScholarPubMed
Pezzulo, G. & Cisek, P. (2016) Navigating the affordance landscape: Feedback control as a process model of behavior and cognition. Trends in Cognitive Sciences 20(6):414–24.CrossRefGoogle ScholarPubMed
Phillips, M. L., Young, A. W., Senior, C., Brammer, M., Andrew, C., Calder, A. J., Bullmore, E. T., Perrett, D. I., Rowland, D., Williams, S. C. R., Gray, J. A. & David, A. S. (1997) A specific neural substrate for perceiving facial expressions of disgust. Nature 389(6650):495–98.CrossRefGoogle ScholarPubMed
Pinker, S. (1999) How the mind works. Annals of the New York Academy of Sciences 882:119–27, discussion 128–34.CrossRefGoogle ScholarPubMed
Pinker, S. (2003) Language as an adaptation to the cognitive niche. In: Language evolution: States of the art, eds. Kirby, S. & Christiansen, M. H., pp. 1637. Oxford University Press.CrossRefGoogle Scholar
Poerio, G. L. & Smallwood, J. (2016) Daydreaming to navigate the social world: What we know, what we don't know, and why it matters. Social and Personality Psychology Compass 10(11):605–18.CrossRefGoogle Scholar
Ramsey, W. M. (2007) Representation reconsidered. Cambridge University Press.CrossRefGoogle Scholar
Ramstead, M. J. D., Badcock, P. B. & Friston, K. J. (2018) Answering Schrödinger's question: A free-energy formulation. Physics of Life Reviews 24:116. https://doi.org/10.1016/j.plrev.2017.09.001.CrossRefGoogle ScholarPubMed
Ramstead, M. J. D., Constant, A., Badcock, P. B. & Friston, K. (2019a) Variational ecology and the physics of sentient systems. Physics of Life Reviews 31:188205.CrossRefGoogle Scholar
Ramstead, M. J. D., Veissière, S. P. L. & Kirmayer, L. J. (2016) Cultural affordances: Scaffolding local worlds through shared intentionality and regimes of attention. Frontiers in Psychology 7:1090.CrossRefGoogle ScholarPubMed
Rietveld, E. & Brouwers, A. A. (2017) Optimal grip on affordances in architectural design practices: An ethnography. Phenomenology and the Cognitive Sciences 16(3):545–64.CrossRefGoogle Scholar
Rietveld, E. & Kiverstein, J. (2014) A rich landscape of affordances. Ecological Psychology 26(4):325–52. https://doi.org/10.1080/10407413.2014.958035.CrossRefGoogle Scholar
Robbins, J. (2008) On not knowing other minds: Confession, intention, and linguistic exchange in a Papua New Guinea community. Anthropological Quarterly 81(2):421–29.CrossRefGoogle Scholar
Robbins, J., Cassaniti, J. & Luhrmann, T. M. (2011) The constitution of mind: What's in a mind? Interiority and boundedness. Suomen Antropologi 36(4):1520.Google Scholar
Robbins, J. & Rumsey, A. (2008) Introduction: Cultural and linguistic anthropology and the opacity of other minds. Anthropological Quarterly 81(2):407–20.CrossRefGoogle Scholar
Roepstorff, A., Niewöhner, J. & Beck, S. (2010) Enculturing brains through patterned practices. Neural Networks 23(8–9):1051–59.CrossRefGoogle ScholarPubMed
Rosaldo, M. Z. (1982) The things we do with words: Ilongot speech acts and speech act theory in philosophy. Language in Society 11(2):203–37.CrossRefGoogle Scholar
Rozin, P., Haidt, J. & Fincher, K. (2009) Psychology: From oral to moral. Science 323(5918):1179–80.CrossRefGoogle ScholarPubMed
Rumsey, A. (2013) Intersubjectivity, deception and the “opacity of other minds”: Perspectives from Highland New Guinea and beyond. Language & Communication 33(3):326–43.CrossRefGoogle Scholar
Schilbach, L. (2016) Towards a second-person neuropsychiatry. Philosophical Transactions of the Royal Society B: Biological Sciences 371(1686):20150081. https://doi.org/10.1098/rstb.2015.0081.CrossRefGoogle ScholarPubMed
Schmidhuber, J. (2006) Developmental robotics, optimal artificial curiosity, creativity, music, and the fine arts. Connection Science 18(2):173–87.CrossRefGoogle Scholar
Schwartenbeck, P. & Friston, K. (2016) Computational phenotyping in psychiatry: A worked example. eNeuro 3(4):ENEURO.0049-16.2016.CrossRefGoogle ScholarPubMed
Seligman, R., Choudhury, S. & Kirmayer, L. J. (2016) Locating culture in the brain and in the world: From social categories to the ecology of mind. In Handbook of cultural neuroscience, eds. Chiao, J. Y., Li, S. C., Seligman, R., & Turner, R., pp. 320. Oxford University Press.Google Scholar
Seth, A. K. & Friston, K. J. (2016) Active interoceptive inference and the emotional brain. Philosophical Transactions of the Royal Society B: Biological Sciences 371(1708):20160007.CrossRefGoogle ScholarPubMed
Shapiro, L. (2010) Embodied cognition. Routledge.CrossRefGoogle Scholar
Shelley-Tremblay, J. F. & Rosén, L. A. (1996) Attention deficit hyperactivity disorder: An evolutionary perspective. Journal of Genetic Psychology 157(4):443–53.CrossRefGoogle Scholar
Spelke, E. S. & Kinzler, K. D. (2007) Core knowledge. Developmental Science 10(1):8996.CrossRefGoogle ScholarPubMed
Sperber, D. (1996) Explaining culture: A naturalistic approach. Wiley.Google Scholar
Sperber, D. (1997) Intuitive and reflective beliefs. Mind & Language 12(1):6783.CrossRefGoogle Scholar
Stasch, R. (2009) Society of others kinship and mourning in a West Papuan place. University of California Press.CrossRefGoogle Scholar
Stephan, K. E., Kasper, L., Harrison, L. M., Daunizeau, J., den Ouden, H. E. M., Breakspear, M. & Friston, K. J. (2008) Nonlinear dynamic causal models for fMRI. NeuroImage 42(2):649–62.CrossRefGoogle ScholarPubMed
Sterelny, K. (2012) The evolved apprentice. MIT Press.CrossRefGoogle Scholar
Stotz, K. (2017) Why developmental niche construction is not selective niche construction: And why it matters. Interface Focus 7(5):20160157.CrossRefGoogle Scholar
Stotz, K. & Griffiths, P. E. (2017) A developmental systems account of human nature. In: Why we disagree about human nature, eds. Lewens, T. & Hannon, E., pp. 5875. Oxford University Press.Google Scholar
Stout, D. & Chaminade, T. (2007) The evolutionary neuroscience of tool making. Neuropsychologia 45(5):1091–100.CrossRefGoogle ScholarPubMed
Stout, D., Toth, N., Schick, K. & Chaminade, T. (2008) Neural correlates of early Stone Age toolmaking: Technology, language and cognition in human evolution. Philosophical Transactions of the Royal Society B: Biological Sciences 363(1499):1939–49.CrossRefGoogle ScholarPubMed
Sutton, J. (2010) Exograms and interdisciplinarity: History, the extended mind, and the civilizing process. In: The extended mind, ed. Menary, R., pp. 189225. MIT Press.CrossRefGoogle Scholar
Swami, V., Frederick, D. A., Aavik, T., Alcalay, L., Allik, J., Anderson, D., Andrianto, S., Arora, A., Brännström, A., Cunningham, J., Danel, D., Doroszewicz, K., Forbes, G. B., Furnham, A., Greven, C. U., Halberstadt, J., Hao, S., Haubner, T., Hwang, C. S., Inman, M., Jaafar, J. L., Johansson, J., Jung, J., Keser, A., Kretzschmar, U., Lachenicht, L., Li, N. P., Locke, K., Lönnqvist, J. E., Lopez, C., Loutzenhiser, L., Maisel, N. C., McCabe, M. P., McCreary, D. R., McKibbin, W. F., Mussap, A., Neto, F., Nowell, C., Alampay, L. P., Pillai, S. K., Pokrajac-Bulian, A., Proyer, R. T., Quintelier, K., Ricciardelli, L. A., Rozmus-Wrzesinska, M., Ruch, W., Russo, T., Schütz, A., Shackelford, T. K., Shashidharan, S., Simonetti, F., Sinniah, D., Swami, M., Vandermassen, G., van Duynslaeger, M., Verkasalo, M., Voracek, M., Yee, C. K., Zhang, E. X., Zhang, X. & Zivcic-Becirevic, I. (2010) The attractive female body weight and female body dissatisfaction in 26 countries across 10 world regions: Results of the International Body Project I. Personality & Social Psychology Bulletin 36(3):309–25.CrossRefGoogle ScholarPubMed
Swanson, J., Moyzis, R., Fossella, J., Fan, J. & Posner, M. I. (2002) Adaptationism and molecular biology: An example based on ADHD. Behavioral and Brain Sciences 25(4):530–31.CrossRefGoogle Scholar
Taylor, C. (2016) The language animal. Harvard University Press.CrossRefGoogle Scholar
Timmermans, B., Schilbach, L., Pasquali, A. & Cleeremans, A. (2012) Higher order thoughts in action: Consciousness as an unconscious re-description process. Philosophical Transactions of the Royal Society B: Biological Sciences 367(1594):1412–23. https://doi.org/10.1098/rstb.2011.0421.CrossRefGoogle ScholarPubMed
Tomasello, M. (2009) Why we cooperate. MIT Press.CrossRefGoogle Scholar
Tomasello, M. (2014) A natural history of human thinking. Harvard University Press.CrossRefGoogle Scholar
Tomasello, M., Carpenter, M., Call, J., Behne, T. & Moll, H. (2005) Understanding and sharing intentions: The origins of cultural cognition. Behavioral and Brain Sciences 28(5):675–91, discussion 691–735.CrossRefGoogle ScholarPubMed
Tovo-Rodrigues, L., Rohde, L. A., Menezes, A. M. B., Polanczyk, G. V., Kieling, C., Genro, J. P., Anselmi, L. & Hutz, M. H. (2013) DRD4 rare variants in attention-deficit/hyperactivity disorder (ADHD): Further evidence from a birth cohort study. PLoS ONE 8(12):e85164.CrossRefGoogle ScholarPubMed
Trivers, R. (2000) The elements of a scientific theory of self-deception. Annals of the New York Academy of Sciences 907:114–31.CrossRefGoogle ScholarPubMed
Tschacher, W. & Haken, H. (2007) Intentionality in non-equilibrium systems? The functional aspects of self-organized pattern formation. New Ideas in Psychology 25(1):115.CrossRefGoogle Scholar
Tybur, J. M., Lieberman, D., Kurzban, R. & DeScioli, P. (2013) Disgust: Evolved function and structure. Psychological Review 120(1):6584.CrossRefGoogle ScholarPubMed
Van Dijk, L. & Rietveld, E. (2017) Foregrounding sociomaterial practice in our understanding of affordances: The skilled intentionality framework. Frontiers in Psychology 7:1969. doi:10.3389/fpsyg.2016.01969.CrossRefGoogle ScholarPubMed
Veissière, S. (2016) Varieties of tulpa experiences: The hypnotic nature of human sociality, personhood, and interphenomenality. In: Hypnosis and Meditation: Towards an Integrative Science of Conscious Planes, eds. Raz, A. & Lifshitz, M., pp. 5576. Oxford University Press.Google Scholar
Veissière, S. (2018) Cultural Markov blankets? Mind the other minds gap! Comment on “Answering Schrödinger's question: A free-energy formulation” by Maxwell James Désormeau Ramstead et al. Physics of Life Reviews 24:4749.CrossRefGoogle Scholar
von der Lühe, T., Manera, V., Barisic, I., Becchio, C., Vogeley, K. & Schilbach, L. (2016) Interpersonal predictive coding, not action perception, is impaired in autism. Philosophical Transactions of the Royal Society B: Biological Sciences 371(1693):20150373. https://doi.org/10.1098/rstb.2015.0373.CrossRefGoogle Scholar
Wang, Y.-G., Wang, Y.-Q., Chen, S.-L., Zhu, C.-Y. & Wang, K. (2008) Theory of mind disability in major depression with or without psychotic symptoms: A componential view. Psychiatry Research 161(2):153–61.CrossRefGoogle ScholarPubMed
Whiten, A. & Erdal, D. (2012) The human socio-cognitive niche and its evolutionary origins. Philosophical Transactions of the Royal Society B: Biological Sciences 367(1599):2119–29.CrossRefGoogle ScholarPubMed
Williams, B. (2011) Ethics and the limits of philosophy. Taylor & Francis.CrossRefGoogle Scholar
Wright, L. T., Nancarrow, C. & Kwok, P. M. H. (2001) Food taste preferences and cultural influences on consumption. British Food Journal 103(5):348–57.CrossRefGoogle Scholar
Zatzick, D. F. & Dimsdale, J. E. (1990) Cultural variations in response to painful stimuli. Psychosomatic Medicine 52(5):544–57.CrossRefGoogle ScholarPubMed
Zawidzki, T. W. (2008) The function of folk psychology: Mind reading or mind shaping? Philosophical Explorations: An International Journal for the Philosophy of Mind and Action 11(3):193210.CrossRefGoogle Scholar
Zawidzki, T. W. (2013) Mindshaping: A new framework for understanding human social cognition. MIT Press.CrossRefGoogle Scholar
Figure 0

Figure 1. Self-evidencing and the Bayesian brain. Upper panel: Schematic of the quantities that define an agent and its coupling to the world. These quantities include the internal states of the agent (e.g., a brain) and quantities describing exchange with the world – namely, sensory input and action that changes the way the environment is sampled. The environment is described by equations of motion that specify the dynamics of (hidden) states of the world. Internal states and action both change to minimize free energy or self-information, which is a function of sensory input and a probabilistic belief encoded by the internal states. Lower panel: Alternative expressions for free energy illustrating what its minimization entails. For action, free energy (i.e., self-information) can only be suppressed by increasing the accuracy of sensory data (i.e., selectively sampling data that are predicted). Conversely, optimizing internal states makes the representation an approximate conditional density on the causes of sensory input (by minimizing a Kullback-Leibler divergence between the approximate and true posterior density). This optimization makes the free-energy bound on self-information tighter and enables action to avoid surprising sensations (because the divergence can never be less than zero). When selecting actions that minimize the expected free energy, the expected divergence becomes (negative) epistemic value or salience, whereas the expected surprise becomes (negative) extrinsic value – namely, the expected likelihood that prior preferences will be realized following an action. See the Appendix for a technical explanation – and description of the variables in this figure.

Figure 1

Figure 2. Cultural affordances. A schematic illustration of the looping effects that modulate social learning by human agents through expectations that, in turn, enable their interaction with cultural affordances. The attentional processes of individual agents are modulated by regimes of attention and by the shared expectations, norms, and conventions that characterize their local culture. In this example, the key point is that the yellow arrows effectively bias self-evidencing towards or away from (certain kinds of) sensory evidence – and that the optimal selection (i.e., salience) has to be both learned and learnable in the right sort of cultural context. Adapted from Ramstead et al. (2016).

Figure 2

Figure 3. Summary of the variational approach to niche construction. As in Figure 1, internal states and action change to minimize free energy based on sensations and beliefs. Heuristically, one can think of niche construction as the process whereby the agent's action creates a symmetry between internal and external states. The agent changes the statistical structure of the world as it acts on the world. The statistical structure of the world here simply refers to the actual probability of finding some causes of outcomes at a given location in the environment (e.g., the bread being the cause of pleasant smell in the bakery). From the point of view of niche construction, such probability changes as a function of the agent's action and in a way that is consistent with the agent's beliefs. Indeed, a simple consequence of agents acting to optimize action based on beliefs is that the traces produced by agents’ action will tend to be consistent with their beliefs. Another intriguing consequence of this is that, over time, traces in the world will effectively “learn” agents’ beliefs, in the sense that those traces will encode statistical regularities that relate to those beliefs. For example, consider a well-worn path cut through the grass in the park. Such a “desire path” encodes a robust probability that the location of the path in the environment will map onto the probability outcome “being walked on.” The value of that probability mapping increases over time as people wear down the path. This means that changes in the niche mirror changes in agents’ beliefs enacted via action. With the mathematical apparatus of the free-energy principle, one can model “environmental learning” about the agents’ action in the same way that one models “agents’ learning” of the environment's sensory causes. The only twist is that the quantities are inversed (compare blue and green vs. yellow and red boxes). From the point of view of the environment's generative process, actions play the same role as sensations in the agent's generative model (for a detailed mathematical description, see Bruineberg et al. 2018; Constant et al. 2018b).

Figure 3

Figure 4. Thinking through other minds (see Figs. 1 and 3 for the equations). This figure depicts the loop between action, sensations, and niche construction that lead to the acquisition and production of cultural habits, and to the inference and learning about other minds. The shared epistemic resources in the constructed niche (i.e., external states modified by actions from agents 1 to n) and the regimes of attention (i.e., internal state) constitute the domains of statistical regularities that tune to one another via the physical engagement of the niche. Those domains are finessed (i.e., mutual learning of internal and external states) by a community of practices (agents from 1 to n) over ontogenetic (e.g., over development) and phylogenetic timescales (e.g., via the inheritance of material resources). The learning and deployment of internal and external domains of statistical regularities is what we call “thinking through other minds” (TTOM). TTOM entails, and depends on, the production of culturally patterned practices. Cultural practices and associated artefacts are epistemic resources that guide the attention (and learning) of members in the community by shaping sensory perception.