1. Introduction
“Analogy lies at the core of human cognition,” as Holyoak et al. (Reference Holyoak, Gentner, Kokinov, Holyoak, Gentner and Kokinov2001, p. 2) point out. Analogies underlie creative thought and problem solving, and as such are implicated in virtually all aspects of human life. Analogies are found in science (e.g., comparing an atom with the solar system), in politics (e.g., the first President Bush comparing Saddam Hussein with Adolf Hitler), and in everyday living. It is not altogether surprising, then, that analogy is an equally powerful force in cognitive development. Children use analogy to extend their knowledge about the biological, physical, and psychological world and to solve problems (e.g., Brown & Kane Reference Brown and Kane1988; Holyoak et al. Reference Holyoak, Junn and Billman1984; Inagaki & Hatano Reference Inagaki and Hatano1987; Pauen & Wilkening Reference Pauen and Wilkening1997). Spontaneous analogies have been observed in very young children (Pauen & Wilkening Reference Pauen and Wilkening1997; Tunteler & Resing Reference Tunteler and Resing2002), and there is even some evidence that infants are able to reason by analogy from around their first birthday (Chen et al. Reference Chen, Sanchez and Campbell1997).
Given its importance to cognition, it is equally unsurprising that analogy has been the focus of a number of detailed theoretical accounts, many of which have been implemented as working computational models (for a recent review, see French Reference French2002). The majority of this work has focused on adult reasoning, and existing developmental accounts are largely adaptations of adult models (e.g., Gentner et al. Reference Gentner, Rattermann, Markman, Kotovsky, Simon and Halford1995). From a developmental perspective, these accounts are wanting in that they posit specific mechanisms (e.g., structure-mapping) with no plausible explanation as to how such mechanisms arise during cognitive development.
The work presented here is an attempt to fill this gap by providing a theory of the emergence of analogical reasoning abilities. The theory suggests that basic analogical abilities may arise from the normal functioning of a memory system as its domain knowledge increases. Importantly, no special additional mechanisms are required to deal with simple analogies. Rather, the ability to generate and use such analogies is argued to arise from the priming of relations that hold between terms in the analogy. The account thus illustrates how a putative complex skill could emerge out of relatively simple mechanisms. Our approach is comparable to emergentist theories of other cognitive skills such as language, which is sometimes envisaged as a “new machine constructed entirely out of old parts” (Bates & MacWhinney Reference Bates, MacWhinney, MacWhinney and Bates1989). A further consequence of our approach is that analogy may best be understood not as a uniform cognitive skill but, instead, as an umbrella term that describes different task-specific constellations of basic memory and control processes.
The rest of this target article unfolds as follows. First, we present a brief overview of the current state of research into analogical reasoning and its development. In the second section we consider key aspects of current accounts of analogical reasoning and consider why these are difficult to reconcile with some aspects of cognitive development. We then present our suggestions for a more developmentally constrained theory based on priming within a semantic memory system. Subsequently, this verbal theory is implemented in a model that, although simple, illustrates how our account of analogical completion functions. We then demonstrate how this account is able to tie together a wide range of developmental findings into a single explanatory framework. Finally, we consider the theoretical implications of the model for the development of analogical reasoning.
1.1. Key features in the development of analogical reasoning
Because we are primarily concerned with development, we start by considering general accounts (and evidence) of how analogical reasoning develops before going on to consider more detailed and specific (largely adult) accounts of analogy. One approach (e.g., Halford et al. Reference Halford, Wilson and Phillips1998; Hummel & Holyoak Reference Hummel and Holyoak1997; Piaget et al. Reference Piaget, Montangero, Billeter and Piaget1977; Sternberg & Rifkin Reference Sternberg and Rifkin1979) sets analogy alongside other cognitive skills and attempts to provide a domain-general explanation for all such skills. The second approach focuses more on whether analogical reasoning arises from increasing knowledge (e.g., Brown Reference Brown, Vosniadou and Ortony1989; Gentner Reference Gentner, Vosniadou and Ortony1989; Gentner et al. Reference Gentner, Rattermann, Markman, Kotovsky, Simon and Halford1995; Goswami Reference Goswami1992; Reference Goswami1996). While reviewing the experimental results from both approaches in sections 1.1.1–1.1.4, we enumerate the key phenomena that a developmental account of analogy must capture.
1.1.1. Analogy as a domain-general cognitive skill
Early research into the development of analogical reasoning (e.g., Piaget et al. Reference Piaget, Montangero, Billeter and Piaget1977; Sternberg & Rifkin Reference Sternberg and Rifkin1979) found scant direct evidence of analogy use by young children. This was generally taken as evidence of domain-general structural changes in children's reasoning abilities. For example, Piaget tested 5- to 12-year-old children on picture-based “a is to b as c is to what?” analogies and found only occasional and uncertain evidence of analogical reasoning (Piaget et al. Reference Piaget, Montangero, Billeter and Piaget1977). Piaget interpreted this as suggesting that analogical development should be understood in terms of his more general account of the development of logical reasoning. In a similar vein, Sternberg and colleagues (Sternberg & Nigro Reference Sternberg and Nigro1980; Sternberg & Rifkin Reference Sternberg and Rifkin1979) used children's reaction time data to argue that there was an age-modulated shift from solving analogies using largely associative strategies to using more genuine analogical reasoning strategies. However, both Piaget's and Sternberg's experimental results, and consequent theoretical positions, have been criticized on the grounds that they failed to take into account the children's knowledge of the relations underlying the analogies. Consequently, they greatly underestimated children's analogical reasoning abilities (Goswami Reference Goswami1991).
Some more recent theorists have also argued that domain-general changes have a particularly important role in young children's emerging analogical reasoning abilities. These accounts focus on the development of capacity limits in active memory instead of structural changes in underlying reasoning mechanisms. Halford argues (see Andrews & Halford Reference Andrews and Halford2002; Andrews et al. Reference Andrews, Halford, Bunch, Bowden and Jones2003; Halford Reference Halford1993; Halford et al. Reference Halford, Wilson and Phillips1998) that one of the most fundamental constraints acting on cognitive development is the maximum relational complexity that can be processed in parallel in working memory (see also Hummel & Holyoak [1997] for a similar account concerning the LISA model, discussed in section 1.2.1 of the target article). Halford and colleagues define complexity as “the number of related dimensions or sources of variation” (Halford et al. Reference Halford, Wilson and Phillips1998, p. 803). Tasks involving one source of variation start to be processed around the first birthday. Binary relations (i.e., with two sources of variation), including rudimentary analogical reasoning, can be understood by about 2 years of age. By age 5 children are able to process ternary relations and so are able to demonstrate skills such as transitivity. For example, Richland et al. (Reference Richland, Morrison and Holyoak2006) tested 3- to 14-year-old children on analogical mappings between pictures that varied in the number of relations that had to be integrated and the presence or absence of perceptual distractors. Although the youngest children performed well on the simpler analogies, relational complexity severely disrupted their mapping success, a result that diminished with age. The authors interpret these results as evidence that the general maturation of working memory and inhibitory mechanisms is at the heart of increased performance on analogical reasoning tasks. Finally, it is worth pointing out that these conclusions are not uncontested. Indeed, Goswami (Reference Goswami1998) and Gentner and Rattermann (1998) provide some evidence that children younger than 5 years of age are able to form analogies involving ternary relations, while Halford et al. (Reference Halford, Wilson and Phillips1998) argue that these latter findings can be explained in terms of decomposing ternary relations into binary relations.
1.1.2. The development of analogical completion as knowledge accretion
In contrast to the early null findings, recent researchers have demonstrated analogical reasoning very early in development. Most strikingly, Chen et al. (Reference Chen, Sanchez and Campbell1997) demonstrated 10- and 13-month-old infants' ability to use analogy to solve a simple task. Here, the infant's parent modeled a task where the infant had to combine two sub-goals to reach a toy – removing a barrier, pulling a cloth, and then pulling a string to reach the toy. The infants subsequently had to disregard superficial similarities to transfer the parent-modeled solution to a novel task involving the same underlying structure. This result suggests that at least the precursors of analogical reasoning are present from before the first birthday. Similar studies have demonstrated that 17- to 36-month-olds (see Brown Reference Brown, Vosniadou and Ortony1989) and 2- to 4-year-olds (Crisafi & Brown Reference Crisafi and Brown1986) benefit from analogical transfer in simple problem solving paradigms. Furthermore, children from 3 to 4 years of age can, given sufficiently familiar domains, solve analogies for more complex “a is to b as c is to what?” type analogies (Goswami & Brown Reference Goswami and Brown1989; Reference Goswami and Brown1990; Rattermann & Gentner Reference Rattermann and Gentner1998a).
Results such as these have been taken as evidence that the crucial constraint on analogical development is the knowledge that the child has, not some kind of general structural change (e.g., Goswami Reference Goswami1992). As children's knowledge about the world becomes richer, they can better use this knowledge to form and understand analogies. It is worth noting that there is no inherent contradiction between domain-general changes in processing relational complexity and knowledge accretion. Indeed, domain-general accounts also acknowledge a strong role for knowledge accretion as a driving force in analogical development. However, a substantial difference between the positions of Halford and colleagues and that of Goswami is that the latter places a far greater importance on the development of relational representations and downplays the importance of maturational change in working memory capacity.
Observation 1: There is a strong relationship between accretion of relational knowledge and successful analogical reasoning (Goswami & Brown Reference Goswami and Brown1989; Rattermann & Gentner Reference Rattermann and Gentner1998a).
To investigate this knowledge accretion account, Goswami and Brown (Reference Goswami and Brown1989; Reference Goswami and Brown1990) developed a picture-based test similar to Piaget's a:b::c:d analogies, for use with 3- to 9-year-olds. However, unlike Piaget's experiments, Goswami and Brown used relations that were familiar to the young children, such as cutting (e.g., Playdoh is to cut Playdoh as apple is to cut apple). They found evidence of analogical completion in even the youngest children and found a correlation between analogical completion and an independent test of relational knowledge. They concluded that analogical reasoning is domain specific. Children start using analogies at different points in development as a result of the development of the appropriate knowledge representations. Further evidence for knowledge constraining analogical reasoning comes from studies investigating 4-year-olds' analogical transfer solving biological problems (Brown & Kane Reference Brown and Kane1988) and physical problems (Pauen & Wilkening Reference Pauen and Wilkening1997), and young children's analogical inferences (Inagaki & Hatano Reference Inagaki and Hatano1987; Vosniadou Reference Vosniadou, Vosniadou and Ortony1989).
Observation 2: Given that children's analogical reasoning depends on their underlying domain-specific knowledge, there is domain-specific as opposed to a domain-general change over development in children's ability to form analogies (Goswami & Brown Reference Goswami and Brown1989).
An additional developmental phenomenon is the extent to which children use analogy spontaneously. Ingaki and Hatano (1987), Goswami and Brown (Reference Goswami and Brown1989), and Pauen and Wilkening (Reference Pauen and Wilkening1997) all reported some degree of analogical transfer in the absence of any explicit guidance. Spontaneous transfer was demonstrated more systematically in 4-year-olds by Tunteler and Resing (Reference Tunteler and Resing2002), whose results not only suggest that spontaneous analogical transfer occurs even in young children, but also that it becomes more likely with increasing experience of the problem domain – again consistent with the knowledge accretion account. Evidence for spontaneous analogical reasoning is particularly interesting in that it further suggests that analogy is an emergent phenomenon (with important developmental implications considered later).
Observation 3: Analogical ability occurs spontaneously within a domain (Goswami & Brown Reference Goswami and Brown1989; Ingaki & Hatano Reference Inagaki and Hatano1987; Pauen & Wilkening Reference Pauen and Wilkening1997; Tunteler & Resing Reference Tunteler and Resing2002).
1.1.3. The relational shift and knowledge accretion
Gentner and colleagues (Gentner Reference Gentner1988; Reference Gentner, Vosniadou and Ortony1989; Gentner & Toupin Reference Gentner and Toupin1986; Gentner et al. Reference Gentner, Rattermann, Markman, Kotovsky, Simon and Halford1995) have suggested an account of the development of analogy based on Structure-Mapping Theory (SMT), which is discussed in more detail in section 1.2.1. A key component of this account is that children undergo a relational shift whereby their analogical reasoning changes over time from being initially based on the similarity of object attributes to gradually including relational information between objects and subsequently incorporating systems of relations. Gentner et al. (Reference Gentner, Rattermann, Markman, Kotovsky, Simon and Halford1995) proposed that this change results from children progressively re-representing relations. Children move from using first order predicate-argument representations, for example, darker (a, b) (i.e., a is darker than b) where the dimension, darkness, and the comparison, greater than, are conflated in the same representation, to using increasingly abstract relational representations that support more complex mappings (e.g., greater[darkness(a), darkness(b)] with the comparison clearly separated from the dimension).
Observation 4: Over development, children show a “relational shift,” changing their preference for judging similarity from surface similarity to relational similarity (Gentner Reference Gentner1988; Gentner & Toupin Reference Gentner and Toupin1986; Rattermann & Gentner Reference Rattermann and Gentner1998a).
Evidence for the relational shift comes from a number of areas, including a replication of the Goswami and Brown (Reference Goswami and Brown1989) study taking into account object similarity (Rattermann & Gentner Reference Rattermann and Gentner1998a), studies of children's metaphor comprehension (Gentner Reference Gentner1988), object similarity as a constraint on analogical problem solving (Holyoak et al. Reference Holyoak, Junn and Billman1984), and children's performance on cross-mapping tasks. We focus specifically on the last of these. In a cross-mapping task (see Fig. 1), children must make an appropriate analogical transfer in the presence of a conflict between object similarity and perceptual similarity. A number of experiments (see Gentner et al. Reference Gentner, Rattermann, Markman, Kotovsky, Simon and Halford1995) suggest that younger children find cross-mapping problems harder than older children, although the delay in cross-mapping performance may be driven by specific aspects of the task design and very rich, highly detailed stimuli. Goswami (Reference Goswami1995) demonstrated that even very young children (3 years old) can solve the kind of simple cross-mapping tasks involving relative size represented in Figure 1. Although changes in cross-mapping ability could arise from development in domain-general abilities such as working memory capacity, they are also consistent with the hypothesis that children become better at processing relational information with age.
Observation 5: Children can solve analogies even when there is a conflict between object and relational similarity (cross-mapping), although these analogies may be harder (Gentner et al. Reference Gentner, Rattermann, Markman, Kotovsky, Simon and Halford1995; Kotovsky & Gentner Reference Kotovsky and Gentner1996; Rattermann & Gentner Reference Rattermann, Gentner, Holyoak, Gentner and Kokinov1998b; although see Goswami Reference Goswami1995).
One interesting experimental finding is that the age at which children can solve cross-mapped analogies can be manipulated by teaching children relevant relational labels (Kotovsky & Gentner Reference Kotovsky and Gentner1996; Rattermann & Gentner Reference Rattermann, Gentner, Holyoak, Gentner and Kokinov1998b). With appropriate teaching of relational labels, a 3-year-old child can solve cross-mapped analogies normally only solved at 5 years or older. Similarly, Loewenstein and Gentner (Reference Loewenstein and Gentner2005, Experiment 3) found that preschool children who heard words for spatial relations performed better on a spatial mapping task than those who did not, and Goswami (Reference Goswami1995) also demonstrated that 3- and 4-year-olds can use the familiar relational structure from “Goldilocks and the Three Bears” (i.e., Daddy Bear>Mummy Bear>Baby Bear) to solve transitive mapping problems. Such evidence is strongly consistent with the knowledge accretion account, although the importance of relational labels is also consistent with Halford's (e.g., Halford et al. Reference Halford, Wilson and Phillips1998) relational complexity account of analogical development.
Observation 6: Provision of labels affects the formation of representations and subsequent performance in analogical reasoning tasks (Kotovsky & Gentner Reference Kotovsky and Gentner1996; Loewenstein & Gentner Reference Loewenstein and Gentner2005; Rattermann et al. Reference Rattermann, Gentner and DeLoache1990).
1.1.4. Indicators of discontinuous change
A very different approach to the development of analogical reasoning comes from dynamical systems theory (van der Mass & Molenaar Reference van der Maas and Molenaar1992). Hosenfeld et al. (Reference Hosenfeld, van der Maas and van den Boom1997) investigated whether the development of analogical reasoning could be understood as a phase transition (specifically a bifurcation) in a dynamical system. Their longitudinal study with eighty 6- to 8-year-olds looked for indicators of such a discontinuous change in children's performance on twenty geometric analogies. The authors found evidence for three such indicators, all occurring at approximately the same point in development. First, a rapid improvement in children's test scores was observed, consistent with a sudden jump in children's performance. Second, Hosenfeld et al. noted an increase in the inconsistency of children's responses, which they interpreted as an increase in anomalous variation typical of transitional dynamical systems. Third, they reported evidence of a critical slowing down in solution times around the time of the sudden jump.
Observation 7: During the course of analogical development, children display several indicators of a discontinuous change (Hosenfeld et al. Reference Hosenfeld, van der Maas and van den Boom1997): (i) sudden jump in correct responses, (ii) anomalous variation, and (iii) critical slowing down.
We have summarized the developmental literature into seven key phenomena that any successful theory of the development of early analogical reasoning must take into account. The primary focus of the following work is on simple analogies of the kind first solved by very young children (i.e., involving comparisons between pairs of relations) demonstrating these developmental markers. As such we, initially, do not consider some aspects of more complex analogical reasoning, (like the importance of relational complexity; Andrews & Halford Reference Andrews and Halford2002; Andrews et al. Reference Andrews, Halford, Bunch, Bowden and Jones2003). However, in sections 4 and 5 of the target article we consider more complex analogies and analogical mapping, and we will briefly return to the issue of the interrelationship between relational complexity and analogical reasoning at that point.
1.2. Models of analogical reasoning
Having reviewed behavioral data relevant to the development of analogical reasoning, we now turn to existing accounts and computer models of analogical reasoning in general and the extent to which such accounts can capture the developmental phenomena listed in the previous section. The following review is necessarily brief, but more extensive surveys of computational models of analogy can be found in Holyoak and Barnden (Reference Holyoak and Barnden1994), Barnden and Holyoak (Reference Barnden and Holyoak1994), and French (Reference French2002). We start with Structure-Mapping Theory (SMT) (Gentner Reference Gentner1983; Reference Gentner, Vosniadou and Ortony1989), which has probably been the most influential theoretical position in analogical reasoning for the past two decades (for other models strongly influenced by SMT, see Forbus et al. Reference Forbus, Gentner and Law1994; Keane & Brayshaw Reference Keane, Brayshaw and Sleeman1988; Keane et al. Reference Keane, Ledgeway and Duff1994; Larkey & Love Reference Larkey and Love2003). SMT is important because it illustrates several key aspects of many current theories of analogy. It has also led to a substantial body of empirical research (e.g., Bowdle & Gentner Reference Bowdle and Gentner1997; Clement & Gentner Reference Clement and Gentner1991; Markman & Gentner Reference Markman and Gentner1997).
1.2.1. Structure-Mapping Theory and related models
Structure-Mapping Theory is an account of how people use analogies to draw inferences (e.g., inferring that an electron rotates around a nucleus based on the more familiar knowledge that the planets rotate around the sun). For SMT, the actual attributes of objects such as color, size, and so on, are normally irrelevant; what is important are the relations between objects (such as Object A revolves around Object B). In SMT, mental representations are highly structured, being composed of predicates with arguments. Given this assumption, SMT distinguishes between object attributes, and relations between objects, at a purely syntactic level, with no regard for semantic content.
Analogical reasoning, under this account, involves first selecting a base domain from memory based on surface similarity, followed by creating a structural mapping between base and target. This process involves trying to put objects in the base in a one-to-one correspondence with objects in the target. Predicates between target objects are then matched with identical predicates in the base domain. Which relations are mapped from base to target are governed by a preference for systematicity among the relations, that is, a preference for higher order relations between relations. This preference determines what inferences will result from an analogy.
Structure-Mapping Theory has been implemented in the two-stage MAC/FAC (“many are called but few are chosen”) computational model (Forbus et al. Reference Forbus, Gentner and Law1994). This model is a compromise between the desire to have structural analogies and the computational requirements of searching through long-term memory. The MAC component economically selects a few candidates from a vast number of options in long-term memory using a nonstructural matcher based on the number of occurrences of a predicate in a candidate. The FAC stage then uses the much more computationally expensive Structure-Mapping Engine (SME) (Falkenhainer et al. Reference Falkenhainer, Forbus and Gentner1989) to choose the best structural match from the candidates. This two-stage process was designed to be consistent with the body of psychological research which suggests that nonstructural similarity constrains the retrieval of a base domain (e.g., Gick & Holyoak Reference Gick and Holyoak1983; Novick Reference Novick1988; but see Blanchette & Dunbar [Reference Blanchette and Dunbar2001; 2002] for a discussion of the role of other factors such as audience characteristics and goals on analogical reasoning, as well as the important distinction between analogical retrieval and generation).
Importantly, SMT has been used to simulate the development of analogical reasoning (Gentner et al. Reference Gentner, Rattermann, Markman, Kotovsky, Simon and Halford1995). The same process model was tested on an analogical task under different representational assumptions, mirroring Gentner et al.'s assumptions about the knowledge base of 3- or 5-year-old children. By changing how much higher-order relational information was represented, Gentner et al. simulated key differences in performance seen in experiments with 3- and 5-year-olds. SMT demonstrated successful cross-mapping for the older children – and demonstrated a relational shift. Furthermore, by varying the abstractness of representations, Gentner et al. were able to use SME to simulate the age-related change observed in children from making only within-dimension mappings (e.g., within the dimension size) to cross-dimension mapping (e.g., mapping greater size onto greater brightness).
A consideration of SMT is important because two central features of the model – explicit, structured representations and some form of structure mapping – appear in some form or another in many other models. This can be seen even in accounts that use seemingly fundamentally different architectures. For example, the Analogical Constraint Mapping Engine (ACME; Holyoak & Thagard Reference Holyoak and Thagard1989) and Learning and Inference with Schemas and Analogies (LISA; Hummel & Holyoak Reference Hummel and Holyoak1997) are both hybrid connectionist/symbolic systems. Yet, they both focus largely on the process of mapping between base and target domains, incorporating multiple constraint satisfaction of semantic information as well as structural information. ACME (as SMT) starts with predicate-argument representations. Three constraints then govern the mapping process: structural, semantic, and pragmatic. These constraints are intended to act together as pressures to make local decisions about correspondences between elements in order to produce a psychologically plausible global mapping. LISA is an attempt to build upon ACME to incorporate more aspects of traditional connectionist models. Key to the working of LISA is that relevant objects and roles are bound via temporal synchrony (Shastri & Ajjanagadde Reference Shastri and Ajjanagadde1993). For example, the proposition loves(Jim, Mary) is represented in LISA's semantic units by the patterns for Jim and Lover being active simultaneously while the patterns for Mary and Beloved are independently synchronously active. Hummel and Holyoak (Reference Hummel and Holyoak1997) hypothesize that maturation in the number of concepts that can be simultaneously processed could explain the development of analogy. Halford et al. (Reference Halford, Wilson, Guo, Gayler, Wiles, Stewart, Holyoak and Barnden1994; Reference Halford, Wilson and Phillips1998) make similar developmental claims from a different Structured Tensor Analogical Reasoning (STAR) symbolic connectionist approach (see also Eliasmith & Thagard Reference Eliasmith and Thagard2001).
The Copycat models (Hofstadter Reference Hofstadter1984; Mitchell Reference Mitchell1993) are another example of hybrid accounts of analogical reasoning. These models involve a long-term memory, interconnected nodes forming a semantic network, and a working memory where structures representing the analogical problem are built. These models also consist of “codelets” or “agents” that cooperate and compete to construct descriptions of relationships between objects. Copycat differs from other models by actively constructing new representations. The representations are influenced by both top-down and bottom-up processes, motivating Hofstadter to describe Copycat as “high level perception” (for similar accounts, see also Barnden Reference Barnden, Barnden and Holyoak1994; French Reference French1995; Kokinov & Petrov Reference Kokinov, Petrov, Gentner, Holyoak and Kokinov2001).
1.2.2. Distinguishing features of models
A useful characterization of models of analogical reasoning might split them according to how information is represented. Thus, models such as SME, ACME, LISA and STAR, which all use structured predicate-argument type representations as input to the model, may be contrasted with models such as Copycat, that start with less structured representations. In the former type, representation formation is an integral component of the modeling process, whereas in the latter, analogy is viewed as more closely related to perceptual processes.
2. Analogy as relational priming
Although explicit mapping appears to be a near ubiquitous feature of the theories and models surveyed above (e.g., Gentner Reference Gentner1983; Holyoak & Hummel Reference Holyoak, Hummel, Gentner, Holyoak and Kokinov2001; Kokinov & French Reference Kokinov, French and Nadel2003; Novick Reference Novick1988), there is evidence to suggest that it may not be necessary for successful analogical reasoning. For instance, Ripoll et al. (Reference Ripoll, Brude and Coulon2003) demonstrated a disassociation between analogical mapping and analogical transfer. Explicit mapping accounts (e.g., SMT) predict that analogical transfer occurs only after a mapping is established between the base and the target. Consequently, if analogical transfer requires prior mapping, then the reaction time of analogical transfer should increase with reaction time for mapping. Ripoll et al. (Reference Ripoll, Brude and Coulon2003) found that while reaction time increased in a cross-mapping condition when participants were asked to perform a mapping task, reaction time did not increase when participants were tested for analogical transfer. Therefore, there is reason to consider alternatives to explicit mapping as the fundamental mechanism behind the development of analogy-making abilities.
In the following sections we describe the two mechanisms that form the backbone of our developmental account of analogy-making. First we suggest that priming is centrally implicated in analogical reasoning. Second, we propose that relations are best represented as transformations between states, rather than as explicit symbols.
2.1. Analogical reasoning as priming
Many cases of analogy involve seeing the similarity between the prominent relation in one domain and the prominent relation in a second domain. Within traditional approaches to analogical reasoning, the cognitive mechanism for an a is to b as c is to …? analogy involves mapping the a term onto the c term and the b term onto the unknown d term and then transferring across the relation between the a and b terms to the c and d terms. We suggest that this is accomplished via a simpler mechanism based on relational priming. Put most simply, we propose that exposure to the a (e.g., puppy) and b (e.g., dog) terms of an analogy primes a semantic relation (e.g., offspring), which then biases the c term (kitten) to produce the appropriate d term (cat).
What evidence is there for this proposal? First, medium- to long-term semantic priming is a ubiquitous phenomenon that is well established in the adult literature (e.g., Becker et al. Reference Becker, Moscovitch, Behrmann and Joordens1997; see also Chapman et al. [1994] for a review of priming in children). Second, several studies now suggest a general role for priming in analogical reasoning. For example, Schunn and Dunbar (Reference Schunn and Dunbar1996) presented participants with a biochemistry problem on one day and with a genetics problem on the following day. Although the two problems were different, their solutions both involved inhibition. Participants were significantly more likely than controls to propose an inhibition solution to the genetics problem following the biochemistry problem. However, the participants did not mention the prior biochemistry problem either during the experiment or in a post-task questionnaire. Consequently, the authors explained the results as implying a form of priming. Kokinov (Reference Kokinov1990) also demonstrated priming in analogical problem solving. Prior to being given a difficult target problem (e.g., heating water in a forest), participants were primed with a different analogical problem whose commonsense solution was well known to them (e.g., heating a cup of tea in a mug). The performance of the participants rose substantially immediately after priming, returning to control levels after 24 hours.
Third, recent studies have repeatedly demonstrated the presence of relational priming under a variety of different conditions (Estes Reference Estes2003; Estes & Jones Reference Estes and Jones2006; Gagne Reference Gagné2001; Reference Gagné2002; Gagne & Shoben Reference Gagné and Shoben1997; Gagne et al. Reference Gagné, Spalding and Ji2005; Gerrig & Murphy Reference Gerrig and Murphy1992; McKoon & Ratcliff Reference McKoon and Ratcliff1995; Spellman et al. Reference Spellman, Holyoak and Morrison2001; Wisniewsky & Love 1998), including relational priming resulting directly from analogical reasoning (Green et al. Reference Green, Fugelsang and Dunbar2007). In these studies, prior exposure to a relation (e.g., by presentation of two nouns joined by a relation, such as apple and cake) facilitates subsequent judgments involving that relation (e.g., made of).
In summary, semantic priming effects are commonly reported across many areas of cognitive psychology (see Tulving & Schacter Reference Tulving and Schacter1990). More importantly, relational priming is a robust psychological phenomenon that does not require explicit strategic control. Consequently, relational priming is a choice candidate mechanism for a developmental account of analogy, emerging from simple memory processes. The ubiquity of priming effects in development suggests that it is a plausible building block for a theory that posits the emergence of analogical completion from simple cognitive mechanisms (for a similar view of the relation between inhibition – negative priming – and cognitive development, see Houdé Reference Houdé2000). Indeed, a strength of a relational priming account of analogical reasoning is that it does not posit analogy-specific mechanisms.
2.2. Representation of relations
Some models of analogical reasoning (e.g., Copycat) attempt to integrate analogical reasoning with lower-level perceptual representations. Other models (SME, LISA) use explicitly structured predicate and argument type representations. The issue of how object attributes and relations are represented is important for modeling analogy because it constrains what else can take place; for example, mapping is one obvious way to compare systems of explicit structured representations (e.g., predicates). However, there are serious concerns about how these predicate representations can be acquired (see Elman et al. Reference Elman, Bates, Johnson, Karmiloff-Smith, Parisi and Plunkett1996; Shultz Reference Shultz2003). Hence, predicate and argument representations may not be appropriate for a developmental account of analogical reasoning.
In fact, within the context of analogy, relations need not be represented in predicate/argument terms at all. For the purposes of analogy it may be sufficient to conceptualize relations as transformations between items. This has the important advantage that relations may be learned via well-understood (connectionist) procedures. Moreover, there are notable precursors in the literature for viewing relations as transformations. For instance, Thomas and Mareschal (Reference Thomas, Mareschal, Shafto and Langley1997) and Hahn et al. (Reference Hahn, Chater and Richardson2003) argue that transformations underlie similarity judgments and, by extension, analogical reasoning. One such account, the metaphor as pattern completion model (MPC) (Thomas & Mareschal Reference Thomas and Mareschal2001; Thomas et al. Reference Thomas, Mareschal, Hinds, Moore and Stenning2001), is particularly relevant because of its strong focus on development and because it simulates the emergence of metaphor, closely related to analogy. Similarly, Rogers and McClelland (Reference Rogers and McClelland2004) present an account of the development of semantic cognition that proposes that relations are transformations. Finally, viewing relations as transformations in a semantic space suggests that “relational similarity” might be a performance factor in analogical completion; this is indeed the case, at least in adult participants (Leech et al. Reference Leech, Mareschal and Cooper2007).
3. A model of the Goswami and Brown paradigm
The seven key developmental phenomena we detailed in earlier sections can all be seen in variants of the “a is to b as c is to what?” analogies, such as the Goswami and Brown (Reference Goswami and Brown1989; Reference Goswami and Brown1990) paradigm. According to Sternberg (Reference Sternberg1977a), “a is to b as c is to what?” analogies incorporate the core information processing components required for analogical reasoning. Therefore, to facilitate comparison with the developmental evidence, we have embodied our two central theoretical tenets (relational priming and relations as transformations) in a connectionist model of the Goswami and Brown paradigm. We will first describe the task and model and then present a number of specific simulations exploring this paradigm, before finally presenting a simulation which takes a step back and tentatively suggests how the underlying principles of the specific model could form part of a much more general explanatory relational priming framework.
3.1. The Goswami and Brown paradigm
The Goswami and Brown forced choice task involves children selecting a picture to complete an analogical sequence involving simple causal relations (see Fig. 2). After seeing three pictures (e.g., bread, cut bread, and lemon) the child is given four response options: (a) the analogically appropriate response (e.g., cut lemon); (b) the correct transformation applied to the wrong object (e.g., cut cake); (c) the wrong transformation applied to the correct object (e.g., squeezed lemon); and (d) an object-similarity match (e.g., yellow balloon).
Consistent with the Goswami and Brown paradigm, the model focuses on simple causal domains (e.g., cutting, melting, turning on, burning) such as those used to test young children (Goswami & Brown Reference Goswami and Brown1989; Reference Goswami and Brown1990; Rattermann & Gentner Reference Rattermann and Gentner1998a). In these tasks, common objects (e.g., apples, bread) are transformed by a causal agent (e.g., knife), as when an apple is cut by a knife. The event sequence experienced by the network in the example would be: first, to be presented with representations of an apple and a knife, and then, with representations of a cut apple and a knife. The task of the network is to learn the transformation from apple to cut apple in the context of knife, which, consistent with our theoretical assumption about transformations and relations, is equivalent to learning the relation cutting. Once the network has learned such relations, analogical completion may be modeled by first exposing the network to the a and b terms of the analogy (e.g., apple and cut apple), thus priming a relation (in this case cutting); and then presenting the network with the c term of the analogy (e.g., bread). The network should then settle into a state consistent with the product of the c term and the primed relation (in this case, cut bread).
3.2. The model
3.2.1. Network architecture
Figure 3 shows the architecture of the connectionist network used to model both the acquisition of relational information and the completion of analogies within the Goswami and Brown paradigm. All network weights are bidirectional and symmetrical, thereby enabling the flow of activation in all directions throughout the network. The bottom layer (roughly corresponding to the input layer) is split into two banks of units, representing the presentation of two different objects in a “before,” or pre-transformation, state. The two different banks of units correspond to an object [i.e., Object(t1)] that can be transformed (e.g., apple) and a causal agent object [i.e., CA(t1)] that causes the transformation (e.g., knife). Similarly, the upper layer (corresponding to the output layer) is split into two banks, representing the same two objects [i.e., Object(t2) and CA(t2)] in their “after,” or post-transformation, state. In the current simulations, we assume that objects are encoded in terms of perceptual features only (e.g., shape, size, color) at both input and output.
One can think of the “before” and “after” representations as two temporally contiguous states of the world. Because both the “before” and “after” states can be obtained by direct observation of the world, learning of relational information does not require an external teacher, and constitutes a form of self-supervised learning (Japkowicz Reference Japkowicz2001). Both the “before” and “after” states of the object representations [i.e., Object(t1) and Object(t2)] and the hidden layer have 40 units each. The “before” and “after” states of the causal agent [CA(t1) and CA(t2)] have 4 units each. The activation of any unit varies according to a sigmoidal activation function from 0 to 1 (this is sometimes referred to as an “asigmoid activation” unit; Shultz Reference Shultz2003). The initial weights are randomized uniformly between ±0.5.
Because of the bidirectional connections, input activation can cycle throughout the network before settling into a stable attractor state. During training, contrastive Hebbian learning is used to change the connection weights such that attractors on the output units coincide with target output states of the network (for an explanation of contrastive Hebbian learning and a presentation of the benefits – in terms of biological plausibility – of this algorithm with regard to other supervised training algorithms, see O'Reilly Reference O'Reilly1996; O'Reilly & Munakata Reference O'Reilly and Munakata2000). As with the better-known backpropagation training algorithm, contrastive Hebbian learning creates internal representations across the hidden units to solve complex problems. As the name suggests, within the learning algorithm weight changes are calculated locally as the difference between a Hebbian and an anti-Hebbian term. These terms correspond to different states of activation of a unit. Contrastive Hebbian learning requires two phases of activation during training. The first phase, the minus phase, involves clamping some of the units [e.g., the Object(t1) units] to a desired pattern and letting the remaining units' activation spread through the network (we used five activation cycles between each weight update). For example, in Figure 3, the Object(t1) and CA(t1) units are clamped, and the hidden units and Object(t2) and CA(t2) are free to change and settle into a stable state. The resulting activation state of Object(t2) is taken as the response the network arrives at for a given input. In the second phase, the plus phase, all the external units (inputs and outputs) are clamped on. Only the hidden units' activation settles into a stable state, constrained by all the external units [i.e., the Object(t1) and Object(t2) and CA(t1) and CA(t2) units]. The state of activation of the plus phase corresponds to the desired activation of the network given a certain input.
Contrastive Hebbian learning uses the difference between the plus and minus phases to update the connection weights as follows:
where α is a learning-rate parameter (set to 0.1 in all simulations reported here), x and y are the activations of two interconnected units, and the superscripts distinguish between the values of the plus and minus activation phases. As learning proceeds, the difference between the weights in the plus and minus phases reduces as the activation in the minus phase comes to replicate that in the plus phase.
3.2.2. Training: The learning of causal relations
In the current model, networks were trained on input patterns produced on-the-fly by adding Gaussian noise (μ=0.0, σFootnote 2=0.1) to prototypes selected at random from a predefined pool of 20 different possible Object(t1) prototype patterns, and 4 different CA(t1) prototype patterns. The prototypes consisted of randomly generated input vectors with slot values within the range [0, 1], and where each vector slot value was set to 0 with a probability of p=0.5. Slot values were set to 0 in the prototypes to increase the sparsity of external representations, whereas the addition of noise to the prototypes was intended to capture the fact that although two instances of, for example, cutting an apple with a knife are similar, they are not identical.
Four transformation vectors were also randomly generated but were set to have a Euclidean distance from every other transformation of less than 10. The transformation vectors encode the relation between the pre- and post-transformation states of the object. In fact, the transformed state of the object, Object(t2), is obtained by adding a transformation vector to Object(t1). For example, Object(t1) (e.g., apple): [0.5 0.0 0.2 0.0 0.8 0.2 0.0 0.4] might be transformed by the vector (e.g., cut): [−0.4 0.0 0.0 0.0 0.0 0.7 0.0 0.0], resulting in Object(t2) (cut apple): [0.1 0.0 0.2 0.0 0.8 0.9 0.0 0.4]. Note that although the transformation vector is used to generate the target pattern corresponding to any particular input, the transformation vector itself is never presented to the network. Different objects (e.g., bread or apple) transformed by the same relation (e.g., cut) are transformed by the same vector. Thus, the network can learn about a particular transformation by generalizing across sets of Object(t1)/Object(t2) pairings that are affected by that transformation.
In the model, CA(t1) represents a causal agent (e.g., knife) which when presented concurrently with certain (but not all) objects at Object(t1) (e.g., apple), leads to a transformed Object(t2) representation (e.g., cut apple). Consequently, the target pattern for the Object(t2) depends on CA patterns. In the simulations presented here CA(t1) always remains the same at CA(t2) (i.e., CA is never transformed). Training consists in randomly selecting an object and a causal agent, computing the transformed state, Object(t2), and updating the weights such that the actual Object(t2) state produced by the network approaches the target Object(t2). The partitioning of banks of units into object and causal agent layers is actually a property of the training regime, not of the network architecture. More complex training environments could also lead to a change in the state of CA at t2 (e.g., knife at t1 to wet knife at t2).
Each of the 20 Object(t1) representations can be affected by 2 of the 4 causal agents (and thus 2 of the 4 transformations). When an object is presented in conjunction with one of the remaining 2 causal agents, the target Object(t2) pattern is the same as the untransformed Object(t1) pattern. Thus, whereas the causal agent knife transforms apple to cut apple, the causal agent water (for example) has no affect on apple. Equally, whereas the causal agent knife transforms apple to cut apple, the causal agent knife has no affect on rock. Hence, the presence of the causal agent alone is not a predictor of whether a transformation will occur. Given this organization, there are 360 potential analogies (20 objects×2 causal agents×9 other objects that can be affected by the same causal agent) on which the network may be tested.
3.2.3. Testing: Analogical completion
The testing of analogy completion proceeds in a different way from the learning of relation information. As we have stressed, priming is fundamental to our account of analogical completion. It occurs in the network because the bidirectional connections allow the hidden and “after” layers to maintain activity resulting from an initial event. The activity that is maintained in the network biases how new external input is then subsequently processed.
To illustrate how activation-based priming and pattern completion combine to complete analogies, we consider the archetypal case of a:b::c:d analogies. First, the units are clamped with the representation of apple at Object(t1) and cut apple at Object(t2), while CA(t1) and CA(t2) are initially set to 0.5, the resting value. This corresponds to being presented with the information apple:cut apple (i.e., the first half of an a:b::c:d analogy). The causal agent is not presented to the network at any point during testing. The network settles into the attractor by filling in CA(t1) and CA(t2) and arriving at hidden unit activations consistent with the transformation cutting. Following this, the Object(t1) and Object(t2) units are unclamped, and a second pattern, corresponding to bread, is presented to Object(t1) and nothing presented to Object(t2). CA(t1) and CA(t2) are initially presented with resting activation patterns and then unclamped. This corresponds to being presented with the information bread:? (i.e., the second probe-half of the a:b::c:d analogy). By unclamping the original object and causal agent units and by presenting a different Object(t1) pattern, the network is no longer in equilibrium and settles into a new attractor state. During training, the network has encoded, in the connections to and from the hidden layer, the similarities in the transformations corresponding to relations such as cutting. Consequently, the prior priming of the apple and cut apple transformation biases the network to settle into the attractor state consistent with the transformation cutting, which gives the cut bread pattern at Object(t2). The network has now produced the appropriate response at Object(t2) to complete the analogy (i.e., apple:cut apple::bread:cut bread).
3.2.4. An example of developing analogical ability
In the Goswami and Brown paradigm, children are presented with the a:b::c terms of the analogy and four response options. Figure 4 shows, at three different stages of learning, the sum of squared distance (SSD) between the actual output of the network when tested on the bread:cut bread: apple:..? analogy and four possible trained Object(t2) patterns as activation propagates throughout the network over five cycles. The lower the y-axis value, the closer the actual activation is to that possible output pattern. Consequently, Figure 4 shows which of the four objects that the network has been trained on is closest to the network's actual response after different amounts of training. The four Object(t2) target patterns (taken from the Goswami and Brown paradigm) that the network's response is compared with are: (1) the analogically appropriate transformed object (i.e., cut apple); (2) a possible Object(t2) which is perceptually identical to the Object(t1) representation (e.g., apple); (3) the Object(t1) changed by an inappropriate transformation (e.g., bruised apple); and (4) a different Object(t1) pattern transformed by the correct transformation (e.g., cut banana).
After 100 epochs of training (Fig. 4a), the network is unable to complete the analogy appropriately. Instead its output is closest to apple (i.e., the object similarity response). After 2,000 epochs of training (Fig. 4b), the output is ambiguous, equally close to both apple and cut apple. After 5,000 epochs of training (Fig. 4c), the network settles into the appropriate state (i.e., cut apple).
3.2.5. An example of non-analogy
To infer correct analogical completion, it is not enough to demonstrate that it is occurring in the appropriate context. It is also important that the network does not produce an analogical response when it is not appropriate. Consistent with the results in Figures 4a–c, it could be the case that the network has developed the attractor corresponding to cut apple with a basin so wide that the activation settles into it whenever the network is presented with apple. However, consideration of the performance of the network after 5,000 epochs of training (Fig. 4d) demonstrates that this is not the case. Here, the network was presented with bread at Object(t1) and the untransformed bread pattern at Object(t2). Subsequently, Object(t1) was clamped to apple and the network allowed to settle. The resulting activation state was consistent with the Object(t2) for the untransformed apple pattern. Thus, when primed with a non-transformation example, the network appropriately produces the non-analogical response.
3.3. Simulating the developmental markers of analogical completion
Having established that the model captures the broad-brushed developmental profile of children's analogical completion, we now consider how the model captures the seven key more detailed developmental phenomena highlighted above.Footnote 1
3.3.1. The relationship between knowledge accretion and successful analogical reasoning
The thick line in Figure 5 shows the network's performance when tested on all 360 possible analogies across training. An analogy is assumed to have been successfully completed if the sum of squared difference between the actual activation and the analogically appropriate target is lower than the sum of squared difference between the actual activation and each other possible response. After 100 epochs of training, less than 20% of analogies are completed successfully. However, by 5,000 epochs of training, the network produces the analogically appropriate response for almost 100% of possible analogies. The thin line in the same figure shows the mean sum of squared error at Object(t2). This is a measure of how well the network has mastered the causal domain on which it is trained. The proportion of analogies correct and sum of squared error are strongly negatively correlated (Spearman's ρ=0.99; p < 0.001).2 Thus, consistent with developmental evidence, the network's performance on analogical completion is highly correlated with its domain knowledge of causal transformations.
The relation between domain knowledge and analogical completion may also be shown by extracting the causal agent responsible for a given transformation. Goswami and Brown (Reference Goswami and Brown1989) tested relational knowledge by asking children to choose the casual agent responsible for transforming different objects. This may be tested in the network by prompting it with apple and cut apple at Object(t1) and Object(t2) and resting patterns at CA(t1) and CA(t2). The network should then produce the appropriate causal agent (i.e., knife) at CA(t1) and CA(t2). This ability is important because it demonstrates that the analogical completion observed in the network is not simply a matter of forming a simple input-output (or stimulus-response) link.
Figure 6 presents the performance of the network at extracting the causal agent over training. The proportion of correctly produced causal agents closely correlates with performance on analogical completion (Spearman's ρ=0.508; p < 0.001). This strong correlation mirrors the results obtained by Goswami and Brown (Reference Goswami and Brown1989) with young children.
3.3.2. Domain-specific change in children's ability to reason analogically
Domain specificity is a natural corollary of the strong and drawn out relationship between relational knowledge and analogical ability – as the network gradually becomes able to master relational knowledge in different domains it will acquire the ability to solve analogies in that domain. Figure 7 illustrates this phenomenon by showing the network's performance over training when a single object is tested on two analogies involving distinct causal transformations. For this example, the network solves one analogy over 2,500 epochs earlier than the other. Again, this is consistent with the developmental literature: the profile parallels the analogical performance observed with children, suggesting that the ability to solve analogies in different domains arises at different points in development.
3.3.3. The spontaneous production of analogical completion
As noted above several developmental studies have shown that children use analogies without explicit teaching (e.g., Goswami and Brown Reference Goswami and Brown1989; Ingaki & Hatano 1987; Pauen & Wilkening Reference Pauen and Wilkening1997; Tunteler & Resing Reference Tunteler and Resing2002). The network mirrors this performance. At no point is the network trained on an example of an analogy; instead the network is only trained on transformations. Nor does the network have any dedicated architecture for performing analogy. Analogical completion is an emergent phenomenon resulting from the way relational information is represented and the way analogies are tested.
3.3.4. A shift in analogical judgment from surface similarity to relational similarity
Children demonstrate a relational shift over development. They appear to move from judging similarity in terms of object features to judging similarity on the basis of relational similarity in analogy tasks (Rattermann & Gentner Reference Rattermann and Gentner1998a). To investigate whether the network undergoes a relational shift over training we compared the types of errors produced by the network with those made by children. Two of these types of errors are: object-similarity errors (where the network responds at Object(t2) with the same pattern as at Object(t1)) and wrong transformation errors (where the network produces the appropriate object at Object(t2) but transformed by a non-primed causal agent).
Object-similarity errors and wrong transformation errors are the kinds of errors predominantly made by 4- to 5-year-old children (Rattermann & Gentner Reference Rattermann and Gentner1998a). Table 1 presents a comparison of children's analogical completion over development and the network's analogical completion over training. The network provides a reasonable approximation of the children's response profiles. Importantly, it shows the same shift over training with a considerable decrease in the proportion of appearance responses (from 22.4% to 2.5%). This is matched with an increase in correct responses (i.e., correct transformation responses) from 39.3% to 63.5%. Such behavior is consistent with the relational shift phenomenon in which children produce more transformation-based analogies as they get older.
Note. The children's data are taken from Rattermann and Gentner (Reference Rattermann and Gentner1998a)
Bootstrap re-sampling tests (Efron & Tibshirani Reference Efron and Tibshirani1998) were used to compare the mean performance for children in each cell to the distribution of means found by repeatedly sampling subsets of the individual networks' responses at 2,300 and 2,800 epochs. Children and the model do not differ on either correct response (p > 0.1) or object similarity (p > 0.1). The networks produce significantly fewer wrong transformation errors than the children (p < 0.01), however, the difference in means between 4- and 5-year-old children on wrong transformation responses does not differ significantly from the differences in network performance at 2,300 and 2,800 epochs (p>0.1).
3.3.5. The effect of relational labels on analogical performance
Gentner and her colleagues (Kotovsky & Gentner Reference Kotovsky and Gentner1996; Loewenstein & Gentner Reference Loewenstein and Gentner2005; Rattermann et al. Reference Rattermann, Gentner and DeLoache1990) have repeatedly found that prior training or exposure with appropriate relational labels facilitates children's analogy-making. A variant of the network shows a parallel effect on analogical completion when trained with an analogue of relational labels.
To model the effects of labeling on analogical completion the network architecture was modified to include two additional layers of four relational units: RL(t1) and RL(t2) (see Fig. 8). These were bidirectionally connected to the hidden layer (as are the CA(t1) and CA(t2) layers). During training, each of the relational label units uniquely coded for a transformation, thereby labeling that relation. For example, when the network was presented with apple and knife it was also presented with the relational label cutting (e.g., RL(t1)=[1 0 0 0], whereas the relational label bruising might be RL(t1)=[0 1 0 0]). Therefore, the RL(t1) and RL(t2) served to uniquely identify a transformation irrespective of the Object(t1) and Object(t2), and as opposed to the CA(t1) and CA(t2), which as in the original simulations were ambiguous. Although the addition of the RL units simplifies the learning task confronting the network, it provides no additional information that is not already conjointly present in the object and causal agent layers.
There was a marked difference in analogical completion over training with the addition of relational labels (Fig. 9). On average, relational labels resulted in the network completing analogies earlier. To assess whether this difference is statistically meaningful, we found the maximum value of the first derivative of the fitted curves for each individual network – this value corresponds to the sudden jump in performance for each network (See Sect. 3.3.6). The median number of epochs before the maximum slope was 1,830 when labels were supplied and 2,665 when labels were not supplied. A Wilcoxon signed-rank test indicated that this difference was highly significant, p<0.0001. Despite the substantial difference in performance across relational label conditions, both early (prior to 1,500 epochs) and later (after 4,000 epochs) analogical completion were similar to completion without relational labels. Thus, the presence of relational labels does not change the overall developmental picture. What it does is move forward developmentally the point at which there is a sudden shift in the ability to complete analogies. Again, this behavior mimics the performance observed experimentally with children. The role that the label plays here is to provide clearer, more consistent, task relevant constraints. The network's performance is accelerated because it can make use of this more consistent cue.
3.3.6. Indicators of a discontinuous change
The network also exhibits all three indicators of discontinuous change in children's analogical reasoning abilities identified by Hosenfeld et al. (Reference Hosenfeld, van der Maas and van den Boom1997). First, Hosenfeld et al. reported a sudden jump or rapid acceleration in the proportion of children's correct responses between test sessions. The network shows a similar rapid acceleration in performance at an average of 2,230 epochs of training (Fig. 10a). This figure also shows the first derivative (rate of change) of analogy performance. The first derivative reveals not only the sharp discontinuity of the sudden jump, but also the complex developmental trajectory. In particular we see reductions in the rate of change of analogical performance surrounding the sudden jump, and also that the model exhibits secondary, less extreme, sudden jumps in analogical performance.
Hosenfeld et al. (Reference Hosenfeld, van der Maas and van den Boom1997) also demonstrated an increase in the inconsistency of children's responses occurring alongside the sudden jump in correct performance. One way to assess the inconsistency of the network's response is to present the same network twice with noisy versions of the same analogy. The percentage of analogies that are not completed in the same way is a measure of the network's inconsistency at a given training epoch. Figure 10b shows the percentage of inconsistent responses calculated on presentation of each analogy twice to 50 networks. Figure 10b reveals that the inconsistency of the network's performance undergoes a rapid increase and peaks at around 2,500 epochs, shortly after the onset of the sudden jump. A nonparametric correlation comparing the epoch of the maximum value for the first derivative of accuracy (the sudden jump) with the maximum first derivative for the inconsistency scores across individual networks reveals a strongly significant relationship (Spearman's ρ=0.54; N=50; p<0.001).
Finally, Hosenfeld et al. (Reference Hosenfeld, van der Maas and van den Boom1997) also found a critical slowing down in children's solution times accompanying the onset of the sudden jump in correct responses. If we take the number of cycles necessary before the network settles into its final response as a measure of solution time, then the same pattern can be observed with the behavior of the network. Figure 10c shows the mean number of activation cycles required to settle over training. The number of cycles peaks around 2,300 epochs, in the neighborhood of the “sudden jump,” and across networks this correlates significantly with the occurrence of the sudden jump (Spearman's ρ=0.285; N=50; p<0.05).
3.3.7. The relative difficulty of cross-mapped analogies
Children (and adults) can complete analogies appropriately even when there is a strong conflict between object similarity and relational similarity (e.g., Gentner et al. Reference Gentner, Rattermann, Markman, Kotovsky, Simon and Halford1995). This is most clearly shown in cross-mapping situations, where the same object appears in both the base and the target but with a different role. In repeated studies (see Gentner et al. Reference Gentner, Rattermann, Markman, Kotovsky, Simon and Halford1995), children have solved cross-mapped analogies. However, these analogies are hard, and older children perform considerably better than younger children.
Because the current model is designed to capture performance in the Goswami and Brown (Reference Goswami and Brown1989; Reference Goswami and Brown1990) and Rattermann and Gentner (Reference Rattermann and Gentner1998a) studies, with analogies consisting of two objects, the exact cross-mapping experiments presented in Rattermann et al. (Reference Rattermann, Gentner and DeLoache1990) and Gentner and Toupin (Reference Gentner and Toupin1986) cannot be directly simulated. However, Figure 11 illustrates how the network can be tested on an analogue of the three-object cross-mapping experiments. There are two important aspects of the analogy presented in Figure 11: first, identical circles (B and R1) have different roles (B is larger than A, whereas R1 has the same role as C); and second, there is response competition between the literal similarity response (R1) and the relational response (R2). Thus, analogies of the type presented in the figure constitute a genuine test of cross-mapping.
We trained a modified network on a different set of stimuli in order to simulate a cross-mapping situation. The training environment consisted of three objects composed of nine units and three causal agents and associated transformations. Each object could be transformed when presented in conjunction with two out of the three causal agents. Training and testing were conducted in the same way as in the other simulations with the same network parameters and architecture. Crucially, one object vector when transformed by one causal agent was identical to a different untransformed object, leading to the potential for response competition. Given its environment, the network could be tested on six possible analogies, with one cross-mapped analogy.
Figure 12 shows the results for cross-mapped and non-cross-mapped analogies, averaged across 20 replications. After 15,000 epochs of training the cross-mapped analogy was completed almost with complete accuracy. However, the networks failed to complete any cross-mapped analogy in any replication before approximately 9,900 epochs. This is in contrast to the non-cross-mapped analogies, which were solved substantially earlier in training, reaching close to 100% performance after approximately 6,400 epochs of training. The maximum first derivative occurred 950 epochs earlier for the non-cross-mapped analogies than for the cross-mapped analogies (Wilcoxen signed-rank test: sum of positive ranks=105; N=20; p<0.001).
In summary, the network is able to disregard all object similarity to solve analogies even when the same object representation occurs in different roles in the base and the target. The results also suggest that cross-mapped analogies are harder for the network to learn (although how difficult a cross-mapped analogy is to solve still depends on how easy it is for the network to learn the appropriate transformations). Both of these aspects of the model's behavior are consistent with the developmental evidence: that children can solve cross-mapped analogies (consistent with Goswami Reference Goswami1995) but that they are more difficult than other similar analogies.
3.4. The development of representations within the model
Many of the parallels with developmental phenomena observed in the network's performance arise from the interaction between the contrastive Hebbian learning algorithm and the regularities in the training set. It is worth considering how this interaction affects the network's internal representations over training.
The learning task facing the network can be seen in terms of two distinct processes: auto-association [replicating the pattern of activation at Object(t1) at Object(t2)] and transformation [producing a transformed version of the Object(t1) activation at Object(t2)]. In fact, auto-association can be viewed as a transformation encoded by a vector filled with 0s. Because this auto-association (or null transformation) occurs more frequently than the other transformations, the network initially learns auto-associations, replicating the activation at Object(t1) at Object(t2) and ignoring the transformations. It is this aspect of the learning algorithm that results in object similarity errors early in training instead of the less frequent transformation responses. Later in training, the network has learned to perform both auto-association and transformations. Consequently, the learning algorithm encodes the transformations given the appropriate conjunctively active CA(t1) activation pattern. This allows the network to build up more complex internal representations which support the production of the desired activation pattern at Object(t2) and which are a prerequisite for the appropriate completion of analogies.
The shift in strategy is also observed in the changes that take place in the network's internal representations at different points in training. Although the network is made up of bidirectional connections, information about a state at a given time [i.e., Object(t1) and CA(t1)] must pass through the hidden layer before it has any impact on the Object(t2) and CA(t2) response units. Hence, the hidden units provide an area in which information about objects and causal agents can be combined. The network's output [i.e., Object(t2) and CA(t2)] is driven by how the network organizes (represents) the object and causal agent information at the hidden layer.
Figures 13 and 14 show the location, in the first two principal components space, of the hidden unit activations for each possible Object(t1) and CA(t1). The grey ellipses illustrate the clustering of hidden unit activation following presentation of every pattern at Object(t1) alongside each pattern at CA(t1). After 100 epochs of training (Fig. 13), the hidden unit representations group the inputs according to the pattern at Object(t1). For example, the 1s (i.e., 1a, 1b, 1c, 1d) are grouped together, as are the 2s, the 3s, and so on.
Figure 14 shows the same analysis after 5,000 epochs of training (i.e., after the behavioral spurt observed at around 2,400 epochs). This suggests that the hidden units no longer group the inputs on the basis of Object(t1) features. Instead, the network has developed more complex internal representations. The different shaded ellipses illustrate the clusterings of hidden unit activations for the different causal agents (a–d). There is considerable overlap for the clusterings corresponding to causal agents a–d. The overlap reflects the fact that the 2-dimensional display is no longer sufficient to illustrate the complexity of the separation that is embedded in a 40-dimensional space.
We used a k-means cluster analysis technique to explore the representations that exist in this higher dimensional space. This technique is appropriate because we have strong prior theoretical reasons for investigating whether the network's hidden units cluster into four different groups according to causal agent. Table 2a shows the idealized outcome of a k-means cluster analysis showing perfect grouping by causal agent (when four groups are specified). Table 2b shows the actual results of the k-means analysis after 100 epochs of training. The fit of the model at 100 epochs was assessed against the null hypothesis that each cell has a count of 5. This analysis revealed that the model clustering did not differ from that expected by chance (χ2(9)=11.2; p>0.25). Thus, at 100 epochs, the hidden units do not cluster by causal agents. This is consistent with the principal components analysis showing that at 100 epochs the network groups events by object similarity and not by causal agent. However, Table 2c indicates that after 5,000 epochs of training the hidden unit activations group largely by causal agents. For each causal agent there is a separate dominant cluster similar to the idealized results in Table 2a. Frequency counts for the model cluster data correlated highly with the idealized data (Spearman's ρ=0.712; N=16). Thus, later on in training the hidden units are grouping the input according to causal agent, and consequently, are representing the transformations and not just the object attributes.
Note. WW2=World War II.
One way of describing how the internal representations change over training is to say that the learning algorithm “pays attention” to different aspects of the environment and develops different representations over time. Initially, the learning algorithm considers only the coarse object patterns irrespective of transformations (i.e., relations), whereas later in training the learning algorithm pays increasing attention to the transformations. This characterization of the network's behavior parallels Gentner's (1989) proposed explanation for the developmental changes observed with the relational shift. She suggested that children's changing performance in analogical reasoning and metaphor comprehension tasks results from changes in what children pay attention to (from object attributes to relations) and how the objects and relations are represented. Note, however, that one of the strongest implications of the analogy as relational priming account is that the relational shift does not arise simply from a maturing system shifting from generally representing the world in terms of objects to representing the world in terms of relational systems. Instead, the apparent relational shift is a consequence of acquiring greater and richer relational knowledge – as suggested by Goswami's notion of “relational primacy” (Goswami Reference Goswami1991).
In summary, the work presented so far accounts for the development of early analogical reasoning in young children in terms of simple priming mechanisms and increasing world knowledge. The current framework stresses the importance of the interplay between the learning mechanism and the environment in determining not only the final representations, but also the developmental trajectory of how the network represents objects and transformations.
Of course, explaining the simple analogies used to test young children in terms of relational priming begs the question of how such a mechanism might explain the more difficult and complex analogies used to test adults – analogies on which young children typically fail. Thus, in the next section, we ask whether a priming-based account can be extended to account for complex analogies with multiple objects and multiple relations typically used to assess adult analogical reasoning. Our aim here is not to develop a full theory of all aspects of analogical reasoning in adults, but rather, to demonstrate the potential of our relational priming framework for modeling adult performance.
4. Analogies with multiple objects and multiple relations
In our view, analogical reasoning is actually something of an umbrella term referring to several different cognitive processes working in concert and heavily dependent on specific task demands. Our position echoes earlier approaches. For instance, Goswami (Reference Goswami1991) suggests that analogical problem solving and a:b::c:d analogical completion may, to some degree, tap distinct cognitive processes. We maintain that complex analogies involving systems of relations and simple analogies involving relational priming may use similar underlying memory processes (e.g., pattern completion and relational priming) in considerably different ways as elaborated in section 4.1.3.
The following simulation exemplifies our approach with an account of how relational priming could build up an analogy between the Gulf War in 1991 and World War II involving multiple objects and networks of relations. This simulation is not intended to be a definitive account but instead is intended to illustrate that there is no essential conflict between the relational priming account (with additional cognitive control) and complex analogical reasoning. However, we acknowledge that there is considerable future work ahead of us before the model presented here could capture all the subtleties of fully fledged adult analogy.
In this simulation we address how the relational priming account could explain: (1) networks of relations (through iterative unfolding); and (2) one-to-many mappings (through additional controlled inhibitory mechanisms). The core theoretical mechanisms of the model (i.e., relational priming and relations implemented as transformations, pattern completion, gradual adaptation of connection weights) remain from earlier simulations. The main difference between the simulations is in additional controlled inhibitory processes present only during testing of analogical reasoning.
4.1. The model
4.1.1. The model architecture
There are two architectural differences from that of the previous simulations (see Fig. 15):
(i) A single context layer instead of two causal agent layers. In earlier simulations it was useful to talk about a causal agent in order to make the relationship between the model and the Goswami and Brown paradigm as transparent as possible, the idea being that children learn about one object which is transformed (the apple) and another object which is instrumental for the transformation (the knife) but which does not undergo a transformation itself. It was therefore logical to represent the causal agent at both time step 1 (i.e., the “before” state) and time step 2 (i.e., the “after” state). However, the “before” and “after” causal agent layers can be collapsed into a single layer (in terms of the network architecture, the single [context] and duplicate [causal agent] representations are functionally identical). This single layer can, more generally, be thought of as representing the context in which the auto-association or transformation of an object (e.g., an apple) occurs. The principal benefit of using a single context layer is that it allows greater flexibility of the types of situations that can be represented and therefore simulated. For instance, instead of the Object(t1) and Object(t2) layers representing different temporal states, these layers could equally be interpreted as representing different objects (e.g., robin and bird) at a single time point. In this case the context layer would represent a more abstract relational label such as ISA (i.e., robin ISA bird); see Rogers and McClelland (Reference Rogers and McClelland2004) for further discussion.
(ii) Additional inhibitory connections. In addition to receiving the normal input via connections from other layers, in the adapted network all external units (i.e., Object 1 and Object 2 and the context layers) can also be selectively inhibited (i.e., switched off). This inhibition only occurs during analogical completion (testing), not during knowledge acquisition (training), and can be understood in terms of inhibitory connections from an additional control system (see Davelaar et al. Reference Davelaar, Goshen-Gottstein, Ashkenazi, Haarmann and Usher2005). Relational priming combined with selective inhibition of either context or object layers underlies how the network can demonstrate complex analogical completion representative of a wider variety of behaviors.
4.1.2. Training
In the current simulations, the training environment implements a version of an analogy comparing the Persian Gulf War of 1991 with World War II. As such, the 11 object prototypes in the training set group into two categories: (i) five from the Persian Gulf: these are Saddam Hussein, George Bush Senior, Iraq, Kuwait, and the United States; and (ii) six from World War II: Adolf Hitler, Winston Churchill, Germany, Poland, Austria, and the Allies. We chose these two domains (the Gulf War and World War II) because they have previously been the focus of analogy research (e.g., Spellman & Holyoak Reference Spellman and Holyoak1992), and can support analogies involving networks of relations.
In contrast with our earlier simulations, object prototype representations are binary vectors (i.e., each unit is either on or off). Each object representation consists of two components, an orthogonal component uniquely identifying the object (this can be thought of as the object's label or name) and an orthogonal component representing which category the object belongs to (i.e., either Gulf War or World War II). Although, these object representations are highly simplified, they do permit a straightforward illustration of how relational priming is compatible with complex analogical reasoning.
In addition, there are six distinct orthogonal context representations, each consistent with a different relation. The network is trained (for 2,000 epochs) on only a subset of contexts and objects designed to reflect some of the relational structure underlying the analogical comparison of the Gulf War and World War II (see Table 3). As in previous simulations, each prototype pattern presented to the network has Gaussian noise added (μ=0.0; σ2=0.01).
4.1.3. Testing
In testing, there are three processes involving the proposed inhibition mechanism that, working in concert, illustrate the iterative unfolding of a complex analogy. At each time step some combination of the object or context layers is inhibited (this could be thought of as shifting conscious attention between different contexts and objects), and the network falls into a new attractor state with either the new context or a new object. We consider each of the three proposed processes in turn.
(1) Selecting a new object with the same role (i.e., relation) in the parallel (or target) domain. This process involves finding a different pair of Object 1 and Object 2 representations that correspond to the same context representation. This process can be thought of as giving the network two objects and then asking what other two different objects have the same relation, that is, a:b…? with the response c:d, in contrast to previous a:b:c…? d analogies.
After presentation of Object 1 (e.g., Iraq) and Object 2 (e.g., Kuwait) and settling into an attractor state (e.g., with the context representation occupies), the Object 1 and Object 2 representations are inhibited. This inhibition reflects the active, volitional search that is involved in completing some complex analogies. The pattern completion properties of this type of recurrent network ensure that the network subsequently settles into an attractor state consistent with the prior context representation (occupies) but with different Object 1 and Object 2 representations (e.g., Germany and Austria, respectively). This allows the network to form analogies from limited information.
In order to traverse a network of relations, building up a large multipart analogy, the network has to find new relations for consideration in a controlled way. The following two processes address this issue:
(2) Selecting a new role in the same domain for the same object. One possibility for finding a new relation is to simply inhibit the context layer and allow the network to settle into a new attractor consistent with the Object 1 representation and a new context representation. This can be understood as: Given an Object 1 activation (e.g., Iraq), what other relation (other than Occupies) goes with this object? In response, the network would produce the context representation Defies and the Object 2 representation United States.
(3) Selecting a new role in the same domain for a different object. It is also possible to find a pair of different objects connected by a different relation. This involves inhibiting the previous Object 1 and Object 2. This process results in a pair of new objects and a new relation (e.g., Bush Motivates United States), which are from the same domain as the prior objects. Both processes (2) and (3) can subsequently be used to form a new mapping with the other domain, using process (1) above.
4.2. Results
The three processes detailed above can be used sequentially to explore similarities between two domains involving multiple relations and objects to build up a large, complex analogy. Figures 16 and 17 demonstrate the network's performance on one such analogy using the Gulf War and World War II domains (the network's exploration of the analogy presented in these figures is one example out of many possible trajectories). Figure 16 shows the network's journey through the space of multiple concepts as it discovers the multirelation analogy, whereas Figure 17 shows the successive activation states of the network as analogical completion unfolds through time. First, the network is given a domain as the base by presenting the network with Object 1 and Object 2 layers consisting of the component representing the domain (i.e., Gulf War) and resting values for all other Object 1 and Object 2 units. The network then completes the Object 1, Object 2, and context layers consistent with this domain (i.e., Saddam threatens Kuwait). This activation state constitutes the starting point of the analogy. Subsequently, process (1) is used to find a mapping in the alternative domain (i.e., Hitler threatens Poland). Following this, either process (2) or process (3) is used to find a different relation involving the same Object 1 representation (i.e., Hitler dictator of Germany). Process (1) is then repeated to find a mapping in the Gulf domain (i.e., Saddam dictator of Iraq). Subsequently, process (1) and either process (2) or (3) are interleaved iteratively to traverse the network of relations, thereby building up a complex analogy (see Figs. 16 and 17). Importantly, this model reveals how relational priming may serve as a fundamental subprocess when building analogies involving networks of relations and many-to-many comparisons across domains.
4.3. Discussion
In the current implementation, iterative unfolding is guided by an essentially blind process of relation selection (i.e., processes (2) and (3) above involve finding a different random relation or object). This, however, is not to say that iterative unfolding is necessarily an unguided bottom-up process. One way in which top-down information can play a strong role is in terms of systems of relations and the way that these are gradually encoded in the network's internal representations. In the network's training environment, systems of relations will co-occur with greater frequency and across different examples over individual relations that do not co-vary consistently. Connectionist networks are good at picking up these statistical regularities and representing them in the hidden units, such that systems of relations are represented as closer in hidden unit space. Any consequent analogical mappings (i.e., given a certain relation) made with these hidden units will thus contain a bias for selecting a new relation from within the same coherent relational system. Rogers and McClelland (Reference Rogers and McClelland2004) discuss the importance of this type of consistent covariance for the related but distinct development of semantic cognition. Here, we see how this consequence of the statistics of the input could explain biases for systems of relations which have been repeatedly reported in the analogy literature, and, more importantly, could explain how and why these biases develop.
Importantly, a process of iterative unfolding also resonates with evidence as to how children learn about relational structures. Children's analogy abilities become more systematic and sophisticated as they gradually absorb and internalize more of the richness of the structural relations (particularly causal systems of relations) in the biological, physical, and psychological domains in which they find themselves (for a review, see Goswami Reference Goswami, Holyoak, Gentner and Kokinov2001).
We do not consider in detail how the postulated additional control processes are implemented in a full connectionist account, or how these processes develop – this is beyond the scope of the current paper (although certainly not beyond the scope of connectionist modeling, e.g., Davelaar et al. Reference Davelaar, Goshen-Gottstein, Ashkenazi, Haarmann and Usher2005; O'Reilly Reference O'Reilly2006; Rougier et al. Reference Rougier, Noelle, Braver, Cohen and O'Reilly2005). Tentatively we suggest that the work of O'Reilly and colleagues provides a useful template for understanding how flexible cognitive control systems could develop in response to task demands (see Rougier et al. Reference Rougier, Noelle, Braver, Cohen and O'Reilly2005). This work considers the interaction of a network that can rapidly update or maintain activation according to a dynamic gating mechanism trained with reinforcement learning. Task-relevant representations self-organize over training. These representations are capable of capturing data from Stroop and Wisconsin-card sorting tasks – both tasks that require the kind of active maintenance and inhibition necessary for the proposed iterative unfolding of analogies. Furthermore, one of the interesting aspects of the Rougier et al. work is that the network is able to generalize to novel tasks. Therefore, such an approach is a useful starting point for investigating the origins of the controlled iterative unfolding that we propose for complex analogical mapping and how such skills generalize to analogies in novel domains, although there are likely to be considerable challenges ahead in satisfactorily marrying the simple analogical processes detailed earlier with the controlled processes in a single developmental model.
5. General discussion
The account of analogical completion presented here is an attempt at a novel developmental theory of analogical reasoning. Contrasting our model with models inspired by adult reasoning (e.g., SME, LISA) indicates that taking a developmental perspective constrains the modeling process in a substantially different way. Unlike most accounts of adult analogical reasoning, which are largely concerned with analogies involving systems of relations, our developmental account has been forced to emphasize different qualities such as knowledge acquisition and the importance of reconciling analogy competence with other lower-level cognitive processes. This difference in emphasis results in an explanatory framework that brings together a wide range of seemingly disparate developmental experimental findings in a single computational model. In contrast to other models of analogy that have been applied to developmental phenomena, the model presented here is able to account for all of the key developmental phenomena listed in the introduction. As such, it is worth reflecting on some of the theoretical implications of the model's two underlying core assumptions: the central role of priming and the treatment of relations as transformations.
5.1. Relations as transformations
5.1.1. The relational shift revisited
The key benefit of viewing relations as transformations between states is that relations do not have to be represented explicitly, avoiding the difficulties of learning explicit structured representations. By attempting to simulate the acquisition of relational knowledge, the current account is consistent with and provides a possible developmental mechanism for knowledge accretion theories of analogical development. This is especially clear for simple causal relations (e.g., cutting) such as those considered with the model, where there is a direct physical change in an object. However, as emphasized in the final simulation, potentially all relations could be viewed as transformations.
Our account of the relational shift depends crucially on how well the network learns object transformations versus auto-associations (i.e., null transformations). At the heart of the relational shift is a change from ignoring transformation information (auto-association) to incorporating transformation information into the network's internal representation of the object. The rate at which this occurs reflects the relative mix of transformation versus non-transformation experiences that the network encounters, thereby grounding the relational shift squarely within the experiences that the child encounters. This position contrasts markedly with Gentner's Structure-Mapping account, which posits that the relational shift derives from a process of knowledge becoming increasingly abstract. Instead, our position resembles far more the relational primacy hypothesis of Goswami.
One of the more unexpected consequences of the current account as a developmental theory is that it implies that there is no necessary relational shift for any given relation in a child's similarity judgments. The relational shift is modulated by the frequency with which children encounter transformations versus auto-associations (i.e., null transformations). This even implies that for high-frequency relations children might show an initial bias for transformational similarity over object-based similarity. For the network exposed to high-frequency transformations, this early preference for transformation over auto-association would correspond to it being able to solve analogies but not non-analogies (i.e., the network would make errors like apple:apple::bread:cut bread). For such high-frequency relations the network would, accordingly, undergo something akin to a relational shift in reverse, gradually becoming able to produce the object-similarity response and producing the transformation response only when appropriate rather than all of the time. Thus, the theory and model strongly predict that with a subset of highly familiar relations children will, in contrast to the expectations of the relational-shift hypothesis of Gentner (Reference Gentner1988), choose relational responses over object-similarity responses even when an object-similarity judgment would be more appropriate. A useful test of the validity of the model would investigate the relational judgments of children to similar stimuli that differ in the appropriateness of a relational or object-based interpretation. Following the analogy as relational priming account, we would expect that over development there should be a shift away from relational responses for the object-based stimuli – an inverse relational shift.
5.1.2. Transformation size in analogical reasoning
One factor that affects how well the network learns a transformation is the degree of confusability or overlap between different input-output training patterns (e.g., how similar are the representations between apple and cut apple). The lesser the overlap between the auto-associative (e.g., apple) and the transformed (e.g., cut apple) responses (i.e., the greater the size of the transformation) the easier it is for the network to learn the pattern sets that define the transformation. As a consequence, the network will also be better at completing analogies involving states that are very different as compared to those that do not differ very much. The model therefore predicts that analogies involving larger transformations will be solved more readily than analogies involving smaller transformations. This prediction is a consequence of our two key theoretical assumptions (analogy as relational priming and relations as transformations), and their implementation within the network model. No other model of analogical reasoning makes a similar prediction.
In order to explore this prediction, Leech et al. (Reference Leech, Mareschal and Cooper2007) tested adolescents and adults on analogies where the transformations involved varied along some dimension. Consistent with our model's predictions, it was found that for both adults and adolescents, analogies involving larger transformations were solved more accurately than analogies involving smaller transformations.
5.1.3. Transformation and predicate structure
Some readers may argue that relations are not really implemented as relations, but, instead, we have built predicate structure into the network architecture, with the Object(t1) layer being the patient and the CA(t1) (or context) layer being the agent or instrument. However, this is not the case. Any layer (or slot) of the network can be trained as a patient or an agent in a transformation. Indeed, architecturally the CA(t1) layer and Object(t1) layer are not functionally different. For example, we could train the model on knife at CA(t1) and broken knife at CA(t2) (in addition to training the network that knife cues a physical transformation in apple). Then a single network could solve analogies involving knife as agent and knife as patient.
5.2. Analogy as priming
Central to the question of whether priming can play an important part in analogical reasoning is whether analogy (and related underlying mechanisms) need be explicit. Priming is normally considered as an automatic, implicit mechanism (e.g., Tulving & Schacter Reference Tulving and Schacter1990), whereas analogy is generally characterized as an essentially explicit ability. Therefore, analogy as relational priming seems at first glance to be a somewhat paradoxical theoretical position. However, we believe that this argument has only superficial validity, primarily because it is important to distinguish between cognitive processes and the behavioral and cognitive results of those processes. For instance, although priming – an implicit process – may be a core mechanism of analogy, the result of that process can still be explicit and accessible to the system (i.e., the resulting analogy can be verbalized or used as input to some other cognitive process). Language comprehension provides an illustrative example of a domain that involves some implicit mechanisms (including semantic priming) and verbalizable explicit outcome (e.g., see Kutas & Federmeier Reference Kutas and Federmeier2000). Secondly, our account of analogy as relational priming is compatible with both implicit analogical reasoning mechanisms (as suggested by the earlier simulations of analogical completion with uncontrolled relational priming) and with much more deliberative mechanisms (i.e., the controlled use of inhibition and relational priming to iteratively unfold a large and complex analogical mapping).
Priming effects are ubiquitous in perception and cognition and have been observed across development. Viewing analogy as related to a form of priming demonstrates how a high-level cognitive process can be rooted in lower-level processes. This provides a possible explanation of the early and natural use of analogical reasoning by young children. The proposed relationship between analogical reasoning and more basic priming mechanisms can be understood as analogous to how some researchers have related cognitive skills (including reasoning and categorization) to more general, lower level processes of inhibition (e.g., Dempster & Brainerd Reference Dempster and Brainerd1995; Houdé Reference Houdé2000).
Consistent with this perspective, we envisage analogy as something of a heterogeneous phenomenon: Different types of analogy will utilize a variety of memory and control processes in considerably different organizations and to considerably different effect. While our aim has not been to provide an exhaustive account of analogical reasoning, we have suggested two possible configurations of cognitive processes that may underlie distinct types of analogical reasoning. These configurations illustrate the explanatory power of analogy as relational priming, and demonstrate how the framework is compatible with complex adult levels of performance.
Although there is strong evidence that relational priming may be involved in analogy, further work needs to be done to establish how well the two processes are interrelated. In fact, the analogy as relational priming account stands or falls on the predicted intimate relationship between analogy and priming, especially through development. In this vein, a strong test of our account concerns whether relational priming can be demonstrated in children and if so whether this correlates with performance on more standard analogical reasoning measures.
5.3. The role of world knowledge
Both children and adults can readily form analogies involving novel stimuli. They do this rapidly and following very little exposure. This may appear to be inconsistent with the lengthy training regime and slow connection weight adjustments that form the basis of analogical reasoning in our simulations. This is not the case. Even stimuli that appear novel often share a great deal of underlying information with prior experience. For example, novel square drawings moving on a computer monitor will tap into considerable existing world knowledge about the possible relations between moving items. More generally, when the network is presented with a “novel” problem, the task will be made easier if the network can co-opt existing representations into learning the new problem (e.g., Altmann Reference Altmann2002; Shultz & Rivest Reference Shultz and Rivest2001; Shultz et al. Reference Shultz, Rivest, Egri and Thivierge2006). A similar explanation of why children can rapidly draw causal inferences about novel objects that they have never encountered before is given by McClelland and Thompson (Reference McClelland and Thompson2007), who demonstrate that providing a network with substantial previous experience in a domain can lead to rapid “one-shot” causal learning.
5.4. Explicit mapping
In our final simulation we demonstrated how relational priming could be used deliberatively to build up a complex explicit analogy. The final simulation illustrates how a situation that is normally assumed to be an example of explicit structure mapping is consistent with a simpler conception of analogy combined with meta-cognitive processes used to elaborate an initial mapping. We acknowledge that the current adult model is underdeveloped and in particular fails to explain the etiology of the control and memory processes necessary for handling complex adult analogies; and that a fuller account of adult analogy would, therefore, be necessary to convince many researchers in the structure-mapping community of the explanatory power of the relational priming account of analogical reasoning. However, we believe that the iterative mechanism for building up mappings based on relational priming is useful for illustrating some important distinctions between our account and existing models.
In our view the elaboration of an initial mapping is both deliberative (i.e., non-automatic) and task-directed. To demonstrate, reconsider the World War II/Gulf War analogy. Holyoak and colleagues (e.g., Holyoak & Hummel Reference Holyoak, Hummel, Gentner, Holyoak and Kokinov2001; Spellman & Holyoak Reference Spellman and Holyoak1992) have shown that people can come up with complex mappings between 1990 and 1939 (e.g., Saudi Arabia is comparable to France; Kuwait is comparable to Poland; George Bush could be Roosevelt or Churchill). However, most of these explicit mappings are irrelevant for the analogy to fulfill its intended purpose of stressing the similarity between invasions from Hitler's Germany and Saddam Hussein's Iraq, and consequently emphasizing that if Saddam Hussein were not stopped something terrible – like World War II – would be visited on the world. The analogy still holds if only Saddam Hussein and Hitler (both conceived as dangerous, aggressive dictators) are compared and nothing else is mapped. Thus, in this situation at least, there does not need to be an explicit mapping of relational structure to form an analogy (though making such structure explicit may well serve to increase the intensity of any argument based on the analogy).
A mechanism of iterative unfolding, such as the illustrative one presented in the final simulation, also enables a fuller comparison of the current model's scope with respect to other phenomena such as the systematicity effects shown in conjunction with Structure-Mapping Theory. As we noted in the previous section, relations that constitute a system will vary coherently in a naturalistic training environment. This means that a connectionist network learning about that domain will develop internal representations that reflect this relational structure (something that developing children also do – see Goswami Reference Goswami, Holyoak, Gentner and Kokinov2001). Consequent inferences or analogical reasoning based on these internal representations would, therefore, contain a bias towards systems of relations (see Rogers & McClelland Reference Rogers and McClelland2004; Reference Rogers, McClelland, Rakison and Gershkoff-Stowe2005; Thomas & Mareschal Reference Thomas and Mareschal2001).
How world knowledge is acquired is also of particular importance in determining the kinds of relations that prime others. In the adult relational priming literature, priming effects have typically been quite small. These small effects (although semantic priming effects are typically larger in children than adults; Chapman et al. Reference Chapman, Chapman, Curran and Miller1994) could in part result from the fact that experimenters have focused on very general relations defined in linguistic taxonomies (e.g., the relation have; for a discussion, see Estes & Jones Reference Estes and Jones2006), rather than on the more specific types of relations that are salient and useful for explaining the world. This parallels the evidence from Goswami and colleagues that causal explanations are particularly relevant to children in making sense of their environments, because many real-world events feature cause-effect patterns (see Goswami Reference Goswami, Holyoak, Gentner and Kokinov2001). It follows that in order to further develop our account of iterative unfolding with systems of relations it will be necessary to provide training regimes that better reflect a child's environment, and, in particular, training regimes that emphasize the relations and relational structures that are most salient within the child's environment, and as such are most likely to serve as primes.
We have so far not discussed the important issue of relational complexity (Halford et al. Reference Halford, Wilson and Phillips1998) and how the iterative unfolding account could marry simple proportional analogies such as a:b::c:d with the developmental effects of relational complexity on children's reasoning. However, to address this issue, one open question that would first need to be broached is how n-ary relations would be implemented (e.g., relations with three arguments such as gives in John gives Mary the book). One simple way, following event semantics (Davidson Reference Davidson and Rescher1967), to generalize our account to analogies involving n-ary relations would be to decompose an n-ary relation into multiple binary relations around an event: for example, GIVER(event, John), GIVEE(event, Mary), and GIVEN(event, book). Higher arity relations, according to the iterative unfolding account, would then require more temporary storage of partial results (e.g., dealing with the ternary relation give would require temporary storage of the giver, the givee, and the given). Hence, this approach would also predict that relational complexity would constrain analogical reasoning and that, across development, ability with higher arity relations will correlate with some measure of working memory efficacy.
The important distinction underlying our approach to more complex analogies is between the explicit mapping framework where analogies are worked out completely by some cognitive mechanism (e.g., LISA), and an alternative view where simple analogies are first made before subsequently being checked and expanded if necessary using explicit iterative unfolding. To reiterate, according to our account, explicit structure mapping is a meta-cognitive skill: a relational priming mechanism reveals a relational similarity between two domains, but the reasoner can iteratively unfold this by repeatedly applying the simpler mechanism over and over again to components of a domain in order to extend the analogy or to discover where the analogy breaks down. This second account has an important role for explicit mapping; the key difference is that explicit mapping is no longer necessary for analogy to occur, but instead describes a subset of analogies.
Future work involving, for example, patients following neurological insult, or possibly transcranial magnetic stimulation with healthy participants, could provide strong evidence for the disentangling of mapping from analogy. In particular, given that explicit mapping is likely to employ more frontal regions, whereas relational priming is likely to be more temporal, we predict that frontal damage should have relatively little impact on analogical reasoning when explicit mapping is not central to performance, and that more temporal damage should severely reduce relational priming and consequent analogical reasoning.
One interesting prediction from the model that also suggests a dissociation of analogy and mapping concerns a developmental asymmetry in analogy completion when base and target are reversed. For each analogy, it is predicted that there will be a period when the model can complete an analogy one way but fails to complete its reverse. For example, apple:cut apple::bread:cut bread at a given point in development may be easier than the reverse analogy: bread:cut bread::apple:cut apple. This phenomenon arises in the model because pattern completion is differently constrained in the base domain and the target domain. The base domain involves greater external constraint (i.e., both the a and b terms) than the target domain (just the c term). Consequently, the model is more likely to appropriately complete an analogy if the less well learnt relation is in the more highly constrained base domain than if it is in the less constrained target domain. This prediction is hard to reconcile with structure-mapping accounts and so constitutes a further strong test of the validity of analogy as relational priming model.
5.5. Development revisited
One of the principal lessons from this work is that it is vital to place development squarely at the heart of any account of cognition. This is not a new proposal (e.g., see Karmiloff-Smith Reference Karmiloff-Smith1998; Mareschal et al. Reference Mareschal, Johnson, Sirois, Spratling, Thomas and Westermann2007; Piaget Reference Piaget and Duckworth1970; Thelen & Smith Reference Thelen and Smith1994) but one that is often overlooked by investigators of adult cognition. Many models of adult cognition have become very complex, often positing a myriad of specialist mechanisms, but are also very powerful at explaining many different aspects of adult performance on a range of complex tasks. However, in many cases, these models make no attempt to explain how the complex structures assumed to be part of adult cognition emerged. In contrast to this, we have emphasized the need to explain how cognitive mechanisms emerge over time with experience of the world. The result is that a very different kind of model is arrived at. As discussed in sections 4 and 5.4, our current account still has a substantial way to go to capture the complexity and richness of adult analogical reasoning. Indeed, in section 4 we sketch one possible way forward. That objection notwithstanding, it still remains for adult-level models to make contact with the developmental constraint: namely, that all proposed mechanisms must have a developmental origin in order to be plausible. Thus, while the developmental model does not reach adult levels of competence, the adult model does not make sufficient contact with its developmental origins. A complete explanation of analogical reasoning must breach this gap.
In summary, relational priming has been presented as a developmentally viable account of early analogical completion. We have shown that the account, implemented in a connectionist model, captures a broad range of developmental phenomena, including seven detailed developmental markers of analogical ability. Our final simulation demonstrates how the simple relational priming mechanism can be applied iteratively to traverse complex analogies. This approach promises to provide a fuller developmental picture of the mechanisms underlying the gradual transition from simple to more complex reasoning.
ACKNOWLEDGMENTS
This work was supported by the Economic and Social Research Council (UK), MRC NIA grant G0400341 and European Commission NEST grant 020988 (ANALOGY).