A glaring concern with Leech et al.'s “relations as transformations” account of analogy is the amount of training needed to attain a capacity for analogical inference. Adults reach a stage in development where analogical inference extends to ad hoc relationships outside the sphere of prior experience. Modeling this capacity is a problem for common feed-forward and simple recurrent networks, which rely on stimulus-driven response-error correction (Phillips Reference Phillips1999; Reference Phillips2000); and for similar reasons, this level of development is unreachable with the sort of connectionist model proposed in the target article. The analogizer cannot prepare in advance all possible transformations that could be primed. Moreover, any degree of generalization afforded to the model via similarity-based transformation is thwarted by analogies demanding transformations inconsistent with previous tasks.
Learning set transfer (Kendler Reference Kendler1995) or relational schema induction (Halford et al. Reference Halford, Bain, Maybery and Andrews1998) involves testing participants on a series of stimulus-response tasks having a common structure (e.g., transverse patterning), where each task instance consists of a set of stimuli in novel relationships. Suppose, for example, in one task instance (T1) square predicts circle, circle predicts triangle, and triangle predicts square; in the next task instance (T2) cross predicts star, star predicts bar, and bar predicts cross; and so on. The transverse patterning structure and the fact cross predicts star (information trial) are sufficient to correctly predict the responses to star and bar in the other two trials. Even on more complex structures involving more objects and more information trials, adults reach the point of correctly predicting the responses on the remaining trials (Halford et al. Reference Halford, Bain, Maybery and Andrews1998).
The target authors' model fails to account for this sort of abstract analogy because the system can only utilize relations between objects that have already been learned as transformation functions on the basis of prior experience. Analysis of internal representations by the authors revealed that the developed network groups objects in hidden unit activation space by the relations that transform them. The input/hidden-to-output connections effectively implement a mapping whose domain is partitioned into subdomains, one for each causal relation (e.g., cut, bruised, etc.). The input-to-hidden connections implement a mapping from object pairs to points located within the subdomain corresponding to the relationship between the two objects, effectively providing an index to the objects' relation. For example, apple and cut apple are mapped to a point in hidden unit space contained in the subdomain for the cut transformation function. This point provides the context for mapping the next object, say, banana to cut banana (assuming that this transformation was also learned) to complete the analogy. The same sequence of steps may also be applied to transverse patterning, assuming that the network has learned all the required mappings: For example, cross and star would map to a point in the subdomain corresponding to the task relation T2; and star in the context of T2 would map to bar. Unlike adults, however, the network must be trained on all possible transformations to make this inference.
Notice that the problem with Leech et al.'s model is not about a complete failure to generalize. Suitably configured, some degree of generalization may be achievable using a learned internal similarity space of object representations. All fruit, for example, could be represented along a common dimension, and the various causal relations could be orthogonal projections that systematically translate the representations of fruit to cut fruit, or bruised fruit, and so on. Learning to complete analogies for some instances of fruit and cut fruit may generalize to the other instances, assuming the number of parameters (weights) implementing the mappings is sufficiently small compared to the number of fruit examples. But the elements of a transverse patterning task may not be systematically related in any way other than via the transverse patterning structure; they need not belong to the same category of objects, and they may even contradict mappings learned from a previous task instance (e.g., cross may predict bar in a new instance of the task). Thus, there is no basis on which the needed similarity could have developed. The problem is that the capacity for abstract analogical inference transcends specific object relationships.
Despite this pessimistic assessment, perhaps an explanation for analogy could be based on transformations augmented with processes that represent and manipulate symbols. Assuming a capacity to bind/unbind representations of objects to representations of symbols, abstract analogies such as transverse patterning may be realized as the transformation of symbols (e.g., symbol a maps to b, b maps to c, and c maps to a), instead of specific object representations. However, hybrid theories are to be judged at a higher explanatory standard (Aizawa Reference Aizawa2002). Not only are they required to explain each component (e.g., an object transformation account for concrete analogies and a symbolization account for abstract analogies), but they also need to explain why the components are split that way.
Indeed, Aizawa's detailed analysis of the systematicity problem (Fodor & Pylyshyn Reference Fodor and Pylyshyn1988) and its proposed “solutions” (for a review, see Phillips Reference Phillips2007) signpost a general developmental theory of analogy. To paraphrase, the problem is not to show how analogy is possible under particular assumptions, but to show how analogy is a necessary consequence of those assumptions. The capacity for analogy, like the property of systematicity, is a ubiquitous product of normal cognitive development. If a developmental connectionist explanation depends on a particular network configuration, then why does it get configured that way? And if the answer is an appeal to error minimization, then what preserves this configuration in the face of optimization over numerous stimulus relations that may have nothing to do with analogy? Answers to these sorts of questions without relying on what Aizawa distinguishes as ad hoc assumptions would help to shift Leech et al.'s account from one that is simply compatible with the data to one that actually explains it.
Leech et al.'s developmental approach may yield valuable insights into the early acquisition of a capacity for concrete analogical inference. But to expect that it will lead directly to higher cognition seems more like wishful thinking.