R1. Introduction
Scientists typically approach the problem of how space is represented in the brain using the vehicle of their own preferred model species, whether pigeon, rat, human, or fish. Therefore, it is impressive to see in the commentaries how many issues cut across species, environments, descriptive levels, and methodologies. Several recurring themes have emerged in the commentators' responses to our target article, which we address in turn. These are: (1) Frames of reference and how these may be specified and related, (2) comparative biology and how representations may be constrained by the ecological characteristics and constraints of a given species, and (3) the role of developmental and life experiences in shaping how representations are structured and used. In the following sections we explore each of these themes in light of the commentators' views, and outline research possibilities that speak to each of these.
R2. Frames of reference
A fundamental issue in spatial representation is the frame of reference, against which positions and movements are compared. The concept of frame is a mathematical construct rather than a real cognitive entity, but it provides a useful way of organizing important concepts in spatial representation. However, the crisp formalism that we have developed in order to treat space mathematically may bear scant resemblance to the brain's actual representation, which has been cobbled together using patchy, multisensory, variable, probabilistic, and distorted information. Nevertheless, it is useful to review the ingredients of a perfect map, before trying to understand how the imperfect and real-world brain may (or may not, as Zappettini & Allen argue) have approximated this functionality.
Maps can be either metric or non-metric (Fig. R1), the former incorporating distance and direction and the latter (topological maps – see commentary by Peremans & Vanderelst) using only neighbourhood relationships without reference to precise distances or directions. That said, these parameters are necessarily loosely incorporated into the map, or else the relationship with the real world becomes too disconnected for the map to be useful. We focused on metric maps in our analysis because of the clear evidence of metric processing provided by the recent discovery of grid cells. However, Peremans & Vanderelst describe how an initially topological map could possibly be slowly metricized through experience.
A metric spatial reference frame, to be useful, needs two features: an origin, or notional fixed point within the frame, from which positions are measured, and some way of specifying directions in each of the cardinal planes. Human spatial maps generally fall into two classes, Cartesian and polar (Fig. R1), which treat these two features slightly differently. A classic Cartesian map has a single plane on which an origin is defined, while directionality is imposed on the plane via the two orthogonal x and y axes, which intersect at this origin. Positions are specified by projecting them onto the axes, yielding two distance parameters per position. Directions are not explicitly represented but can be inferred by reference to the axes. In a polar map, by contrast, direction is explicitly represented. Positions are specified by the distance from the origin and the direction with respect to a reference direction, again yielding two parameters, one specifying distance and one direction.
If a map is a map of the real world, then the frame of reference of the map needs to correspond somehow (be isomorphic) with the world. For example, in a map of the Arctic, the origin on the map corresponds to the North Pole and the reference direction on the map corresponds to the Greenwich Meridian. Spatial inferences made using the map can enable valid conclusions to be drawn about real-world spatial relationships. However, there are many different levels of description of the real world. If a lookout on a ship navigating the polar waters sees an iceberg, then she won't shout “Iceberg, 3 km South, 20 degrees East!” but rather “Iceberg, 100 m, starboard bow!” – that is, she will reference the object not to the Earth's global reference frame, but rather to the more relevant local reference frame of the ship. Thus, maps can align with the world in many different ways.
In the cognitive science of spatial representation, the global world-referenced frame has come to be called “allocentric,” whereas a local self-referenced frame is called “egocentric.” As the icebreaker example shows, however, this dichotomy is an oversimplification: If the lookout should then bang her elbow on the gunwale, she will complain not of her starboard elbow but of her right elbow, which of course would become her port elbow if she turned around – “starboard” and “port” are ship-referenced directions and can therefore be egocentric or allocentric depending on the observer's current perception of either being the ship or being on the ship. We will return to the egocentric/allocentric distinction further on.
The final important feature of a reference frame is its dimensionality, which was the primary topic of the target article. Cartesian and polar maps are usually two-dimensional, but can be extended to three dimensions. For both frameworks this requires the addition of another reference axis, passing through the origin and aligned vertically. Now, the position of a point in this space requires three parameters: distance in each of x, y, and z for the Cartesian map, and beeline distance plus two directions for the polar map. In the polar map, the first direction is the angle with respect to either/both of the orthogonal directional references in the plane of those axes, and the second direction is the angle with respect to both axes in a plane orthogonal to that.
With these concepts articulated, the question now is how the brain treats the core features of reference frames. We address these in the following order:
-
1. How are egocentric and allocentric reference frames distinguished?
-
2. What is the dimensionality of these reference frames?
-
3. Where is the “origin” in the brain's reference frames?
We turn now to each of these issues as they were dealt with by the commentators, beginning with the egocentric/allocentric distinction.
R2.1. Egocentric and allocentric reference frames
When you reach out to pick up a coffee cup, your brain needs to encode the location of the cup relative to your body; the location of the cup relative to the outside world is irrelevant. The frame of reference of the action is therefore centred on your body, and is egocentric. Conversely, if you are on the other side of the room, then planning a path to the cup requires a room-anchored, allocentric reference frame. The fundamental difference between these frames lies in the type of odometry (distance-measuring) required. If the brain's “odometer” is measuring distance between body parts, or movement of one part with respect to another, then the frame is egocentric, and if the measurement is relative to the earth, then it is allocentric. The reason that grid cells have proved so theoretically important is that they perform allocentric odometry – their firing fields are equidistant in earth-centred coordinates – thus proving once and for all that the brain indeed has an allocentric spatial system.
Actions in space typically require more than one reference frame, and these frames therefore need to be related to each other. For example, in reaching for the cup, the movements the hand needs to make are constrained by the position of the arm. If reaching for the cup while also walking around the table, then the hand and arm frames need to be updated to account for the relative motion of the cup, and egocentric and allocentric frames must interact. It seems that large parts of the posterior cortex are devoted to these complicated computations, with the parietal cortex being more involved with egocentric space (Andersen & Buneo Reference Andersen and Buneo2002) and the hippocampus and its afferent cortices with allocentric space. Where and how these systems interact is not yet known, but as Orban notes, it may be in the parietal cortex.
In our target article we restricted our analysis to allocentric reference frames on the grounds that only allocentric encoding is fully relevant to navigation. The importance to the bicoded model of a possible egocentric/allocentric unity is that there is better evidence for volumetric spatial encoding in the realm of egocentric action. If egocentric and allocentric space turn out to be encoded using the same representational substrate, the implication is that allocentric space, too, should be volumetric rather than bicoded. Indeed, several commentators question the egocentric/allocentric distinction and contest the exclusion of egocentric spatial processing from a theory of navigation. Orban argues that navigation uses egocentric processes because allocentric and egocentric processing converge on a common neural substrate in the parietal cortex – which is true, but it does not mean they are therefore indistinguishable. Klatzky & Giudice, referring to studies of humans, ask, “what, then, differentiates the behaviors ostensibly governed by the planar mosaic from human spatially directed actions such as pointing, reaching, over-stepping, and making contact?” – to which we would answer that egocentric actions, such as reaching, do not require allocentrically referenced odometry. They then say, “This exclusion of metrically constrained behaviors from consideration is undermined, however, by the intrinsic ambiguity of frames of reference,” by which they presumably mean that humans, at least, can flexibly move between frames of reference. However, movement between reference frames does not imply that the frames are therefore indistinguishable. Klatzky & Giudice conclude that “lacking reliable behavioral or neural signatures that would allow us to designate actions as egocentric on the basis of their metric demands, it seems inappropriate simply to exclude them from a theory of volumetric spatial representation.”
We of course would not want to fully exclude egocentric factors from a complete description of navigation, but it needs to be said that a distinction can be made. As noted earlier, the existence of allocentric odometry, in the form of grid cell activity, shows without a doubt that the brain possesses at least one allocentric spatial system. Given that there are actions that engage grid cells and others that do not, there must be (at least) two forms of spatial representation occurring in the brain: a primarily egocentric one in the parietal cortex that does not require grid cell odometry and a primarily allocentric one in the entorhinal/hippocampal system that does. The question is whether these two systems use the same encoding scheme and/or the same neural substrate. In our view, both are unlikely, although in terms of neural substrate there are clearly close interactions. Therefore, arguing for allocentric isotropy (equivalent encoding in all directions) on the grounds of (possible) egocentric isotropy is unwarranted – they are separate domains.
Of course, there may be more than just these two reference frame systems in the brain – there may even be many. And we accept that the distinction may not be as clear-cut and binary as we made out. Kaplan argues for the possibility of hybrid representations that combine egocentric and allocentric components, and points out that the locomotor-plane-referenced form of the bicoded model is in fact such a hybrid. We have some sympathy with this view, as it pertains to the bicoded model, but are not fully persuaded by it. Although it is true that this plane is egocentrically referenced, because it is always under the animal's feet, it is also the case that the locomotor plane is constrained by the environment. So, one could argue that the egocentric frame of reference is forced to be where it is because of allocentric constraints. Thus, the mosaic form of the bicoded model, in which each fragment is oriented according to the local surface, is arguably an allocentrically based, rather than hybrid, model. The distinction may be mostly a semantic one, however.
The egocentric/allocentric distinction can be blurred even in discussions of explicitly allocentric neural representation. In fact, the place cells themselves, long regarded as the prime index of allocentric encoding, are hybrid inasmuch as a place cell fires only when the rat's egocentric reference frame occupies a particular place in the allocentric world. Indeed, it is difficult to think of any example in the existing literature of fully allocentric encoding, in which the position of the organism is truly irrelevant and neurons only encode the relative positions of objects with respect to each other. There may be such neurons, at least in humans, but if they exist, they have not been found yet. Most encoding probably mixes allocentric and egocentric components to a certain degree. Such mixing can cause problems for the spatial machinery. For example, Barnett-Cowan & Bülthoff speculate that the combination of righting reflexes to maintain an upright head posture during self-motion and object recognition, combined with prior assumptions of the head being upright, may interfere with the brain's ability to represent three-dimensional navigation through volumetric space.
Egocentric and allocentric distinctions aside, within allocentric space one can also distinguish between locally defined and globally defined space, and several commentators have addressed this issue. It is relevant to whether a large-scale representation is a mosaic, a proposal for which Yamahachi, Moser, & Moser (Yamahachi et al.) advance experimental support, noting that place and grid cell studies show fragmentation of large complex environments based on internal structure (in their case, walls). Howard & Fragaszy argue that for a sufficiently dense and complex space, such as in a forest, the fragments of the mosaic could in principle become so small and numerous that the map as a whole starts to approximate a fully metric one. We would argue, however, that in order to be fully metric in all dimensions, such a map would need a global 3D directional signal, for which evidence is lacking at present. However, future studies may reveal one, at least in species with the appropriate ecology.
If complex space is represented in a mosaic fashion, then what defines the local reference frames for each fragment? Here it may be that distal and proximal environmental features serve different roles, with distal features serving to orient the larger space and proximal ones perhaps defining local fragments. Savelli & Knierim observe that local versus global reference frames sometimes conflict. They suggest that apparent impoverishment of three-dimensional encoding may result from capture of the activity by the local cues – the vertical odometer is, in essence, overridden by the more salient surfaces of the local environment, which continually act to reset it. For example, consider a helical maze in which an animal circles around on a slowly ascending track, such as the one in the Hayman et al. (Reference Hayman, Verriotis, Jovalekic, Fenton and Jeffery2011) study. It may be that the surface – the track – is highly salient, whereas the height cues are much less so, leading to under-representation of the ascending component of the animal's journey. Studies in animals that are not constrained to moving on a substrate, such as those that fly, will be needed to answer the question more generally. The theme of capture by local cues is also taken up by Dudchenko, Wood, & Grieves (Dudchenko et al.), who note that grid and place cell representations seem to be local, while head direction cells frequently maintain consistency across connected spaces. The implication is that in a truly volumetric, open space, odometry has the capacity to operate in all three dimensions, leading to the creation of an isotropic, volumetric map. Clearly, experiments with other species with other modes of travel and in other environments will be needed to answer the question of whether three-dimensional maps can be isotropic.
R2.2. Dimensionality of reference frames
Having distinguished between egocentric and allocentric reference frames, we turn now to the issue of dimensionality of these frames, which was the theme of our target article. Although the earlier analysis of reference frames drew a clear distinction between dimensions, the brain does not necessarily respect such distinctions, as pointed out by some of the commentators. Part of the reason for this blurring of the boundaries is that the brain works with information provided by the body's limited sense organs. Thus, it has to cope with the dimension reduction that occurs in the transformation among the real three-dimensional world, the two-dimensional sensory surfaces (such as the retina) that collect this information, and the fully elaborated multidimensional cognitive representation that the brain constructs.
Since information is lost at the point of reduction to two dimensions, it needs to be reconstructed again. An example of such reconstruction is the detection of slope using vision alone, in which slope is inferred, by the brain, from visual cues such as depth. Orban explores how the primate visual system constructs a three-dimensional representation of objects in space, and of slope based on depth cues, in the parietal cortex. However, the reconstruction process introduces distortions such as slope overestimation. Orban and Ross both observe that slope illusions vary depending on viewing distance, meaning that the brain has a complicated problem to solve when trying to tailor actions related to the slope. Durgin & Li suggest that the action system compensates for such perceptual distortion by constructing actions within the same reference frame, such that the distortions cancel.
Perception of three-dimensional spaces from a fixed viewpoint is one problem, but another quite different one concerns how to orchestrate actions within 3D space. For this, it is necessary to be able to represent the space and one's position within it – problems that are different for egocentric versus allocentric space.
Taking egocentric space first: How completely three-dimensional is the 3D reconstruction that the brain computes for near space (i.e., space within immediate reach)? Several commentators argue in favour of a representation having full three-dimensionality. Orban outlines how the parietal cortex, generally believed to be the site of egocentric spatial encoding (Galati et al. Reference Galati, Pelle, Berthoz and Committeri2010), is well specialized for representing space in all three dimensions, while Badets suggests that the spatial dimensions may be mapped to a common area in the parietal cortex that integrates according to a magnitude-based coding scheme (along with other magnitudes such as number). Lehky, Sereno, & Sereno (Lehky et al.) agree that primate studies of the posterior cortex in egocentric spatial tasks show clear evidence of three-dimensional encoding. They say, “While the dimensionality of space representation for navigation [our emphasis] in primates is an important topic that has not been well studied, there are physiological reasons to believe that it may be three-dimensional,” although they do not outline what those reasons are. By contrast, Phillips & Ogeil argue that even egocentric space is bicoded. First they appeal to evidence from perceptual illusions and neglect syndromes to show that vertical and horizontal spaces are affected differently. Then they turn to a theoretical analysis of the constraints on integration of vertical and horizontal space, and problems such as chaotic dynamics that can result from attempts at such integration. It therefore seems that more research is needed to understand the representation of near space, and whether or not it is different in different dimensions (anisotropic).
Turning to allocentric space, the issue of the dimensionality was explored extensively in the target article and again in the commentaries. The core questions concerning encoding of the third allocentric dimension have to do with whether it is encoded, and if so, how, and how it might (or might not) be integrated with the other two.
The question of whether it is encoded is addressed by several commentators. Weisberg & Newcombe make the important point that “vertical” comes in (at least) two forms, orthogonal to horizontal or orthogonal to the animal. They suggest that the different types of vertical may have different contributions to make to the overall encoding of the situation. For example, terrain slope may be useful in, for example, helping orient the local environment, but this information would be lost if vertical encoding were earth-horizontal–related. They suggest that both types of vertical may be encoded, and may or may not be integrated.
Other commentators focus on the encoding of volumetric (rather than slanting planar) three-dimensional space, and speculate about how such encoding may be achieved. Powers argues, on the basis of studies in robotics and engineering, for dimension reduction as a means of achieving coding efficiency by reducing redundancy (i.e., what Carbon & Hesslinger call “empty cells” that contain no information but take up representational space). In this light, the bicoded model would seem to offer such efficiency. Carbon & Hesslinger agree, arguing that a two-dimensional map with alternative information for the vertical dimension should suffice for most things, and they draw on comparative studies to support this. This seems a very reasonable proposition to us. For a surface-dwelling animal traversing, say, hilly terrain, there is only one z-location for each x-y position, and so the animal could in principle navigate perfectly well by computing only the x-y components of its journey. The only drawback would be in, say, comparing alternative routes that differed in how much ascent and descent they required. However, this could be encoded using some other metric than distance – for example, effort – much like encoding that one route goes through a boggy swamp and the other over smooth grassland. The navigation system does not necessarily have to perform trigonometry in all three dimensions to compute an efficient route across undulating terrain. For volumetric space, the problem becomes slightly less constrained because there are unlimited z-locations for every x-y point. However, again, it might be possible to navigate reasonably effectively by simply computing the trigonometric parameters in the x-y (horizontal) plane and then factoring in an approximate amount of ascent or descent.
An interesting related proposal is put forward by Schultheis & Barkowsky, that representational complexity could be expanded or contracted according to current need. According to their concept of scalable spatial representations, the detail and complexity of the activated representation varies with task demands. They suggest that the vertical dimension might be metrically encoded in situations that demand it (e.g., air traffic control) but not in situations that do not (e.g., taxi driving). Dimensionality of encoding might also be affected not just by task demands but by cognitive load. By analogy with the effects of load on attentional processing (Lavie Reference Lavie2005), one might suppose that in conditions of high cognitive load (such as a pilot flying on instruments), the degree to which all dimensions are processed might be restricted. Such processing restrictions could have important implications for training and instrument design for air- and spacecraft.
Wang & Street move in the opposite direction from dimension reduction. They argue not only that the cognitive representation of allocentric space is a fully fledged three-dimensional one, but also that with the same neural machinery it is possible to encode four spatial dimensions. They support their assertion with data showing that subjects can perform at above-chance levels on path-completion tasks that cross three or four dimensions, something they say should be possible only with a fully integrated volumetric (or perhaps “hypervolumetric”) map. Our view on this suggestion is that it is a priori unlikely that an integrated 4D map could be implemented by the brain because this would require a 4D compass, the selection pressure for which does not exist in a 3D world. Our interpretation of the 4D experiment is that if subjects could perform at above-chance levels, then they would do so using heuristics rather than full volumetric processing, and by extension the above-chance performance on 3D tasks may not require an integrated 3D map either. The experimental solution to this issue would be to identify neural activity that would correspond to a 4D compass, but computational modelling will be needed to make predictions about what such encoding would look like.
Given general agreement that there is some vertical information contained in the cognitive map (i.e., the map has at least 2.5 dimensions), the next question concerns the nature of this information – is it metric, and if not, then what is it? A fully volumetric map would have, of course, complete metric encoding of this dimension as well as of the other two, and Wang & Street argue for this. Others such as Savelli & Knierim and Yamahachi et al. argue that we will not know for sure until the relevant experiments are done with a variety of species and in a variety of environmental conditions. A number of commentators agree with our suggestion, however, that the vertical dimension (or, in the mosaic version of the bicoded model, the dimension orthogonal to the locomotor plane) is probably not encoded metrically. There are a variety of opinions as to what the encoded information is however. Phillips & Ogeil suggest that the other dimension is “time” rather than altitude, although it is not clear to us exactly how this would work. Nardi & Bingman support the proposal that “effort” could be a height cue. Sparks, O'Reilly, & Kubie (Sparks et al.) develop the notion of alternative factors further – they discuss the issue of optimization during navigation, pointing out that the shortest path is not necessarily optimal and that factors such as energy expenditure are also important. They suggest a “multi-coding” (as opposed to merely a “bicoding”) scheme to take into account these other factors. And as mentioned earlier, Weisberg & Newcombe suggest that there are two types of vertical information, a true aligned-with-gravity vertical and another that is related to terrain slope, raising the issue of how these two forms may be related, if at all. Thus, it appears that a variety of stimulus types might serve to input into the encoding of vertical space.
The final question regarding the vertical dimension is whether – assuming it is represented at all – it is combined with the horizontal ones to make a fully integrated, volumetric map. Burt de Perera, Holbrook, Davis, Kacelnik, & Guilford (Burt de Perera et al.) refer to studies of fish to argue that vertical space, although in their view coded metrically, is processed separately from horizontal space. They suggest some experiments to explore whether this is the case, by manipulating animals' responses in the vertical dimension and showing that these show evidence of quantitative encoding (e.g., by obeying Weber's law). However, similar results could also be found if the animal were using some kind of loose approximation to a metric computation (e.g., climbing to a certain level of exhaustion), and so the experiments would need to be carefully designed so as to show operation of true odometry.
Before leaving the issue of reference frames, it is worth briefly examining the little-discussed issue of how the brain may define an “origin” within these frames.
R2.3. The origin of reference frames
As noted in the earlier analysis of reference frames, a metric map needs an origin, or fixed point, against which the parameters of the space (distance and direction) can be measured. The question of whether the brain explicitly represents origins in any of its reference frames is completely unanswered at present, although a notional origin can sometimes be inferred. To take a couple of examples among many, neurons in the visual cortex that respond proportionally to distance from the eye (Pigarev & Levichkina Reference Pigarev and Levichkina2011) could be said to be using the retina as the “origin” in the antero-posterior dimension, while neurons that are gain modulated by head angle (Snyder et al. Reference Snyder, Grieve, Brotchie and Andersen1998) are using the neck as the “origin” (and straight ahead as the “reference direction”). Whether there exist origins for the other egocentric reference frame, however, is less clear.
For allocentric space, the origin becomes a more nebulous concept, although in central place forager species, such as rats, the central place (the nest) may constitute an “origin” of sorts. However, it may be that the brain uses not a fixed point as its reference so much as a fixed boundary. Place and grid cells, for example, anchor their firing to the boundaries of the local environment, moving their firing fields accordingly when the boundaries are moved en bloc. Furthermore, at least for grid cells, deformation (as opposed to rigid translation) of these boundaries in some cases leads to a uniform deformation of the whole grid with no apparent focus (Barry et al. Reference Barry, Hayman, Burgess and Jeffery2007), suggesting that the whole boundary contributes to anchoring the grid. Hence, it is not obvious that there is a single point-like origin in the entorhinal/hippocampal representation of allocentric space. However, it could be that the place cells themselves in essence provide multiple origins – their function may be to anchor spatially localized objects and events to the reference frame defined by the boundaries and grids. It may be that the simple, singular notion of origin that has been so useful in mathematics will need to be adapted to accommodate the brain's far more nebulous and multifarious ways of working.
R3. Comparative studies
We turn now to the ways in which studies from different species, with different evolutionary histories and ecological constraints, have contributed to our understanding of spatial encoding. Principles derived for one species may not apply to a different species in a different environment using a different form of locomotion. Orban points out, for example, that rodent and primate brains are very different, and that there are cortical regions in primates that do not even have homologues in rodents. Hence, extrapolation from rodents to primates must be done with care, a point with which we agree.
The importance of comparative studies in cognitive science lies in understanding how variations in body type, ecological niche, and life experience correlate with variations in representational capacity. This information tells us something about how these representations are formed – what aspects of them are critical and what aspects are just “add-ons” that evolved to support a particular species in a particular ecological niche. While comparative studies of three-dimensional spatial cognition are rare, they promise to be particularly revealing in this regard.
R3.1. Studies of humans
Humans are, naturally, the species that interests us the most, but there are important constraints on how much we can discover about the fine-grained architecture of our spatial representation, until non-invasive single neuron recording becomes possible. There are, however, other means of probing encoding schemes, such as behavioural studies. Longstaffe, Hood, & Gilchrist (Longstaffe et al.) and Berthoz & Thibault describe the methods that they use to understand spatial encoding in humans over larger (non-reachable) spaces. However, Carbon & Hesslinger point out that experimental studies need to take into account the ecological validity of the experimental situation, something which is sometimes difficult to achieve in laboratory studies in which natural large-scale three-dimensional spaces are rare. Virtual reality (VR) will likely help here, and future studies will be able to expand on previous research in two-dimensional spaces to exploit the capacity of VR subjects to “float” or “fly” through 3D space. This could be used not just during behavioural studies but also in neuroimaging tasks, in which the participating brain structures can be identified.
Some commentators describe the human studies in real (non-virtual) 3D spaces that have recently begun. Hölscher, Büchner, & Strube (Hölscher et al.) extend the anistropy analysis of our original article to architecture and environmental psychology, and note that there are individual differences in the propensity to form map-based versus route-based representations. Pasqualotto & Proulx describe prospects for the study of blind people, whose condition comprises a natural experiment that has the potential to help in understanding the role of visual experience in shaping spatial representation in the human brain. Klatzky & Giudice argue that humans are ecologically three-dimensional inasmuch as their eyes are raised above the ground and their long arms are able to reach into 3D space, meaning that it is adaptive to represent local space in a fully three-dimensional manner. Indeed, evidence supports that local (egocentric) space is represented isotropically in primates, a point made also by Orban, though contested by Phillips & Ogeil.
A final important facet of human study is that humans have language, which provides insights unavailable from animal models. In this vein, Holmes & Wolff note that spatial language differentiates horizontal axes from vertical axes more than it differentiates the horizontal ones from each other, thereby supporting the notion of anisotropy in how space is encoded. However, as discussed in the next section, the degree to which such anisotropy could be due to experience rather than innate cognitive architecture is still open for debate.
R3.2. Studies of nonhumans
Moving on to nonhuman animals, Carbon & Hesslinger review the various ways in which the spatial problems faced by animals of a variety of different ecologies reduce to a common set of mainly surface-referenced problems. However, not all commentators agree that a planar or quasi-planar map is supported by comparative studies. Orban observes that the nonhuman primate brain is much closer to the human than to the rodent brain on which much neurobiological work has been done, and notes that several parietal areas in primates are specialized for spatial representation. Howard & Fragaszy discuss the issue of nonhuman primates moving in a dense arboreal lattice. They suggest the use of laser scanning technology (LiDAR) to map out the movements of primates to determine how their movements are constrained (or at least informed) by the structure of the complex spaces through which they move.
Nonhuman primates are relatively close to humans in evolutionary terms, and commonalities in spatial behaviour might reflect this. Useful insights can therefore be gleaned by moving phylogenetically further away, to try and discern which features are ancient and foundational, and which are more species-specific. Moss advocates bats for the study of three-dimensional spatial encoding. Not only can bats fly, and thus move through space in an unconstrained way, but they also have a sense – echolocation – unavailable to most mammals. This provides a tool for understanding how a supramodal cognitive representation can be formed from sensory inputs that show local (species-specific) variation. Electrolocation and magnetic sense are additional sensory systems in other species that may also be informative here.
Moving even further away from humans, non-mammalian vertebrates, such as birds, and invertebrates have been valuable in providing comparative data that reveal both commonalities and differences in how different organisms represent space. Zappettini & Allen take issue with the notion that there is a map-like representation operating in nonhuman animals, suggesting that this is putting “the hypothetical cart before the evidential horse.” The debate about the operation of cognitive maps in animals is a longstanding one that is too complex to fully revisit in this forum. However, we would argue, in defence of our proposition, that the existence of grid cells is evidence of at least some kind of map-like process. That there are neurons in the brain that respond to distances and directions of travel is incontrovertible proof that rodent brains – and therefore likely the brains of other mammals, at least – compute these parameters. Regardless of whether one considers a representation incorporating distance and direction to necessarily be a map, it is nevertheless valid to ask whether encoding of these parameters extends into all three dimensions. To this extent we hope that common ground can be found with the map-skeptics in devising a program of study to explore real-world spatial processing. Zappettini & Allen also object to our use of evolutionary selection arguments to defend the bicoded hypothesis over the volumetric one. We agree that such arguments are weak and serve only to suggest hypotheses, and not to test them. However, suggesting hypotheses was the main aim of our target article, given the paucity of hard data available at present.
Other comparative biologists have been more relaxed about the notion of a map-like representation underlying navigational capabilities in nonhumans. Citing studies of avian spatial encoding, Nardi & Bingman suggest that taxonomically close but ecologically different species would be useful comparators – for example, chickadees (inhabiting an arboreal volumetric space) versus nutcrackers (storing food on the ground). Burt de Perera et al. use studies of fish to argue that vertical space, although coded separately from horizontal space, is nevertheless encoded metrically. This finding is interesting because fish can – unlike humans – move in an unconstrained way through 3D space, and would in theory benefit from an integrated 3D map if such had evolved. That the fish map is evidently not integrated in this way suggests that maybe such integrated maps never did evolve. Dyer & Rosa suggest that study of simple organisms such as bees can help reveal the ways in which complex behaviours can arise from simple building blocks, particularly when plasticity is added to the mix. They agree, however, that evidence suggests that in bees, too, space is represented in a bicoded fashion.
In summary, then, studies in comparative cognition may be tremendously informative in the unravelling of the neurocognitive representation of three-dimensional space, both by revealing which features are central and which are peripheral “add-ons,” and by showing how different ecological constraints and life experiences contribute to shaping the adult form of the representation.
R4. Experience
The final major theme in the commentaries concerned the possible role of experience in shaping spatial representations. Note that just because experience correlates with encoding format (bicoded, volumetric, etc.), it does not necessarily follow that it caused that format. For example, an animal that can fly will have (perhaps) an ability to encode 3D space, and also a lifetime's experience of moving through 3D space, but the encoding may have been hard-wired by evolution rather than arising from the individual's own life experience. Yamahachi et al. stress that more research is needed to determine the role of experience in shaping the structure of the map. This could be done by, for example, raising normally 3D-exploring animals in a restricted, 2D space to see whether the encoding type changes.
Experience, if it affects encoding at all, can act in two broad ways: It can operate during development to organize the wiring of the underlying neural circuits, or it can operate on the fully developed adult brain, via learning, to enable the subject to acquire and store new information. These two areas will be examined in turn.
R4.1. Developmental experience
It is now clear that infants are not blank slates on which all their adult capacities will be written by experience, but are born with many of their adult capacities having already been hard-wired, under genetic control. That said, it is also clear that developmental experience can shape the adult brain. How experience and hard-wiring interact is a matter of considerable interest.
Experience during ontogeny can shape development of a neural structure by affecting neural migration and axonal projections, or else, subsequently, by affecting dendritic connections and synapse formation. Migration and axon development are generally complete by the time a developing animal has the ability to experience three-dimensional space and so are unlikely to affect the final structure of the spatial representation. On the other hand, dendritic and synaptic proliferation and pruning processes are plentiful during infant development – however, they also take place to an extent during adulthood, when they come under the rubric of “learning.” It is evident, therefore, that development and learning show a considerable overlap.
Some sensory systems show critical periods in infancy during which experience is necessary for normal brain development, and interruption of which can cause lifelong impairment. Vision in mammals is a prime example of such experience-dependent plasticity: Monocular deprivation or strabismus (squint) occurring during the critical period, but not during adulthood, can permanently affect connection formation in the thalamus and visual cortex, resulting in adult visual impairment (amblyopia; Morishita & Hensch Reference Morishita and Hensch2008). A natural question arising from studies of three-dimensional encoding in animals, then, is whether infant experience can affect the formation of the adult cognitive representation.
Recent studies have suggested that head direction cells, place cells, and grid cells start to operate in adult-looking ways very early in infancy in rat pups. In two studies published together in a 2010 issue of Science (Langston et al. Reference Langston, Ainge, Couey, Canto, Bjerknes, Witter, Moser and Moser2010; Wills et al. Reference Wills, Cacucci, Burgess and O'Keefe2010), cells were recorded from the first day the pups started exploring, 16 days after birth. Head direction cells already showed adult-like firing and stability even though the pups had had very little experience of movement and only a few days of visual experience. Place cells were also apparent from the earliest days of exploration, although their location-specific firing improved in coherence and stability with age. Grid cells were the last cell type to mature, appearing at about 3–4 weeks of age, but when they did appear they showed adult-like patterns of firing with no apparent requirement for experience-dependent tuning. Thus, the spatial system of the rat seems to come on-stream with most of its adult capabilities already present, suggesting a considerable degree of hard-wiring of the spatial representation.
Is this true for vertical space as well? The experiment of Hayman et al. (Reference Hayman, Verriotis, Jovalekic, Fenton and Jeffery2011) showed a defect in vertical grid cell odometry, and perhaps this defect is due to the limited 3D experience that rats in standard laboratory cages have had. However, if extensive experience of 2D environments is not needed for the spatial cells to operate in the way that they do, as is manifestly the case, then experience of 3D environments may not be needed either. Thus, the question of whether infant experience is necessary for adult cognitive function in the spatial domain is one that awaits further data for its resolution.
Although the experiments have yet to be done, some theoretical analysis of this issue is beginning. Stella, Si, Kropff, & Treves (Stella et al.) describe a model of grid formation in two dimensions that arises from exploration combined with simple learning rules, and suggest that this would extend to three dimensions, producing a face-centered cubic array of grid fields. We would note here that such a structure would nevertheless be anisotropic, because the fields do not form the same pattern when transected in the horizontal plane as in the vertical. While such a grid map would be metric, Peremans & Vanderelst discuss a theoretical proposal whereby a non-metric, topological map could be built up from experience, by recording approximate distances travelled between nodes and using these to build a net that approximates (topologically) the real space. They note that this would look anisotropic because of differential experience of travel in the different dimensions, even though the same rules operate in all three dimensions. This is an interesting proposal that predicts that animals with equal experience in all three dimensions should produce isotropic spatial representations, something that may be tested during recordings on flying animals such as bats.
R4.2. Adult experience
As noted above, there are considerable overlaps in the biological processes supporting infant development and adult learning, because the processes of dendritic arborization and synaptic pruning that support experience-dependent plasticity operate in both domains (Tavosanis Reference Tavosanis2012). The question arises, then, as to whether adult experience can shape the structure of a complex cognitive representation such as that of space.
Dyer & Rosa suggest that in bees, although movement in the vertical dimension (determined using optic flow) is usually represented with a lower degree of precision, brain plasticity could allow the experience-dependent tuning of responses such that precision in this dimension could be increased. In this way, the organisms can learn to use navigationally relevant information particular to a given environment. This does not mean, however, that they learn a different mechanism for combining information. Moreover, Dyer & Rosa agree that the evidence supports the notion of a bicoded structure to spatial processing in bees, even with experience.
Experiments involving human subjects have provided insights into how experience might constrain spatial representation. Berthoz & Thibault discuss how the learning experience during exploration of a multilayer environment can shape the way in which subjects subsequently tackle navigation problems in that environment (Thibault et al. Reference Thibault, Pasqualotto, Vidal, Droulez and Berthoz2013). Pasqualotto & Proulx explore the effects that visual deprivation (through blindness) can have on the adult spatial representation that forms. They suggest that congenitally blind individuals may not exhibit quasi-planar spatial representation of three-dimensional environments, positing that the absence of visual experience of the 3D world may shape how spatial encoding occurs. This is a matter that awaits further study, as 3D spatial encoding has not yet been explored in subjects who are blind. These authors suggest several ways in which such studies might be conducted, using sensory substitution devices (which convert visual information into tactile or auditory) in order to create 3D virtual reality environments. The difference between subjects who are congenitally blind and those who had had visual experience in early life will be particularly interesting here.
Experience may play a role in how normal-sighted subjects perceive and interpret the spatial world, too. Bianchi & Bertamini discuss the interesting errors that human subjects make when predicting what will be visible in a mirror as they approach it, and show that the errors are different for horizontal versus vertical relationships. Tellingly, adults make more errors than children. These findings suggest that experience may play a part in generation of errors – adults have more experience of crossing mirrors from one side to the other than from top to bottom, and more experience of passing in front of mirrors generally than children do. This may perhaps be a case where experience diminishes the capacity to accurately represent a space, because adults have enhanced ability to make inferences, which they do (in this case) erroneously.
While it is certainly the case that experience is likely to inform the adult cognitive representation of space, our hypothesis is that experience is not necessary to construct it. This is because the rodent studies described earlier suggest a high degree of representational complexity even in animals that have had impoverished spatial life experience. Future research in animals reared in complex 3D environments will be needed to determine the extent to which experience is needed for formation and refinement of the brain's map of 3D space.
R5. Conclusion
It is apparent from the commentaries on our target article that there are many questions to be answered concerning the relatively nascent field of three-dimensional spatial encoding. The most salient questions fall into the categories outlined in this response, but no doubt more will emerge as spatial cognition researchers start to tackle the problem of representing large-scale, complex spaces. The work will require an interplay between behavioural and neurobiological studies in animals and humans, and the efforts of computational cognitive scientists will be needed to place the results into a theoretical framework. The end result will, we hope, be a body of work that is informative to many disciplines involved both in understanding natural cognitive systems and also in building artificial spaces and artificial devices.
ACKNOWLEDGMENTS
The work was supported by grants to Kate Jeffery from the Wellcome Trust (GR083540AIA), the European Commission's Seventh Framework Programme (SPACEBRAIN), the Medical Research Council (G1100669), and the Biotechnology and Biological Sciences Research Council (BB/J009792/1), and by a grant to Axona Ltd from the European Commission's Seventh Framework Programme (SPACEBRAIN).
Target article
Navigating in a three-dimensional world
Related commentaries (32)
Anisotropy and polarization of space: Evidence from naïve optics and phenomenological psychophysics
Applying the bicoded spatial model to nonhuman primates in an arboreal multilayer environment
Are all types of vertical information created equal?
Augmented topological maps for three-dimensional navigation
Development of human spatial cognition in a three-dimensional world
Does evidence from ethology support bicoded cognitive maps?
Foreshortening affects both uphill and downhill slope perception at far distances
Grid maps for spaceflight, anyone? They are for free!
Has a fully three-dimensional space map never evolved in any species? A comparative imperative for studies of spatial cognition
Human path navigation in a three-dimensional world
Just the tip of the iceberg: The bicoded map is but one instantiation of scalable spatial representation structures
Learning landmarks and routes in multi-floored buildings
Learning to navigate in a three-dimensional world: From bees to primates
Making a stronger case for comparative research to investigate the behavioral and neurological bases of three-dimensional navigation
Map fragmentation in two- and three-dimensional environments
Monkeys in space: Primate neural data suggest volumetric representations
Multi-floor buildings and human wayfinding cognition
Navigating in a volumetric world: Metric encoding in the vertical axis of space
Navigating through a volumetric world does not imply needing a full three-dimensional representation
Navigation bicoded as functions of x-y and time?
Perceptual experience as a bridge between the retina and a bicoded cognitive map
Semantic sides of three-dimensional space representation
Spatial language as a window on representations of three-dimensional space
The complex interplay between three-dimensional egocentric and allocentric spatial representation
The planar mosaic fails to account for spatially directed action
The problem of conflicting reference frames when investigating three-dimensional space in surface-dwelling animals
The study of blindness and technology can reveal the mechanisms of three-dimensional navigation
Think local, act global: How do fragmented representations of space allow seamless navigation?
Vertical and veridical – 2.5-dimensional visual and vestibular navigation
What counts as the evidence for three-dimensional and four-dimensional spatial representations?
What is optimized in an optimal path?
Which animal model for understanding human navigation in a three-dimensional world?
Author response
A framework for three-dimensional navigation research