Vertical and veridical – 2.5-dimensional visual and vestibular navigation

David M. W. Powers

doi:10.1017/S0140525X13000526

Vertical and veridical – 2.5-dimensional visual and vestibular navigation

Published online by Cambridge University Press: 08 October 2013

David M. W. Powers

Show author details

David M. W. Powers*: Affiliation:
CSEM Centre for Knowledge and Interaction Technology, Flinders University, Adelaide, SA 5001, Australia. David.Powers@flinders.edu.auhttp://flinders.edu.au/people/David.Powers

Article contents

Abstract
References

Rights & Permissions

Abstract

Does the psychological and neurological evidence concerning three-dimensional localization and navigation fly in the face of optimality? This commentary brings a computational and robotic engineering perspective to the question of “optimality” and argues that a multicoding manifold model is more efficient in several senses, and is also likely to extend to “volume-travelling” animals, including birds or fish.

Type: Open Peer Commentary
Information: Behavioral and Brain Sciences , Volume 36 , Issue 5 , October 2013 , pp. 562 - 563

DOI: https://doi.org/10.1017/S0140525X13000526 [Opens in a new window]
Copyright: Copyright © Cambridge University Press 2013

We think we live in a three-dimensional world, so it is natural to think that our representation of our environment would reflect a three-dimensional structure, and be represented as such in our three-dimensional brain. In fact, we operate in at least a four-dimensional world, with time being a dimension that also needs representation in order to process events, deal with causality, and so forth. The traditional computing model, whether the one-dimensional Turing Machine or the sequential address space of a central processing unit (CPU), is to embed multidimensional structures by arithmetic or referential mappings to linear arrays. Computer circuits and living brains also have a strong, locally two-dimensional structure. In the cortex, the layers of the brain represent both layers of complexity and a mapping of time to depth, with opportunity for increasingly diverse associations to be formed. The cochlear is essentially a one-dimensional representation physically coiled and temporally coded for time and frequency. The retina is essentially a two-dimensional representation with quaternary coding for frequency, and again an analog encoding of time and amplitude. The vestibular system is a collection of essentially one-dimensional sensors, as described by Jeffery et al., who note that there would be considerable complexity in producing a three-dimensional analog representation from this.

Although the three-dimensional world nominally has volume “larger than a plane by a power of 3/2” (sect. 2, para. 4), the vertical dimension is in general many orders of magnitude smaller than the distances navigated in the horizontal plane, there is no need to encode it this way, and even “volume-travelling” creatures tend to maintain a preferred orientation in relation to gravity, as Jeffery et al. point out (sect. 5.1, para. 11). Animals also tend to operate in strata, being far from “unconstrained.” In fact, there is arguably less constraint for a rat jumping around a three-dimensional maze or a squirrel jumping around an even more complex forest – for example, they can move vertically in a way a bird or a fish cannot. Similarly, the “three planes” of directional information when viewed egocentrically, combined with the vertical dimension “characterized by gravity” (sect. 2, para. 4), provide a natural two-dimensional manifold which can be topologically expanded locally when there is more information available, as is well known from self-organizing maps (von der Malsburg Reference von der Malsburg1973). The constraints of gravity do not “add” complexity, but rather give rise to both constraints and heuristics that reduce complexity.

The visual system is also constrained by the locus of eyegaze and the characteristic poses with respect to gravity. Here we have two-dimensional sensors from which three-dimensional information could be reconstructed, but again the evidence points to a 2.5-dimensional model (Marr Reference Marr1982). But the two-dimensional plane of the retina tends to be near vertical, whereas the two-dimensional plane of locomotion tends to be near horizontal, and it would seem natural that both should track egocentrically when these assumptions are invalid, with discrepancies between the canonical gravitational and egocentric navigational model leading to errors that increase with the deviation.

Jeffery et al. have little to say about navigation, focusing almost exclusively on representation without much comment on aspects of the reviewed experiments that relate to their actual title. When we consider vision and navigation in robotics, whether underwater, aerial, or surface vehicles are used, we tend to use a two-dimensional model coded with far more than just one additional piece of dimensional information. We tend to track and correct individual sensor readings to keep track of our location, based on simple models that predict based on location, velocity, and acceleration, and we also attach information about the terrain, the temperature, the sights and sounds, and potentially also information about how hard or dangerous, or how much energy is being expended. These factors become inputs for both algorithmic and heuristic components of the actual path planning that is the key task in navigation. One additional factor that is important here is the ability to deal with the world at multiple scales, with approximate paths being worked out for a coarse scale before zooming in recursively to plan the coarse stages in detail. Even fully detailed two-dimensional maps lead to impossibly complex calculations of an optimal path, so the solution is to select a reasonable one at a coarse level based on high-level detail, and then burrow down into the detail when needed. Humans also have difficulty navigating more than two dimensions, and even 2.5 dimensions is not in general a good computer interface feature (Cockburn & McKenzie Reference Cockburn, McKenzie, Beaudouin-Lafon and Jacob2001; Reference Cockburn, McKenzie and Terveen2002).

In terms of Shannon Information, the most frequent events or landmarks or percepts are the least interesting and useful, and Kohonen Self-Organizing Maps (SOM) will tend to allocate area to its inputs in a monotonic increasing but sublinear way (p ^2/3 rather than -log p) for a gain in efficiency of representation, whilst the Zipfian Principle of Least Effort will ensure encoding so that the frequent data can be dealt with faster, and sparse representation principles will lead us to store events that occur rather than placeholders for events that never occur or voxels that are seldom occupied, as would be the case for a four-dimensional model with a cell for every time and place. In fact, the anti-Shannon allocation of memory in a SOM means there is more real estate available for the subtle details that correspond to the next level of detail in our robotic model. For the squirrel this might be brain space for the vertical dimensions and horizontal layers of a forest, and, more importantly still, the nutritional and territorial dimensions. The latter is an important fifth dimension for the squirrel, marked by its own pheromones and those of its potential mates and potential predators.

Jeffery et al. argue for a 2.5-dimensional model, but allow that this could be a manifold with an egocentric normal rather than a gravimetric plane. They seem to prefer a mosaic model, which is reminiscent of many robotic and vision models where extra details are linked in. The manifold model would in many ways seem to fit better with current ideas of brain processing and self-organization, although a distributed mosaic of contextual associations with multiple sensory-motor areas would be a highly plausible model for survival navigation.

References

Cockburn, A. & McKenzie, B. (2001) 3D or not 3D? Evaluating the effect of the third dimension in a document management system. In: ed. Beaudouin-Lafon, M. & Jacob, R. J. K., Proceedings of CHI'2001, Conference on Human Factors in Computing Systems, pp. 434–41. ACM Press.Google Scholar

Cockburn, A. & McKenzie, B. (2002) Evaluating the effectiveness of spatial memory in 2D and 3D physical and virtual environments. In: ed. Terveen, L., Proceedings of CHI'2002, Conference on Human Factors in Computing Systems, pp. 203–10. ACM Press.Google Scholar

Marr, D. (1982) Vision. A computational investigation into the human representation and processing of visual information. W.H. Freeman.Google Scholar