Numerosity perception is a key aspect of the number sense, and it is thought to be supported by a specialized mechanism, the approximate number system (ANS), which in primates has a specific neural substrate in the intraparietal sulcus (Nieder & Dehaene Reference Nieder and Dehaene2009). The finding that continuous visual properties influence numerosity judgments is used by Leibovich et al. as a main argument to claim that numerosity is processed holistically with continuous magnitudes. Their hypothesis that people do not extract numerosity independently from continuous magnitudes, as well as the related claim that perceived numerosity is simply the result of weighting a variety of continuous visual properties (Gebuis & Reynvoet Reference Gebuis and Reynvoet2012b), challenges a central tenet of the ANS theory and the notion of number sense more generally. However, this hypothesis is not grounded in any formal (mathematical or computational) model: In particular, it lacks any details about which continuous properties are necessary and sufficient to estimate numerosity, as well as how these continuous properties are extracted from the visual display in the first place. Together with the apparent circularity in the statement that “number sense develops from understanding the correlation between numerosity and continuous magnitudes” (Leibovich et al., sect. 8, para. 6), this leads to a “non-numerical” account of numerosity perception that does not seem to have the explanatory value that one should expect from a cognitive theory.
The nature of the mechanisms underlying numerosity perception has been debated for decades (e.g., Allik & Tuulmets Reference Allik and Tuulmets1991, Burr & Ross Reference Burr and Ross2008; Dehaene & Changeux Reference Dehaene and Changeux1993; Durgin Reference Durgin1995), and the fact that numerosity perception can be non-veridical has been known even longer (e.g., Frith & Frith Reference Frith and Frith1972). However, recent computational modeling work based on unsupervised learning in “deep” neural networks (see Zorzi et al. [Reference Zorzi, Testolin and Stoianov2013] and Testolin & Zorzi [Reference Testolin and Zorzi2016] for a review of the approach) has provided a state-of-the-art and neurobiologically plausible account of how visual numerosity is extracted from real images of object sets. Stoianov and Zorzi (Reference Stoianov and Zorzi2012) showed that numerosity emerges as a high-order statistical property of images in deep networks that learn a hierarchical generative model of the sensory input. Learning in the network only involved “observing” images, and it aimed at efficient coding of those images, without providing any information about numerosity (i.e., there was no teaching signal). As a result of this unsupervised learning, number-sensitive neurons emerged in the deepest layer of the network, with tuning functions that mirrored those of biological neurons in the monkey parietal cortex (Roitman et al. Reference Roitman, Brannon and Platt2007). In agreement with the ANS hypothesis, the numerosity signal encoded by the population of number-sensitive neurons in the model was found to be largely invariant to continuous visual properties, and it supported numerosity estimation with the same behavioral signature (i.e., Weber's law for numbers) and accuracy level (i.e., number acuity) of human adults. Preliminary analyses of learning trajectory in the model also revealed good match to developmental changes in number acuity in infancy and childhood (Stoianov & Zorzi Reference Stoianov and Zorzi2013).
Detailed analysis of the emergent computations in the Stoianov and Zorzi (Reference Stoianov and Zorzi2012) model showed that numerosity is abstracted from lower-level visual primitives through a simple two-level hierarchical process that exploits cumulative surface area as a normalization signal (Fig. 1). Contrary to Leibovich et al.'s “holistic” hypothesis that the number sense develops on the basis of a “sense of magnitude,” the essential primitive in the emergent computations is not a continuous property but high-frequency spatial filters (implemented by center-surround neurons) that discretize the visual input. Note that the key role of high-frequency spatial filtering has been independently highlighted by Dakin et al. (Reference Dakin, Tibber, Greenwood, Kingdom and Morgan2011) in their psychophysical model. In summary, visual numerosity is a high-order summary statistic in the Stoianov and Zorzi (Reference Stoianov and Zorzi2012) model, but this is the result of hierarchical non-linear computations rather than a simple weighted combination of continuous visual properties. Accordingly, numerosity comparison turns out to be impossible when the raw image is the only input to the decision (even when trained using machine learning algorithms; see Stoianov & Zorzi [Reference Stoianov and Zorzi2012]). Nevertheless, the emerged hierarchical mechanism is relatively simple and this fits well with the long phylogenetic history of the visual number sense (from fish [Agrillo et al. Reference Agrillo, Piffer, Bisazza and Butterworth2012], to primates [Brannon & Terrace Reference Brannon and Terrace1998]).
Figure 1. Computational model of numerosity perception based on a hierarchical (i.e., deep) neural network architecture (Stoianov & Zorzi Reference Stoianov and Zorzi2012). A layer of center-surround detectors with small receptive fields discretize the visual input and provide their signal to another layer of neurons with larger receptive fields. These number-sensitive neurons compute a local numerosity signal invariant of perceptual properties by means of inhibitory normalization. The population activity of number-sensitive neurons provides the final numerosity signal.
The Stoianov and Zorzi (Reference Stoianov and Zorzi2012) model also suggests that the normalization process embedded into the mechanism extracting abstract numerosity may be inefficient in particular circumstances, such as when a strong manipulation of continuous visual cues generates high uncertainty (low signal-to-noise ratio) for the numerosity judgment (e.g., Fig. 3B in Leibovich et al.), and this effect is exacerbated by pathological conditions that affect inhibitory processing. This crucial insight can be illustrated with the combined behavioral-computational investigation of Cappelletti et al. (Reference Cappelletti, Didino, Stoianov and Zorzi2014), which showed that the decline of number acuity in elderhood (Halberda et al. Reference Halberda, Ly, Wilmer, Naiman and Germine2012) was limited to stimuli in which numerosity is incongruent with cumulative surface area. In turn, this effect was linked to the (in)efficiency of inhibitory processing, as indexed by performance in classic cognitive control tasks (e.g., Stroop paradigm). Simulations with the Stoianov and Zorzi (Reference Stoianov and Zorzi2012) model revealed that degraded synaptic inhibition, which specifically affected inhibitory normalization in the network (Fig. 1), induced impaired comparison performance for incongruent stimuli while preserving performance on congruent stimuli, thereby accounting for the data in elderly humans. Note that the notion of inhibitory normalization in the Stoianov and Zorzi (Reference Stoianov and Zorzi2012) model is by no means equivalent to the hypothesis that inhibition is required to suppress irrelevant continuous properties at the decision level (as in Leibovich et al.), although we do not a priori exclude this additional effect. Inefficient normalization might also be involved in the atypical performance of children with developmental dyscalculia (Bugden & Ansari Reference Bugden and Ansari2016; Piazza et al. Reference Piazza, Facoetti, Trussardi, Berteletti, Conte, Lucangeli, Dehaene and Zorzi2010).
In conclusion, state-of-the-art computational modeling reveals that numerosity perception is supported by an emergent neurocomputational mechanism (implementing the ANS) that cannot be reduced to a simple combination of continuous visual properties. In contrast to the proposal of Leibovich et al., our modeling work shows that (1) the emergence of number sense is the result of a learning process, but it does not hinge upon a pre-existing “sense of magnitude” or the availability of numerical labels; (2) numerosity can be abstracted from continuous magnitudes; and (3) the influence of continuous visual properties on numerosity judgments simply taps the limits of the normalization process that is embedded into the ANS. Any alternative theoretical account, including that of Leibovich et al., should be implemented as a formal (computational) model and compared to that of Stoianov and Zorzi (Reference Stoianov and Zorzi2012) in terms of descriptive adequacy, as it is current (and best) practice in other cognitive domains.
Numerosity perception is a key aspect of the number sense, and it is thought to be supported by a specialized mechanism, the approximate number system (ANS), which in primates has a specific neural substrate in the intraparietal sulcus (Nieder & Dehaene Reference Nieder and Dehaene2009). The finding that continuous visual properties influence numerosity judgments is used by Leibovich et al. as a main argument to claim that numerosity is processed holistically with continuous magnitudes. Their hypothesis that people do not extract numerosity independently from continuous magnitudes, as well as the related claim that perceived numerosity is simply the result of weighting a variety of continuous visual properties (Gebuis & Reynvoet Reference Gebuis and Reynvoet2012b), challenges a central tenet of the ANS theory and the notion of number sense more generally. However, this hypothesis is not grounded in any formal (mathematical or computational) model: In particular, it lacks any details about which continuous properties are necessary and sufficient to estimate numerosity, as well as how these continuous properties are extracted from the visual display in the first place. Together with the apparent circularity in the statement that “number sense develops from understanding the correlation between numerosity and continuous magnitudes” (Leibovich et al., sect. 8, para. 6), this leads to a “non-numerical” account of numerosity perception that does not seem to have the explanatory value that one should expect from a cognitive theory.
The nature of the mechanisms underlying numerosity perception has been debated for decades (e.g., Allik & Tuulmets Reference Allik and Tuulmets1991, Burr & Ross Reference Burr and Ross2008; Dehaene & Changeux Reference Dehaene and Changeux1993; Durgin Reference Durgin1995), and the fact that numerosity perception can be non-veridical has been known even longer (e.g., Frith & Frith Reference Frith and Frith1972). However, recent computational modeling work based on unsupervised learning in “deep” neural networks (see Zorzi et al. [Reference Zorzi, Testolin and Stoianov2013] and Testolin & Zorzi [Reference Testolin and Zorzi2016] for a review of the approach) has provided a state-of-the-art and neurobiologically plausible account of how visual numerosity is extracted from real images of object sets. Stoianov and Zorzi (Reference Stoianov and Zorzi2012) showed that numerosity emerges as a high-order statistical property of images in deep networks that learn a hierarchical generative model of the sensory input. Learning in the network only involved “observing” images, and it aimed at efficient coding of those images, without providing any information about numerosity (i.e., there was no teaching signal). As a result of this unsupervised learning, number-sensitive neurons emerged in the deepest layer of the network, with tuning functions that mirrored those of biological neurons in the monkey parietal cortex (Roitman et al. Reference Roitman, Brannon and Platt2007). In agreement with the ANS hypothesis, the numerosity signal encoded by the population of number-sensitive neurons in the model was found to be largely invariant to continuous visual properties, and it supported numerosity estimation with the same behavioral signature (i.e., Weber's law for numbers) and accuracy level (i.e., number acuity) of human adults. Preliminary analyses of learning trajectory in the model also revealed good match to developmental changes in number acuity in infancy and childhood (Stoianov & Zorzi Reference Stoianov and Zorzi2013).
Detailed analysis of the emergent computations in the Stoianov and Zorzi (Reference Stoianov and Zorzi2012) model showed that numerosity is abstracted from lower-level visual primitives through a simple two-level hierarchical process that exploits cumulative surface area as a normalization signal (Fig. 1). Contrary to Leibovich et al.'s “holistic” hypothesis that the number sense develops on the basis of a “sense of magnitude,” the essential primitive in the emergent computations is not a continuous property but high-frequency spatial filters (implemented by center-surround neurons) that discretize the visual input. Note that the key role of high-frequency spatial filtering has been independently highlighted by Dakin et al. (Reference Dakin, Tibber, Greenwood, Kingdom and Morgan2011) in their psychophysical model. In summary, visual numerosity is a high-order summary statistic in the Stoianov and Zorzi (Reference Stoianov and Zorzi2012) model, but this is the result of hierarchical non-linear computations rather than a simple weighted combination of continuous visual properties. Accordingly, numerosity comparison turns out to be impossible when the raw image is the only input to the decision (even when trained using machine learning algorithms; see Stoianov & Zorzi [Reference Stoianov and Zorzi2012]). Nevertheless, the emerged hierarchical mechanism is relatively simple and this fits well with the long phylogenetic history of the visual number sense (from fish [Agrillo et al. Reference Agrillo, Piffer, Bisazza and Butterworth2012], to primates [Brannon & Terrace Reference Brannon and Terrace1998]).
Figure 1. Computational model of numerosity perception based on a hierarchical (i.e., deep) neural network architecture (Stoianov & Zorzi Reference Stoianov and Zorzi2012). A layer of center-surround detectors with small receptive fields discretize the visual input and provide their signal to another layer of neurons with larger receptive fields. These number-sensitive neurons compute a local numerosity signal invariant of perceptual properties by means of inhibitory normalization. The population activity of number-sensitive neurons provides the final numerosity signal.
The Stoianov and Zorzi (Reference Stoianov and Zorzi2012) model also suggests that the normalization process embedded into the mechanism extracting abstract numerosity may be inefficient in particular circumstances, such as when a strong manipulation of continuous visual cues generates high uncertainty (low signal-to-noise ratio) for the numerosity judgment (e.g., Fig. 3B in Leibovich et al.), and this effect is exacerbated by pathological conditions that affect inhibitory processing. This crucial insight can be illustrated with the combined behavioral-computational investigation of Cappelletti et al. (Reference Cappelletti, Didino, Stoianov and Zorzi2014), which showed that the decline of number acuity in elderhood (Halberda et al. Reference Halberda, Ly, Wilmer, Naiman and Germine2012) was limited to stimuli in which numerosity is incongruent with cumulative surface area. In turn, this effect was linked to the (in)efficiency of inhibitory processing, as indexed by performance in classic cognitive control tasks (e.g., Stroop paradigm). Simulations with the Stoianov and Zorzi (Reference Stoianov and Zorzi2012) model revealed that degraded synaptic inhibition, which specifically affected inhibitory normalization in the network (Fig. 1), induced impaired comparison performance for incongruent stimuli while preserving performance on congruent stimuli, thereby accounting for the data in elderly humans. Note that the notion of inhibitory normalization in the Stoianov and Zorzi (Reference Stoianov and Zorzi2012) model is by no means equivalent to the hypothesis that inhibition is required to suppress irrelevant continuous properties at the decision level (as in Leibovich et al.), although we do not a priori exclude this additional effect. Inefficient normalization might also be involved in the atypical performance of children with developmental dyscalculia (Bugden & Ansari Reference Bugden and Ansari2016; Piazza et al. Reference Piazza, Facoetti, Trussardi, Berteletti, Conte, Lucangeli, Dehaene and Zorzi2010).
In conclusion, state-of-the-art computational modeling reveals that numerosity perception is supported by an emergent neurocomputational mechanism (implementing the ANS) that cannot be reduced to a simple combination of continuous visual properties. In contrast to the proposal of Leibovich et al., our modeling work shows that (1) the emergence of number sense is the result of a learning process, but it does not hinge upon a pre-existing “sense of magnitude” or the availability of numerical labels; (2) numerosity can be abstracted from continuous magnitudes; and (3) the influence of continuous visual properties on numerosity judgments simply taps the limits of the normalization process that is embedded into the ANS. Any alternative theoretical account, including that of Leibovich et al., should be implemented as a formal (computational) model and compared to that of Stoianov and Zorzi (Reference Stoianov and Zorzi2012) in terms of descriptive adequacy, as it is current (and best) practice in other cognitive domains.
ACKNOWLEDGMENTS
The study was supported by grants from the European Research Council (No. 210922) and the University of Padova (Strategic Grant NEURAT) to M.Z. I.P.S. was supported by a Marie Curie Intra European Fellowship (PIEF-GA-2013-622882) within the 7th Framework Program.