Bowers et al. describe the importance of targeted behavioral experiments when evaluating deep neural networks as models of biological vision. We agree with the sentiment and draw parallels to the notion that “neuroscience needs behavior” (Krakauer, Ghazanfar, Gomez-Marin, MacIver, & Poeppel, Reference Krakauer, Ghazanfar, Gomez-Marin, MacIver and Poeppel2017). A major point raised by Bowers et al. is that one system – a neural network – can provide an excellent prediction of another system – the visual system – while relying on entirely different mechanisms. Carefully designed behavioral experiments are needed to assess how good the match really is. This point echoes the historic multiple realizability argument highlighted by Krakauer et al., which states that different (neural) mechanisms can solve the same computational problem. Krakauer and colleagues proposed the same solution: Carefully designed behavioral experiments, to generate and test hypotheses about the neural mechanisms that give rise to behavior. In essence, neuroscience and modeling both need behavior to guide hypothesis testing and theory development in their endeavor to understand how the brain works.
What types of behavioral experiments are best suited to evaluate deep neural networks as models of biological vision? As suggestions for the modeling community, we take inspiration from solutions pioneered by neuroscience in recent years (e.g., Snow & Culham, Reference Snow and Culham2021). There is growing realization that real-world object recognition engages distinct neural responses compared to the behaviors involved with standard image recognition tasks. In the traditional experiment, observers respond with button presses to images displayed on a computer monitor as brain activity is recorded. This approach has provided important insights into biological vision and has served as a great starting point for model evaluation (e.g., Jozwik, Kietzmann, Cichy, Kriegeskorte, & Mur, Reference Jozwik, Kietzmann, Cichy, Kriegeskorte and Mur2023). However, traditional experiments do not fully capture how humans interact with objects in real-world environments.
We suggest that our experiments should increasingly mimic real-world behavior, by: (1) including tasks beyond image recognition when evaluating deep neural networks, and (2) developing platforms that enable simulation of realistic task environments. Using these environments, both humans and models can be subjected to a wide range of real-world behavioral tasks such as object tracking (e.g., following a moving animal) or visual search (e.g., finding objects in cluttered scenes); also see Peters and Kriegeskorte (Reference Peters and Kriegeskorte2021) for discussions. The researcher will be offered a level of control that supports carefully designed experiments while maintaining ecological validity. The proposed platforms are now within reach thanks to advances in virtual reality and three-dimensional (3D) computer graphics, which are yielding powerful game engines accessible to psychologists and modelers alike. Promising recent approaches have extended the Unity game engine to the design of psychology experiments (e.g., Alsbury-Nealy et al., Reference Alsbury-Nealy, Wang, Howarth, Gordienko, Schlichting and Duncan2022; Brookes et al., Reference Brookes, Warburton, Alghadier, Mon-Williams and Mushtaq2020; Peters, Retchin, & Kriegeskorte, Reference Peters, Retchin and Kriegeskorte2022; Starrett et al., Reference Starrett, McAvan, Huffman, Stokes, Kyle and Ekstrom2021) and the simulation of interactive physics (e.g., ThreeDWorld; Gan et al., Reference Gan, Schwartz, Alter, Mrowca, Schrimpf, Traer and Yamins2021).
Importantly, we suggest that the behavior in task environments should include the measurement of continuous dependent variables that unfold over time. Traditional cognitive psychology and neuroscience experiments use binary metrics such as “yes/no” or “multiple-choice” questions with one correct option among competitors (e.g., image classification). By contrast, humans in the real world have evolved to complete unstructured tasks in service of survival-related goals. We use cognitive abilities honed through millions of years of primate evolution and over a decade of childhood development to navigate environments, build tools, find food, solve problems, and interact with other humans in cooperative and competitive settings. These dynamic behaviors involve head, body, and limb movements (Adolph & Franchak, Reference Adolph and Franchak2017) and are based on internal decisions made from the input received from our sensory organs at millisecond timescales (Stanford, Shankar, Massoglia, Costello, & Salinas, Reference Stanford, Shankar, Massoglia, Costello and Salinas2010). Measuring the continuous behavioral dynamics may allow for richer understanding compared to discrete variables that average over many experimental trials (Spivey, Reference Spivey2007; for object memory dynamics, see Li, Yuan, Pun, & Barense, Reference Li, Yuan, Pun and Barense2023; for navigation dynamics, see de Cothi et al., Reference de Cothi, Nyberg, Griesbauer, Ghanamé, Zisch, Lefort and Spiers2022; for “continuous psychophysics,” see Straub & Rothkopf, Reference Straub and Rothkopf2022).
The models we build should also explain neural activity measured as humans complete different experimental tasks. Not only will this approach create a wealth of interdisciplinary opportunities, but modelers could take advantage of psychology and neuroscience theory which continues to make important predictions about behavior (e.g., Behrens et al., Reference Behrens, Muller, Whittington, Mark, Baram, Stachenfeld and Kurth-Nelson2018; Cowell, Barense, & Sadil, Reference Cowell, Barense and Sadil2019). As one example, the anterior temporal lobes are theorized to be a centralized “hub” region of the human brain involved in combining multiple sensory features to form object concepts (Lambon Ralph, Jefferies, Patterson, & Rogers, Reference Lambon Ralph, Jefferies, Patterson and Rogers2017). This structure supports the formation of new concepts in tasks involving the combination of 3D shape and sound (Li et al., Reference Li, Ladyka-Wojcik, Qazilbash, Golestani, Walther, Martin and Barense2022). Furthermore, damage to the anterior temporal lobes results in predictable impairments on memory, perception, and learning tasks (i.e., semantic dementia; Barense, Rogers, Bussey, Saksida, & Graham, Reference Barense, Rogers, Bussey, Saksida and Graham2010; Hodges & Patterson, Reference Hodges and Patterson2007). A complete model should be able to make novel predictions about behavioral and brain responses while also accounting for existing data across many tasks.
We have outlined concrete suggestions toward a collaborative path that we envision to be productive. We suggest that modelers should design realistic tasks in virtual reality, measure the continuous behavioral dynamics that unfold over time, and assess correspondences to brain activity across many tasks. However, there are also many challenges that lie ahead before these suggestions can be fully realized: The expertise required to span cognitive psychology and neuroscience in addition to computational modeling is daunting. Developing naturalistic real-world experiments requires programming skills often not taught in psychology and neuroscience curriculums, whereas theoretical models important for understanding human cognition are often not taught in computer science. Fully characterizing the dynamics of behavior and brain activity will likely require theory and measurement techniques that have not yet been developed (Druckmann & Rust, Reference Druckmann and Rust2023). For these reasons, we suggest an incremental, highly interdisciplinary and collaborative approach toward real-world experiments, which we hope will lead to a more complete understanding of how the human brain may support object-centered representations.
Our suggestions reemphasize the centrality of behavior – described as “psychological findings” by Bowers et al. – across both the development of more human-like neural networks as well as in the continued understanding of the human brain.
Bowers et al. describe the importance of targeted behavioral experiments when evaluating deep neural networks as models of biological vision. We agree with the sentiment and draw parallels to the notion that “neuroscience needs behavior” (Krakauer, Ghazanfar, Gomez-Marin, MacIver, & Poeppel, Reference Krakauer, Ghazanfar, Gomez-Marin, MacIver and Poeppel2017). A major point raised by Bowers et al. is that one system – a neural network – can provide an excellent prediction of another system – the visual system – while relying on entirely different mechanisms. Carefully designed behavioral experiments are needed to assess how good the match really is. This point echoes the historic multiple realizability argument highlighted by Krakauer et al., which states that different (neural) mechanisms can solve the same computational problem. Krakauer and colleagues proposed the same solution: Carefully designed behavioral experiments, to generate and test hypotheses about the neural mechanisms that give rise to behavior. In essence, neuroscience and modeling both need behavior to guide hypothesis testing and theory development in their endeavor to understand how the brain works.
What types of behavioral experiments are best suited to evaluate deep neural networks as models of biological vision? As suggestions for the modeling community, we take inspiration from solutions pioneered by neuroscience in recent years (e.g., Snow & Culham, Reference Snow and Culham2021). There is growing realization that real-world object recognition engages distinct neural responses compared to the behaviors involved with standard image recognition tasks. In the traditional experiment, observers respond with button presses to images displayed on a computer monitor as brain activity is recorded. This approach has provided important insights into biological vision and has served as a great starting point for model evaluation (e.g., Jozwik, Kietzmann, Cichy, Kriegeskorte, & Mur, Reference Jozwik, Kietzmann, Cichy, Kriegeskorte and Mur2023). However, traditional experiments do not fully capture how humans interact with objects in real-world environments.
We suggest that our experiments should increasingly mimic real-world behavior, by: (1) including tasks beyond image recognition when evaluating deep neural networks, and (2) developing platforms that enable simulation of realistic task environments. Using these environments, both humans and models can be subjected to a wide range of real-world behavioral tasks such as object tracking (e.g., following a moving animal) or visual search (e.g., finding objects in cluttered scenes); also see Peters and Kriegeskorte (Reference Peters and Kriegeskorte2021) for discussions. The researcher will be offered a level of control that supports carefully designed experiments while maintaining ecological validity. The proposed platforms are now within reach thanks to advances in virtual reality and three-dimensional (3D) computer graphics, which are yielding powerful game engines accessible to psychologists and modelers alike. Promising recent approaches have extended the Unity game engine to the design of psychology experiments (e.g., Alsbury-Nealy et al., Reference Alsbury-Nealy, Wang, Howarth, Gordienko, Schlichting and Duncan2022; Brookes et al., Reference Brookes, Warburton, Alghadier, Mon-Williams and Mushtaq2020; Peters, Retchin, & Kriegeskorte, Reference Peters, Retchin and Kriegeskorte2022; Starrett et al., Reference Starrett, McAvan, Huffman, Stokes, Kyle and Ekstrom2021) and the simulation of interactive physics (e.g., ThreeDWorld; Gan et al., Reference Gan, Schwartz, Alter, Mrowca, Schrimpf, Traer and Yamins2021).
Importantly, we suggest that the behavior in task environments should include the measurement of continuous dependent variables that unfold over time. Traditional cognitive psychology and neuroscience experiments use binary metrics such as “yes/no” or “multiple-choice” questions with one correct option among competitors (e.g., image classification). By contrast, humans in the real world have evolved to complete unstructured tasks in service of survival-related goals. We use cognitive abilities honed through millions of years of primate evolution and over a decade of childhood development to navigate environments, build tools, find food, solve problems, and interact with other humans in cooperative and competitive settings. These dynamic behaviors involve head, body, and limb movements (Adolph & Franchak, Reference Adolph and Franchak2017) and are based on internal decisions made from the input received from our sensory organs at millisecond timescales (Stanford, Shankar, Massoglia, Costello, & Salinas, Reference Stanford, Shankar, Massoglia, Costello and Salinas2010). Measuring the continuous behavioral dynamics may allow for richer understanding compared to discrete variables that average over many experimental trials (Spivey, Reference Spivey2007; for object memory dynamics, see Li, Yuan, Pun, & Barense, Reference Li, Yuan, Pun and Barense2023; for navigation dynamics, see de Cothi et al., Reference de Cothi, Nyberg, Griesbauer, Ghanamé, Zisch, Lefort and Spiers2022; for “continuous psychophysics,” see Straub & Rothkopf, Reference Straub and Rothkopf2022).
The models we build should also explain neural activity measured as humans complete different experimental tasks. Not only will this approach create a wealth of interdisciplinary opportunities, but modelers could take advantage of psychology and neuroscience theory which continues to make important predictions about behavior (e.g., Behrens et al., Reference Behrens, Muller, Whittington, Mark, Baram, Stachenfeld and Kurth-Nelson2018; Cowell, Barense, & Sadil, Reference Cowell, Barense and Sadil2019). As one example, the anterior temporal lobes are theorized to be a centralized “hub” region of the human brain involved in combining multiple sensory features to form object concepts (Lambon Ralph, Jefferies, Patterson, & Rogers, Reference Lambon Ralph, Jefferies, Patterson and Rogers2017). This structure supports the formation of new concepts in tasks involving the combination of 3D shape and sound (Li et al., Reference Li, Ladyka-Wojcik, Qazilbash, Golestani, Walther, Martin and Barense2022). Furthermore, damage to the anterior temporal lobes results in predictable impairments on memory, perception, and learning tasks (i.e., semantic dementia; Barense, Rogers, Bussey, Saksida, & Graham, Reference Barense, Rogers, Bussey, Saksida and Graham2010; Hodges & Patterson, Reference Hodges and Patterson2007). A complete model should be able to make novel predictions about behavioral and brain responses while also accounting for existing data across many tasks.
We have outlined concrete suggestions toward a collaborative path that we envision to be productive. We suggest that modelers should design realistic tasks in virtual reality, measure the continuous behavioral dynamics that unfold over time, and assess correspondences to brain activity across many tasks. However, there are also many challenges that lie ahead before these suggestions can be fully realized: The expertise required to span cognitive psychology and neuroscience in addition to computational modeling is daunting. Developing naturalistic real-world experiments requires programming skills often not taught in psychology and neuroscience curriculums, whereas theoretical models important for understanding human cognition are often not taught in computer science. Fully characterizing the dynamics of behavior and brain activity will likely require theory and measurement techniques that have not yet been developed (Druckmann & Rust, Reference Druckmann and Rust2023). For these reasons, we suggest an incremental, highly interdisciplinary and collaborative approach toward real-world experiments, which we hope will lead to a more complete understanding of how the human brain may support object-centered representations.
Our suggestions reemphasize the centrality of behavior – described as “psychological findings” by Bowers et al. – across both the development of more human-like neural networks as well as in the continued understanding of the human brain.
Financial support
A. Y. L. is supported by a BrainsCAN Postdoctoral Fellowship. M. M. is supported by an NSERC Discovery Grant.
Competing interest
None.