Yarkoni highlights the disconnect between psychology's descriptive theories and its inferential tests – a problem we argue is exacerbated by inadequate measurement. The primacy of measurement in psychology's history has ebbed-and-flowed, from the absolute focus on what was observable and quantifiable that defined behaviorist approaches (Hayes & Brownstein, Reference Hayes and Brownstein1986; Skinner, Reference Skinner1963, Reference Skinner1976) to the overreliance on button presses and mouse clicks that characterizes some modern research (Baumeister, Vohs, & Funder, Reference Baumeister, Vohs and Funder2007). Today, digital trace data provide new opportunities for rich measurement that captures behavioral, situational, and environmental/contextual factors simultaneously (Lazer et al., Reference Lazer, Pentland, Watts, Aral, Athey, Contractor and Wagner2020; Mischel, Reference Mischel2004). For instance, smartphones are a powerful data source – a collection of sensors and logging routines that we carry with us for large swathes of the day – that psychologists are utilizing to predict a variety of outcomes, from social interaction, personality, mood, to general health (Davidson, Reference Davidson2020; Ellis, Reference Ellis2020; Harari et al., Reference Harari, Müller, Stachl, Wang, Wang, Bühner and Gosling2020; Miller, Reference Miller2012; Piwek, Ellis, Andrews, & Joinson, Reference Piwek, Ellis, Andrews and Joinson2016; Stachl et al., Reference Stachl, Au, Schoedel, Gosling, Harari, Buschek and Bühner2020).
Improved methodology alone will not result in rapid progress for the behavioral sciences (see Kaplan, Reference Kaplan1964; Uttal, Reference Uttal2001). For example, digital trace data have re-ignited problems with traditional operationalizations of latent variables. Research demonstrating associations between new and old measures often fails to articulate why a connection between a latent measure (e.g., mood disturbance) and a behavioral (digital) predictor (e.g., keystroke speed) should exist in advance of an analysis (Davidson, Reference Davidson2020; Zulueta et al., Reference Zulueta, Piscitello, Rasic, Easter, Babu, Langenecker and Leow2018). Without specification or theory, the focus on prediction over explanation restricts generalizability further. A related challenge is the disconnect between subjective and objective measures (e.g., Taylor et al., Reference Taylor, Banks, Jolley, Ellis, Watson, Weiher and Julku2021), where predictive studies find their survey data predict an outcome, but objective measures do not (Eisenberg et al., Reference Eisenberg, Bissett, Zeynep Enkavi, Li, MacKinnon, Marsch and Poldrack2019). Here, the problem is an overreliance on subjective methodologies to measure both latent and observable constructs. For example, the gold standard for personality measurement relies on surveys (e.g., HEXACO, OCEAN, Big 5) and remains contested (Cattell, Reference Cattell1958; Kagan, Reference Kagan2001). Similarly, other measures including estimates of everyday behavior rarely align with reality (Parry et al., Reference Parry, Davidson, Sewall, Fisher, Mieczkowski and Quintana2020). While latent measurement remains core to psychological science, many constructs are developed rapidly, with little standardization, and rely on face validity alone (e.g., “internet addiction,” despite being sardonic in origin, has spawned 100s of technology addiction scales; Howard & Jayne, Reference Howard and Jayne2015). New digital sources need to avoid these issues if they are to prosper.
Illuminating the complex relationship between generalizability and measurement further – observations of behavior via digital traces will often only explain (or predict) part of a broad latent construct. At face value, predicting part of extraversion may appear straightforward from digital recordings of speech, or time spent using social apps. However, there are other sub-components of extraversion that these data will struggle to explain (e.g., feeling indifferent to social activities). Other personality factors such as openness and agreeableness remain conceptually more challenging to map onto (a single) digital behavior (Hinds & Joinson, Reference Hinds and Joinson2019; Stachl et al., Reference Stachl, Au, Schoedel, Gosling, Harari, Buschek and Bühner2020). Hence, it is critically important psychology shifts away from predictive validity alone as evidence for successful operationalization and parameterization, especially from new data sources (Boyd, Pasca, & Lanning, Reference Boyd, Pasca and Lanning2020). Any new digital measure has to be developed incrementally, where researchers first describe how it conceptually aligns with an existing latent construct (Glewwe & van der Gaag, Reference Glewwe and van der Gaag1990). Assuming that digital traces are behavioral expressions of latent variables, researchers should be able to qualitatively express links at a more general level first across contexts, then move to specifics, which would enhance generalizability.
Of course, refocusing on actual behavior via digital traces will not be a panacea. Some digital traces may be “objective,” but they are rarely error-free (Sen, Floeck, Weller, Weiss, & Wagner, Reference Sen, Floeck, Weller, Weiss and Wagner2019). For example, a microphone-based audio classifier can detect whether ambient conversations are taking place around an individual, but it may not distinguish real conversations from someone watching television. Similarly, little consideration is given to how measurement variance might be reduced or maximized for a new digital source. For example, while some assessments in psychology (e.g., cognitive tasks) do not produce reliable individual differences, others (e.g., mood) purposefully reflect variations in individual responses (Hedge, Powell, & Sumner, Reference Hedge, Powell and Sumner2018). Hence, it is critical to find ways to share raw data, processing pipelines, and analysis scripts for digital trace research, as the degrees of freedom are vast, which causes large variance in conclusions made from the same data (Silberzahn et al., Reference Silberzahn, Uhlmann, Martin, Anselmi, Aust, Awtrey and Carlsson2018; Towse, Ellis, & Towse, Reference Towse, Ellis and Towse2020). Validation procedures are likely to reflect the disparity of digital data sources, but combining small and large-scale approaches (e.g., N = 1 sample, case studies) can successfully quantify errors associated with smartphone sensing-based methods (Geyer, Ellis, Shaw, & Davidson, Reference Geyer, Ellis, Shaw and Davidson2020; Sen et al., Reference Sen, Floeck, Weller, Weiss and Wagner2019; Szot, Specht, Specht, & Dabrowski, Reference Szot, Specht, Specht and Dabrowski2019). Only then can related work explore how signals from multiple systems may be combined to improve data efficiency. Failure to ensure this basic research is completed will result in little progress as research agendas risk shifting in the wrong direction if the grounding principles are weak, particularly in applied settings, such as security and health, which are increasingly interested in digital traces (Davidson, Reference Davidson2020; Guttman & Greenbaum, Reference Guttman and Greenbaum1998).
Moreover, we acknowledge that research in this space remains challenging to conduct because data derived from digital sources can be difficult to access, handle, and interpret (DeMasi, Kording, & Recht, Reference DeMasi, Kording and Recht2017). This challenges the way psychologists are trained and incentivized (not) to publish descriptive findings in an interdisciplinary landscape. However, we are hopeful that new methods and emerging forms of data will complement psychology's diverse measurement practices. Collectively termed the Internet of Things, the future potential for data linkage that could further leverage real-world research remains an exciting prospect. In the long term, taking time to understand how behavioral, situational, and environmental/contextual factors can be extracted from objective digital data will allow psychology to develop robust contextualized and comprehensive theory (Lazer et al., Reference Lazer, Pentland, Watts, Aral, Athey, Contractor and Wagner2020).
Our muse are people and psychology should critically consider how it moves forward and merges old and new. Generalizability requires sound measures first, but there is still little agreement between psychologists on what is worth measuring.
Yarkoni highlights the disconnect between psychology's descriptive theories and its inferential tests – a problem we argue is exacerbated by inadequate measurement. The primacy of measurement in psychology's history has ebbed-and-flowed, from the absolute focus on what was observable and quantifiable that defined behaviorist approaches (Hayes & Brownstein, Reference Hayes and Brownstein1986; Skinner, Reference Skinner1963, Reference Skinner1976) to the overreliance on button presses and mouse clicks that characterizes some modern research (Baumeister, Vohs, & Funder, Reference Baumeister, Vohs and Funder2007). Today, digital trace data provide new opportunities for rich measurement that captures behavioral, situational, and environmental/contextual factors simultaneously (Lazer et al., Reference Lazer, Pentland, Watts, Aral, Athey, Contractor and Wagner2020; Mischel, Reference Mischel2004). For instance, smartphones are a powerful data source – a collection of sensors and logging routines that we carry with us for large swathes of the day – that psychologists are utilizing to predict a variety of outcomes, from social interaction, personality, mood, to general health (Davidson, Reference Davidson2020; Ellis, Reference Ellis2020; Harari et al., Reference Harari, Müller, Stachl, Wang, Wang, Bühner and Gosling2020; Miller, Reference Miller2012; Piwek, Ellis, Andrews, & Joinson, Reference Piwek, Ellis, Andrews and Joinson2016; Stachl et al., Reference Stachl, Au, Schoedel, Gosling, Harari, Buschek and Bühner2020).
Improved methodology alone will not result in rapid progress for the behavioral sciences (see Kaplan, Reference Kaplan1964; Uttal, Reference Uttal2001). For example, digital trace data have re-ignited problems with traditional operationalizations of latent variables. Research demonstrating associations between new and old measures often fails to articulate why a connection between a latent measure (e.g., mood disturbance) and a behavioral (digital) predictor (e.g., keystroke speed) should exist in advance of an analysis (Davidson, Reference Davidson2020; Zulueta et al., Reference Zulueta, Piscitello, Rasic, Easter, Babu, Langenecker and Leow2018). Without specification or theory, the focus on prediction over explanation restricts generalizability further. A related challenge is the disconnect between subjective and objective measures (e.g., Taylor et al., Reference Taylor, Banks, Jolley, Ellis, Watson, Weiher and Julku2021), where predictive studies find their survey data predict an outcome, but objective measures do not (Eisenberg et al., Reference Eisenberg, Bissett, Zeynep Enkavi, Li, MacKinnon, Marsch and Poldrack2019). Here, the problem is an overreliance on subjective methodologies to measure both latent and observable constructs. For example, the gold standard for personality measurement relies on surveys (e.g., HEXACO, OCEAN, Big 5) and remains contested (Cattell, Reference Cattell1958; Kagan, Reference Kagan2001). Similarly, other measures including estimates of everyday behavior rarely align with reality (Parry et al., Reference Parry, Davidson, Sewall, Fisher, Mieczkowski and Quintana2020). While latent measurement remains core to psychological science, many constructs are developed rapidly, with little standardization, and rely on face validity alone (e.g., “internet addiction,” despite being sardonic in origin, has spawned 100s of technology addiction scales; Howard & Jayne, Reference Howard and Jayne2015). New digital sources need to avoid these issues if they are to prosper.
Illuminating the complex relationship between generalizability and measurement further – observations of behavior via digital traces will often only explain (or predict) part of a broad latent construct. At face value, predicting part of extraversion may appear straightforward from digital recordings of speech, or time spent using social apps. However, there are other sub-components of extraversion that these data will struggle to explain (e.g., feeling indifferent to social activities). Other personality factors such as openness and agreeableness remain conceptually more challenging to map onto (a single) digital behavior (Hinds & Joinson, Reference Hinds and Joinson2019; Stachl et al., Reference Stachl, Au, Schoedel, Gosling, Harari, Buschek and Bühner2020). Hence, it is critically important psychology shifts away from predictive validity alone as evidence for successful operationalization and parameterization, especially from new data sources (Boyd, Pasca, & Lanning, Reference Boyd, Pasca and Lanning2020). Any new digital measure has to be developed incrementally, where researchers first describe how it conceptually aligns with an existing latent construct (Glewwe & van der Gaag, Reference Glewwe and van der Gaag1990). Assuming that digital traces are behavioral expressions of latent variables, researchers should be able to qualitatively express links at a more general level first across contexts, then move to specifics, which would enhance generalizability.
Of course, refocusing on actual behavior via digital traces will not be a panacea. Some digital traces may be “objective,” but they are rarely error-free (Sen, Floeck, Weller, Weiss, & Wagner, Reference Sen, Floeck, Weller, Weiss and Wagner2019). For example, a microphone-based audio classifier can detect whether ambient conversations are taking place around an individual, but it may not distinguish real conversations from someone watching television. Similarly, little consideration is given to how measurement variance might be reduced or maximized for a new digital source. For example, while some assessments in psychology (e.g., cognitive tasks) do not produce reliable individual differences, others (e.g., mood) purposefully reflect variations in individual responses (Hedge, Powell, & Sumner, Reference Hedge, Powell and Sumner2018). Hence, it is critical to find ways to share raw data, processing pipelines, and analysis scripts for digital trace research, as the degrees of freedom are vast, which causes large variance in conclusions made from the same data (Silberzahn et al., Reference Silberzahn, Uhlmann, Martin, Anselmi, Aust, Awtrey and Carlsson2018; Towse, Ellis, & Towse, Reference Towse, Ellis and Towse2020). Validation procedures are likely to reflect the disparity of digital data sources, but combining small and large-scale approaches (e.g., N = 1 sample, case studies) can successfully quantify errors associated with smartphone sensing-based methods (Geyer, Ellis, Shaw, & Davidson, Reference Geyer, Ellis, Shaw and Davidson2020; Sen et al., Reference Sen, Floeck, Weller, Weiss and Wagner2019; Szot, Specht, Specht, & Dabrowski, Reference Szot, Specht, Specht and Dabrowski2019). Only then can related work explore how signals from multiple systems may be combined to improve data efficiency. Failure to ensure this basic research is completed will result in little progress as research agendas risk shifting in the wrong direction if the grounding principles are weak, particularly in applied settings, such as security and health, which are increasingly interested in digital traces (Davidson, Reference Davidson2020; Guttman & Greenbaum, Reference Guttman and Greenbaum1998).
Moreover, we acknowledge that research in this space remains challenging to conduct because data derived from digital sources can be difficult to access, handle, and interpret (DeMasi, Kording, & Recht, Reference DeMasi, Kording and Recht2017). This challenges the way psychologists are trained and incentivized (not) to publish descriptive findings in an interdisciplinary landscape. However, we are hopeful that new methods and emerging forms of data will complement psychology's diverse measurement practices. Collectively termed the Internet of Things, the future potential for data linkage that could further leverage real-world research remains an exciting prospect. In the long term, taking time to understand how behavioral, situational, and environmental/contextual factors can be extracted from objective digital data will allow psychology to develop robust contextualized and comprehensive theory (Lazer et al., Reference Lazer, Pentland, Watts, Aral, Athey, Contractor and Wagner2020).
Our muse are people and psychology should critically consider how it moves forward and merges old and new. Generalizability requires sound measures first, but there is still little agreement between psychologists on what is worth measuring.
Financial support
This work was part-funded by the Centre for Research and Evidence on Security Threats (ESRC Award: ES/N009614/1 to PJT; ANJ; DAE), www.crestresearch.ac.uk and by the National Science Foundation (SES-1758835 to CS).
Conflict of interest
None.