The target article by Bentley et al., “Mapping collective behavior in the big-data era,” is a fascinating paper and the authors deserve congratulations for a pioneering piece of research.
The article's important contribution is the development of a synthesis between behavioral models of the type that are standard in economics and statistical models of the type that are normally used in the analysis of large data sets. From the economic perspective, the goal of an empirical exercise is the development of an interpretive framework for observed behaviors, one in which choices and outcomes derive from well-posed decision problems. The standard “big data” analysis exploits the availability of massive data in order to develop statistical models that well characterize the data. The size of these data sets allows for a constructive form of data mining, in which the analyst allows the data to select a best-fitting model. From the perspective of an economist, the data mining exercise often appears to be a black box. Although the statistical model may have high predictive power, it does not reveal the mechanisms that determine individual choices and so is not amenable to counterfactual analysis. In contrast, from the statistician's perspective, economic models may be predicated on functional form and other assumptions that are required to operationalize a given theory, but do not have any justification outside of tractability.
Bentley et al. transcend the limitations of these approaches by showing how behavioral models may be used to understand patterns found in a range of large data sets. They achieve this by using behavioral models as an interpretive device, rather than as a literal representation of reality. In this respect, they take a more modest stance than is found in so-called structural approaches to econometrics. The authors compellingly demonstrate that this modest stance can still provide substantive social science insights. The authors consider two aspects of the determinants of decisions. The first dimension involves the respective role of individual-specific versus social factors in affecting choices; the second dimension involves the quality of information available to agents on the payoffs from actions. By partitioning environments determined by individual versus social factors and information rich versus information poor environments, one can then consider four categories of choice types. The target article shows that this “quadrant” approach allows for interpretation of differences in the properties of large data sets that are collected in disparate contexts. Bentley et al. demonstrate that these differences can be understood in terms of underlying differences in the preferences and information sets of the individuals that comprise the data.
Unlike the standard economics paper, Bentley et al.’s study does not contain any formal statistical calculations, hypothesis tests, and the like. This absence is not a reason to question the empirical contributions of the target article. Social science evidence comes in many forms. The approach taken by the authors, which uses economic theory to interpret data patterns, rather than fully explain them, is underappreciated as an integration of empirics and theory. The modesty of the theory/empirics link respects the limits of any social science theory or set of theories as an interpretive framework for data sets of the type under study. Thus, the authors have articulated a constructive vision of “big” social science for “big” data. I look forward to their subsequent work.
The target article by Bentley et al., “Mapping collective behavior in the big-data era,” is a fascinating paper and the authors deserve congratulations for a pioneering piece of research.
The article's important contribution is the development of a synthesis between behavioral models of the type that are standard in economics and statistical models of the type that are normally used in the analysis of large data sets. From the economic perspective, the goal of an empirical exercise is the development of an interpretive framework for observed behaviors, one in which choices and outcomes derive from well-posed decision problems. The standard “big data” analysis exploits the availability of massive data in order to develop statistical models that well characterize the data. The size of these data sets allows for a constructive form of data mining, in which the analyst allows the data to select a best-fitting model. From the perspective of an economist, the data mining exercise often appears to be a black box. Although the statistical model may have high predictive power, it does not reveal the mechanisms that determine individual choices and so is not amenable to counterfactual analysis. In contrast, from the statistician's perspective, economic models may be predicated on functional form and other assumptions that are required to operationalize a given theory, but do not have any justification outside of tractability.
Bentley et al. transcend the limitations of these approaches by showing how behavioral models may be used to understand patterns found in a range of large data sets. They achieve this by using behavioral models as an interpretive device, rather than as a literal representation of reality. In this respect, they take a more modest stance than is found in so-called structural approaches to econometrics. The authors compellingly demonstrate that this modest stance can still provide substantive social science insights. The authors consider two aspects of the determinants of decisions. The first dimension involves the respective role of individual-specific versus social factors in affecting choices; the second dimension involves the quality of information available to agents on the payoffs from actions. By partitioning environments determined by individual versus social factors and information rich versus information poor environments, one can then consider four categories of choice types. The target article shows that this “quadrant” approach allows for interpretation of differences in the properties of large data sets that are collected in disparate contexts. Bentley et al. demonstrate that these differences can be understood in terms of underlying differences in the preferences and information sets of the individuals that comprise the data.
Unlike the standard economics paper, Bentley et al.’s study does not contain any formal statistical calculations, hypothesis tests, and the like. This absence is not a reason to question the empirical contributions of the target article. Social science evidence comes in many forms. The approach taken by the authors, which uses economic theory to interpret data patterns, rather than fully explain them, is underappreciated as an integration of empirics and theory. The modesty of the theory/empirics link respects the limits of any social science theory or set of theories as an interpretive framework for data sets of the type under study. Thus, the authors have articulated a constructive vision of “big” social science for “big” data. I look forward to their subsequent work.