Hostname: page-component-745bb68f8f-cphqk Total loading time: 0 Render date: 2025-02-12T08:16:27.408Z Has data issue: false hasContentIssue false

Using big data to predict collective behavior in the real world1

Published online by Cambridge University Press:  26 February 2014

Helen Susannah Moat
Affiliation:
Department of Civil, Environmental and Geomatic Engineering, University College London (UCL), London, WC1E 6BT, United Kingdom. Suzy.Moat@wbs.ac.ukhttp://www.wbs.ac.uk/about/person/suzy-moat/ Behavioural Science Group, Warwick Business School, The University of Warwick, Coventry, CV4 7AL, United Kingdom. Tobias.Preis@wbs.ac.ukhttp://www.wbs.ac.uk/about/person/tobias-preis/Chengwei.Liu@wbs.ac.ukhttp://www.wbs.ac.uk/about/person/chengwei-liu/Nick.Chater@wbs.ac.ukhttp://www.wbs.ac.uk/about/person/nick-chater/
Tobias Preis
Affiliation:
Behavioural Science Group, Warwick Business School, The University of Warwick, Coventry, CV4 7AL, United Kingdom. Tobias.Preis@wbs.ac.ukhttp://www.wbs.ac.uk/about/person/tobias-preis/Chengwei.Liu@wbs.ac.ukhttp://www.wbs.ac.uk/about/person/chengwei-liu/Nick.Chater@wbs.ac.ukhttp://www.wbs.ac.uk/about/person/nick-chater/
Christopher Y. Olivola
Affiliation:
Tepper School of Business, Carnegie Mellon University, Pittsburgh, PA 15213. olivola@cmu.eduhttps://sites.google.com/site/chrisolivola/
Chengwei Liu
Affiliation:
Behavioural Science Group, Warwick Business School, The University of Warwick, Coventry, CV4 7AL, United Kingdom. Tobias.Preis@wbs.ac.ukhttp://www.wbs.ac.uk/about/person/tobias-preis/Chengwei.Liu@wbs.ac.ukhttp://www.wbs.ac.uk/about/person/chengwei-liu/Nick.Chater@wbs.ac.ukhttp://www.wbs.ac.uk/about/person/nick-chater/
Nick Chater
Affiliation:
Behavioural Science Group, Warwick Business School, The University of Warwick, Coventry, CV4 7AL, United Kingdom. Tobias.Preis@wbs.ac.ukhttp://www.wbs.ac.uk/about/person/tobias-preis/Chengwei.Liu@wbs.ac.ukhttp://www.wbs.ac.uk/about/person/chengwei-liu/Nick.Chater@wbs.ac.ukhttp://www.wbs.ac.uk/about/person/nick-chater/

Abstract

Recent studies provide convincing evidence that data on online information gathering, alongside massive real-world datasets, can give new insights into real-world collective decision making and can even anticipate future actions. We argue that Bentley et al.’s timely account should consider the full breadth, and, above all, the predictive power of big data.

Type
Open Peer Commentary
Copyright
Copyright © Cambridge University Press 2014 

Modern everyday life is threaded with countless interactions with massive technological systems that support our communication, our transport, our retail activities, and much more. Through these interactions, we are generating increasing volumes of “big data,” documenting our collective behavior at an unprecedented scale.

Bentley et al. provide a timely account of the role of big data in the study of collective behavior. They offer a comprehensive analysis of what our interactions on the Internet, in particular using social network sites such as Facebook and Twitter, can tell us about how information flows throughout the large and complex network of human society. While we agree that this insight into the structure of social connections is important, we emphasize that big data do not only come from online social networks. We note a number of recent studies providing evidence that big data can tell us much more about real-world collective decision making than has been acknowledged in Bentley et al.’s account, and can even allow us to better anticipate collective actions taken in the real world.

For example, human decision making often involves gathering information to determine the consequences of possible actions (Simon Reference Simon1955). Increasingly, we turn to the Internet, and search engines such as Google in particular, to provide information to support our everyday decisions. Can massive records of our search engine usage therefore offer insight into the previously hidden information-gathering processes which precede real-world decisions taken around the globe? Recent results suggest that they can. A series of studies have shown that search engine query data “predict the present,” providing a measurement of real-world behavior often before official data are released (Choi & Varian Reference Choi and Varian2012). Correlations between search engine query data and real-world actions have been demonstrated across a range of areas such as motor vehicle sales, incoming tourist numbers, unemployment rates, reports of flu and other diseases, and trading volumes in the U.S. stock markets (Askitas & Zimmerman Reference Askitas and Zimmermann2009; Brownstein et al. Reference Brownstein, Freifeld and Madoff2009; Choi & Varian Reference Choi and Varian2012; Ettredge et al. Reference Ettredge, Gerdes and Karuga2005; Ginsberg et al. Reference Ginsberg, Mohebbi, Patel, Brammer, Smolinski and Brilliant2009; Preis et al. Reference Preis, Reith and Stanley2010).

Further studies have illustrated that data on online information gathering can also anticipate future collective behavior. Goel et al. (Reference Goel, Hofman, Lahaie, Pennock and Watts2010) demonstrated that search query volume predicts the opening weekend box-office revenue for films, first-month sales of video games, and chart rankings of songs. Our own investigations have suggested that changes in the number of searches for financially related terms on Google (Preis et al. Reference Preis, Moat and Stanley2013) and views of financially related pages on Wikipedia (Moat et al. Reference Moat, Curme, Avakian, Kenett, Stanley and Preis2013) may have contained early warning signs of stock market moves.

In a recent study, we exploited the global breadth of Google data to compare information-gathering behavior around the world. Our analysis uncovered evidence that Internet users from countries with a higher per capita gross domestic product (GDP) tend to search for more information about the future rather than the past (Preis et al. Reference Preis, Moat, Stanley and Bishop2012). For 45 countries in 2010, we calculated the ratio of the volume of Google searches for the upcoming year (“2011”) to the volume of searches for the previous year (“2009”), a quantity we called the “future orientation index.” We found that this index was strongly correlated with per capita GDP. In ongoing work, we seek to better understand whether these results reflect international differences in decision-making processes. Perhaps, for example, a focus on the future supports economic success.

Aside from search data, other research has provided evidence that the massive datasets generated by our everyday actions in the real world can also support better forecasting of future behavior (King Reference King2011; Lazer et al. Reference Lazer, Pentland, Adamic, Aral, Barabasi, Brewer, Christakis, Contractor, Fowler, Gutmann, Jebara, King, Macy, Roy and Van Alstyne2009; Mitchell Reference Mitchell2009; Vespignani Reference Vespignani2009). Large-scale datasets allow us to look for patterns in collective behavior which might recur in the future, similar to the way in which we as individuals rely on the statistical structure we have observed in the world when trying to forecast consequences of decisions (Giguère & Love Reference Giguère and Love2013; Olivola & Sagara Reference Olivola and Sagara2009; Stewart Reference Stewart2009; Stewart et al. Reference Stewart, Chater and Brown2006). For example, analysis of data collected through daily police activities has shown that the occurrence of a burglary results in a short-term increase in the probability that another burglary will occur on the same street, with implications for behavioral models of how these crimes are committed (Bowers et al. Reference Bowers, Johnson and Pease2004; Johnson & Bowers Reference Johnson and Bowers2004; Mohler et al. Reference Mohler, Short, Brantingham, Schoenberg and Tita2011). Such insights have been captured in predictive policing systems which aim to deploy police to areas before an offence occurs, with initial evaluations demonstrating a reduction in levels of crime (Johnson et al. Reference Johnson, Birks, McLaughlin, Bowers and Pease2007). Similarly, large-scale data on both long-distance travel by air and local commuting can improve predictions of human travel behavior and therefore the spread of epidemics, with clear consequences for the distribution of health resources such as vaccines (Balcan et al. Reference Balcan, Colizza, Gonçalves, Hu, Ramasco and Vespignani2009; Tizzoni et al. Reference Tizzoni, Bajardi, Poletto, Ramasco, Balcan, Gonçalves, Perra, Colizza and Vespignani2012).

When considered at greater breadth, we argue that, in contrast to Bentley et al.’s conjecture, big-data studies do far more than “allow us to see better how known behavioral patterns apply in novel contexts” (target article, sect. 4, para. 13). Online search data, for example, offer us insight into early information gathering stages of real-world decision-making processes that could not previously be observed, while large-scale records of real-world activity enable us to better forecast future actions by allowing us to identify new patterns in our collective behavior. Such predictive power is not only of theoretical importance for behavioral science, but also of great practical consequence, as it opens up possibilities to reallocate resources to better support the well-being of society. Our ability to extract maximum value from these datasets is, however, highly dependent on our ability to ask the right questions: a task for which experts in more “traditional behavioral science” (sect. 4, para. 13) are ideally placed.

ACKNOWLEDGMENTS

Helen Susannah Moat, Tobias Preis, and Nick Chater acknowledge the support of the Research Councils U.K. Grant EP/K039830/1, and Moat further acknowledges support from EPSRC grant EP/J004197/1. In addition, Moat and Preis were supported by the Intelligence Advance Research Projects Activity (IARPA) via Department of Interior National Business Center (DoI/NBC) contract number D12PC00285.

Footnotes

1.

Disclaimer: The views and conclusions contained herein are those of the authors and should not be interpreted as necessarily representing the official policies or endorsements, either expressed or implied, of IARPA, DoI/NBC, or the U.S. Government.

References

Askitas, N. & Zimmermann, K. F. (2009) Google econometrics and unemployment forecasting. Applied Economics Quarterly 55:107–20.Google Scholar
Balcan, D., Colizza, V., Gonçalves, B., Hu, H., Ramasco, J. J. & Vespignani, A. (2009) Multiscale mobility networks and the spatial spreading of infectious diseases. Proceedings of the National Academy of Sciences USA 106:21484–89.CrossRefGoogle ScholarPubMed
Bowers, K. J., Johnson, S. & Pease, K. (2004) Prospective hotspotting: The future of crime mapping? British Journal of Criminology 44:641–58.Google Scholar
Brownstein, J. S., Freifeld, C. C. & Madoff, L. C. (2009) Digital disease detection – harnessing the web for public health surveillance. New England Journal of Medicine 360:2153–57.Google Scholar
Choi, H. & Varian, H. (2012) Predicting the present with Google Trends. Economic Record 88 (Suppl. s1):29.Google Scholar
Ettredge, M., Gerdes, J. & Karuga, G. (2005) Using web-based search data to predict macroeconomic statistics. Communications of the ACM 48:8792.Google Scholar
Giguère, G. & Love, B. C. (2013) Limits in decision making arise from limits in memory retrieval. Proceedings of the National Academy of Sciences USA 110:7613–18.Google Scholar
Ginsberg, J., Mohebbi, M. H., Patel, R. S., Brammer, L., Smolinski, M. S. & Brilliant, L. (2009) Detecting influenza epidemics using search engine query data. Nature 457(7232):1012–14.CrossRefGoogle ScholarPubMed
Goel, S., Hofman, J. M., Lahaie, S., Pennock, D. M. & Watts, D. J. (2010) Predicting consumer behavior with web search. Proceedings of the National Academy of Sciences USA 107:17486–90.Google Scholar
Johnson, S. D., Birks, D. J., McLaughlin, L, Bowers, K. J. & Pease, K. (2007) Prospective crime mapping in operational context: Final report. Home Office, London.Google Scholar
Johnson, S. D. & Bowers, K. J. (2004) The burglary as clue to the future: The beginnings of prospective hot-spotting. European Journal of Criminology 1:237–55.Google Scholar
King, G. (2011) Ensuring the data-rich future of the social sciences. Science 331:719–21.CrossRefGoogle ScholarPubMed
Lazer, D., Pentland, A., Adamic, L., Aral, S., Barabasi, A.-L., Brewer, D., Christakis, N., Contractor, N., Fowler, J., Gutmann, M., Jebara, T., King, G., Macy, M., Roy, D. & Van Alstyne, M. (2009) Computational social science. Science 323:721–23.Google Scholar
Mitchell, T. M. (2009) Mining our reality. Science 326:1644–45.CrossRefGoogle ScholarPubMed
Moat, H. S., Curme, C., Avakian, A., Kenett, D. Y., Stanley, H. E. & Preis, T. (2013) Quantifying Wikipedia usage patterns before stock market moves. Scientific Reports 3:1801.Google Scholar
Mohler, G. O., Short, M. B., Brantingham, P. J., Schoenberg, F. P. & Tita, G. E. (2011) Self-exciting point process modeling of crime. Journal of the American Statistical Association 106:100–08.Google Scholar
Olivola, C. Y. & Sagara, N. (2009) Distributions of observed death tolls govern sensitivity to human fatalities. Proceedings of the National Academy of Sciences USA 106:22151–156.CrossRefGoogle ScholarPubMed
Preis, T., Moat, H. S. & Stanley, H. E. (2013) Quantifying trading behavior in financial markets using Google Trends. Nature Scientific Reports 3, No. 1684, pp. 16.Google ScholarPubMed
Preis, T., Moat, H. S., Stanley, H. E. & Bishop, S. R. (2012) Quantifying the advantage of looking forward. Nature Scientific Reports 2, No. 350.Google Scholar
Preis, T., Reith, D. & Stanley, H. E. (2010) Complex dynamics of our economic life on different scales: Insights from search engine query data. Philosophical Transactions of the Royal Society A 368:5707–19.Google Scholar
Simon, H. A. (1955) A behavioral model of rational choice. Quarterly Journal of Economics 69:99118.Google Scholar
Stewart, N. (2009) Decision by sampling: The role of the decision environment in risky choice. Quarterly Journal of Experimental Psychology 62:1041–62.Google Scholar
Stewart, N., Chater, N. & Brown, G. D. A. (2006) Decision by sampling. Cognitive Psychology 53:126.Google Scholar
Tizzoni, M., Bajardi, P., Poletto, C., Ramasco, J. J., Balcan, D., Gonçalves, B., Perra, N., Colizza, V. & Vespignani, A. (2012) Real-time numerical forecast of global epidemic spreading: Case study of 2009 A/H1N1pdm. BMC Medicine 10:165.Google Scholar
Vespignani, A. (2009) Predicting the behavior of techno-social systems. Science 325:425–28.Google Scholar