Hostname: page-component-745bb68f8f-lrblm Total loading time: 0 Render date: 2025-02-06T18:53:22.706Z Has data issue: false hasContentIssue false

Is Mechanical Turk the Answer to Our Sampling Woes?

Published online by Cambridge University Press:  23 March 2016

Melissa G. Keith*
Affiliation:
Department of Psychological Sciences, Purdue University
Peter D. Harms
Affiliation:
Department of Management, University of Alabama
*
Correspondence concerning this article should be addressed to Melissa G. Keith, Department of Psychological Sciences, Purdue University, 703 Third Street, West Lafayette, IN 47906. E-mail: keith7@purdue.edu
Rights & Permissions [Opens in a new window]

Extract

Although we share Bergman and Jean's (2016) concerns about the representativeness of samples in the organizational sciences, we are mindful of the ever changing nature of the job market. New jobs are created from technological innovation while others become obsolete and disappear or are functionally transformed. These shifts in employment patterns produce both opportunities and challenges for organizational researchers addressing the problem of the representativeness in our working population samples. On one hand, it is understood that whatever we do, we will always be playing catch-up with the market. On the other hand, it is possible that we can leverage new technologies in order to react to such changes more quickly. As an example, in Bergman and Jean's commentary, they suggested making use of crowdsourcing websites or Internet panels in order to gain access to undersampled populations. Although we agree there is an opportunity to conduct much research of interest to organizational scholars in these settings, we also would point out that these types of samples come with their own sampling challenges. To illustrate these challenges, we examine sampling issues for Amazon's Mechanical Turk (MTurk), which is currently the most used portal for psychologists and organizational scholars collecting human subjects data online. Specifically, we examine whether MTurk workers are “workers” as defined by Bergman and Jean, whether MTurk samples are WEIRD (Western, educated, industrialized, rich, and democratic; Henrich, Heine, & Norenzayan, 2010), and how researchers may creatively utilize the sample characteristics.

Type
Commentaries
Copyright
Copyright © Society for Industrial and Organizational Psychology 2016 

Although we share Bergman and Jean's (Reference Bergman and Jean2016) concerns about the representativeness of samples in the organizational sciences, we are mindful of the ever changing nature of the job market. New jobs are created from technological innovation while others become obsolete and disappear or are functionally transformed. These shifts in employment patterns produce both opportunities and challenges for organizational researchers addressing the problem of the representativeness in our working population samples. On one hand, it is understood that whatever we do, we will always be playing catch-up with the market. On the other hand, it is possible that we can leverage new technologies in order to react to such changes more quickly. As an example, in Bergman and Jean's commentary, they suggested making use of crowdsourcing websites or Internet panels in order to gain access to undersampled populations. Although we agree there is an opportunity to conduct much research of interest to organizational scholars in these settings, we also would point out that these types of samples come with their own sampling challenges. To illustrate these challenges, we examine sampling issues for Amazon's Mechanical Turk (MTurk), which is currently the most used portal for psychologists and organizational scholars collecting human subjects data online. Specifically, we examine whether MTurk workers are “workers” as defined by Bergman and Jean, whether MTurk samples are WEIRD (Western, educated, industrialized, rich, and democratic; Henrich, Heine, & Norenzayan, Reference Harms and DeSimone2010), and how researchers may creatively utilize the sample characteristics.

Are MTurk Workers “Workers”?

Bergman and Jean's focal article suggests that one possible solution for obtaining samples from nonprofessional/nonmanagerial working populations is to make use of online crowdsourcing websites such as MTurk. Unlike more traditional samples mentioned in the focal article, MTurk samples do tend to be more diverse in terms of occupations. For example, Behrend and colleagues (Behrend, Sharek, Meade, & Wiebe, Reference Behrend, Sharek, Meade and Wiebe2011) reported a range of professions including business and management (14.23%); computer, math, and engineering (12.73%); office and administrative support (10.86%); sales, service, and food (10.49%); education (6.37%); arts, design, entertainment, sports, and media (7.12%); and healthcare (3.37%). In another study, Downs and colleagues (Reference Downs, Holbrook, Sheng and Cranor2010) reported a similar pattern with science, engineering, and information technology (24.71%); business, management, and financial services (14.36%); administrative support (10.27%); education (9%); art, writing, and journalism (6.34%); service (5.75%); medical (3%); skilled labor (1.75%); and legal services (1.25%). A more recent study by Harms and DeSimone (Reference Harms and DeSimone2015) reported only a .12 correlation for industry representativeness when comparing MTurk samples and Department of Labor statistics. Taken together, we can conclude that although MTurk samples are more professionally diverse, they tend be overrepresentative of technology-related industries and still not all that representative of the working population as a whole. Nonetheless, it is still possible that MTurk samples may provide some needed sampling diversity.

In addition, it should be noted that MTurk samples are somewhat unique in that they often contain a relatively large percentage of subjects who are currently unemployed (Behrend et al., Reference Behrend, Sharek, Meade and Wiebe2011; Ross, Zaldivar, Irani, & Tomlinson, Reference Paolacci, Chandler and Ipeirotis2010). Although inappropriate for many types of organizational research, a readily available sample of unemployed individuals may be useful for answering research questions pertaining to job search behaviors, outcomes regarding unemployment, and the like. Thus, in this sense, MTurk does offer a valuable resource for understanding individuals often ignored in organizational research (Woo, Keith, & Thornton, Reference Ross, Zaldivar, Irani and Tomlinson2015).

That said, perhaps the greatest opportunity for research using MTurk and other online samples is that they are often heavily populated by individuals who would best be described as underemployed (Ross et al., Reference Paolacci, Chandler and Ipeirotis2010). MTurk samples tend to be more educated than the U.S. average (Ipeirotis, Reference Henrich, Heine and Norenzayan2010). At the same time, their reported income levels tend to be lower than the general population (Casler, Bickel, & Hackett, Reference Casler, Bickel and Hackett2013; Ross et al., Reference Paolacci, Chandler and Ipeirotis2010). On the basis of this assumption of underemployment, MTurk samples may be an interesting population to examine research questions related to disengagement, job satisfaction, job insecurity, and fulfillment of needs. Here again, MTurk is potentially useful for surveying groups not traditionally represented in the literature.

Are MTurk Samples WEIRD?

In an earlier critique of sampling in the social sciences, Henrich and colleagues (Henrich, Heine, & Norenzayan, Reference Harms and DeSimone2010) argued that research in published journals tends to be overrepresentative of WEIRD (Western, educated, industrialized, rich, and democratic) populations and that results based on these samples may not generalize elsewhere. Although researchers sampling from MTurk technically have the ability to sample from many different countries, most MTurk workers tend to reside either in the United States or India (Ipeirotis, Reference Henrich, Heine and Norenzayan2010), and most research is conducted using English-language surveys. Although the potential for diversity exists, many researchers set up qualifications and survey only U.S. workers. This practice likely reflects prior research showing that non-U.S. participants tend to provide poor-quality data (e.g., Feitosa, Joseph, & Newman, Reference Feitosa, Joseph and Newman2015; Litman, Robinson, & Rosenzweig, Reference Ipeirotis2015). That said, one consequence of limiting MTurk samples to U.S. populations is that the sample is automatically going to be Western, industrialized, and democratic. This leaves education and income to make the sample less WEIRD. As noted above, past research has shown that MTurk samples tend to be both more educated than average and, at the same time, lower than average in terms of household income (within the United States). For example, Ross and colleagues (Reference Paolacci, Chandler and Ipeirotis2010) found a median household income of between $20,000 and $30,000 in their MTurk sample. In another sample, Casler et al. (Reference Casler, Bickel and Hackett2013) found a household income range of $25,000–50,000. Similar results were reported by Barger, Behrend, Sharek, and Sinar (Reference Barger, Behrend, Sharek and Sinar2011), who reported modal household income between $40,000 and $60,000. This does seem to suggest that MTurk provides an opportunity to collect data from lower socioeconomic status samples that are often missed in traditional published samples. At the same time, it should be noted that MTurk samples are also disproportionately young (see Buhrmester, Kwang, & Gosling, Reference Buhrmester, Kwang and Gosling2011; Chandler, Mueller, & Paolacci, Reference Chandler, Mueller and Paolacci2014; Paolacci, Chandler, & Ipeirotis, Reference Litman, Robinson and Rosenzweig2010), and these low incomes may simply reflect that the participants are in entry-level or part-time jobs owing to their age rather than being reflective of populations who spend their careers laboring in low-skill, low pay jobs.

One additional quirk concerning MTurk samples is that it has been observed that they tend to report spending a great deal of time online and, in particular, research samples tended to be dominated by a small number of individuals for whom participating in crowdsourced tasks or surveys had become a job (Harms & DeSimone, Reference Harms and DeSimone2015). This may explain why, as Buhrmester et al. (Reference Buhrmester, Kwang and Gosling2011) concluded, “MTurk participants are not representative of the American population, or any population for that matter” (p. 4). That being said, as the economy creates the need for more individuals to take on “gig-based” jobs, perhaps MTurk samples will increasingly come to reflect an emerging contingent of the working population that did not exist 10 years ago. Given this emerging trend and the fact that MTurk samples do seem to provide some much-needed diversity, we would argue that, although MTurk is not THE solution to our WEIRD sampling problem in the organizational sciences, it does represent a small step in the right direction.

Using MTurk to the Researcher's Advantage

With all that being said, researchers in the psychological sciences can use the unique characteristics of MTurk to our advantage. First, the characteristics of MTurk samples themselves have shifted dramatically since the site was created, and they continue to exhibit fluctuations in terms of sample characteristics on a day-to-day basis (see www.mturk-tracker.com). Consequently, we cannot really know what samples from MTurk will look like in the future. But at the same time, we cannot predict what typical samples in the organizational sciences will look like in the future either. For the foreseeable future, however, it is likely that MTurk samples will provide much-needed diversity when compared with traditional samples such as students and managers. Moreover, because fluctuations in sample characteristics can be monitored and, to some degree, predicted, it is possible that researchers can time their studies to target specific populations.

Second, it is important to remember that researchers using online surveys and experiments have the ability to sample individuals with a particular set of characteristics. For example, using a simple branching function in Qualtrics, researchers have the ability to direct samples of employed and unemployed participants to different surveys. Of course, one potential drawback is that researchers interested in an employed/unemployed sample must be willing to create extra projects with equal pay for the nontargeted population in order to prevent dishonest reporting of employment status. Ideally though, the researcher can collect data from both (or many different) groups of participants in order to address their research questions. The ability to create experiments or surveys targeted at particular individuals or groups while still collecting a large sample is unique to the online sampling approach and should provide for many potentially interesting research opportunities.

General Conclusions

In sum, we do not view MTurk or other online sampling portals as the ultimate fix for the field's sampling woes. With that said, samples obtained online can offer many advantages over traditional methods, but effective utilization of this research requires both creative and clever designs as well as vigilance on the part of the researcher. Crowdsourced studies present unique challenges to researchers, but then again, no one ever guaranteed that convenience samples would be entirely convenient.

References

Barger, P., Behrend, T. S., Sharek, D. J., & Sinar, E. F. (2011). I-O and the crowd: Frequently asked questions about using Mechanical Turk for research. The Industrial–Organizational Psychologist, 49 (2), 1117.Google Scholar
Behrend, T. S., Sharek, D. J., Meade, A. W., & Wiebe, E. N. (2011). The viability of crowdsourcing for survey research. Behavioral Research Methods, 43 (3), 114.CrossRefGoogle ScholarPubMed
Bergman, M. E., & Jean, V. A. (2016). Where have all the “workers” gone? A critical analysis of the unrepresentativeness of our samples relative to the labor market in the industrial–organizational psychology literature. Industrial and Organizational Psychology: Perspectives on Science and Practice, 9, 84113.CrossRefGoogle Scholar
Buhrmester, M., Kwang, T., & Gosling, S. D. (2011). Amazon's Mechanical Turk: A new source of inexpensive, yet high-quality, data? Perspectives on Psychological Science, 6, 35.CrossRefGoogle Scholar
Casler, K., Bickel, L., & Hackett, E. (2013). Separate but equal? A comparison of participants and data gathered via Amazon's MTurk, social media, and face-to-face behavioral testing. Computers and Human Behavior, 29, 21562160.CrossRefGoogle Scholar
Chandler, J., Mueller, P., & Paolacci, G. (2014). Nonnaïveté among Amazon Mechanical Turk workers: Consequences and solutions for behavioral researchers. Behavioral Research, 46, 112130. doi:10.3758/s13428-013-0365-7 CrossRefGoogle ScholarPubMed
Downs, J. S., Holbrook, M. B., Sheng, S., & Cranor, L. F. (2010). Are your participants gaming the system? Screening Mechanical Turk workers. In Proceedings from SIGCHI ’10: The 28th International Conference on Human Factors in Computing Systems (pp. 23992402). New York, NY: ACM Press.CrossRefGoogle Scholar
Feitosa, J., Joseph, D. L., & Newman, D. A. (2015). Crowdsourcing and personality measurement equivalence: A warning about countries whose primary language is not English. Personality and Individual Differences, 75, 4752.CrossRefGoogle Scholar
Harms, P. D., & DeSimone, J. A. (2015). Caution! MTurk workers ahead—Fines doubled. Industrial and Organizational Psychology: Perspectives on Science and Practice, 8 (2), 183190.CrossRefGoogle Scholar
Henrich, J., Heine, S. J., & Norenzayan, A. (2010). The weirdest people in the world? Behavioral and Brain Sciences, 33 (2/3), 6183.CrossRefGoogle ScholarPubMed
Ipeirotis, P. G. (2010). Demographics of Mechanical Turk (Technical Report CeDER-10-01). New York: New York University.Google Scholar
Litman, L., Robinson, J., & Rosenzweig, C. (2015). The relationship between motivation, monetary compensation and data quality among US- and India-based workers on Mechanical Turk. Behavioral Research, 47, 519528.CrossRefGoogle ScholarPubMed
Paolacci, G., Chandler, J., & Ipeirotis, P. G. (2010). Running experiments on Amazon Mechanical Turk. Judgment and Decision Making, 5 (5), 411419.CrossRefGoogle Scholar
Ross, J., Zaldivar, A., Irani, L., & Tomlinson, B. (2010). Who are the crowdworkers? Shifting demographics in Mechanical Turk. In Proceedings from CHI’10: Extended Abstracts on Human Factors in Computing Systems (pp. 28632872). Atlanta, GA: ACM Press.Google Scholar
Woo, S. E., Keith, M., & Thornton, M. A. (2015). Amazon Mechanical Turk for industrial and organizational psychology: Advantages, challenges, and practical recommendations. Industrial and Organizational Psychology: Perspectives on Science and Practice, 8 (2), 171179.CrossRefGoogle Scholar