A long line of research demonstrates the public’s political ignorance (Achen and Bartels Reference Achen and Bartels2016; Berelson, Lazarsfeld, and McPhee Reference Berelson, Lazarsfeld and McPhee1954; Clifford and Jerit Reference Clifford and Jerit2016; Converse Reference Converse and Apter1964; Delli Carpini and Keeter Reference Carpini, Michael and Keeter1996; Luskin and Bullock Reference Luskin and Bullock2011), yet several studies suggest that the public may be more knowledgeable than we think (Krosnick et al. Reference Krosnick, Holbrook, Berent, Carson, Michael Hanemann, Kopp, Mitchell, Presser, Ruud, Kerry Smith, Moody, Green and Conaway2002; Mondak Reference Mondak2000; Reference Mondak2001; Mondak and Davis Reference Mondak and Davis2001; Nie, Verba, and Petrocik Reference Nie, Verba and Petrocik1979; Popkin Reference Popkin1991; Prior and Lupia Reference Prior and Lupia2008). In part, this debate hinges on the presence or absence in surveys of a “don’t know” (DK) option, with some scholars contending that respondents hide some of their knowledge behind DK responses.
Luskin and Bullock (Reference Luskin and Bullock2011) presented an experiment showing that including or excluding DK makes little practical difference to estimates of knowledge. They conducted their experiment on two platforms; we replicated it on three more. On one of these platforms—Amazon’s MTurk—respondents appear to hide significant political knowledge behind DK. We attribute this difference to MTurkers’ experience with attention checks and other quality-control mechanisms, which condition them to scrupulously avoid errors. This conditioning, unique to MTurk, suggests a need for more caution among researchers who pilot questions or administer experiments on this platform, at least for certain types of questions.
This conditioning, unique to MTurk, suggests a need for more caution among researchers who pilot questions or administer experiments on this platform.
THEORY
Mondak and colleagues argued that respondents conceal knowledge behind DK responses and once advocated, in some cases, rescoring DK responses as correct or partially correct to produce higher, more accurate estimates of public knowledge (Mondak Reference Mondak2000; Reference Mondak2001; Mondak and Anderson Reference Mondak and Anderson2004; Mondak and Davis Reference Mondak and Davis2001). They argued that neutral options (for opinion questions) and the DK option (for knowledge questions) lead respondents to “satisfice,” avoiding the cognitive work of answering a question even if they are capable of a better response. Concurring, Krosnick et al. (Reference Krosnick, Holbrook, Berent, Carson, Michael Hanemann, Kopp, Mitchell, Presser, Ruud, Kerry Smith, Moody, Green and Conaway2002, 373) asked “whether offering a no-opinion option attracts only respondents who would otherwise have offered meaningless responses, or whether offering a no-opinion option also attracts respondents who truly have opinions and would otherwise have reported them.”
Other researchers counter that “discouraging DKs reveals precious little hidden knowledge” (Luskin and Bullock Reference Luskin and Bullock2011, 554; see also Tourangeau, Maitland, and Yan Reference Tourangeau, Maitland and Yan2016). “When people who initially select a DK alternative are subsequently asked to provide a ‘best guess,’ they fare statistically no better than chance” (Sturgis, Allum, and Smith Reference Sturgis, Allum and Smith2008, 90). Luskin and Bullock (Reference Luskin and Bullock2011) drew their conclusions from a randomized experiment, largely replicated here. In what we label the encourage guessing condition, Luskin and Bullock prefaced a battery of political knowledge questions with this statement: “If you aren’t sure of the answer to any of these questions, we’d be grateful if you could just give your best guess.” In the encourage DK condition, they included this preface: “Many people have trouble answering questions like these. So, if you can’t think of the answer, don’t worry about it. Just mark that you don’t know and move on to the next one.”Footnote 1 All respondents then answered the same battery, with DK always an option. This subtle treatment indeed influenced respondents, who chose DK less often and answered correctly more often in the encourage guessing condition. However, this decrease in DK responses did not yield a greater increase in correct answers than could be attributed to guessing. They therefore concluded that including a DK option was harmless—that is, respondents were not hiding significant knowledge behind the neutral option.
Luskin and Bullock (Reference Luskin and Bullock2011) administered their experiment on two live-interviewer platforms: the American National Election Studies (ANES) and Time-Sharing Experiments for the Social Sciences (TESS). Much public opinion research now takes place online, where the absence of a live interlocutor fundamentally changes social incentives. We therefore replicated their experiment on three online platforms: the Cooperative Congressional Election Study (CCES), Google Surveys (GS), and MTurk. All platforms have their unique features, of course. The CCES presents a lengthy political survey, with some respondents compensated through points and rewards in the YouGov system. In contrast, GS presents brief pop-up surveys to Internet users attempting to load unrelated websites; these respondents answer the questions simply to make the survey go away so they can view their desired content. Replicating Luskin and Bullock’s (Reference Luskin and Bullock2011) experiment on these diverse platforms provides an important check on their results.
More to the point, however, we also replicated their experiment on MTurk, the most distinctive platform. MTurk makes no attempt to recruit a representative user base, resulting in well-known demographic peculiarities. Nevertheless, scholars have replicated several published experiments on MTurk, reassuring researchers of the platform’s reliability (Ansolabehere and Schaffner Reference Ansolabehere and Schaffner2014; Berinsky, Huber, and Lenz Reference Berinsky, Huber and Lenz2012). Huff and Tingley (Reference Huff and Tingley2015, 8) therefore concluded that MTurkers are “not all that different from respondents on other survey platforms,” particularly when analyzed by demographic subgroup.
Nevertheless, MTurkers face unique incentives that may affect their behavior in subtle ways, especially concerning neutral response options like DK. Most MTurk jobs are not survey research but rather so-called “human intelligence tasks” such as transcribing text from images and completing other simple work. Job providers can accept or reject a user’s work, and users’ resulting ratings affect their ability to receive future assignments. Similarly, market research and social science surveys posted to MTurk regularly include attention checks or similar quality-control devices (Peer, Vosgerau, and Acquisti Reference Peer, Vosgerau and Acquisti2014). Poor performance on one task directly affects an MTurk user’s ability to earn money in the future.
Nevertheless, MTurkers face unique incentives that may affect their behavior in subtle ways, especially concerning neutral response options like DK.
Over time, these mechanisms condition MTurkers to pay closer attention to detail than users on other platforms (Hauser and Schwarz Reference Hauser and Schwarz2016). MTurkers thus are not only politically and demographically distinct from users on other platforms; they also are conditioned to avoid errors, making neutral response options more attractive. Survey weights and subgroup analysis might correct MTurk’s demographic skew, but they cannot account for these conditioned behaviors. Unique among respondent pools, MTurk “is a population that learns” (Hauser and Schwarz Reference Hauser and Schwarz2016). Because MTurkers are conditioned to avoid errors, they may calculate that it is better to hide behind the DK option when they have any uncertainty about their response, making them distinct from respondents recruited through other platforms.
DESIGN
To test this hypothesis, we replicated Luskin and Bullock’s (Reference Luskin and Bullock2011) experiment across three platforms by administering a battery of political knowledge questions to CCES, MTurk, and GS respondents, always with a DK option (Jones 2021).Footnote 2 (Methodological details specific to each platform are footnoted.Footnote 3) We randomly assigned respondents to Luskin and Bullock’s two conditions. Before viewing the knowledge battery, half of the respondents were encouraged to guess if they did not know an answer; the other half was encouraged to mark “DK.” Our battery included the following five items:
-
• To the best of your knowledge, does your state have its own constitution?
-
• Is the US federal budget deficit—the amount by which the government’s spending exceeds the amount of money it collects—now bigger, about the same, or smaller than it was during most of the 1990s?
-
• For how many years is a US Senator elected—that is, how many years are there in one full term of office for a US Senator?
-
• On which of the following does the US federal government currently spend the least? (Options: foreign aid, Medicare, national defense, Social Security.)
-
• Who nominates judges to the Supreme Court? (Options: the President, the House of Representatives, the Senate, the Supreme Court.)
RESULTS
Among CCES respondents, 69% correctly answered that their state had a constitution, 63% answered that the deficit had grown, 49% answered that US Senators serve for six years, 28% identified foreign aid as the federal government’s smallest expenditure, and 73% said that the President nominates Supreme Court judges. The mean CCES respondent answered 2.8 of five items correctly—the same as GS respondents but lower than the 3.1 mean among MTurkers. As shown in table 1, these baseline platform differences persist even after controlling for demographic differences in ordinary least squares (OLS) regression.Footnote 4 CCES and GS respondents gave fewer correct answers (first column) and more DK responses (second column) than MTurk respondents. Perhaps MTurkers’ frequent participation in social science research makes them a more knowledgeable group overall.
Notes: *p≤0.05 (two-tailed). OLS coefficients shown with standard errors in parentheses. MTurk is the omitted platform; females, independents, and respondents ages 18–24 are the omitted categories. Rounding is to two significant digits.
Because participants were assigned randomly into conditions, we did not include demographic controls when estimating treatment effects.Footnote 5 Figure 1 (left panel) summarizes the average treatment effect of encourage guessing (as opposed to encourage DK) on correct and DK responses.Footnote 6 On all platforms, encourage guessing reduced DK responses relative to encourage DK, although the effect was significant only for MTurk respondents (-0.33, p<0.01). The reduction was -0.14 (p=0.11) for CCES and -0.12 (p=0.21) for GS respondents. Random guessing alone would yield an approximate 32% accuracy rate.Footnote 7 If respondents on all platforms converted their reduced DK responses to truly random guesses, we would expect meaningless increases in correct responses of 0.11 (MTurk), 0.045 (CCES), and 0.038 (GS).Footnote 8 For GS, that is almost exactly what we found: an insignificant 0.030 (p=0.75) increase in correct responses under the encourage guessing condition. Curiously, CCES respondents appear to have provided (insignificantly) fewer correct responses under encourage guessing (-0.062, p=0.56).Footnote 9 To summarize, CCES and GS respondents saw even smaller effects of the encourage guessing treatment than Luskin and Bullock (Reference Luskin and Bullock2011) originally reported, but the overall pattern clearly supports their general contention that including or omitting the DK option does not change estimates of knowledge in the sample. When we observed a marginal increase in correct responses (on GS), it was not greater than can be attributed to random guessing, again affirming Luskin and Bullock’s (Reference Luskin and Bullock2011) general argument.
Among MTurk respondents, however, a substantially different picture emerges. On this platform only, encourage guessing raised average scores by 0.30 relative to encourage DK (p<0.01)—far greater than the 0.11 increase we expected from random guessing and almost the exact amount by which encourage guessing reduced DK responses (see figure 1, right panel). On its face, this result implies that nearly every MTurk respondent induced to guess rather than mark DK ultimately marked a correct answer instead—although we note that the 95% confidence interval around our +0.30 estimate extends as low as 0.14. At least some MTurk respondents clearly responded to the encourage guessing treatment, giving more correct responses than could be obtained by chance alone.
We found no evidence that MTurk respondents were more likely to search for answers online in one condition than in another, behavior that could produce these results spuriously. In both of our experimental conditions, the 25th, 50th, and 75th percentiles of elapsed time were identical—55, 72, and 101 seconds, respectively—meaning that respondents did not spend more time answering our knowledge battery in one condition when compared to the other.Footnote 10 We conclude that at least some MTurkers hide knowledge behind the DK option, in clear contrast to the other platforms.Footnote 11
CONCLUSION
In an experiment administered using live interviewers, Luskin and Bullock (Reference Luskin and Bullock2011) found no evidence that ANES or TESS respondents hide knowledge behind the DK option. Their treatment successfully induced people to guess rather than choose DK, but this guessing did not reveal concealed knowledge. We arrived at a similar conclusion using two online platforms, CCES and GS. However, MTurk respondents behave differently. Perhaps the attention checks, accuracy bonuses, and other quality-control devices frequently employed on MTurk condition its users to select neutral options unless they are certain of their response. In any event, MTurkers appear to hide some knowledge behind the DK option—even though our experiment did not use any of the attention checks or accuracy bonuses common to MTurk surveys. It therefore might make sense to omit the DK option when using MTurk.
On MTurk only, inducing respondents to guess not only reduces DK responses but also increases correct responses in a nearly one-to-one relationship. Our results do not call into question MTurk’s general utility as a research platform, but they do suggest caution concerning studies of political knowledge specifically and the use of neutral response options generally. If MTurk users react differently than users on other platforms to neutral options, then researchers should be aware of their unique properties and characteristics when designing surveys and survey experiments. We do not dispute the general conclusion of Luskin and Bullock (Reference Luskin and Bullock2011) on most platforms, but the choice of MTurk as a research platform complicates decisions about when to use the DK option and likely points to the need to consider how different experimental manipulations may vary across platforms.
If MTurk users react differently than users on other platforms to neutral options, then researchers should be aware of their unique properties and characteristics when designing surveys and survey experiments.
For instance, we note that our CCES and GS respondents took less notice of our manipulation generally than Luskin and Bullock’s (Reference Luskin and Bullock2011) ANES and TESS respondents. Their respondents significantly decreased their DK responses under encourage guessing and significantly increased their correct responses—albeit not sufficiently to rule out random guessing. By contrast, we observed smaller decreases in DK responses among CCES and GS respondents than Luskin and Bullock (Reference Luskin and Bullock2011) reported and no measurable increase in correct responses. Unlike our MTurk results, this pattern supports Luskin and Bullock’s (Reference Luskin and Bullock2011) broader conclusion that the presence or absence of a DK option makes little difference.Footnote 12 Nevertheless, the muted but slightly different response pattern reveals the importance of context and the need to be cautious in how our claims generalize.
SUPPLEMENTARY MATERIALS
To view supplementary material for this article, please visit http://dx.doi.org/10.1017/S1049096520001651.
DATA AVAILABILITY STATEMENT
Replication materials can be found on Dataverse at https://doi.org/10.7910/DVN/KAPQWT.