International election observers are now present at more than four out of every five elections in the developing world. International monitoring of domestic elections was once regarded as an unacceptable violation of state sovereignty.Footnote 1 But since the late 1990s, the refusal by a government to invite reputable foreign observers has become a conspicuous signal that democratic elections are unlikely to take place.Footnote 2 Yet despite the widespread presence of election observers throughout the world, their effects on elections are not fully understood. Do international observers directly influence the quality of elections? Or does the presence of foreign observers have little or no effect on the behavior of voters, parties, and election officials? In this article I argue that field experimental methods and the random assignment of election observers are one way to answer this question. More generally, I indicate that field experiments are a promising method for evaluating the effects of democracy assistance programs, and argue that a mutually beneficial long-term relationship between democracy-promoting organizations and academics can ideally grow out of such field experimentation, generating knowledge that is both scientifically and practically relevant.
Applied to the 2004 presidential elections in Indonesia, field experimental methods reveal that although the election was widely regarded as democratic, the presence of observers had a measurable effect on votes cast for the incumbent candidate, indicating that such democracy assistance can influence election quality in unanticipated ways, even in the absence of blatant election-day fraud. The incumbent presidential candidate actually performed better in internationally-monitored villages, whereas the challenger performed about the same. This unanticipated result where there is little evidence of election-day fraud underscores several of the likely challenges and advantages of field experimentation.
Because this application of field experimental methods was the first of its kind in election monitoring, I present this study with an eye toward facilitating future research and debate about the implementation of field experiments in what are often challenging conditions. These conditions frequently involve short time horizons, limited access to relevant outcome measures, and potentially large but uncertain payoffs. Such conditions are unsurprising in highly volatile political settings in which questions exist about the procedural fairness of electoral processes. In this sense, the challenges are precisely what make the research interesting and potentially useful. All the same, they require special attentiveness on the part of scholars.
Field Experiments and the Effects of Democracy Promotion
The history of democracy promotion is long and diverse, significantly pre-dating the introduction of election monitoring in sovereign states, and going back at least to the early twentieth century.Footnote 3 Following both the end of World War II and the end of the Cold War, increasing numbers of democratic states and international organizations incorporated democracy promotion explicitly into their policies toward other countries.Footnote 4 Various forms of foreign assistance from states, NGOs, intergovernmental organizations, and international financial institutions became conditioned on democracy and democratization. These international actors became more likely to reduce support for countries that experienced military coups, failed to hold democratic elections, or engaged in massive violations of political rights. Direct democracy assistance became a growth industry, and developed democracies, international NGOs, or pro-democracy intergovernmental organizations like the European Union, the Organization of American States, or the Organization for Security and Cooperation in Europe devoted significant resources to direct democracy assistance.Footnote 5
Despite the billions of dollars spent on democracy assistance since the early 1990s, and the prominent role that democracy promotion plays in the foreign policies of many influential governments, ongoing uncertainty about the conditions under which democracy assistance “works” continues to fuel debate among scholars and policymakers. On one hand, critics argue that democracy assistance is imperialist and interventionist, imposed on unwilling populations which would be better off without such foreign meddling, or as a disingenuous cover for other more controversial foreign policy objectives.Footnote 6 On the other hand, proponents argue that democracy assistance and democracy promotion more generally have provided crucial support for democratizing forces during critical junctures in the political development of many countries.Footnote 7 Additionally, individual governments tout their foreign policy successes in democracy promotion in terms of dollars spent on democracy assistance or highlight the correlation between their work and successful democratic transitions.Footnote 8 Because they have a vested interest in continuing their own programs, practitioners of democracy promotion are frequently less credible as evaluators of democracy assistance programs.Footnote 9
Missing from much of the debate over democracy assistance is a more objective method by which scholars and practitioners can evaluate the conditions under which various forms of democracy assistance are effective. The question of whether and how international actors influence the domestic politics of sovereign states is of interest for academic- and policy-related reasons. Correlations between total dollars spent on democracy assistance and changes in aggregate measures of democracy or political rights are suggestive but somewhat controversial in that they are open to numerous interpretations. Because democracy promoters and aid donors may target their assistance toward regimes where improvements are most likely (or least likely), it is difficult to separate the consequences of democracy promotion from what would have happened in the country in the absence of democracy promotion. Therefore, although it is possible to show that average levels of democracy improve when more money is spent on democracy assistance, it is extremely difficult to demonstrate that the change was caused by democracy assistance rather than some other omitted variable. It is even more difficult to use such methods to pinpoint which democracy assistance programs work as they were intended, which programs do not have their intended effects, and whether any programs have unanticipated effects.
Although many scholars are circumspect about the conclusions that can be drawn from such research, individuals on both sides of the debate on democracy promotion have been guilty of claiming a causal relationship between democracy assistance and either positive or negative outcomes. For example, as Steven E. Finkel et al. argue in their large-N cross-national study of democracy aid, “there are consistent positive impacts of direct USAID democracy assistance on overall levels of democracy in recipient countries, as measured by the Freedom House and Polity IV indices over time.”Footnote 10 Other scholars have found positive, negative and insignificant relationships between aggregate levels of democracy and money spent on democracy assistance, and the relationship between democracy and foreign aid.Footnote 11 Some detailed case studies make convincing arguments that democracy promoters contributed to negative outcomes, such as international involvement in the 1992 elections in Angola or the 1993 elections in Cambodia.Footnote 12 Other scholars argue that democracy promoters were instrumental in bringing about democratic transitions or in preventing democratic backsliding, as in Peru in 2000 or more generally in post-Soviet Latin America and Central Europe.Footnote 13 Although it is clear that international efforts to promote democracy were followed either by disastrous outcomes (as in Angola and Cambodia) or desirable progress toward democracy (as in Peru after 2000 or Georgia in 2004), it is difficult to compare the actual outcome to the counterfactual outcome, or what would have happened if international actors had not attempted to promote democracy or pressure governments to hold democratic elections.
Field experiments offer a partial solution to several related problems, as a number of powerful players in the democracy promotion industry have recognized in recent years. The Millennium Challenge Corporation, whose mission is partly to promote democracy in poorer countries, now calls for the “use of random assignment to treatment and control groups” in its programs as the method of impact evaluation likely to produce the most “rigorous results.”Footnote 14 More recently, the USAID-commissioned National Academy of Sciences report, Improving Democracy Assistance, covers a wide range of methods for evaluating the effectiveness of democracy assistance programs, but recommends “randomized impact evaluation” as the “ideal research design.”Footnote 15 Partly as a result, many recipients of USAID funding have begun to adopt such methods to evaluate programs. Note that the NAS report does not recommend the exclusive use of field experiments, but highlights them as an underutilized and potentially powerful tool to help democracy promoters complete rigorous evaluations of pilot programs, learn about the effects of existing programs, and ultimately refine democracy assistance over time.
This article is therefore potentially important in part because of the new attention to “rigorous impact evaluation” spreading throughout the democracy promotion industry, and because it represents an example of the potential for mutually-beneficial cooperation between academics and practitioners in studying the effects of democracy promotion.
International Election Observation
International election monitoring is one of the most well-known and potentially consequential forms of democracy promotion.Footnote 16 Aid spent directly on democracy assistance activities (not including indirect democracy promotion like aid conditionality) represents hundreds of millions annually in the US alone, and over the course of the 1990s, election monitoring became a central part of the growing democracy promotion industry.Footnote 17 Democratization—usually considered a purely domestic political process—since the end of the Cold War has become permeated with international actors.Footnote 18 This development represents an overt attempt by international actors to influence the course of domestic politics, yet, like many such relationships, it remains understudied.Footnote 19
States, IGOs, NGOs, and scholars who support election observation argue that it increases voter and political party confidence in the electoral process, deters fraud when it exists, and generates a third-party evaluation of election quality for international and domestic audiences, thus making negative consequences for a leader who holds fraudulent elections more likely.Footnote 20 As Kofi Annan stated while Secretary General of the United Nations:
The presence of international election observers, fielded always at the invitation of sovereign states, can make a big difference in ensuring that elections genuinely move the democratic process forward. Their mere presence can dissuade misconduct, ensure transparency, and inspire confidence in the process.Footnote 21
Skepticism of this view of observers is prevalent. The most critical argue that international election observers are simply glorified tourists,Footnote 22 or that they are biased representatives of governments, out only to promote their country's narrow economic interests.Footnote 23 Others argue that observers fail on other grounds, for example, that observers do not succeed in their mission when they actually observe electoral fraud because documenting fraud proves that they have failed to prevent it.Footnote 24 Even supporters of election observation suggest that observers should be more professionalized and receive better training, increase consistency in their evaluations across countries, improve coordination with local actors, and generally increase the accuracy of their evaluations of election quality.Footnote 25
An assortment of regional or case studies examine the role of international observers in the democratization process, and show mixed support for the claims made by proponents of election observation. For example, Eric Bjornlund, Michael Bratton, and Clark Gibson suggest that the presence of observers (both domestic and international) contributed to the successful transfer of power during the 1991 Zambian elections, but that observers struggled with their own political legitimacy vis-à-vis domestic audiences.Footnote 26 In El Salvador, Tommie Sue Montgomery argues that the Organization of American States' hasty judgment of the 1991 elections and their failure to criticize blatant attempts to manipulate the election contributed to the government's decision not to reform obvious problems prior to the 1994 elections.Footnote 27 These and other similar studies offer a wealth of information on elections in dozens of countries throughout the developing world, and set the foundation for this study.Footnote 28
A shared weakness of existing cross-national and case-study research on election monitoring, including some of my own research, is that such studies have difficulty attributing causal effects to international observers. Like research on the effects of democracy promotion, studies show that the presence of observers is correlated with a variety of positive and negative outcomes following elections, but they have difficulty comparing the outcomes of observed elections to a counterfactual world in which observers were not present. As Thomas Carothers has argued, the most significant potential effect of international observers is difficult to measure:
Out of fear of being caught by foreign observers, political authorities may abandon plans to rig elections. Of course, few foreign officials would readily acknowledge having had such plans, making it hard to measure precisely the deterrent effect of electoral observation.Footnote 29
Knowledge that international observers will be present at an election may prevent fraud from being attempted by political parties and candidates, or may deter other illegal or improper behaviors, although the nature of the decision to invite international observers and engage in illicit activity makes a cross-national causal test of this hypothesis exceedingly difficult.
The pre-election prevention of electoral irregularities, as highlighted by Carothers, is the ideal outcome for organizations interested in encouraging democratic elections. However, another potentially important effect of observers is also possible. On election day, because observers are physically present in polling stations, they may have direct and micro-level effects on the election day behavior of voters, parties, candidates, or election officials. To use the most obvious example, individuals engaged in ballot box stuffing may not wish to carry out their plans in the physical presence of international observers. Similarly, polling station officials may be more likely to follow official electoral regulations if they are internationally observed. The mechanism behind this “observer effect” is that individuals frequently behave differently when they know they are being watched, particularly if they are aware that they are engaging in illegal or socially undesirable behavior. Thus, the behavior of voters, poll workers, or party operatives could change as a result of a visit by international election observers, and many forms of meaningful change in election day behavior should be reflected in votes cast on election day.
Although there are many potential effects of observers and international actors on the quality of the electoral process before, during, and after election day, I focus here on a subset of potential effects: whether international election observers have direct effects on election day behavior. By randomly assigning international observers for their election day observation, all other variables are held constant, and any difference between the group of observed and unobserved areas can be causally attributed to international observers.
For democracy promoters and for academics, this study is ideal for replication. Randomization of short-term observers could be adopted as standard practice by election monitoring missions, and over time, the result would be a clearer understanding of the potential effects of observers on elections, the conditions under which they have desired effects on the quality of elections, and the most efficient design of election monitoring missions.
Random Assignment and the Effects of International Election Observers
I illustrate how experimental methods can be applied to democracy assistance with a field experimental test of whether international election observers influence the election day behavior of voters, political party representatives, or election officials, as reflected in the pattern of votes cast on election day. If international observers do not influence behavior on election day, randomly selected areas of the country should be equivalent across all observable indicators. If observers do cause a significant change in behavior on election day, the areas of the country that were randomly assigned to be observed should be significantly different from those that were not.
For the 2004 presidential elections in Indonesia, I had the opportunity to attempt random assignment of international observers for the Carter Center's election day deployment. To my knowledge, this was the first attempt of this type within international election observation, and one of the first attempts in the more general field of democracy promotion.Footnote 30 The case of Indonesia was selected because the opportunity to attempt random assignment of international observers was made available. The introduction of randomly-assigned international observers had been met with some skepticism by other practitioners. Although election observation missions regularly use randomization to assign international observers to vote-counting centers at the end of election day as part of a parallel vote tabulation,Footnote 31 random assignment of short term election observers during voting was thought unnecessary, logistically too difficult, or contrary to some of the other goals of election observation.Footnote 32 Random assignment of election day monitors is not standard practice, even though it carries other advantages for election monitoring missions, such as providing a defined sample of polling stations from which to draw their summary conclusions. Current practices for observer deployment vary by election and by organization, but the most common deployment method is to allow individual observer teams to choose the polling stations that they visit within a given region after they have been deployed throughout the country. This system may create biases in the observations collected by observers, as individual teams may use different methods to select where they observe.
The 2004 presidential elections in Indonesia were the first direct presidential elections in the country's history. Legislative elections held in 1999 and in April of 2004 were widely considered successful, given the country's size and its newly democratizing status.Footnote 33 Prior to these elections, the president was selected indirectly. The incumbent in the 2004 elections, Megawati Sukarnoputri, had been in office since her 2001 appointment by the People's Consultative Assembly. There were two rounds of the 2004 presidential election; this article focuses on the second-round runoff between the incumbent candidate Megawati Sukarnoputri (commonly referred to as Megawati or Mega) and the leading challenger, Susilo Bambang Yudhoyono (commonly referred to as SBY).
Expectations were high leading up to the 2004 elections, which were viewed as a crucial step in Indonesia's democratization.Footnote 34 Many believed that the elections were likely to go well, and the most common concerns in advance of election day pertained to logistical factors and the administration of an election in such a large and diverse country.Footnote 35 However, because of the scope of the election reforms leading to the 2004 elections and the recent transition to democratic institutions, some observers worried that the election could deteriorate into violence or fraud.Footnote 36
Prior to the election, there were reports of “money politics” and other forms of intimidation, complaints related to restrictions on domestic election observers, as well as violations of laws restricting campaign activity.Footnote 37 Despite these complaints, the overall environment leading up to the presidential elections was guardedly optimistic, and observers hoped that the election would be carried out peacefully. Thus, in the case of Indonesia in 2004, the anticipated effect that international observers could have on election day behavior was moderated by the expectation that the election would be relatively clean.
An election with clear-cut cases of blatant election-day fraud would have made a more straightforward baseline study of whether election observers improve election quality. Theoretically speaking, Indonesia was a more complicated case. Although many experts in Indonesia politics were anxious in advance of the election, blatant election-day fraud—such as ballot box stuffing—was not expected.Footnote 38 In countries that experience widespread election-day manipulation, the party of the incumbent government is frequently the primary sponsor, and other research has show that observers can deter blatant election-day manipulation.Footnote 39 However, in Indonesia's 2004 election, the incumbent had never stood for a direct presidential election, and did not have a reputation for carrying out widespread election-day manipulation. Additionally, going in to the second-round runoff, Megawati had already lost the first round of the election to SBY, and was not expected to win. Thus, in designing this study, it was not clear in advance of the election which candidate, if any, would be more likely to benefit from the presence of observers.
Logistics of Implementation
In part due to the size of the country and the application of a randomized field experimental methods to untested circumstances, there were five logistical challenges in the implementation of this experiment, all of which were ultimately possible to mediate. I detail them here to make evaluation of the experiment more transparent and to facilitate future applications of field experimental methods to democracy promotion programs. Additionally, the discussion of logistical challenges illustrates that the conditions for this experiment were less than ideal, yet the study was still conducted successfully. This is a point that is often underemphasized when students and practitioners are trained in field experimental methods. Like in many other areas of teaching methods of research or evaluation, the emphasis tends to be on the ideal conditions for applying the methodology rather than how to address potential deviations from that ideal. This experiment illustrates that it is possible to carry out successful randomized evaluations of democracy-promotion programs under somewhat adverse conditions while highlighting the risks and limitations inherent in making such tradeoffs.
Indonesia is one of the largest and most geographically diverse election-holding countries in the world, and in terms of votes cast, the presidential election was the largest single-day election ever held in the world.Footnote 40 Muhammad Qodari describes the immensity of the challenge for the Indonesian election commission, or KPU, leading into the three elections in 2004:
The KPU had to print and distribute not only a unique identify card for each of [the] nearly 150 million voters, but also 660 million ballot papers … For those ballots to be of any use the KPU had to provide for the acquisition, transport, positioning, handling, and supervision in 32 provinces and 440 districts of 580,000 voting stations, 2.9 million voting booths, 2.3 million ballot boxes, and 1.2 million bottles of ink with which officials could mark voters' fingers in order to prevent multiple voting. Consider that all this had to be accomplished in a developing, still somewhat infrastructure-poor country that consists of more than 12,000 islands spread across … about 2% of the Earth's entire surface area—and one gets some sense of the awesome challenges involved.Footnote 41
In a much smaller country with somewhat better infrastructure and fewer islands, the ideal experimental design might have randomized the assignment of observers across the entire country. The first logistical challenge in the Indonesian case was that many areas of the country were not accessible to international observers on election day, and therefore random assignment could not be attempted across the entire population of polling stations. This issue was further exacerbated by the limited number of Carter Center observers participating in the randomization. Rather than randomize across the entire population, it was conducted within a significantly smaller group of pre-selected districts, or “blocks,” as described below.
The second logistical challenge was that there was no complete list of polling stations available from the central government. This problem was addressed by randomizing at the village level. Within Indonesia, there are five levels of administrative divisions pertaining to elections. In addition to the provinces (propinsi) and districts (kabupaten or kota) listed in the above quote, the districts for the September 2004 runoff election were divided into 4,987 sub-districts (kecamatan), the sub-districts were divided into approximately 60,000 villages or neighborhoods (kelurahan or desa), and the villages and neighborhoods were divided into approximately 580,000 polling stations (TPS). Under some conditions it would be preferable to randomly assign observers to polling stations within each district where international observers were sent. However, in the case of Indonesia, this polling-station-level assignment was not ideal for several theoretical and logistical reasons. Even if there had been a complete list of polling stations, observers would have had a difficult time locating them. Many were set up outdoors at locations without physical addresses such as community badminton courts, in the middle of streets, on sidewalks, in empty lots or field, etc. Additionally, many polling stations were set up adjacent to each other, particularly in urban areas, so it would have been difficult for observers to visit one polling station without making their appearance known to adjacent polling stations, and the issue is further complicated because data are not systematically available on the adjacency of polling stations.
The best option was to randomly assign observers across the next largest administrative divisions: kelurahan and desa. These administrative divisions can be understood as villages in non-urban areas or neighborhoods within cities. Within the areas included in this study, they can be as small as a few dozen voters with just one polling station, or as large as 60,000 voters.Footnote 42 Most villages/neighborhoods are identifiable on a local map, making it possible for the observer teams to find them. This made random assignment across units logistically possible, both because a complete list of villages and neighborhoods existed, and because international observer teams had a reasonable chance of being able to identify and locate the treatment units on election day.
The third logistical challenge was that it was unclear prior to the election whether any disaggregated election results would be made available. Had the government failed to release disaggregated election results, the lack of data would have prevented most tests of whether observers influenced election-day behavior. This issue was not possible to address in advance, but the government indicated that it would release these results prior to the election, and ultimately did so. Although the public release of disaggregated election results has become more common, many governments only post the results for a limited period of time, in a format that is difficult to capture, which could be a relevant issue for future studies.
Fourth, because election law mandated that each polling station have no more than 300 voters, a full-length election day was determined unnecessary, and polling stations were only open from 7:00 a.m. to 1:00 p.m. for the presidential election, significantly limiting the number of polling stations that an observer team could visit on election day. This challenge did not prevent the experiment from taking place, but reduced the number of villages visited by observers, and the overall size of the experiment.
Finally, because it was the first time that random assignment of international election observers had been attempted (to my knowledge), many of the challenges in applying this methodology to election observation had yet to be worked out and agreed upon by the interested parties, and were negotiated in a very short time frame. Although this explains some of the decision-making timeline, the cooperation and flexibility of the Carter Center staff and delegation made the project feasible.
Background and Experimental Design
The Carter Center's mission for the second round of the election consisted of 57 observers and 28 observer teams, 23 of which were asked to participate in the study. Ultimately, missing election results in some regions and one team's decision not to follow the experimental protocol reduced the number of teams in the study to 19.Footnote 43 The long-term election observers and the Jakarta-based staff of the Carter Center selected areas of Indonesia (primarily kabupaten and kota, or districts and cities) where the Carter Center would send election observers. This selection of districts was intended to place Carter Center observers throughout the country, but was constrained by logistical and safety issues.Footnote 44 In order for an area to be selected, it had to be accessible by car or aircraft within one day's travel time, and had to have basic accommodations for the observer team that were judged as sufficiently safe.Footnote 45 There was also some effort made to avoid extensive overlap with the European Union election observation mission (the largest observer mission in the country), as well as consideration for whether access was granted to areas of Indonesia where foreigners are frequently prohibited from traveling such as Banda Aceh, Ambon, and parts of Papua. For the participating teams, random assignment was applied within each district or pair of districts (kabupaten and kota) where Carter Center observers were deployed.
Each team's list of villages and neighborhoods was generated from a complete list of villages and neighborhoods within each pre-selected geographic area using systematic random sampling (also known as patterned sampling).Footnote 46 Randomization requires that every unit within a given block has an equal probability of being selected. Once the random assignment was conducted for each pair of observers, the lists were not released to anyone outside of the Carter Center staff and the observers assigned to each area.
The unit of analysis in this study is the village/neighborhood. However, note that each village or neighborhood contained one or more polling stations. Upon arriving at randomly selected locations, observers visited between one and four polling stations within each village or neighborhood. Within a given neighborhood or village, they were necessarily limited to those polling stations that they could locate. Most teams were able to spend time the day before the election scouting the area and looking for signs that polling stations were being built. They were instructed not to choose polling stations based on any substantive characteristics such as the number of voters, complaints about the polling station, known popularity of one candidate, or recommendations from local officials or police. Rather, if a team went to more than one polling station in a given village, they were instructed to go to every third or fifth polling station that they could locate.
The random assignment of observers to villages or neighborhoods within each block generates two groups: the treatment group (which was assigned to be observed) and the control group (which was assigned not to be observed). In theory, the randomization should produce two groups that are equivalent except that one group was assigned to be “treated” with international election observation. Although it is unlikely, it is possible that randomization produces groups of villages/neighborhoods that are different in important ways, and could potentially generate misleading results. Therefore, I also check the degree to which the two groups are similar based on a variable that is available for all villages/neighborhoods, but that could not have been affected by the presence of election day observers. It should therefore not be significantly different between groups. When abundant data about the experimental population is available, such a randomization check is straightforward. Because this was the first direct presidential election in Indonesian history, and the first election for which I have been able to collect disaggregated election results, little historical precedent or data exist. Nevertheless, because voter registration took place before election day, the average number of registered voters between the treatment and control villages can be used as a variable for the randomization check, as there is no reason that this variable should be significantly different between the treatment and control groups. Table 1 presents the results of the randomization check. Across all 19 blocks, assignment to the treatment group is not consistently or significantly related to the number of registered voters, as expected. When all blocks are pooled together, as shown in the last column of table 1, assignment to the treatment group is unrelated to the number of registered voters, indicating that there is no significant difference in the average number of registered voters in the treatment and control groups.
Table 1 Logistic regression of assigned-to-treatment group on registered voters

Notes: Model 21 includes dummy variables for each block (not reported). Standard errors in parentheses.
* significant at 5%,
** significant at 1%.
Table 2 summarizes the areas observed by Carter Center observers at the village level. Out of all villages in the visited regions, Carter Center observers were assigned to visit 482 villages, 95 of which were actually visited.Footnote 47 The so called “failure to treat,” or the fact that all assigned units were not actually monitored, is common in field experiments, and is discussed in greater detail later.Footnote 48 Within these 95 assigned and visited villages, 147 individual polling stations were visited. Note that a small proportion of villages in the control group were visited.Footnote 49
Table 2 Carter Center observation coverage

Data and Results
In the second round of the 2004 presidential elections, Susilo Bambang Yudhoyono and his running mate Jusuf Kalla were the leading candidates, having won 34 percent of the votes cast in the first round in a five-candidate field. The incumbent president, Megawati Sukarnoputri won 27 percent in the first round. The runoff was held on September 20, 2004. According to official results, SBY won the presidency with 60.6 percent of the vote.
Government-reported unofficial election results were recorded for the total number of votes cast for each candidate for all villages in the second round of the 2004 presidential election and the total number of registered voters.Footnote 50 The unofficial results were made public by the Indonesian KPU (the general elections commission) for most of the country. These aggregate results were uploaded by regional election officials to a central government-run website, and should be viewed as “unofficial” or uncertified government-provided election results.
Table 3 presents aggregate summary statistics for the 1,822 village-level observations included in the study. I downloaded, compiled, and merged these unofficial election results with international observer data. All comparisons only include districts where Carter Center observers were deployed, where they participated in the randomization, and where village-level elections results were reported for the entire district.
Table 3 Summary statistics for all available village-level variables

As mentioned above, Carter Center observation teams did not visit all villages that were randomly assigned to the treatment group, leading to some “failure to treat.” This issue was anticipated to some degree, and is common in other similar field experiments when it is difficult to ensure high levels of compliance in the field, or where many relevant variables such as travel time or the precise location of treatment units are difficult to collect in advance of the treatment. Despite the use of the word failure, failure to treat does not threaten the validity of most field experiments, although it requires careful attention. It would be a mistake to simply compare the subset of villages actually visited by international observers with those that were not. This comparison may yield biased estimates. Moving untreated villages from the treatment group into the control group makes the study observational rather than experimental, and takes away the central advantages of the randomization. To further clarify this point, in Indonesia, it is plausible that some villages were more difficult for observers to locate than others and that this “findability” determined which villages in the treatment group were actually visited. It is possible that “findability” is also related to voting behavior or support for particular candidates. Therefore, it cannot be assumed that the determination of which villages were actually monitored was also random.
The most straightforward method of analyzing the results from the experiment is to compare the villages assigned to the treatment group to the villages assigned to the control group within each geographic area across which observers were assigned. In experimental jargon, this estimate is the “intent-to-treat” effect (or ITT effect) on the dependent variable within each block. There are several other methods that could be used to estimate the effect of observers, given randomization within blocks and variation in the treatment rates across blocks. As the central dependent variable of interest, I use the natural log of the total number of votes cast for Megawati in each of the 1,822 villages and neighborhoods included in the study. It is not possible to observe what would have happened in the villages visited by observers if they were not, in fact, monitored. Instead, randomization allows a comparison of two groups of villages that should be alike in all ways except that one group was assigned to be visited by international observers.Footnote 51
The random assignment of units means that it is possible to estimate the average ITT effect without accounting for any other observed differences between villages. Regression allows the inclusion of covariates and serves to reduce the unexplained variance in votes cast for Megawati. I calculate the ITT effect using ordinary-least squares (OLS) regression. To restate, the central dependent variable is the performance of the incumbent candidate, measured as the natural log of the total number of votes cast for Megawati in each village. An additional independent variable measuring the total number of registered voters in the village (logged) is included in the model.Footnote 52
Table 4 presents the estimated effect of being assigned to the treatment group within each regional block, and a pooled estimate across all areas included in the study. Even given the relatively low rate of assigned villages that were actually visited (as shown in table 2), assignment to the treatment group is associated with improved performance for Megawati in 15 out of the 19 blocks. The consistent direction of the effect across more than three-fourths of the blocks is unlikely due to chance.Footnote 53
Table 4 OLS: Estimated effects of intent to treat on total votes for Megawati (ln)

Note: Pooled estimate includes block fixed effects. Standard errors in parentheses.
* significant at 5%;
** significant at 1%
The last column of table 4 provides a pooled estimate with fixed effects for each experimental block. Note again that Treatment Group is a measure of the assignment of observers and their “intent” to treat the village, not the actual presence of observers on election day. Because Treatment Group is dichotomous and the dependent variable is logged, the coefficients represent the percent change in total votes cast for Megawati given that Treatment Group changes from zero to one and all else is held constant. In the estimate in Table 4, including all districts in the study, assignment to the treatment group caused a 6.5% positive change in the number of votes cast for Megawati. To put this number in context, the average number of votes cast for Megawati per village is 1,394, and assignment to the treatment group is associated with an average increase of about 91 votes for Megawati across all villages in the treatment group.Footnote 54 The same estimates were conducted on votes cast for SBY, and are included in the online appendix.Footnote 55 There is no significant relationship between observer presence and the performance of the winning candidate, SBY, indicating that the increase in votes for Megawati did not come directly at the expense of SBY. Estimates using vote share for each candidate as the dependent variable rather than votes cast produce similar results.Footnote 56
The ITT estimates of the effect of observers on Megawati's vote share are diluted by the low treatment rates, and an observer effect was detected despite the fact that many of the villages assigned to the treatment group were never observed. Yet because assignment to the treatment group made it more likely that a given village or neighborhood would be visited, it is possible to estimate the average size of the effect of observers on only those villages and neighborhoods that were actually visited.Footnote 57 This method still utilizes the random assignment of observers, and treats the actual visits by observers to a village as a function of their assignment to the treatment group. The full table is confined to the online appendix. Like the estimates presented in table 4, total registered voters (logged) are included as an independent variable. Across all geographic blocks, when accounting for the low treatment rates the estimated effect of observers on the internationally observed villages is associated with a +32 percent change in votes cast for Megawati, which translates into an average increase of 446 votes per treated village.Footnote 58
Discussion
Overall, the results of this field experiment show that the incumbent candidate performed better and the challenger performed about the same in villages and neighborhoods assigned to be monitored by Carter Center observers. This result was not anticipated, and highlights a central advantage of using field experimental methods: the possibility that they can reveal effects that are not anticipated by scholars or practitioners.Footnote 59 Such a surprising result nevertheless requires some speculative explanation and analysis of the unique circumstances surrounding this election. Why might the presence of observers increase votes cast for Megawati, but not decrease votes cast for SBY? Why did observers influence what was widely viewed as a democratic election?
The reports of international observers, journalists, and analysts suggest several possible explanations. Although all major international observer organizations judged the observed problems with the election to be insignificant, a number of irregularities were documented and described in the post-election reports of international observers. The most plausible explanation for this finding stems from the early closing of polling stations. The official election day was from 7:00 a.m. to 1:00 p.m., but after the first round of the presidential election, the KPU ruled that polling stations could close after 11:30 provided that all eligible voters had voted. If this rule was followed correctly, it should not have produced significant problems, and only those polling stations that reached 100 percent turnout should have closed early. Reports suggest, however, that a number of polling stations closed before all eligible voters had cast a ballot, and well before the earliest legal closing time of 11:30.Footnote 60
The presence of observers could have influenced the decision by election officials to close early by making it more likely that polling stations in visited areas would stay open until the mandated time. Additionally, during the course of their observation, many Carter Center observers announced or implied that they could return later in the day to observe the closing. If Megawati supporters were less likely to turn out to vote without being mobilized to do so by party representatives or election officials, correctly following the regulations surrounding the length of election day would have disproportionately benefited Megawati voters. Local party officials would have more time to mobilize voters, and poll-workers would have had greater incentive to prove that all voters had cast a ballot so that they could close early without violating electoral regulations. One potential explanation is therefore that non-observed villages were more likely to close before less motivated or reluctant voters had shown up, and were less likely to follow the electoral regulations about staying open until 1:00 p.m. or until all registered voters had cast a ballot.
Several additional pieces of evidence support the possibility that Megawati supporters were more reluctant to turn out and also suggest that she was not in control of the party or state machinery that would have been required to engage in widespread election-day fraud. First, her party performed poorly in the April legislative elections and in the first-round presidential elections. Second, in the weeks leading up to the run-off election, it was widely speculated in the media that she would lose, with public opinion polls from several organizations predicting support for SBY at about 60 percent and support for Megawati at around 29 percent.Footnote 61 Third, although Megawati had some incumbency advantages, including the ability to make public appearances throughout the country outside of the legal three-day campaign period, her support from several prominent parties was unstable. For example, Megawati was endorsed by the powerful Golkar party, which won the April 5 legislative elections, and which possessed well developed local party machinery that could have been used to mobilize the vote for Megawati. But several weeks before the election, national and local party leaders publicly split over the decision to endorse Megawati, and before the election analysts predicted that “Golkar will not be able to fully bring its formidable party machinery behind Megawati.”Footnote 62 Post-election polling revealed that the vast majority of Golkar voters who cast a ballot voted against their party's endorsement and for SBY.Footnote 63 Relative to incumbent presidential candidates in other countries, Megawati's election-day advantage was minimal.
If Megawati supporters were reluctant to turn out, she should perform better in those areas where turnout was higher. Scatter plots of votes cast for Megawati vs. turnout across all 1,822 villages included in the experiment (shown in the online appendix) illustrate that Megawati does somewhat better in villages with higher turnout and SBY does worse, on average, in villages with higher turnout. These comparisons do not prove that increasing turnout would have necessarily increased votes for Megawati, but they are consistent with the idea that Megawati's supporters were more reluctant to turn out, and that her performance would have increased if voter mobilization increased.
There are other less plausible potential explanations for Megawati's increased support within monitored villages. Reports from the Carter Center and the EU missions highlight numerous complaints of “money politics,” including vote buying and the inappropriate use of government resources to support particular candidates. Few of these complaints were documented directly by observers, and there is little to suggest that vote-buying on election day was occurring. Of course, successful bribery and intimidation may be invisible to all but the participants. If intimidation was taking place in the second round in favor of SBY, it is possible that voters in monitored villages felt more confident in voting for Megawati when observers were present, although I have found little anecdotal support for this scenario.
Could observers from the Carter Center have caused extra support for Megawati? This explanation is similarly unlikely. Recall that international observers were mobile throughout election day, traveling from polling station to polling station. Their presence is also not pre-announced, and their deployment plans are confidential until they arrive in a village, neighborhood, or specific polling station. It is technically possible that those who were not inclined to vote on their own were drawn to the polling station because foreign observers visited. Thus, it remains a possibility that a visit by observers attracted additional voters to the polls, but it is not clear why this might have disproportionately influenced Megawati supporters.
The results presented here show a clear difference between observed and unobserved villages, but they are subject to interpretation. The most likely explanation for this finding, in my view, is that observers made polling station officials more likely to follow electoral regulations, and therefore caused visited polling stations to stay open later than they would have if observers had not visited. Given that the election was expected to be relatively free of election day irregularities, the fact that any significant effect of observers was found is noteworthy. This result does not imply election fraud. If widespread election fraud by one candidate had taken place, and this fraud were deterred by observers, the cheating candidate should have performed worse in areas that were observed. Even though Megawati benefited from observers, the results do not show that SBY performed significantly worse when observers were present, as would be expected if observers reduced ballot box stuffing or other forms of direct election fraud. Rather, I argue that election officials were more likely to follow the letter of the election law pertaining to closing time after having been visited by international observers.
The Carter Center mission concluded that “voters were able to exercise their democratic rights in a peaceful atmosphere and without significant hindrance.”Footnote 64 The results presented here do not contradict this conclusion. Even so, I demonstrate that international observers had measurable effects on election day behavior, causing localized improvement in the performance of the losing incumbent presidential candidate. This specific result is somewhat idiosyncratic, but the fact that it was unanticipated highlights one of the central advantages of field experiments: they allow researchers to uncover effects of interventions even if they are not anticipated. More generally, the study illustrates the potential use of field experimental methods to evaluate the effects—anticipated or not—of democracy assistance programs such as international election monitoring.
Conclusion
There is a great need for increased learning about the causal effects of a range of democracy promotion programs. Working collaboratively, scholars and practitioners can use experimental methods to confirm the short and long-term effects of existing programs, uncover unanticipated effects, refine existing programs over time, test the relative efficiency or cost-effectiveness of different methods, and evaluate new programs before they are phased in on a larger scale. The subfield of development economics has experienced a dramatic increase in the use of experimentation, and scholars and policy-makers in some parts of the field are now working together in long-term cooperative relationships aimed at what Abhijit V. Banerjee and Esther Duflo call an “iterated process of policy learning,” whereby field experimentation is employed as a recurrent element of program evaluation and the lessons learned from previous studies help inform future policy making and the design of additional field experiments.Footnote 65 In this model, field experiments are not a one-shot activity, but are built into a long-term plan to understand the conditions under which various programs are effective. A similar model of applied social science is relevant to the democracy promotion field. International organizations, NGOs, and individual states that consistently engage in a democracy promotion have the incentive and the opportunity to incorporate field experimentation to pilot “test” their new programs and incorporate ongoing evaluation into their existing programs. There are scores of such international democracy promoters, and a partial list includes the European Union, the Organization for Security and Cooperation in Europe's Office for Democratic Institutions and Human Rights, the National Democratic Institute, the International Republican Institute, the United States Agency for International Development, the United Kingdom's Department for International Development (DFID), the Millennium Challenge Corporation, the Asia Foundation, the United Nations, and the Organization of American States. There are also hundreds of within-country pro-democracy organizations, such as those that organize domestic election monitoring missions. A number of these organizations have already expressed interest in or begun to incorporate field experimentation into their work.
Many have criticized field experimentation in general, insisting that experiments are unethical and interventionist, that they are unlikely to answer interesting and important questions, and that such efforts are likely to devolve into “mere” program evaluation.Footnote 66 However, democracy promotion is at least one issue area that has enormous potential for mutually beneficial learning and cooperation between academics and practitioners. In response to a similar debate over the utility of field experiments in development economics, Banarjee and Duflo defend their use of experiments and their relationship to policy-making:
To be interesting, experiments need to be ambitious, and need to be informed by theory. This is also, conveniently, where they are likely to be the most useful for policymakers. Our view is that economists' insights can and should guide policy-making … They are sometimes well placed to propose or identify programs that are likely to make big differences. Perhaps even more importantly, they are often in a position to midwife the process of policy discovery, based on the interplay of theory and experimental research.Footnote 67
I have presented a very optimistic view of the potential use of field experiments in democracy promotion. Field experiments provide an opportunity for the increased cooperation between policymakers and academics. Such mutually beneficial cooperation has long been a goal of many individuals in the field. Scholars of field experimental methods can provide a clearly defined area of expertise that is not currently abundant among democracy promoting organizations. To the extent that the specific substantive areas being tested are also interesting to scholars, many academics will be willing to trade their labor and expertise for access to data and permission to publish the findings. Combined with existing research methods, a long-term cooperative relationship would ideally play a central role in revealing the conditions under which various democracy-promotion programs produce their intended effects, identifying which types of democracy promotion are most efficient, and analyzing the conditions under which specific programs are most likely to have positive or negative effects, and whether such interventions have unintended (but potentially positive) consequences. For scholars and practitioners interested in pursuing field experiments in democracy promotion or related areas, a number of other excellent resources are already available.Footnote 68 Indeed, The Annals of the American Academy of Political and Social Sciences recently published a special issue on field experiments in comparative politics and policy, edited by Donald P. Green and Peter John, which contains a number of relevant essays.Footnote 69
At minimum, I have sought to make clear how random assignment of international election observers can be used to study whether and how international actors influence electoral behavior, and how the knowledge gained through such studies can generate better understanding of democracy-promotion efforts. In the case of the 2004 presidential election in Indonesia, the evidence suggests that on average, the presence of observers caused an increase in total votes cast for the incumbent, Megawati Sukarnoputri, who went on to lose the election and peacefully transfer power to her competitor. These results suggest that even in a relatively clean election, observers can change election-day behavior in a manner that can disproportionately benefit some candidates, and more importantly, demonstrated that observers can have unanticipated effects on election-day behavior. This experiment—and other like it—are ideal for replication in other settings, and similar field experimental methods should be applied to advance our understanding of the effects of election observation and of other democracy promotion activities. Although the payoffs from such efforts are far from certain, and debates over the value of field experiments will certainly continue, the potential benefits for both theoretical and practical understanding are enormous.
Supplementary Materials
Explanatory File http://journals.cambridge.org/pps2010018
Estimates Conducted on Votes Cast for SBY http://journals.cambridge.org/pps2010019