1. Introduction
Transitional justice encompasses criminal prosecutions, reparations, institutional reform, truth commissions, commissions of inquiry, and memorialization, among other measures. Each of these processes endeavors to determine accurately what happened in the past. Respect for the memory of victims, allocation of proportional responsibility among perpetrators, and determination of legal responsibility all depend upon an accurate historical memory built on empirical evidence. Quantitative analysis of the patterns and magnitude of past violence is a small but critical piece of this process. Statistical evidence can contribute to this process, as it did, for example, in the 2013 trial of General José Efraín Ríos Montt. Statistical analyses indicated that members of the indigenous population were eight times more likely to be killed by the army than their non-indigenous neighbors. The judges found this to be compelling evidence consistent with the charge of genocide. Perhaps most importantly, the judges noted that the statistical evidence confirmed, “in numerical form, what the victims said.”Footnote 1
In the best of cases, this is precisely what quantitative analyses can offer: both confirmation and generalization of individual experiences. John Hagan, Heather Schoenfeld, and Alberto Palloni describe how the establishment of casualty counts can contribute specifically in the context of international criminal law: “The structure of international criminal law ... holds to the dictum ‘no body, no crime.’ This presents significant problems for lawyers investigating or prosecuting heads of state for crimes against humanity that can be hidden behind the doctrine of state autonomy. To establish legal responsibility, either bodies must be uncovered from mass graves and identified, as was done in Srebrenica, or the number of deaths must be otherwise convincingly established.”Footnote 2 Later in that same piece the authors describe Ball’s presentation of statistical analyses and resultsFootnote 3 to the International Criminal Tribunal for the Former Yugoslavia (ICTY) and optimistically conclude that “[t]his testimony is likely to play a significant role in the judicial panel’s decision about the Kosovo phase of the Milošević case.”Footnote 4
For statistical evidence to contribute to historical clarification, the statistics have to be right. Relying solely on what was observed to draw quantitative comparisons and conclusions is insufficient. This paper focuses on the role statistical analyses of patterns of fatal violence can play in transitional justice mechanisms, and the limitations and potential pitfalls that exist when such analyses are unsupported by the available data.
The remaining sections of this paper outline the kinds of quantitative comparisons that are frequently of interest in transitional justice and why and how most unadjusted observed data is insufficient for these kinds of comparisons, and briefly introduce how to adjust for limitations in observed data and conduct appropriate analyses using the kinds of data sources that are frequently available to researchers and advocates in a transitional justice setting.
2. Using Observed Data to Draw Conclusions About Unobserved Data
Emerging technology has provided new ways to record and publicize observed human rights violations.Footnote 5 But we remain limited by what is observable, and many human rights violations are either unobserved or unrecorded. An individual shot in the woods may leave behind only the perpetrator as witness. A child who escaped the massacre of her village may be too traumatized and fearful to be able to talk about what she witnessed. Communities living in remote areas may not be reached by documentation efforts. People of a marginalized ethnicity may not trust journalists or even human rights activists from other ethnicities. There are countless other scenarios in which there is no record of a homicide.
Yet we must do our best to account for these missing stories if we plan to use quantitative analyses as inputs to transitional justice mechanisms. As noted above, statistical analyses can contribute evidence to transitional justice processes. When we base these analyses solely on unadjusted observed data, we are implicitly assuming that any violations we did not observe are the same as the violations we did observe.
To be more explicit, using only observed data makes a strong but often unspoken statistical assumption that either every single violation was observed and recorded, or that observed, recorded violations represent (in a statistical sense) those violations that were either unobserved or unrecorded (this is discussed in more detail below). These are both strong assumptions and are generally unmet in transitional justice (among other) settings. Using raw data as a proxy for statistical patterns is very likely to misinterpret patterns of violations. Worse yet, by ignoring the unrecorded violations, we do a disservice to victims whose stories have not yet been told. Perversely, the worst events may leave the fewest witnesses, and consequently, these events have the lowest probability of being reported.Footnote 6
Transitional justice mechanisms may be served by asking questions such as: “Did violence increase or decrease when control of a region shifted from one armed group to another?” “Has the indigenous population experienced more violations than the non-indigenous population, consistent with patterns of ethnic targeting or genocide?” “Which armed group is responsible for the majority of violations?”Footnote 7 Answers to such questions can inform determination of legal responsibility, allocation of proportional responsibility among perpetrators, and more generally help to accurately depict the ebb and flow of conflict over time and geographic space. All of the above examples require statistical inference, that is, drawing conclusions about a population based on an observed sample of that population. Statistical inference is only appropriate if one of three conditions is true: (1) when the entire population has been observed, that is, the sample is a complete census; (2) when there is a mathematically known probability relationship between the sample and the population, usually satisfied by drawing a sample randomly; or (3) when the sample is adjusted by one of a set of post-sampling techniques, including raking and multiple systems estimation (MSE), among others. MSE will be discussed in detail in the following sections.
When we use observed, recorded violations to answer such questions, we are assuming either that observed, recorded violations are a complete set of all the violations that have occurred (i.e., that condition (1) is met), or that they are representative, in a statistical sense, of all violations (i.e., that condition (2) is met). In some rare circumstances, one of these assumptions may be true. For instance, there are examples of attempts to completely enumerate victims.Footnote 8 However, it should be noted that in each of those cases assumptions must still be made regarding whether or not every victim has been counted. Philip Verwimp (2010) and Romesh Silva and Patrick Ball (2006) each discuss this challenge directly.Footnote 9 In the case of both the Bosnian Book of the Dead and the Kosovo Memory Book, additional data sources were available to enable evaluations of completeness.Footnote 10,Footnote 11 Additionally, there are many examples of surveys,Footnote 12 which use random samples to represent the underlying population of victims. Many of these include questions used to calculate estimates of direct conflict mortality and/or excess mortality.Footnote 13,Footnote 14
Complete enumeration is very time consuming and expensive. For example, the projects in Bosnia and Kosovo each required over a decade to complete and depended on extensive pre-conflict literacy and population registration, as well as sustained attention by research teams supported by farsighted donors. Surveys to estimate mortality in violence are technically challenging and potentially fraught with errors.Footnote 15
It is rarely feasible for transitional justice projects to attempt a complete enumeration or to conduct a survey. Consequently, many human rights researchers rely on another type of data: convenience samples. These samples include an unknown proportion of the population and have an unknown probability relationship to the population. A census is a list of every possible element in the population; a survey samples a fraction at random; convenience samples include all other kinds of data.
The key difference between random and convenience samples is the way in which the data are collected. Records in a random sample are selected via a probabilistic mechanism. Every member of the population has a known probability of being selected. When properly implemented, random selection produces samples that are representative of the population of interest. This means that the sample accurately reflects important features of the population, such as the proportion of males to females, children to adults, urban versus rural households, etc.
There are a number of ways to select a random sample: a computer can generate a series of random numbers; dice or playing cards can be used to randomly select events or individuals; or every kth house can be selected from a random starting point, for example. The important feature of random selection is that the sample does not include people based on the subjective choices of the researcher or choices by the individuals to be included or excluded in the sample. Therefore, random samples can be used to mathematically calculate the probability of selection for every person selected in the sample. This probability of selection then tells us how many elements in the population our sampled record represents. In other words, random samples are incomplete in the sense that they do not include the entire population, but they are incomplete in a predictable, measurable way—assuming the random sample was collected correctly.Footnote 16 This makes random samples appropriate for the kinds of quantitative comparisons described earlier.
Despite the name, most convenience samples are very difficult to collect, and many are collected very systematically and rigorously. Examples of convenience samples in human rights work include testimonies to truth commissions, press reports, border crossing records maintained by officials, bureaucratic records kept by police and other security forces, SMS messages sent to an activist network, human rights non-governmental organization (NGO) reports, and messages sent via social media. Countless examples of convenience samples are the result of excellent, well-designed data collection projects conducted under incredibly difficult and harrowing circumstances. These are valuable, important projects.
Unfortunately for statistics, disciplined, systematic, meticulous data collection is not a replacement for random data selection. No matter how rigorously it is managed, human rights data from non-random samples is not representative of all the human rights violations that occur during a conflict, except by coincidence. We may be able to speculate about potential differences between the kinds of violations included and excluded in a convenience sample, but without additional data sources and appropriate statistical analyses, it is impossible to know in any rigorous way what is missing from a convenience sample.
Non-random human rights data are valuable sources of information and contain important contextual, qualitative details, but at the same time entail certain biasesFootnote 17 that make them unsuitable for generalization. For example, individuals who are aware that a truth commission has been formed and choose to tell their stories may not have had the same experiences as those who choose not to or are unable to tell their stories.Footnote 18 Events that are covered by the media may differ from events that are not deemed newsworthy but nonetheless involve the same kinds of violence.Footnote 19 An unknown subset of the population may have internet access, and even more importantly, of those who have internet or cell phone access, a different fraction may feel comfortable using such technology to tell their story.
None of the above concerns implies that these are not important sources of information. Again: all of these are valuable data collection mechanisms. However, convenience sample data does not support conclusions about patterns of violence. Conclusions based on patterns observed in convenience sample data tell us about patterns of reports of violence. But since convenience samples contain an unknown proportion of the population, and bear an unknown relationship to the population, there is no scientific or mathematical basis on which to draw quantitative conclusions from those observed reporting patterns about patterns of actual violence. When we use convenience sample data to infer that more violence occurred in this area than that area, or that this group is responsible for more violence than that group, we are discounting the portions of the population not included in the convenience sample. As a result, we run the risk of drawing the wrong conclusions, making the wrong decisions. And in transitional justice research, that has real implications for policy decisions, resource allocation, and accountability.
Notably, this limitation of observed data has long been understood within the field of criminology. Beginning in the late nineteenth century, there has been an extensive body of literature on the unobserved “dark figure” of crime and the effect this has on observed patterns: “Because of the partial and selective nature of the police data, comparisons based on them of variations in ‘actual crime’ over time, between places, and among components of the population, are all held to be grossly invalid.”Footnote 20,Footnote 21 Police data are a record of crime that is “known to police,” a precise example of a convenience sample. As a result, research in criminology has looked for alternative data sources and analytical methods to account and adjust for missing data. Approaches include victim surveys (which have their own challenges and limitationsFootnote 22) and MSE, the statistical method introduced in section 5.Footnote 23
When we use a single convenience sample to compare violence committed by groups A and B to conclude, for example, that more violence was committed by group A than B, we are implicitly assuming that violence committed by group A was reported at the same rate as violence committed by group B. Otherwise, differences in observed rates of violence might be an artifact of differences in rates at which violence was reported and attributed to each group. This is the challenge we encountered in our work with the Peruvian Comisión de la Verdad y Reconciliación (CVR), where one of the key questions was what proportion of the violence was perpetrated by the guerrillas of the Shining Path and what proportion was perpetrated by agents of the state. Analyses conducted by the American Association for the Advancement of Science (AAAS) used testimonies collected by the CVR and databases collected by the governmental Defensoria del Pueblo and by human rights NGOs. These analyses found that Shining Path were identified as perpetrators in slightly less than half of the total number of testimonies collected by the CVR and a much smaller proportion (between 5% and 16%) in the other data sources.Footnote 24 These findings suggest a different reporting rate for Shining Path and state agents across the different sources. However, without access to multiple data sources and appropriate statistical techniques, this would be impossible to detect and adequately adjust for. Ultimately, the AAAS researchers combined the multiple data sources (using a method similar to the one described in section 4) and conducted statistical analyses (similar to the method introduced in section 5) to conclude that 46 percent of all conflict-related deaths were perpetrated by the Shining Path and 30 percent by agents of the Peruvian state.Footnote 25 It is fundamentally the task of a truth commission to tell the truth, and these statistical findings enabled the CVR to make a much clearer argument about the relative responsibility for gross human rights violations of the Shining Path relative to the Peruvian State.
The remainder of this paper presents a case study from the ongoing conflict in Syria to highlight several sources of convenience data (in the absence of complete or randomly selected data) and introduce statistical methods necessary to use such data to draw quantitative comparisons.
3. Case Study—Syria
On the heels of the Arab Spring revolutions beginning in December 2010, armed conflicts began in Syria in March 2011. What started as protests demanding that President Bashar al-Assad resign soon saw the Syrian Army deployed to stop the civilian uprising. Since then, violence has escalated across Syria. Amid this continuing violence and humanitarian crisis, local human rights activists and citizen journalists risk their lives to document human rights violations. The grave challenges they face are compounded by the regime’s active suppression of information flow out of the country. Updated census or other vital statistics are not available, and the current environment makes it extremely dangerous and difficult (if not impossible) to administer a survey (though some information is being collected in refugee campsFootnote 26). As a result, there is considerable uncertainty about the total number of violations and their patterns over time and location.
In early 2012, the United Nations Office for the High Commissioner for Human Rights (OHCHR) commissioned the Human Rights Data Analysis Group (HRDAG) to examine multiple convenience samples collected by Syrian NGOs relying primarily on local networks to document conflict-related deaths in Syria. Three earlier reportsFootnote 27 provide in-depth descriptions of these sources. In this example, we focus on four sources (lists of deaths) that cover the entire length of the ongoing conflict and have continued to share with OHCHR and HRDAG updated records of victims:
• the Syrian Center for Statistics and ResearchFootnote 28 (CSR-SY)
• the Syrian Network for Human RightsFootnote 29 (SNHR)
• the Syria Shuhada WebsiteFootnote 30 (SS)
• the Violations Documentation CentreFootnote 31 (VDC)
For brevity, each list will be referred to by its acronym throughout the following sections.
We conducted basic descriptive statistics looking at each of the datasets separately. As indicated in Figure 1, the distribution of recorded deaths over time looks quite similar for these documentation groups. However, note the very different magnitudes of the respective y-axes for each group.
This appearance of broad agreement across the multiple sources, when aggregated across the entire country, creates the impression that the Syrian conflict is a thoroughly well-documented conflict. And indeed it is, thanks in large part to a highly literate, technologically savvy population willing and able to document the violence occurring in their country. Yet despite this immense work, it is important not to be misled by the apparent consistency into mistakenly relying on any one of these sources to draw conclusions about patterns of violence. Expansion of our comparisons to specific times and locations of interest reveal conflicting patterns in the observed data. Furthermore, comparing the observed patterns with estimates of the total deaths shows that the observed patterns can ignore peaks and increases at key historical moments, and thereby present exactly the wrong picture. These patterns are the background form, the “macro-truth” that can inform transitional justice mechanisms.Footnote 32 This will be elaborated in the following examples and sections.
In Figure 2, three sources (CSR-SY, SNHR, and SS) all indicate a rise in reports of violence in Deir ez-Zor in August 2011. Records from VDC do not indicate this rise in reports of violence. This time period corresponds with reports of protests and government offensives.Footnote 33
As in the example of the Ríos Montt case in the introduction, quantitative analyses have the potential to support victim narratives. But individual convenience samples may tell conflicting narratives, since each data source captures different snapshots of the violence. During the chaos of August 2011 in Deir ez-Zor, it is entirely possible that each of these documentation groups had access to different segments of the community, were told different stories, or were only able to verify a subset of the reports they received. For quantitative analyses to clarify rather than confuse, we must build from the observed reports and use the differences in these data sources to determine a more accurate picture of what happened. The specific statistical process to achieve this will be described in a later section.
To be clear, each of the sources in Figure 2 is important, and each adds unique events not observed by the others. Our concerns about conflicting narratives are not meant to criticize any of these sources or the efforts of these documentation groups. Rather, the point is that we cannot assume that any single source is sufficient to tell the full quantitative story of violence in Syria. Aggregating sources into a single merged dataset is a step in the right direction. But this merged dataset is still susceptible to the biases present in each contributing dataset. Statistical inference must be used to adjust for these biases. This will be addressed in the following sections.
Figure 3 shows a roughly similar pattern of decreasing reports of deaths in Hama between December 2012 and March 2013, though SNHR and SS indicate a slightly contradictory pattern in February 2013. Much like Figure 1, this is precisely the situation where we might mistakenly conclude that the observed records of deaths are indicating an approximately correct, if not complete, picture of the violence. We might conclude that each source is likely slightly undercounting the number of victims, but that the overall pattern of a decrease in violence between December 2012 and March 2013 is probably accurate. We will return to this example in the following sections, as our preliminary statistical estimates of the total number of victims indicates that in fact this apparent pattern of decreasing violence is dramatically incorrect.
It is important to keep in mind that during this time period Hama was under contested control between rebel groups and the Syrian army. Rebel units were described as launching an “all-out assault on army positions across Hama” in mid-December 2012,Footnote 34 whereas by February 2013 McClatchy was describing a “wave of displacement …when the government, seeking to reverse rebel gains, began a heavy-weapons assault….”Footnote 35 This is precisely the situation where a transitional justice process is likely to involve comparative questions about patterns of violence as a way of examining perpetrator responsibility. Did violence increase or decrease as control over key regions changed hands from opposition groups to the state (and, in some regions, back again)? Similar analyses of patterns of violence were used in Kosovo to answer the question of whether refugees were more likely to be fleeing the NATO bombing campaign, actions by the Kosovo Liberation Army, or something else entirely. This analysis was presented as expert testimony to the ICTY.Footnote 36
4. Aggregating Multiple Sources
Combining multiple sources into a single convenience sample has been a popular approach in human rights work for decades; we have listed here only a few of the hundreds of projects that have used this approach. Truth commissions have incorporated external information at least since the Salvadoran Truth Commission published Anexo II as part of their 1993 report, in which they combined databases from approximately six governmental and non-governmental sources.Footnote 37 Many human rights NGOs around the world have used this technique. For example, the International Center for Human Rights Research in Guatemala (CIIDH) in the 1990s, and the Colombian Commission of Jurists (CCJ) in Colombia in the 2000s combined victim testimonies, other NGOs’ reports, and press sources, and calculated statistics from the combined database.Footnote 38 Many academic projects have combined maps, household surveys, archives, and victim testimonies.Footnote 39 Various media monitoring projects have integrated multiple publicly available sources via human or automated methods.Footnote 40
Automated (or semi-automated) procedures for identifying multiple records that refer to the same individual, potentially within the same source or across multiple sources, is an active topic of research in statistics and computer science; it is referred to variably as record linkage, database deduplication, or matching.Footnote 41 Not only must multiple records that describe the same individual victim be identified and merged into a single, complete record, but information about which source(s) contributed the original record(s) must also be maintained. This last piece of information is key to the final step, modeling the documentation patterns.
To determine whether multiple records refer to the same individual, we begin with records with sufficiently identifiable information. For this case study, we used records that include the name of the victim, and date and location of his or her death. Additional demographic variables, such as age (or date of birth), sex, and location of birth may be used for the record linkage process. In our experience, at a minimum, a record must include a name, date, and location to be considered sufficiently identifiable for the record linkage process. Unfortunately, this means discarding a large number of records because there is no reliable way to determine if, for example, an unnamed body reported by source A in fact refers to a named victim included in source B. It is impossible to reliably match records that lack sufficient identifying information. This also highlights the importance of the final step, estimation, to account for these unidentified victims.
Determining whether multiple records refer to the same victim using semi-automated methods involves drawing many comparisons between many pairs of records.Footnote 42 The size of this problem scales rapidly with the number of initial records to consider—specifically, if we compare every possible pair of records, we must conduct (n2) / 2 comparisons, where n is the number of records across all sources. For the Syria case study, we currently have approximately half a million records, resulting in more than a hundred billion possible comparisons. This can be reduced somewhat by comparing only those pairs within certain blocks of records (e.g., only comparing pairs of records from the same geographic area or period of time), but generally this still requires tens or hundreds of millions of comparisons. Choosing which records should be compared can be challenging.Footnote 43
Many kinds of comparisons are then calculated for each pair. For example, some comparison metrics might include the distance between the location of death for each record, the number of days between the reported dates of death, or how phonetically similar the two names are.Footnote 44 Importantly, these are but a few examples; many comparisons are calculated for each pair. A classification model then uses these comparisons to calculate the probability that any two records refer to the same individual. A threshold is selected, and pairs of records with a match probability above this value are considered to refer to the same individual.
Another key step in this process is human review in which a person reviews a subset of pairs and labels each pair as referring to the same individual or not. These labeled pairs are used to train the classification model. This is also an iterative process.Footnote 45 Following each run of the classification model a human will review and label another subset of pairs until the decisions made by the classification model match the decisions made by the human.Footnote 46 This makes it possible to scale the record linkage process to millions of pairs. A human cannot review that many pairs, but a human can train a computer to mimic their decision process and thus label millions of pairs.
More than simply producing a single integrated list of uniquely identified victims, this process makes it possible to start examining both overlap and reporting patterns. Figure 4 returns to the data presented in Figure 2 and looks specifically at the number of victims recorded by both SNHR and CSR-SY (the “overlap” between these two sources, the darkest grey shading) as compared to the number of victims recorded by only CSR-SY (the next lighter shade of grey) or only SNHR (the lightest shade of grey). Figure 4 shows that although CSR-SY and SNHR report comparable numbers of victims in Deir ez-Zor, each source is not necessarily reporting all of the same individual victims.
Figure 4 considers the specific overlap patterns between just two sources; Figure 5 provides another way to consider the information provided by the matching process by looking at the total number of sources reporting each victim (in this case returning to the example from Hama shown in Figure 3). The lightest section of each bar in Figure 5 indicates the number of documented deaths recorded in all four datasets. The next darkest section indicates documented deaths recorded in three out of the four datasets, followed by two out of the four, and the darkest grey section of each bar indicates the number of deaths recorded in only one of the datasets.
Figure 5 indicates a similar overall pattern of decreasing violence as seen by each individual source in Figure 3. However, note that none of the individual lines in Figure 3 match exactly the pattern in Figure 5. This is easier to see in Figure 6, which includes the total number of documented deaths identified after matching all four sources (labeled Nk). Figures 4 through 6 are each different ways to visualize the fact that each of the four sources contributes some records that are also included in one or more of the other sources and some records that are only documented in that single source.
Figures 4 and 5 visualize the key piece of information needed to model documentation patterns—the overlap patterns—that change over time. For example, many more victims were reported by two and three sources in Hama in December 2012 than in the other months. By measuring the number of victims recorded by all four sources, different combinations of three sources, different combinations of two sources, and just one source, we can model the documentation pattern itself, and use that model to estimate the victims who are recorded by zero sources. The undocumented victims are in the “dark figure,” those who are not observed by any of these four projects. The following section describes this estimation process.
5. How Do We Know What We Don’t Know?
A broad category of methods, referred to collectively as Multiple Systems Estimation (MSE),Footnote 47 use multiple samples (convenience, random, or combinations of both) to estimate the total population, including cases that have not been documented (i.e., the dark figure), and thus provide a way to draw statistical inferences. MSE has been developed over the past century in a variety of fields, from ecology Footnote 48,Footnote 49 to demographyFootnote 50,Footnote 51,Footnote 52,Footnote 53 to epidemiologyFootnote 54,Footnote 55,Footnote 56,Footnote 57,Footnote 58,Footnote 59 to human rights.Footnote 60,Footnote 61,Footnote 62,Footnote 63,Footnote 64,Footnote 65 This diversity of fields all rely on MSE methods to use the observed pattern of overlaps, that is, events recorded in two or more samples, to model the underlying population.
The work was initially developed in ecology as a way to estimate the size of animal populations. Imagine wanting to know how many fish are in a lake, denoted N. It certainly would not be reasonable to catch and count every single individual fish, and it would be impossible to confirm that every fish had been caught. But it is possible to cast a net into the lake, catch some number of fish, x, tag them, and throw them back. Repeating this process the following day, we catch y fish. But of the y fish caught on the second day, some portion, z, bear the tags from the previous day. These three numbers, x, y, and z, can be used to calculate an estimate of the total number of fish in the lake: ${\rm{\hat N}} = ({\rm{x}} \cdot {\rm{y}})/{\rm{z}}$ (the “hat” on the N indicates that it is an estimate). This total estimate will include all those fish caught on either day as well as those never caught. In other words, the estimated total includes both observed and unobserved members of the population.
This estimate, called the Lincoln-Petersen estimator (after the original researchers who derived it) assumes just two samples and requires some additional strong assumptions, which are typically not met in human rights (or many other) applications. Fortunately, as described above, the broad category of MSE methods have been expanded to apply to problems in a variety of fields, and this expansion includes methods appropriate for three or more sources and allow for more realistic assumptions (as well as ways to test how sensitive substantive conclusions are to potential violations of those assumptions).Footnote 66 As these methods have been applied in demography, public health, and ultimately human rights, multiple lists of individuals have replaced the idea of captured animals. The process of linking records, to determine whether multiple records refer to the same individual, has replaced the idea of tagged animals. But the underlying mathematical theory remains the same: patterns of overlapping observed records can be used to estimate the size of an entire population.
MSE analyses provide estimates of the entire population, both observed and unobserved. In doing so, the estimates control for many of the biases present in the contributing sources. MSE estimates of the entire population are therefore appropriate for precisely the kinds of comparative analyses described in the preceding sections. With proper statistical inference, accounting for the undocumented victims, we are able to determine if observed reporting patterns reflect the true pattern of violence.
Complete MSE analyses are still under development using a number of sources documenting killings in Syria, including those described in the previous sections. But our preliminary analyses,Footnote 67 much like our analyses of conflicts in other countries,Footnote 68 indicate that even in a seemingly well-documented conflict, there are acts of violence that are missed. Figure 7 builds on Figures 3 and 6 by adding the estimated total number of victims as calculated from MSE analyses. The five lines at the bottom of Figure 7 are the four individual sources (SS, SNHR, CSR-SY, and VDC, as shown in Figure 3) plus the total number of recorded victims from the matched dataset (Nk from Figure 6). The solid line in Figure 7 labeled nhat is the estimated total number of victims, both observed and unobserved, based on MSE analyses of the underlying documentation patterns. The grey shading around the solid line represents the 95 percent bootstrapped confidence interval around the estimate.
Figure 7 shows dramatically just how misleading observed patterns of violence may be. All four sources and the matched dataset indicated a steady decrease in killings in Hama between December 2012 and March 2013. However, estimates accounting for the dark figure indicate a significant spike in killings in January 2013. These killings are undocumented, at least among the four sources included here. Failing to account for the dark figure ignores the key finding about Hama during this period, that killings increased sharply in January 2013. Using the raw data for statistical inference would lead to exactly the wrong conclusion about the conflict during this period.
6. Conclusions
Convenience samples are a valuable source of contextual details and qualitative information. But they inevitably tell only a portion of the story, making them, on their own, insufficient for the kinds of comparisons that are frequently of interest to transitional justice researchers. Notably, collectors of convenience samples are nearly always knowledgeable and forthright about the incompleteness of their datasets. For example, reports from SNHR frequently include the following statement: “It is noteworthy that there are many cases that we were unable to reach and document particularly in the case of massacres and besieged areas where the Syrian government frequently blocks communication.” Estimation provides a way to include these undocumented victims in transitional justice mechanisms. If we cannot name all the victims, the least we can do is count them.
Appropriate quantitative analyses that account for the hidden dark figure of violence have the potential to contribute to transitional justice mechanisms via empirical evidence supporting the memory of victims, allocating proportional responsibility among perpetrators, determining legal responsibility, and developing historical memory and clarity. Such comparisons are only supported by complete data (e.g., a census), randomly selected data (such as a survey), or projections from multiple sources via statistical modeling (such as MSE or other post-stratification methods).
Although examples of censuses and randomly sampled data exist in the fields of human rights and transitional justice research, they are rare and relatively expensive. We should not abandon quantitative analyses when the only available data are convenience samples, but we also should not naïvely treat these samples as if they are complete or predictably incomplete. Inadequate data analyses that fail to account for what is missing in observed data can confuse decision-making. As demonstrated in the example above from Hama, bad statistics are worse than no statistics. They hide what we do not know behind a presentation of seductive but false precision. Questions driven by transitional justice goals are too important to get wrong; we owe it to victims, witnesses, and communities transitioning out of conflict to apply the best methods of all of our disciplines to get these answers right.