Gerrymandering is the practice of designing political districts whose shapes serve some agenda, often the consolidation of power by a political party or the disenfranchisement of a group such as a minority population. In 2018, litigation relating to gerrymandering was underway in at least twelve U.S. states, with several cases reaching the U.S. Supreme Court. In the same year, Colorado, Michigan, Missouri, Ohio, and Utah approved referendums intended to limit gerrymandering through the use of independent commissions. In 2019, cases from Maryland and North Carolina reached the Supreme Court, which ultimately rejected the federal judiciary’s role in districting. With this decision, both major political parties have begun to focus on state-level legislation and legal proceedings. The high-profile nature of these cases and citizens’ demands for solutions has led to interest in developing ways to measure district fairness.
Such measures give concrete axes along which districts and districting plans can be compared. However, the data used to measure a district may have noise or errors. The choices surrounding how a measurement is made also interact with each other to form a “garden of forking paths” (Gelman and Loken 2013) in which each choice affects the outcome of the others. This compounding can have a significant effect on certain scores that appear mathematically reasonable. We demonstrate this issue in a case study by showing that common ways of measuring the shape, or compactness, of districts are affected by several factors irrelevant to fairness or compliance with civil rights law. We further show that an adversary could actively manipulate these scores to affect the assessment of a given plan.
The U.S. Supreme Court has considered the shape of electoral districts in a number of cases including Reynolds v. Sims (1964), Gaffney v. Cummings (1973), Thornburg v. Gingles (1986), Shaw v. Reno (1993), Bush v. Vera (1996), Karcher v. Daggett (1983), and Cooper v. Harris (2017). Aside from legal precedents, 37 states require that their state legislative districts be compact and 18 explicitly require compactness of their congressional districts.
Mathematically, the compactness of a district is a geometric quantity intended to capture how “contorted” or “oddly shaped” a district is. Although compact districts can also be gerrymandered and contorted shapes can arise from geographic or legal necessity, such as rivers or municipal boundaries, poor geometry is often understood as a signal of gerrymandering. For instance, in Bush v. Vera (1996), the Supreme Court condemned districts that were “bizarrely shaped and far from compact.” For these reasons, compactness is quantified during redistricting, though Thornburg v. Gingles (1986) demonstrates that many other considerations must also be made.
Many measures of compactness exist (Niemi et al. Reference Niemi1990; Altman Reference Altman1998; Chambers and Miller Reference Chambers and Miller2010), and mathematicians and legislators continue to debate their relative merits in promoting desirable district shapes. There has been less discussion, however, about how compactness scores should be implemented in practice.
Here, we use the US Census Bureau’s 2015 Cartographic Boundary and TIGER/Line data (United States Census Bureau 2016) to show how the variables used to calculate compactness are complicated by reality and how, even once quantitative scores are defined, confounding factors including geography, topography, cartographic projections, and resolution complicate implementation. Together, the ambiguities we expose provide a high degree of flexibility. We show that this flexibility can be exploited to engineer compactness scores that allow convoluted and gerrymandered districts to meet quantitative standards designed to prevent such abuse.
If policymakers are unaware that quantitative measures of electoral district fairness may be both intentionally and unintentionally manipulated to give a variety of outcomes, they may push to enact standards that are either insufficient or that can be gamed. This problem arose as litigants sought to use the efficiency gap (Stephanopoulos and McGhee Reference Stephanopoulos and McGhee2014) to detect gerrymandering even as it was shown that the measure was problematic (Alexeev and Mixon Reference Alexeev and Mixon2017; Bernstein and Duchin Reference Bernstein and Duchin2017; Chambers, Miller, and Sobel Reference Chambers, Miller and Sobel2017; Veomett Reference Veomett2018). Here, we show that similar problems exist for compactness. We also suggest that implementation flexibility and the accompanying potential for abuse is a general property of trying to quantify electoral fairness.
Section 1 is technical and exposes the full complexity and consequences of the many considerations that must go into calculating aspects of a compactness measurement. Section 2 provides a nontechnical summary of the results and shows how the methods we discuss here can be abused. Section 3 concludes with recommendations for the development and fair characterization of compactness scores. We additionally provide a model software implementation intended to avoid the pitfalls we highlight. All of the examples presented and some of the terminology used stem from United States geopolitics, but our ideas are applicable to districts in any context. Although we focus on compactness, our work provides a cautionary case study revealing challenges in quantifying any measure of gerrymandering.
1 Technical Considerations
In this section, we determine how compactness is affected by (1) the choice of mathematical definition, (2) contiguity, (3) topological holes, (4) the boundaries of political superunits, (5) map projection, (6) topography, (7) data resolution, (8) floating-point calculations, and (9) whether alternative choices were possible in drawing a district’s boundaries. We determine how often issues arise and quantify the impact of each of these considerations on measures of compactness. In Section 2, we will quantify the net impact when all of these considerations are combined, finding that each makes at least some contribution to affecting the quality of the measurements.
1.1 Definitions of Compactness
We identified over 24 different measures of compactness in the literature (Niemi et al. Reference Niemi1990; Altman Reference Altman1998). Of these, we consider three of the most widely used and their variants. These are illustrated in Figure 1 and are as follows:
-
1. Polsby–Popper (Polsby and Popper Reference Polsby and Popper1991): Given as $4\pi A/P^{2}$ , where A is the area of a district and P its perimeter. This score is also known as the “isoperimetric ratio” (DeFord et al. Reference DeFord2018).
-
2. Reock (Reock Reference Reock1961): the ratio of a district’s area to the area of its minimum bounding circle. Finding this circle is nontrivial; an efficient algorithm and associated implementation is given by Gärtner (Reference Gärtner1999).
-
3. Convex Hull (Niemi et al. Reference Niemi1990): the ratio of a district’s area to the area of its convex hull, the minimum convex shape that completely contains the district.
All of the above scores are in the range $[0,1]$ with higher values indicating greater compactness. Low values may indicate potential gerrymandering.
These scores are purely geometric. It may be that scores incorporating population densities or other demographic data provide a better means of measuring gerrymandering (Eig and Seitzinger Reference Eig and Seitzinger1981; Niemi et al. Reference Niemi1990), but they are outside the scope of our experiments. Regardless, all scores are subject to implementation flexibility of some sort. In fact, incorporating additional data might even exacerbate the issues we discuss, since doing so would create additional opportunities for implementation flexibility.
1.2 Data
In our experiments, we draw geographic information from the US Census Bureau’s 2015 Cartographic Boundary and TIGER/Line data (United States Census Bureau 2016) and use it to explore how implementation choices affect the measurement of the electoral districts of the 114th U.S. Congress. The Bureau’s data comes in several different scales or resolutions: 1:500,000 (500k), 1:5,000,000 (5m), and 1:20,000,000 (20m). Figure 11 depicts data at these different resolutions. High-resolution data (e.g., 500k) capture greater geographic detail at the expense of higher collection, storage, and computation costs whereas lower-resolution data (e.g., 20m) capture less geographic detail while reducing costs.
1.3 Nomenclature
All of the measures we consider assume that an electoral district is described by a single planar polygon, without any holes. This assumption is problematic and leaves the measures under-specified. In reality, districts, such as those with islands (see Figure 1), are often comprised of many polygons. While holes in districts are rarer, they also can occur. We have to modify the scores so they can cope with reality, but there are many ways we can do this.
We will indicate whether or not contiguity is accounted for in a score by the suffixes PT (polygons together) and PS (polygons separate). Whether or not holes are accounted for will be indicated by the suffixes AH (add holes) and SH (subtract holes). If there is ambiguity regarding whether area, perimeter, or some other quantity is being treated in this way, then terms such as PTaSHp (treat the area of the polygons together, subtract the perimeter of holes) may be used. The suffix B indicates that a score accounts for constraints imposed by the boundaries of political superunits.
1.4 Noncontiguous Districts
There is no federal requirement that districts must be contiguous and many states do not require it. Yet, most compactness measures assume contiguity. There are many ways of incorporating noncontiguity into compactness sores and each has a large effect. Avoiding the issue by requiring contiguity is likely impossible. Islands, such as Hawaii, make districts noncontiguous unless large bodies of water are included in the district, as discussed below. Disconnected districts may also arise in other ways. Civil rights considerations have given Louisiana 01, depicted in Figure 1, two large portions separated by Louisiana 02; Louisiana 02 was drawn as a majority–minority district following the passage of the Voting Rights Act of 1965. Wisconsin’s 61st Assembly District (Figure 2) exemplifies a different situation. The city of Racine, WI, became noncontiguous by annexing a nearby parcel, but both pieces of the city were included in the same district (Altman and McDonald Reference Altman and McDonald2011). For the 114th Congress 1:500,000 resolution data, 85 of 441 districts are not contiguous. Of the noncontiguous districts, the largest numbers of subdivisions were 580 (Alaska), 134 (Maine 02), 103 (Michigan 01), and 92 (Florida 26); the median was 5. Seventeen of the noncontiguous districts have portions that are separated by land; these include Kentucky 01 and Louisiana 01 (see Figure 1).
The way we treat noncontiguous districts has significant effects on many proposed ways of measuring gerrymandering, including compactness scores. Treating the district as a single unit by, for example, enclosing it in a single convex hull, will tend to result in lower compactness scores. Treating the district as separate units and summing the areas of the units $^{\prime}$ enclosing hulls will result in higher compactness scores.
Mathematically speaking, although Polsby–Popper is usually calculated as $4\pi \frac {A}{P^2}$ , there are several possibilities for extending this formula to noncontiguous districts, in particular $4\pi \sum _i^n \frac {A_i}{P_i^2}$ , $4\pi \frac {\sum _i^n A_i}{(\sum _i^n P_i)^2}$ , and $4\pi \frac {\sum _i^n A_i}{\sum _i^n P_i^2}$ , where i indexes the n noncontiguous subregions of the district. Although the original Polsby–Popper score is bound to the range $[0,1]$ , this is not true of the first of these alternatives. For a district with n noncontiguous regions, the second alternative has a range of $[\frac {1}{n},1]$ ; this variant of the score penalizes districts for being noncontiguous. The final variant, which we use to calculate scores in this paper, yields a value of one if each noncontiguous region is a circle thereby acknowledging that noncontiguity may arise while encouraging each region of a district to be compact.
Special attention should be given to noncontiguous districts to determine whether they result from natural features, legal requirements, or electoral engineering. In Figure 3, we calculate both the Convex Hull and Reock compactness scores for instances in which the polygons comprising a district are scored together versus separately, per Figure 1. Although the scores are nominally the same, a wide gap in values results from using the differing interpretations. This gap supports the need for precision in both language and implementation.
1.5 Holes
Holes are relatively rare in districts, but many of the same considerations apply. The city of Racine, WI is noncontiguous due to annexations, as mentioned earlier. Placing the city within a single voting district required Wisconsin $^{\prime }$ s legislature to draw the 61st State Assembly District in a way that creates both noncontiguity and holes (Figure 2). Texas 18 very nearly surrounds the urban core of Houston and could, in a low-resolution dataset, contain a hole. Holes also appear as artifacts of the digitization process (Figure 4). For the 114th Congress 1:500,000 resolution data, four of 441 districts have holes as artifacts.
1.6 Boundaries
Districts are constrained by borders imposed by higher geopolitical units as well as by nature. Compactness scores that do not account for such constraints may assign low scores to a district that are not meaningful. The panhandles of Florida and Oklahoma, as well as Kentucky $^{\prime }$ s border with the Ohio River (see Figure 11), contain electoral districts whose shape, at least in part, cannot be dictated by politics. The same is true of almost any coastal district since islands and peninsulas with their long perimeters must be included. Louisiana (Figure 11) exemplifies this challenge.
Some scores can be modified to account for this issue (Azavea 2006; Ansolabehere and Palmer Reference Ansolabehere and Palmer2016). These can be marked with the suffix B (borders accounted for). For example, in the case of the convex hull and Reock scores, if the hull or minimum bounding circle is intersected with a state polygon, the result is a better representation of what was possible and, therefore, a better indicator of whether gerrymandering took place. Taking boundaries into account in this way can have a considerable effect on compactness scores (Figure 5).
The boundaries of electoral districts, states, and countries may include large maritime regions, as shown in Figure 6. These regions are difficult or impossible to populate, except near shores, so their inclusion in compactness calculations may hide the effects of gerrymandering. Input data should be cropped to major coastlines to account for this, though doing so is not a panacea. Coastlines tend to be fractal and need to be measured in a way which is insensitive to this effect, as shown in Figure 14.
As Figure 7 shows, boundary data, especially when drawn from disparate sources, may not always co-align. We attempted to quantify this effect by overlaying high-resolution district data with medium-resolution state data and found that the impact was usually small (see Figure 8 for details). Problems can be avoided entirely by using data that are co-aligned, such as the data available from the U.S. Census.
1.7 Projections
Although scores are often defined as though districts exist on a plane, in reality they are wrapped around the curvature of the Earth and local topographical features. Several interpretations of scores are possible: Districts could be mapped to the plane using a projection designed to minimize distortion across an entire country, a subdivision of a country such as a state, or even the district itself. Alternatively, scores could be calculated on the sphere, WGS84 ellipsoid, or a similar body; we do not investigate this possibility here since it is used rarely in practice. As Figure 9 shows, despite all the possibilities, compactness measures appear to be stable to reasonable choices among localized (country-scale) map projections used in practice. Alaska demonstrates what happens when an unreasonable choice is made: its score in a projection suitable for the conterminous United States differs from that of an Alaska-specific projection by up to 20%.
Global projections, such as the standard Mercator, produce scores that differ markedly from local projections; therefore, global projections should not be used for calculating compactness scores—this includes the Web Mercator (EPSG:3857) projection, despite its ubiquitous use on the internet. Across all districts, scores, and projections, the absolute score difference between a district as measured in a locally optimal projection versus a conterminous projection was less than 0.009 in 99% of cases. The other 1% of cases comprise districts such as Alaska and American Somoa, which are outside the region of interest for the conterminous projections. Given this observation, nation-sized projections—excluding outlying states and territories—are likely reasonable choices. Quantitatively, the conterminous Albers Equal Area (EPSG:102003) projection has a maximum scale distortion of 1.25% (Deetz and Adams Reference Deetz and Adams1934); this value can reasonably be taken as an upper limit on the acceptable distortion for any projection used to measure compactness.
1.8 Topography
A different effect of mapping electoral districts to a plane is that topography, such as mountains, is left out of quantities such as area and perimeter. As a result, the true land area and overland distance between points is underestimated. Using the 30m USGS National Elevation Dataset (U.S. Geological Survey (USGS) 2016), we calculated the surface area of districts using RichDEM’s implementation (R. Barnes Reference Barnes2016) of an algorithm by Jenness (Reference Jenness2004) and modeled perimeter as the summed length of all the raster elevation cells at the edge of a district. The difference in Polsby–Popper scores between the topographic and nontopographic data was less than 0.03 for all districts, with 75% of districts having deviations less than 0.005 (Figure 10).
1.9 Resolution
Resolution can be thought of as the density of points describing a boundary. Figure 11 shows the same district at several resolutions. Lower resolutions obtained using standard simplification tools lead to simpler shapes often with shorter perimeters. The U.S. Census Bureau releases boundary data of Congressional Districts in four resolutions: full, 1:500k, 1:5m, and 1:20m (United States Census Bureau 2016). The full-resolution data are available as “TIGER/Line” data whereas the other resolutions are available as “Cartographic Boundary Shapefiles.” At these resolutions, the perimeters of the districts of the 114th Congress are defined by an average of 8914, 1531, 322, and 70 points, respectively.
We find that the choice of resolution has a substantial impact on compactness scores (Figures 12 and 13), with the popular Polsby–Popper score especially affected. This instability adds to a growing list of challenges for using the Polsby–Popper score in practice (Chambers and Miller Reference Chambers and Miller2010; Alexeev and Mixon Reference Alexeev and Mixon2017; DeFord et al. Reference DeFord2018). This suggests that lower-resolution data should be avoided, even if it could otherwise accelerate web and high-performance applications (Tam Cho and Liu Reference Tam Cho and Liu2016).
Since data may be supplied to users by outside sources, adversarial inputs are possible. Such inputs manipulate the data in ways which are sometimes hard to discern to alter measurement outcomes (Goodfellow, Shlens, and Szegedy Reference Goodfellow, Shlens and Szegedy2014). A high-frequency wave applied to the boundary of a district may be visually imperceptible while introducing substantial alterations to a district’s score. The Koch snowflake is an example of what an adversarial input might look: It has an arbitrarily long perimeter surrounding a finite area (Figure 14). More practically, data may contain digitization or simplification artifacts that only become apparent under significant magnification, as shown in Figure 4.
1.10 Choice
If only one possible plan exists for a jurisdiction, that jurisdiction cannot be gerrymandered and should be excluded from analysis. In the Census Bureau data used here (United States Census Bureau 2016), 13 states and territories, including Alaska, Delaware, and Vermont, had only one congressional district. No matter how oddly shaped these districts are, they are not gerrymandered.
1.11 Floating-Point Issues
Computers generally store fractional values based on the IEEE754 specification using either the 32-bit single-precision type, which gives about 7 decimal places of precision, or the 64-bit double-precision type, which gives about 15 decimal places of precision. If geographic boundary data is in the form of decimal degrees of latitude and longitude, as is often the case, then storing such data in a 32-bit type is sufficient to resolve centimeter-scale features; storing such data in a 64-bit type provides nanometer-scale resolution. Thus, 32-bit single-precision types might be sufficient for storing geographic coordinates. However, performing mathematics on fractional numbers, especially 32-bit types, gives potentially erroneous results thanks to rounding and other effects (Goldberg Reference Goldberg1991).
We tested for floating-point instability by computing all of the scores mentioned here using both 32-bit and 64-bit IEE754 compliant types, with the latter taken as the “true” value. Compactness measured in these two systems differed by no more than 0.027%.
1.12 Ordering
The foregoing considerations change not only the values of calculated scores, but also their relative ordering (Figure 15). If ordering is quantified using Spearman’s rank correlation coefficient (Figure 16), it is apparent that different scores give markedly different rankings. Thus, any ranking of districts by compactness is thoroughly tied to and arises from choices made in developing the scores. Figure 17 explores this issue further, as described below.
This section listed many of the major decisions that must be made to measure compactness. These decisions may be made in good faith by people making measurements without awareness of their implications. They may also be made by adversarial actors seeking to affect the outcome of political decisions. The decisions are not independent of each other. In combination they provide more flexibility in outcomes than any one decision does by itself. We explore this below, in Section 2.
2 Results
A number of choices must be made to compute a compactness score. In addition to the choice of (1) compactness definition, we have shown that it is also important to consider how to handle (2) noncontiguous districts, (3) districts with holes, (4) political superunit boundaries, (5) map projections, (6) topography, (7) data resolution, (8) floating-point uncertainty, and (9) whether alternative choices were possible in drawing a district’s boundaries.
In combination, these choices provide unanticipated and undesirable flexibility. This flexibility can be abused. Different implementation choices applied to what is nominally the same score can lead to very different conclusions about the fairness of a districting plan.
To demonstrate this effect, we have selected ten U.S. Congressional Districts widely considered to be gerrymandered. For each district, we performed a grid search over a range of values for each implementation choice, thereby applying the full flexibility detailed in this paper. Similarly to electoral outlier analysis (Ramachandran and Gold Reference Ramachandran and Gold2018), we were able to find sets of implementation decisions for which these districts’ compactness scores are outliers when compared against the full distribution of district scores. We were also able to find sets of decisions which make these districts appear reasonable by locating them near the mean of the distribution. That is, we can exploit implementation flexibility to build seemingly reasonable arguments that these districts are not gerrymandered, as well as to build arguments that they are.
Figure 17 shows the effects of such adversarial choices of parameters. Considered against all districts nation-wide, in the case of NC01, IL04, and PA07, it was possible to move the districts from being obvious outliers to having middle-of-the-pack status. In other cases, such as NC12, NC04, and TX35, it was not possible to move the districts to the mean of the distribution, but they could still be moved considerably closer, potentially obfuscating their outlier status. Similar effects were true when districts were compared only against other districts in their states.
As Table 1 shows, the optimizer does not need to use extreme settings to produce the desired results. For example, TX33 appears most gerrymandered using the CvxHullPTB score at a 500m simplification tolerance in a locally optimized Lambert conformal conic projection with all districts included in the distribution; it appears least gerrymandered using the ReockPT score with a 500m tolerance in a Gall projection with districts comprising an entire state excluded. A sensitivity analysis of the optimizer shows that the choice of score (e.g., Polsby–Popper, Reock) makes the greatest difference in the results, while the other choices all have similar effect sizes.
2.1 Open Source Tools
Of the many compactness scores discussed in the literature, some are better able to cope with the complexities discussed here than others. Many of the more robust metrics, however, are also difficult or impossible to calculate using commonly available software. For instance, QGIS (QGIS Development Team 2017) includes the area of multipolygons as a built-in display field, convex hulls as a function three menu levels deep, and has no functionality to calculate the minimum bounding circles needed for Reock scores.
To address this situation, we have released a family of open source packages which share a common library designed to efficiently, reproducibly, and correctly calculate a variety of compactness scores. The basis of this ecosystem is compactnesslib,Footnote 1 a C++ library and associated command-line interface which ingests bulk or single data in a variety of formats and calculates compactness
scores. The python-mander Python packageFootnote 2 (available via pipFootnote 3 ) and the mandeR R packageFootnote 4 provide high-level interfaces to this library. In addition, a QGIS pluginFootnote 5 provides GIS users an easy means of calculating scores (Archambault and M’ndange-Pfupfu Reference Archambault and M’ndange-Pfupfu2017; R. Barnes Reference Barnes2018; Barnes and Connors Reference Barnes and Connors2018; Metric Geometry and Gerrymandering Group 2018). This stack was utilized to produce the calculations in this paper: The complete source code for generating all the diagrams presented here is available at https://github.com/r-barnes/Barnes2018-compactness-implementation, as well as in a fully reproducible form on Code Ocean (Barnes and Solomon Reference Barnes and Solomon2020a) and archived on the Harvard Dataverse (Barnes and Solomon Reference Barnes and Solomon2020b).
Though this software has the potential to improve the measurement of compactness as embodied by the scores we consider here, it cannot solve gerrymandering on its own: there are many ways to engineer districts each of which has its own flexibility. In this sense the software represents a model of the specificity, accessibility, and transparency necessary for any method of measuring gerrymandering or drawing districts.
3 Discussion
3.1 Best Practices
Our results show the importance of clarity and transparency in the measurements used to evaluate potential voting districts. In general, a mathematical definition alone is not sufficient. Attention must be paid to data and algorithmic quality. As a model for the level of specificity needed to describe quantitative measures of districting plans, we suggest best practices for the calculation of compactness scores. These guidelines are the minimal set any expert would need to explicitly consider when evaluating compactness.
-
• Scores. Be explicit about what each variable in a compactness score means. Does area include holes? Is it constrained by political superunits? How should noncontiguous districts be handled? Score names should be distinct and informative. Appending a clarifying suffix to the name of a score (e.g., PTSHp) informs readers about algorithmic details. See above for examples.
-
• Projections. Scale distortion should be limited to only a few percent throughout the region of interest. Reasonable choices of national or local projections usually suffice.
-
• Resolution. Use the best-available resolution from a trusted source. Simplified or down-scaled data give altered results. Alternatively, choose a score that is robust to changes in resolution, like hull-based scores or recent multiresolution measures (DeFord et al. Reference DeFord2018). The U.S. Census Bureau produces reasonable data designed such that all borders that are at the same resolution align. Ideally, districting data should be drawn from a common, public, trusted, nonpartisan source.
-
• Border constraints. Scores that do not explicitly account for constraints imposed by superunit boundaries leave out valuable information about what was possible in drawing a district. That is, they may unfairly penalize a district for having an odd shape when no other shape was possible. Use a score that accounts for superunit borders. Be sure that borders are cropped to features such as major coastlines.
-
• Choice. Before doing statistics on a set of district plans, eliminate those districts that encompass an entire political superunit, as no other choices of shape were possible.
-
• Topography. We have not found including topography in the calculation of area to be a significant source of variation, assuming the use of low-distortion map projections.
-
• Border coalignment. Coalignment of borders is a concern, although the effect was small in our data. To avoid problems, datasets used in an analysis should always be at the same resolution and carefully coaligned during their creation. In the U.S., Census data satisfy these requirements.
-
• Floating-point considerations. We have not found the choice of single- or double-precision floating-point representations to be a significant source of variation in our calculations.
-
• Transparency. A compactness score should not be accepted and cannot be interpreted without knowing the steps that went into its creation. From a scientific standpoint, this consideration relates strongly to reproducibility: We cannot trust what we cannot reproduce. Therefore, documentation is needed down to the equation level, and the release of source code and data is critical (N. Barnes Reference Barnes2010; Merali Reference Merali2010; Ince, Hatton, and Graham-Cumming Reference Ince, Hatton and Graham-Cumming2012). FAIR principles should be adhered to (Wilkinson et al. Reference Wilkinson2016).
More broadly, while compactness measures are attractive as quantitative means for analyzing districts, they are just a few of the many tools used to combat gerrymandering. Many other quantitative techniques and statistical measures are appearing in the academic literature and in practice. These can measure not only geometry, but also the effects of demography, voting patterns, and other relevant information. Used together, these scores provide a more complete picture of the consequences of choosing one plan over another. However, they are subject to the same instabilities and potential for abuse identified above. That is, the need for clearly defined and well-understood quantitative criteria for assessing districts and plans extends far beyond geographical issues and should be a central point of discussion while considering new standards or legislation.
3.2 Policy Implications
While the U.S. court system has declared that egregious gerrymandering is unconstitutional (Supreme Court of the United States 1986; United States Federal Courts 2016; Supreme Court of Pennsylvania 2018), they have not yet adopted a quantitative standard by which districts can be judged. In Vieth v. Jubelirer (2004), the Supreme Court left open the possibility that a “workable standard” might exist (Supreme Court of the United States 2004), but more recently the Court has shown skepticism saying that, “partisan gerrymandering claims present political questions beyond the reach of the federal courts” (Rucho v. Common Cause, 2019). This paper demonstrates that any standard must be specified precisely and carefully, since differences in interpretation can have large effects on scores. Furthermore, our work demonstrates that even a well-specified standard may judge unreasonable districts as being reasonable (see Figure 17). Therefore, any legally mandated standard of compactness should leave open the possibility of challenges. Moreover, given the implementation flexibility discussed here and its potential for abuse, courts should not accept quantitative arguments unless the code used to build those arguments is made publicly accessible and inspected by experts.
4 Coda
Geometric compactness can be used as a tool to help detect and quantify gerrymandering. However, numerous engineering and implementation decisions must be made to calculate this quantity. The same is true of other such measurements. Whether used unintentionally or maliciously, this flexibility has strong bearing on the quality of measurements and can be leveraged to shape conclusions about the suitability of a districting plan. A measurement cannot be trusted unless complete information about its implementation is available.
Implementation flexibility, such as that discussed in this paper, has the potential to affect any method of measurement (Ioannidis Reference Ioannidis2005; Gelman and Loken Reference Gelman and Loken2013). Alternative ways of measuring the shapes of districts such as discrete geometries (Duchin and Tenner Reference Duchin and Tenner2018) or multivalued scores (DeFord et al. Reference DeFord2018) may be more resistant to such problems, but further investigations are needed to ensure that these methods are stable while still providing meaningful measurements.
Beyond providing “best practices” for implementing compactness standards, we intend the open source software accompanying this paper as a first step toward fair and accurate compactness measurement, allowing scientists, politicians, and the public to evaluate plans using reproducible, mathematically well-founded, and computationally stable tools.
Funding
US Department of Energy Computational Science Graduate Fellowship (DE-FG02-97ER25308) to RB; MIT Research Support Committee (“Structured Optimization for Geometric Problems”) to JS; Army Research Office (W911NF12-R-0011) to JS; National Science Foundation (IIS-1838071) to JS; Amazon Research Award to JS; National Science Foundation (ACI-1053575) to RB.
Acknowledgments
The open source software described here had its genesis in the Geometry of Redistricting workshop held at Tufts University August 7–11, 2017. John Connors helped develop the mandeR package. Max Gardner, Aaron Dennis, Daniel McGlone, and Ariel M’ndange-Pfupfu helped develop the python-mander package. Ariel M’ndange-Pfupfu and Vanessa Archambault helped develop the QGIS plugin. Computation and data utilized XSEDE’s Comet supercomputer (Towns et al. Reference Towns2014), which is supported by the NSF (Grant No. ACI-1053575). Travel funding for RB and research support for JS was provided by a Prof. Amar G. Bose Research Grant and an Amazon Research Award. In-kind support was provided by Isaac B., Hannah J., Kelly K., Vivian L., and Jerry W.
Data Availability Statement
Replication code for this article has been published in Code Ocean, a computational reproducibility platform that enables users to run the code and can be viewed interactively at https://doi.org/10.24433/CO.0469487.v1. A preservation copy of the same code and data can also be accessed via Dataverse at https://doi.org/10.7910/DVN/B8JYZW.