Using Graphs Instead of Tables in Political Science

Jonathan P. Kastellec; Eduardo L. Leoni

doi:10.1017/S1537592707072209

Using Graphs Instead of Tables in Political Science

Published online by Cambridge University Press: 28 November 2007

Jonathan P. Kastellec and

Eduardo L. Leoni

Show author details

Jonathan P. Kastellec: Affiliation:
Columbia University, E-mail: jpk2004@columbia.edu
Eduardo L. Leoni: Affiliation:
Columbia University, E-mail: eleoni@hmdc.harvard.edu

Article contents

Abstract
The Use of Tables versus Graphs in Political Science
Using Graphs Instead of Tables: Descriptive Statistics
Using Graphs Instead of Tables: Regression Analyses
Conclusion
References

Rights & Permissions

Abstract

When political scientists present empirical results, they are much more likely to use tables than graphs, despite the fact that graphs greatly increases the clarity of presentation and makes it easier for a reader to understand the data being used and to draw clear and correct inferences. Using a sample of leading journals, we document this tendency and suggest reasons why researchers prefer tables. We argue that the extra work required in producing graphs is rewarded by greatly enhanced presentation and communication of empirical results. We illustrate their benefits by turning several published tables into graphs, including tables that present descriptive data and regression results. We show that regression graphs emphasize point estimates and confidence intervals and that they can successfully present the results of regression models. A move away from tables towards graphs would improve the discipline's communicative output and make empirical findings more accessible to every type of audience.Jonathan P. Kastellec (jpk2004@columbia.edu) and Eduardo L. Leoni (eleoni@hmdc.harvard.edu) are Doctoral Candidates in Political Science at Columbia University. The authors' names appear in alphabetical order. They would like to thank Andrew Gelman, Rebecca Weitz-Shapiro, Gary King, David Epstein, Jeff Gill, Piero Stanig, and three anonymous reviewers for helpful comments and suggestions. We also thank Noah Kaplan, David Park, and Travis Ridout for generously making their data publicly available. Eduardo Leoni is grateful for support from the Harvard MIT Data Center, where he was a fellow while working on this project.We have created a web site, http://tables2graphs.com, that contains complete replication code for all the graphs that appear in this article, as well as additional graphs that we did not present due to space limitations.

Type: Research Article
Information: Perspectives on Politics , Volume 5 , Issue 4 , December 2007 , pp. 755 - 771

DOI: https://doi.org/10.1017/S1537592707072209 [Opens in a new window]
Copyright: © 2007 American Political Science Association

While political science is a diverse field whose practitioners employ a variety of methodologies and tools, a significant portion of the discipline's output includes the study of data and drawing inferences from statistical analyses. As such, the conclusions one draws from political science papers, books, and presentations often hinge on the successful communication of the data a researcher is using and the inferences she is drawing from them. Yet, much more often than not, political scientists choose to present empirical results in the form of tables rather than using graphical displays, a tendency that weakens the clarity of presentation and makes it more difficult for a reader to draw clear and correct inferences.

In this paper we seek to highlight the discipline's reliance on tables and to offer suggestions for how to use graphs instead of tables to improve the presentation of empirical results. Six years ago, King et al.'s influential paper urged social scientists to present quantities of interest rather than parameter estimates from statistical analyses.¹

King et al. 2000.

Our paper follows up on this effort. We seek to move beyond what researchers should communicate to their audience by offering suggestions on how they should do so.²

We should note that King et al. (2000) did implicitly urge researchers to use graphs by presenting their results mainly in graphical displays; their main focus, however, was on call on researchers to use quantities of interest rather than on how to communicate these quantities.

Other scholars have made similar recommendations.³

See e.g. Bowers and Drake 2005, Epstein et al. 2006, Epstein et al. 2007 and Gelman et al. 2002.

But, as we show, political scientists are not heeding the advice to use graphs.⁴

This neglect may be due in part to the fact that this work is likely to reach only a small subset of the discipline or is narrow in focus. Epstein et al. 2006, and Epstein et al. 2007, which are aimed at legal researchers, appear in the Vanderbilt Law Review and are likely to be seen only by political scientists who study law and courts. Bowers and Drake 2005 does appear in a political science journal (Political Analysis) but its focus is on using exploratory graphical displays to improve inferences drawn from multilevel models and not the general use of graphs instead of tables. Finally, the main inspiration for our paper—Gelman et al. 2002—was written by and for statisticians, and hence is unlikely to have been seen by many political scientists.

Following the example of Gelman et al., we went through every article from five issues of three leading political science journals—the February and May 2006 American Political Science Review, the July 2006 American Journal of Political Science and the Winter and Spring 2006 issues of Political Analysis⁵

We examined only one issue of the AJPS because of the large number of papers in that issue relative to the other two journals.

—and counted the number of tables and graphs presented in each.⁶

Gelman et al. 2002.

We also analyzed the basic characteristics and purpose of each table and graph to get a sense of how researchers use them to communicate empirical results.

This undertaking led to two main conclusions. First, political scientists rely on tables far more than graphs—twice as often, in fact. Second, tables are used mainly to present data summaries and the results of regression models. Indeed, tables presenting parameter estimates and standard errors comprised about 50 percent of the tables in our sample. In addition, we found that political scientists never use graphs to present regression results.

Our goal in this paper is to demonstrate directly how researchers can use graphs to improve the quality of empirical presentations. Unlike previous attempts to promote the use of graphs, we devote a significant portion of our analysis to showing how graphs can greatly improve the communication of regression results, which are almost always presented in tables whose features can strain even the most seasoned journal reader. Rather than presenting an abstract review of the benefits of graphs, we take a sample of tables from the various journal issues and turn them into graphs, showing that it is possible and desirable to do so for any table that presents numeric information, including data summaries and parameter estimates. We show that graphs better communicate relevant information from both data summaries and regression models, including comparing values across variables or models and the sign and significance of predictors. We argue that while graphs are almost never used to present regression results, the benefits from doing so are significant. In particular, graphs are superior at displaying confidence intervals for parameter estimates (and thus their uncertainty) and for making comparisons across models. We believe that scholars who follow our advice will both understand their data better and present their empirical results more clearly to their audience, thereby increasing the value and impact of their research.

The Use of Tables versus Graphs in Political Science

Before presenting examples of using graphs instead of tables, it is useful to examine when and how political scientists currently use each. The five issues we looked at contained 52 articles, 40 of which presented at least one table or graph. These 40 articles contained 150 tables and 89 graphs, a roughly 2-to-1 ratio.⁷

One point of comparison for this measure can be found in Gelman et al. 2002; the issue of the Journal of the American Statistical Association (March 2000) that they analyzed contained 72 graphs and 60 tables.

To understand the motivation of political scientists in presenting empirical results, we coded the type of information conveyed by each, such as summary statistics, parameter estimates, and uncertainty, and predicted values. Figure 1 presents both the frequency with which each type of information appears (in either tabular or graphical form), along with the percentage of graphs that are used within each category. (More detailed information on our coding can be found in the caption.)

Tables and graphs in political science journals.

The left graph depicts the percentage of all graphs and tables presented in five political science journals that fall into the categories on the y-axis (i.e., the number of graphs and tables that fall into each category divided by the total number of tables and graphs); the right graph depicts the percentage of graphs within each category (i.e., the number of graphs in each category divided by the total number of tables and graphs in the respective category). “Estimates and uncertainties” include such quantities as regression coefficients and standard errors; “Summary statistics” include descriptive statistics like means and standard deviations; “Predicted values” include post-regression estimations such as changes in predicted probabilities; “Non-numeric” includes any information that is not quantitative; “Mathematical” generally includes figures from formal models; finally, “other” is a residual category. The plots show that summary stats and estimates and uncertainties comprise the majority of graphical and tabular presentation—while the former are sometimes displayed graphically (about 40% of the time in our sample), the latter are always displayed as tables.

The most striking findings center around the presentation of regression results, which comprise more than 30 percent of all tables and graphs combined. We find that more than half the tables in our sample were used to present such results—that is, point estimates and uncertainty, usually accompanied by some combination of asterisks, bold typeface, or letters to indicate statistical significance.⁸

For clarity, we distinguish these results from “quantities of interest” such as changes in predicted probabilities while still recognizing that the latter result from regression analyses.

In addition, not a single graph presented in the five issues we studied communicated regression results. Clearly political scientists are of the belief that tables are the most effective way—it seems, in fact, the only way—to present point estimates and uncertainty.

We turn next to summary statistics, which include quantities such as means, standard deviations, and frequency summaries. These types of statistics comprised 32 percent of all the graphs and tables in our sample, or roughly the same percentage as regression results. Given that the traditional use of statistical graphics focuses on data summaries,⁹

See e.g. du Toit et al. 1986, Cleveland 1993, and Jacoby 1997.

we might expect researchers to use graphs frequently to present summary statistics. Nevertheless, the authors in our sample did so only about 40 percent of the time, choosing to use tables more often than not.

On the other hand, our results show that political scientists overwhelmingly use graphs to present post-estimation results, such as predicted probabilities. While the reasons for this contrast are unclear, it appears that researchers are comfortable presenting these quantities in graphical form.

Nevertheless, the results are unambiguous: political scientists are far more likely to use tables than graphs, except when presenting post-estimation results. And they never (at least in our sample) use graphs when presenting regression results.

Why Tables?

It is not difficult to discern why researchers choose to present empirical results using tables. Compared to graphs, tables are much easier to produce. In fact, it is often possible to convert statistical output automatically into a typeset-quality table using a single command.¹⁰

STATA, for example, has user-written commands available (e.g., estout, by Jann 2005) that convert regression output to a table in L^AT_EX or text format.

In addition, tables are standard in teaching, presentation, and publishing, thereby providing incentives for scholars to continue producing them. Finally, since tables communicate precise numbers, they are valuable for aiding replication studies (a point we return to in our conclusion).

At the same time, it is easy to understand why researchers are reluctant to use graphs. For one, it simply takes more work to produce graphs. With current software, greater knowledge of the nuances of the statistical/graphical packages is needed to produce effective graphs.¹¹

We prepared the graphs in this paper using the R (R Development Core Team 2006) statistical environment. While we used the base graphics package for the majority of graphs, in figure 5 we used Sarkar 2006's implementation of Trellis (Cleveland 1993, Cleveland et al. 1996) graphics in R and in Figure 8 we used the grid package. For an excellent introduction to R graphics that includes discussion of base, grid and lattice graphics, see Murrell 2006.

More importantly, creating informative statistical graphs involves repeated iterations, trial-and-error and much thought about both the deeper issue of what message the researcher is trying to convey and the practical issue of producing a graph that effectively communicates that message. This process can be quite time-consuming; simply put, it takes much greater effort to produce a quality graph than a table.¹²

Which is not to say that the quality of tables in political science and elsewhere could not be improved. For advice in constructing good tables, see chapter 10 in Wainer 2000.

Another reason why researchers hesitate to use graphs may be their belief that it is simply not feasible to present certain information graphically. Relatedly, some may believe that graphs take up much more space than tables. Both of these concerns are likely to be paramount particularly with respect to regression tables, which can include multiple models that may involve various combinations of variables, observations, and estimation techniques. Researchers may believe it impossible to present results from regressions graphically.

Why Graphs?

We argue, however, that the costs of producing graphs are outweighed by the benefits, and many of the concerns regarding their production are either overstated or misguided altogether. While producing graphs does require greater effort, the very process of graph creation is one of the main benefits of using graphs instead of tables in that it provides incentives for the researcher to present the results more directly and cleanly. Like Gelman et al., we struggled with several versions of each graph presented in this paper before settling on the versions that appear.¹³

Gelman et al. 2002.

While at times frustrating, the iterative process forced us to carefully consider our communications goals and the means of accomplishing them with each graph. Such iteration, of course, is not needed with tables. Thus, the very strength of tables (their ease of production) can also be seen as a weakness.

In addition, concerns about the infeasibility of graphs when presenting certain numeric summaries and about the size of graphs relative to tables are unwarranted. As we illustrate, it is not only possible to present regression results—including multiple specifications—in graphical form, but it is desirable as well. In addition, most of our graphs take up no more room than the tables they replace, including regression tables. And for those that do, we believe the benefit of graphical presentation outweighs the cost of greater size.

Once performed, the extra work put into producing graphs can reap large benefits in communicating empirical results. Extensive experimental research has shown that when the presentation goal is comparison (as opposed to communicating exact values, for which tables are superior), good statistical graphs consistently outperform tables.¹⁴

See e.g., Cleveland 1993, Jacoby 1997, ch. 1, and the studies cited in Gelman et al. 2002, 121–2.

Our goal in this paper is not to add to the voluminous literature systematically investigating the virtues of graphs versus tables. Rather, our approach is practical; we take a sample of representative tables from political science journals, present them graphically, and then qualitatively compare the two. We believe that these examples illustrate that graphs are simply better devices than tables for making comparisons, which is almost always the goal when presenting data and empirical results. And for those who are convinced, we have created a web site (www.tables2graphs.com) that contains complete replication code for producing our graphs, which can help researchers turn their tables into graphs.

Using Graphs Instead of Tables: Descriptive Statistics

We begin our conversion of tables into graphs by analyzing tables with descriptive statistics. Although most of the literature on statistical graphics deals with exploratory data analysis and descriptive statistics, political scientists still choose more often than not to present such information in the form of tables (64 percent of the time in our sample, as seen in figure 1).

When assessing the use of graphs or tables, it is useful to consider why researchers might present descriptive statistics. If the goal is to facilitate replication, and hence allow follow-up researchers to be confident that they are using the same data as the original analysis, then tables are indeed superior. But if the goal is to give an audience a sense of the data in order to lay a foundation for subsequent statistical analyses, then we believe that graphing descriptive statistics is a superior choice. For one, graphs allow for easy comparison of variables, which can be important for assessing regression analyses. Graphs, as we demonstrate, also make it much easier to compare variables across two or more settings, such as different time periods, institutions and countries. Finally, graphs of descriptive statistics also provide a better summary of the distribution of variables.

Using a Mosaic Plot to Present Cross Tabulations

We begin with a table presented in Iversen and Soskice, whose study of redistribution in advanced democracies includes nine tables, seven of which present numerical information.¹⁵

Iversen and Soskice 2006.

Of these, five present descriptive statistics and two present regression results. It is commendable that the authors chose to present in detail much of the data used; notably, however, they chose not to use a single graph in their article.

Their first table (reproduced in our table 1) presents a cross tabulation of electoral systems and government partisanship for a sample of advanced democracies. The key comparison here is whether majoritarian electoral systems are more likely to feature right governments and proportional representation systems are more likely to feature left governments, a comparison presented in numeric form in the last column. The raw numbers that go into this comparison are presented in the main columns. Although the information in this 2-by-2 table is relatively easy to digest, we think that the same information can be more clearly and succinctly presented by using a mosaic plot, a type of graph that is specifically designed to represent contingency tables graphically.¹⁶

Hartigan and Kleiner 1981, 1984, Friendly, 1994, 1999.

The top plot depicts the relationship between electoral system and partisanship of government, while the bottom plot depicts how often countries featured center-left governments at least 50 percent of the time, broken down by their electoral systems. A key feature of the mosaic plot (in contrast to a standard bar plot, for example) is that the area of each rectangle is proportional to the number of observations that fall within their respective contingencies. Thus, the larger width of the rectangles under proportional systems indicates that the majority of countries in the sample have such systems.

Iversen and Soskice 2006, table 1: Electoral system and the number of years with left and right governments (1945–98)

Using a Mosaic Plot to Present Cross Tabulations.

Table 1 from Iversen and Soskice (2006) displays a cross-tabulation of electoral systems and government partisanship. We turn the table into a two-dimensional mosaic plot. The top plot depicts the relationship between electoral system and partisanship of government, while the bottom plot depicts how often countries featured center-left governments at least 50% of the time, broken down by their electoral systems (defined as an “overweight” in the text of the paper), also across proportional and majoritarian systems. A key feature of the mosaic plot is that the area of each rectangle is proportional to the number of observations that fall within their respective contingencies. The mosaic plot clearly displays the key comparisons while retaining the actual counts. Titles above each graph make it clear to the reader what is being compared, unlike in the table.

The first thing to note about the graph is that the main comparison the authors are attempting to make immediately stands out, as the graph shows that proportional systems are significantly more likely to produce left governments, and that nearly every country with a proportional system featured center-left governments more than 50 percent of the time from 1945–1998 (defined as an “overweight” in their paper), while no countries with a majoritarian system featured center-left governments more than 50 percent of the time. We add the raw counts from each category inside the rectangles; the use of a mosaic plot allows us to combine the best feature of a table—its ability to convey exact numbers—with the comparative virtues of a graph. While actual values are frequently not of interest (and hence do not need to be displayed in a graph), the inclusion of counts here alerts the reader that small samples are involved in the calculation of countries with an overweight of left countries without either adding unnecessary clutter to the graph or increasing its size. Finally, the titles above each plot make it clear what is being plotted, while one has to read the actual text in the original article to understand the table.

This example illustrates that even simple and easy-to-read tables can be improved through graphical presentation. As the complexity of a table grows, the gains from graphical communication will only increase. Indeed, mosaic plots can easily be extended to display multidimensional contingency relationships.¹⁷

Friendly 1994.

Using a Dot Plot to Present Means and Standard Deviations

Another common type of table is one presenting descriptive statistics about central tendencies (e.g., means) and variation (e.g., standard deviations). As we discuss below, presenting only means and standard deviations (along with minimums and maximums) may not be the best choice, depending on the nature of the variables involved. However, even when they are sufficient, it can be difficult to make comparisons across variables and to inspect the distribution of individual variables when data is presented tabularly. Graphs, on the other hand, accomplish both goals.

As an example, we turn to Table 1, panel A, from McClurg (reproduced in our table 2), which presents summary statistics from his study of the relationship between social networks and political participation.¹⁸

McClurg 2006.

We transform the table into a modified dot plot, which is very well suited for presenting descriptive information.¹⁹

Cleveland 1993, Jacoby 2006.

We use a single plot, taking advantage of the fact that the scales of all variables are similar. The dots depict the means of each variable, the solid line extends from the mean minus one to the mean plus one standard deviation, and the dashed line extends from the minimum to the maximum of each variable. Rather than ordering the variables randomly or alphabetically, we order them by their mean values, in descending order, which eases comparison of variables.²⁰

Numerical ordering is especially important when presenting the values of an individual variable, since it in effect presents its empirical distribution. See Wainer (2001), Wainer (2005, ch. 11–12) and Friendly and Kwan (2003) for discussions on graphical ordering.

Because the number of respondents covered under each variable is not a feature of the variables themselves, we note them under the labels on the y-axis (if we were interested in comparing sample sizes across variables, we could either include a separate graph or make the area of the dots proportional to sample size). The benefits of the graph are apparent: the dots allow for easy lookup and comparison of the means, while the lines visually depict the distribution of each variable. For example, the graph reveals that political talk is right-skewed, which is not easily concluded from the table.

McClurg 2006, table 1 (panel A): The political character of social networks

Using a Single Dot Plot to Present Summary Statistics.

Table 1 (panel A) from McClurg (2006) presents descriptive statistics from his study of social networks. We turn it into a graph by using a single dot plot, taking advantage of the fact that the scales of all variables are similar. The dots depict the means of each variable, the solid line extends from the mean minus one standard deviation to the mean plus one deviation, and the dashed line extends from the minimum to the maximum of each variable. The number of respondents covered under each variable is given under the y-axis labels. The variables are ordered according to their mean values, in descending order. The graph allows for easy lookup and comparison of the means, while the lines visually depict the distribution of each variable.

Using Dot Plots and Violin Plots to Present Distributions of Variables with Differing Scales

In many instances, it will be worthwhile to go beyond simple descriptions of variables and present more complete summaries of distributions. As Cleveland and McGill note, “Graphing means and sample standard deviations, the most commonly used graphical method for conveying the distributions of groups of measurements, is frequently a poor method. We cannot expect to reduce distributions to two numbers and succeed in capturing the widely varied behavior that data sets in science can have.”²¹

Cleveland and McGill 1985, 832.

For an example of when it is useful to depict fuller representations of variables, we turn to table 2 of Kaplan et al. (reproduced in our table 3), which displays summary statistics from their analysis of issue convergence and campaign competitiveness.²²

Kaplan, Park, and Ridout 2006.

This table presents a challenge for graphical representation that did not arise in the previous example: the scales of the variables differ dramatically. A quick check of the table reveals that the mean, variance, and maxima differ significantly across variables; the mean issue convergence and percent negative ads, for example, are much higher than the other variables. Including all the variables in a single graph would result in severe compression in a majority of the variables, rendering the graph uninformative.

Kaplan et al. 2006, table 2: Descriptive statistics of campaign and issue-level variables

Scale difference is likely to occur in many datasets, and such situations will require the analyst to be creative about the best way to present her data. We suggest two options, the first of which we pursue here. Instead of using a single graph, we decided to group similar variables and present separate graphs for each group, allowing for comparisons within each graph.

There are three main categories of variables in Kaplan et al.'s table 2: binary variables, those measured in millions, and those measured in percents (including issue convergence, which is defined as the percentage of combined attention that Democratic and Republican candidates give to a particular issue). Since competitiveness and issue salience do not fall into any of these categories, we group them with similarly distributed variables.

We begin the construction of our graph, presented in figure 4, by considering what type of display is best suited for each group of variables. Because binary variables can take on only two values, their distribution is fully characterized by their means and sample size. Accordingly, in the top panel, we plot the means of the binary variables, ordering the variables in descending order and placing sample size in the y-axis labels. (We use diamonds to distinguish them from the median values plotted in the two panels below.) This allows for clear comparison of how often each variable is coded as “1.”

Using a Dot Plot and Violin Plots to Present Summary Statistics.

Table 2 from Kaplan, Park and Ridout (2006) presents summary statistics from their study of issue convergence and campaign competitiveness. We group similar variables and present separate graphs for each group, allowing for comparisons within each graph and greater understanding of variable distributions. The top graph plot the means of the binary variables, ordering the variable in descending order and placing sample size in the y-axis labels. For percentage variables and variables measured in millions, we present individual violin plots of each variable. The points depict median values, the white boxes connect the 25th and 75th percentiles and the thin black lines connect the lower adjacent value to the upper adjacent value. The shaded area depicts a density trace, plotted symmetrically above and below the horizontal box plot. The violin plots clearly depict the distribution of each variable and allow for the detection of skewness and outliers.

We next consider the variables measured in percents and those measured in millions. While on different scales, both sets of variables are continuous. A range of options exist for graphing the distribution of continuous variables, including histograms, kernel density plots, and boxplots.²³

On kernel density plots see Chambers et al. 1983; on boxplots see Tukey 1977.

A recent graphical innovation called the “violin plot” combines the virtues of the latter two by overlaying a density trace onto the structure of a box plot.²⁴

Hintze and Nelson 1998.

The resulting plot presents both central tendencies and detailed information on the distribution of variables, including whether they are skewed and the presence of outliers. For each set of variables, we present a panel including violin plots for each variable. The center of each violin plot gives information similar to that presented in a traditional box-plot: the points depict median values, the white boxes connect the 25th and 75th percentiles and the thin black lines connect the lower adjacent value to the upper adjacent value, which help identify skewness or the presence of outliers or both.²⁵

Letting t = 1.5 (75th percentile–25th percentile), the lower adjacent value equals the smallest observation greater than or equal to the 25th percentile—t; the upper adjacent value equals the largest observation less than or equal to the 75th percentile + t. The presence of many values below and above the lower and upper values can be caused by either outliers or skewness (Cleveland and McGill 1985, 832).

The shaded area depicts a density trace of each variable, which is plotted symmetrically above and below the horizontal box plot in order to aid visualization. As with a kernel density plot, a hill distribution (such as “State Voting Age Pop.” in figure 4) indicates normality, while a rectangle distribution (such as “Competitiveness”) indicates a more uniform distribution.

The violin plots reveal several characteristics of the data that are not apparent from the table. First, many of the variables exhibit substantial skewness. For instance, while the median of both “Issue Convergence” and “Issue Salience” are 0, their tails extend well to the right. In addition, the presence of observations well beyond the upper adjacent values of “Total Spending/Capita” and “Difference Spending/Capita” indicates the existence of outliers (indeed, there is only one observation of the latter that is greater than four).

These are details that one would be hard-pressed to glean from a table, even one that went beyond reporting only means and standard deviations. Greater understanding of distributions can aid researchers as they move from exploratory data analysis to model fitting. And, as these violin plots illustrate, they can increase the reader's understanding of the data that is presented and analyzed.

Using an Advanced Dot Plot to Present Multiple Comparisons

The second option for graphing descriptive plots of variables with different scales is to alter the scales themselves. We illustrate this option by converting table 2 from Schwindt-Bayer's study of the attitudes and bill initiation behavior of female legislators in Latin America (reproduced in our table 4) into a graph.²⁶

Schwindt-Bayer 2006.

Most of the rows of the table are devoted to displaying the number of bills introduced by lawmakers in four Latin American legislative chambers across two time periods (which vary by chamber). While the text of the paper does not explicitly state what comparisons the reader should draw from the table, there are three main comparisons one could make: across issue areas, across countries, and across time periods. The structure of the table, however, only allows for easy comparison of issue areas—for a single country and a single time period. The use of absolute counts for the number of bills introduced in each issue area instead of percentages hinders cross-column comparisons (i.e. across countries and time periods). Using percentages would improve the presentation, but would still place the burden on the reader to find patterns in the data.

Schwindt-Bayer 2006, table 2: Number of bills sponsored in each thematic area

Instead, we turn the table into an advanced dot plot, presented in figure 5. We begin by converting the counts of each bill introduced in each issue area into a proportion, with the total number of bills initiated serving as the denominator. For each chamber and for each issue area, we present the proportions for both the first and second time periods using different symbols: open circles for the first period, “+”s for the second. (Because the “number of legislators who sponsored at least one bill” and the “total number of bills initiated” seem tangential to the table, we include them only as counts under the name of each country or chamber in the panel strips.) One option, of course, would be simply to scale the x-axes on a linear proportion scale, ranging from 0 to 1; this, however, would mask differences among several issue areas for which relatively few bills were introduced.²⁷

For each chamber and time period, “other bills” constitute a majority of all bills. Another option would be to exclude these bills and use the sum of the other seven issue areas as the denominator.

Instead, we follow the advice of Cleveland and scale the x-axes on the log 2 scale, and provide for easy lookup by placing tick mark labels at both the top and bottom of the graph.²⁸

Cleveland 1985, 4–5.

Given this scaling, each tick mark (indicated vertically by the solid gray lines) represents a proportion double that of the tick mark to the left. Looking at the Colombia Senate, for example, we can see that the proportion of bills related to health issues declined by half from the 1994–1998 period to the 1998–2002 period. We can also compare easily across issue areas: in Argentina in 1999, for example, it is easy to see that twice as many fiscal affairs bills were introduced as women's issues bills.

Using an Advanced Dot Plots to Present Proportions.

Table 2 from Schwindt-Bayer (2006) presents counts of the type of bills initiated in four Latin American legislative chambers in two time periods each. We turn the table into a graph that allows for comparisons across time periods, chambers and issue areas. For each country or chamber, the “o's” indicate the percentage of all initiated bills pertaining to the issue area listed on the y-axis in the first period, while the “+'s” indicate the percentages from the second period. The grey strips indicate the country or chamber analyzed in the panel immediately below, along with the relevant time periods. For each period, the first count in parentheses gives the number of legislators who sponsored at least one bill, while the second count gives the total number of bills initiated. We scale the x-axis on the log 2 scale: each tick mark is double the percentage of the tick mark to its left.

The graph thus allows for all three types of comparisons. The use of dual symbols allows for easy comparison both across time periods within a given issue area and across issue areas within a single time period. The graph also facilitates cross-legislature comparison by placing all plots in a single column; by reading vertically down the graph, for example, we can see that proportionally more bills pertaining to women's issues were introduced in the Colombia Senate than the Colombia Chamber from 1998–2002. This plot, from the choice of scaling to the tool we utilized, follows the principles outlined in Cleveland, showcasing the ability of Trellis plots to allow comparisons within and across panels.²⁹

Cleveland 1993. For excellent examples of dot plots, sorting, and general guidance on how to use R and lattice graphics to create dot plots, see Jacoby 2006.

The technique of using several smaller graphs instead of one big graph—coined “small multiples” by Tufte—can greatly enhance the presentation of data and empirical results in a wide range of situations.³⁰

Tufte 1983.

In summary, we believe that these examples demonstrate the benefits of graphing descriptive statistics. Of course, there are many other types of descriptive data summaries, such as correlation matrices and time series, that will require different graphing strategies.³¹

See the paper's web site for examples of these types of graphs.

Our purpose is not to exhaust all of these, but to illustrate how descriptive tables can be turned into graphs with great benefit.

Using Graphs Instead of Tables: Regression Analyses

On Regression Tables and Confidence Intervals

Regression tables are meant to communicate two essential quantities: point estimates (coefficients) and uncertainty estimates (usually in the form of standard errors, confidence intervals, or null hypothesis tests). In our sample, 74% of the regression tables presented standard errors, usually supplemented by asterisks labeling coefficients that have attained conventional levels of statistical significance. Thus, tables typically seek to draw attention to coefficients that are significant at the p < .01, p < .05 or p < .10 levels.

This tendency likely stems from the discipline's reliance on null hypothesis significance testing in conducting regression analyses. Articles in fields as diverse as wildlife management, psychology, medicine, statistics, forecasting, and political science argue that null hypothesis significance testing and reliance on p-values can lead to serious mistakes in statistical inference.³²

Johnson 1999 for wildlife management; Schmidt 1996 for psychology; Sterne et al. 2001 for medicine; Gelman and Stern 2006 for statistics; Armstrong 2007 for forecasting; Gill 1999 for political science.

These articles suggest substituting confidence intervals for p-values and significance testing when presenting regression results.

These criticisms notwithstanding, it is likely that political scientists will continue to rely on null hypothesis significance testing. Given this reality, can graphs improve on standard regression tables? We believe the answer is yes, due to the simple fact that graphs of point estimates and confidence intervals can communicate the same information as standard regression tables while adding the virtues of emphasizing effect size and easy comparison of coefficients. A confidence interval effectively presents the same information as a null hypothesis significance test; for example, a 95 percent confidence interval that does not include the null hypothesis (typically zero) is equivalent to a coefficient being statistically significant at p < .05.³³

Greene 1997, 158.

In addition, “confidence intervals have a great virtue: as the sample size increases the size of the interval decreases, correctly expressing our increased certainty about the parameter of interest.”³⁴

Gill 1999, 663.

Of course, it is possible to present confidence intervals in regression tables.³⁵

Indeed, Political Analysis advises authors that when constructing tables, “[i]n most cases, the uncertainty of numerical estimates is better conveyed by confidence intervals or standard errors (or complete likelihood functions or posterior distributions), rather than by hypothesis tests and p-values” (Political Analysis: Information for Authors, 2007).

Notably, however, none of the tables in our sample featured confidence intervals. While the discipline's reliance on null hypothesis testing is perhaps the main cause of this tendency, the practicality of incorporating confidence intervals may also play a role: it is simply difficult to present such tables, in particular when multiple specifications or subsamples are compared (as occurred in about 88 percent of the regression tables in our sample). Consider table 4 of Ansolabehere and Konisky, which we replicate in table 5 (we present the graphical version of the table later in this section).³⁶

Ansolabehere and Konisky 2006.

The authors estimate the effect of voter registration laws on county-level turnout in New York and Ohio, and do robustness checks by presenting six different models, as seen in columns one through six in the table. Note that there are about three dozen coefficients and standard errors pairs to compare.

Ansolabehere and Konisky (2006), table 4: Registration effects on turnout in New York and Ohio counties, fixed effects model, 1954–2000

Would the use of confidence intervals improve interpretation? In table 6, we present the same table, but instead of standard errors and asterisks we use confidence intervals. Although inferences are more direct when estimates are presented in this form, it gets confusing quickly when comparing across models. Even a simple question, such as which confidence intervals overlap, demands careful attention to signs, and comparisons must be done one by one.

Presenting regression results with confidence intervals

These difficulties point to a main advantage of the standard regression table: it is less confusing than the alternative. There are others: it is clear from the table which independent variables are present or missing in each model. In addition, information such as model fit statistics and the number of observations can easily be added to the table. In summary, regression tables are able to display a large wealth of information about the models in a very compact and mostly readable format. They also clearly communicate the results of null hypothesis significance testing. It is no surprise that they are so popular.

Could graphs do a better job? As we illustrate below, graphs can easily display point estimates and confidence intervals, including those from multiple models. In doing so, they can clearly convey the results of null significance hypothesis testing. And once we move beyond the sole consideration of whether a coefficient is significantly different from zero, we believe the benefits of graphs become even more apparent, allowing for the proper highlighting of effect size and comparison of coefficients both within and across models.

In summary, as we move to graphing regression results, and given the aforementioned strengths and weaknesses of tables, we look for a good regression graph to satisfy the following criteria:

It should make it easy to evaluate the statistical significance of coefficients;
It should also be able to display several regression models in a parallel fashion (as tables currently do);
Relatedly, when models differ by which variables are included, it should be clear to the reader which variables are included in which models;
It should be able to incorporate model information;
Finally, the plot should focus on confidence intervals and not (only) on p-values.

Plotting a Single Regression

We start by plotting a simple regression table. Table 2 from Stevens et al. (reproduced in our table 7) displays results from a single least squares regression.³⁷

Stevens et al. 2006.

Using a survey of elites from six Latin American countries, the authors study the effects of economic perceptions, ideology and demographic variables and a set of country dummies on the survey respondent's “individual authoritarianism,” measured on a seven-point scale. The table condenses a large wealth of information: regression fit summaries, the number of cases, point estimates, standard errors, significance tests for multiple comparisons among the country dummies, and asterisks denoting .01, .05 and .10 two-tailed p-values.

Stevens et al. 2006, table 2: Determinants of authoritarian aggression

In figure 6 we condense the same information into a simple dot plot, much like those used in the previous section. We take advantage of the similar scaling across the estimates and display the results in a single plot. The dots represent the point estimates, while the horizontal lines depict 95 percent confidence intervals.³⁸

We could, of course, indicate any number of confidence intervals (using vertical tick marks or thin and thick lines, for example) if desired, including intervals analogous to commonly reported 90 percent significance levels (i.e. when p < .1). See Cleveland and McGill 1985, figure 5, for an example using tick marks to report 50 percent intervals and the paper's website for a version of figure 6 that plots 90 percent and 95 percent confidence intervals.

We place a vertical line at zero for convenience, and make the length of the x-axis symmetric around this reference line for easy comparison of the magnitude of positive and negative coefficients. Each independent variable is displayed on the y-axis. Finally, we omit the estimate and standard error of the constant, since it is not substantively meaningful in this instance (the constant is the predicted value for someone, who, among other unlikely if not impossible characteristics, is age zero.) Finally, we use the empty space in the plot region to display R², adjusted R² and the number of observations; this information, however, could just as well be displayed in the caption, if more desirable or if there is not enough room in the plot region.

Presenting a single regression model using a dot plot with error bars.

Table 2 from Stevens et al. 2006 displays a single least squares regression model that examines the relationship between individual authoritarianism and economic perceptions, ideology and demographic variables in six Latin American countries. We turn the table into a single regression graph, taking advantage of the similar scaling across the estimates. The dots represent the point estimates, while the horizontal lines depict 95% confidence intervals. The range of parameter estimates is displayed on the x-axis, while the variable labels are displayed on the y-axis. We place a vertical line at zero for convenience, and make the length of the x-axis symmetric around this reference line for easy comparison of the magnitude of positive and negative coefficients.

This figure shows several advantages of using graphs. First, information regarding statistical significance is displayed without any asterisks, bars or superscripted letters. Instead, the length of the error bars visually signal which variables are significant: those that do not cross the reference line, which is zero in this case. Thus, a vertical scan of the graph allows the reader to assess quickly which variables are significantly different from the null hypothesis.

In addition, the visual display of the regression results also focuses the attention of the reader away from statistical significance towards the more relevant and perhaps more interesting information revealed by a regression analysis: the estimated effect size and the degree of uncertainty surrounding it. The vertical placement of coefficients makes it easy to compare their relative magnitudes, while the size of the error bars provide information on how precisely each parameter is estimated.

The use of confidence intervals also provides much more information, displayed intuitively, than a regression table. For instance, when confidence intervals do not overlap, we can conclude that two estimates are statistically significantly different from each other. Thus, if one goal of a regression analysis is to compare two or more parameter estimates, then this simple type of graph is sufficient. However, even if the confidence intervals of two coefficients overlap, it is possible that they are still significantly different.³⁹

Austin and Hux 2002; Schenker and Gentleman 2001.

One possibility would be to move beyond this basic graph and plot all contrasts of interest (the estimates of the differences in the country coefficients and their corresponding standard errors, for example: b_Argentina−b_Chile;b_Argentina−b_Colombia, etc.).⁴⁰

Multiple comparisons raise various methodological issues. See Hsu and Peruggia 1994 for a discussion of Tukey's multiple comparisons method and a novel graphical display.

Plotting Multiple Models in a Single Plot

As noted earlier, in our survey of political science tables we found that more often than not researchers present multiple models. Graphing multiple models presents new challenges, as we must be sure that a reader can distinguish the parameter estimates and confidence intervals from each model and that the differences between models in terms of which variables are included are visually apparent.

We begin with the case of two regression models. Table 1 of Pekkanen et al. (replicated in our table 8) displays two logistic regression models that examine the allocation of posts in the LDP party in Japan.⁴¹

Pekkanen et al. 2006.

In the first model, PR Only and Costa Rican in PR are included and Vote share margin is excluded, while in the second model the reverse is true. We present the two models by plotting parallel lines for each of them grouped by coefficients, as can be seen in figure 7. We differentiate the models by plotting different symbols for the point estimates: filled (black) circles for the first model and empty (white) circles for the second. The similar scaling of the coefficients again allows us to graph all the estimates in a single plot. Because most of the coefficients fall below zero, to save space we do not make the x-axis symmetric around zero. And because the plot region features little empty space, in this case we would present model information in the caption. Unlike in the previous example, we include the constant in the graph, since it is substantively meaningful in these models (the inverse logit of the constant in each model gives the predicted probability that a third-term Member of Parliament who is coded as 0 on all the other predictors in the model—theoretically possible given that they are all categorical variables—holds a senior political position.

Pekkanen, Nyblade and Krauss (2006), table 1: Logit analysis of electoral incentives and LDP post allocation (1996–2003)

Using parallel dot plots with error bars to present two regression models.

Table 1 from Pekkanen et al. 2006 displays two logistic regression models that examine the allocation of posts in the LDP party in Japan. We turn the table into a graph, and present the two models by plotting parallel lines for each of them grouped by coefficients. We differentiate the models by plotting different symbols for the point estimates: filled (black) circles for Model 1 and empty (white) circles for Model 2.

By plotting the two models, we can now easily compare both the coefficients within each model and across the two models. The fact that only single estimates are plotted for PR Only, Costa Rican in PR and Vote share margin signals that these predictors appear in only one model. Rather than having to compare individual parameter estimates (and the presence or absence of asterisks) across models in a regression table, a vertical scan of the graph shows that the estimates are mostly robust across models, as the parameter estimates and their respective confidence intervals vary little. Finally, plotting the coefficients for the term indicators shows that LDP members in their first term are much less likely to receive leadership posts, all things equal, than members in their third terms, an intuitive result that does not necessarily stand out in the crowded regression table.

Using Multiple Plots to Present Multiple Models

As the number of models increase, we suggest a different strategy. Instead of presenting all the models and predictors in a single plot, we use “small multiple” plots to present results separately for each predictor. The main objective when presenting results from multiple models is to explore how inferences change as we change the model specification or use different subsamples, or both. Using multiple dot plots with error bars allows the researcher to communicate the information in a multiple regression table with much greater clarity and also make it much easier to compare parameter estimates across models. This strategy also overcomes any problems that might arise from having predictors with greatly varying scales, since the coefficients from each independent variable are presented in a separate plot.

As an example, we return to table 4 of Ansolabehere and Konisky (see table 5 above), turning it into the graph presented in figure 8.⁴²

Ansolabehere and Konisky 2006. This plot was inspired by the plot.bugs function available in Sturtz et al. 2005, where it serves a different purpose (convergence diagnostics).

Rather than placing all the predictors in a single plot, which would make it difficult to compare individual estimates, we created a separate panel for each of the six predictors, with the panels presented in a single column. We also shift strategy and display the parameter estimates on the y-axes, along with 95 percent confidence intervals. This maximizes efficiency since we can stack more plots into a single column than into a single row, given the portrait orientation of journals. The x-axis depicts which model is being displayed. To facilitate comparison across predictors, we center the y-axis at zero, which is the null hypothesis for each of the predictors.

Using “small multiple” plots to present regression results from several models.

Table 4 from Ansolabehere and Konisky 2006 presents regression results from six separate models. We turn the table into a graph that displays a single plot for each predictor, varying across models. The models vary with respect to sample (full sample vs. excluding partisans registration counties) and predictors (with/without state year dummies and with/without law change). On the x-axis we group each type of model: “full sample,” “excluding counties with partial registration” and “full sample with state year dummies.” Within each of these types, two different regressions are presented: including the dummy variable law change and not including it. For each type, we plot two point estimates and intervals, using solid circles for the models in which law change is included and empty circles for the models in which it is not.

The regression table presents six models, which vary with respect to sample (full sample vs. excluding partisans registration counties) and predictors (with/without state year dummies and with/without law change). On the x-axis we group each type of model: “full sample,” “excluding counties with partial registration” and “full sample with state year dummies.” Within each of these types, two different regressions are presented: including the dummy variable law change and not including it. Thus, for each type, we plot two point estimates and intervals—we differentiate the two models by using solid circles for the models in which law change is included and empty circles for the models in which it is not. We again choose not to graph the estimates for the constants because they are not substantively meaningful.

This graphing strategy allows us to easily compare point estimates and confidence intervals across models. Although in all the specified models the percent of county with registration predictor is statistically significant at the 95 percent level, it is clear from the graph that estimates from the full sample with state/year dummies models are significantly different from the other four models. In addition, by putting zero at the center of the graph, it becomes obvious which estimates have opposite signs depending on the specification (log population and log median family income). By contrast, it is much more difficult to spot these changes in signs in the original table. Thus, by using a graph it is easy to visually assess the robustness of each predictor—both in terms of its magnitude and confidence interval—simply by scanning across each panel. In summary, the graph appropriately highlights the instability in the estimates depending on the choice of model.

Conclusion

As a largely empirical discipline, the quality of political science depends on how well its practitioners can communicate their findings to audiences professional and public. We believe that a turn towards greater use of graphical methods, particularly for regression results, can only increase our ability to communicate successfully.

A potential objection to our approach is that the benefits of graphs in terms of aiding comparisons are outweighed by the corresponding loss of precision in presenting data or results. This objection certainly holds with respect to aiding replication studies, as it is difficult to measure replication against a graph.⁴³

King 1995.

Greater use of graphs will increase the burden on researchers to aid replication studies in other ways, by publishing corresponding tables or replication materials on public archives or web sites, or both. Yet with respect to the broader goal of presentation, the choice of precision versus communication is in fact false: given the nature of the phenomena studied by political scientists, measurement error and other sources of uncertainty renders an illusion the precision implied by tables. Indeed, the inherent ability of graphs to present uncertainty efficiently is one of their major strengths.

The range of graphs we have presented in this paper is not exhaustive. With respect to descriptive statistics, for instance, we have not discussed scatterplots, which can prove useful in a variety of situations.⁴⁴

Cleveland and McGill 1984.

And with respect to regression results, graphs provide a powerful way to depict regression coefficients over time.⁴⁵

See e.g., Gelman et al. 2007.

In many instances it should be possible to take our graphical examples here and apply them as needed to other types of graphs. But as our graphs demonstrate, different empirical quantities require different strategies for presentation, which in turn will require researchers to be creative. Examples of such scenarios might include graphing one-tailed confidence intervals or plotting single regression models where the independent variables have varying scales.

As our graphical examples and these possibilities indicate, the methods we propose are “more onerous than the methods currently used in political science,” to echo the words of King et al. in their call for researchers to present quantities of interest.⁴⁶

King et al. 2000, 360.

Nevertheless, we believe the extra effort undertaken will lead to clearer and more accessible papers, lectures, and presentations in political science, and that the discipline has much to gain by moving from tables to graphs.

References

Ansolabehere, Stephen, and David M. Konisky. 2006. The introduction of voter registration and its effect on turnout. Political Analysis 14 (1): 83–100.CrossRef Google Scholar

Armstrong, J. Scott. 2007. Significance tests harm progress in forecasting. International Journal of Forecasting 23 (2): 321–27.CrossRef Google Scholar

Austin, Peter C., and Janet E. Hux. 2002. A brief note on overlapping confidence intervals. Journal of Vascular Surgery 36 (1): 194–95.CrossRef Google Scholar

Bowers, Jake, and Katherine W. Drake. 2005. EDA for HLM: Visualization when probabilistic inference fails. Political Analysis 13 (4): 301–26.CrossRef Google Scholar

Chambers, John M. Cleveland, William S. Kleiner, and Paul A Tukey. 1983. Graphical Methods for Data Analysis. Murray Hill, NJ: Bell Telephone Laboratories Incorporated.

Cleveland, William S. 1985. The Elements of Graphing Data. Murray Hill, NJ: Bell Telephone Laboratories, Inc.

Cleveland, William S. 1993. Visualizing Data. Murray Hill, NJ: AT&T Bell Laboratories.

Cleveland, William S., Richard A. Becker, and MingJen Shyu. 1996. The visual design and control of trellis display. Journal of Computational and Graphical Statistics 5 (2): 123–55.Google Scholar

Cleveland, William S., and Robert McGill. 1984. The many faces of a scatterplot. Journal of the American Statistical Association 79 (388): 807–22.CrossRef Google Scholar

Cleveland, William S., and Robert McGill. 1985. Graphical perception and graphical methods for analyzing scientific data. Science 229 (4716): 828–33.CrossRef Google Scholar

du Toit, S.H.C., A.G.W. Steyn, and R.H. Stumpf. 1986. Graphical Exploratory Data Analysis. New York: SpringerVerlag.

Epstein, Lee, Andrew D. Martin, and Christina L. Boyd. 2007. On the effective communication of the results of empirical studies, part II. Vanderbilt Law Review 60: 101–46.Google Scholar

Epstein, Lee, Andrew D. Martin, and Matthew M. Schneider. 2006. On the effective communication of the results of empirical studies, part I. Vanderbilt Law Review 59: 1811–871.Google Scholar

Friendly, Michael. 1994. Mosaic displays for multi-way contingency tables. Journal of the American Statistical Association 89 (425): 190–200.CrossRef Google Scholar

Friendly, Michael. 1999. Extending mosaic displays: Marginal, conditional, and partial views of categorical data. Journal of Computational and Graphical Statistics 8 (3): 373–95.Google Scholar

Friendly, Michael, and Ernest Kwan. 2003. Effect ordering for data displays. Computational Statistics and Data Analysis 43 (4): 509–39.CrossRef Google Scholar

Gelman, Andrew, Boris Shor, Joseph Bafumi, and David Park. 2007. “Rich State, Poor State, Red state, Blue state: What's the matter with Connecticut?” Columbia University technical report.

Gelman, Andrew, Cristian Pasarica, and Rahul Dodhia. 2002. Let's practice what we preach: turning tables into graphs. American Statistician 56 (2): 121–30.CrossRef Google Scholar

Gelman, Andrew, and Hal Stern. 2006. The difference between “significance” and “not significant” is not itself statistically significant. American Statistician 60 (4): 328–31.CrossRef Google Scholar

Gill, Jeff. 1999. The insignificance of null hypothesis significance testing. Political Research Quarterly 52 (3): 647–74.CrossRef Google Scholar

Greene, William H. 1997. Econometric Analysis. Upper Saddle River, NJ: Prentice Hall, Inc.

Hartigan, John A., and Beat Kleiner. 1981. Mosaics for contingency tables. In Computer Science and Statistics: Proceedings of the 13th Symposium on the Interface, ed. W. F. Eddy. New York: SpringerVerlag.

Hartigan, John A., and Beat Kleiner. 1984. A mosaic of television ratings. American Statistician 38 (1): 32–5.Google Scholar

Hintze, Jerry L., and Ray D. Nelson. 1998. Violin plots: A box plot-density trace synergism. American Statistician 52 (2): 181–84.Google Scholar

Hsu, Jason C., and Mario Peruggia. 1994. Graphical representations of Tukey's multiple comparison method. Journal of Computational and Graphical Statistics 3 (2): 143–61.Google Scholar

Iversen, Torben, and David Soskice. 2006. Electoral institutions and the politics of coalitions: Why some democracies redistribute more than others. American Political Science Review 100 (2): 165–81.CrossRef Google Scholar

Jacoby, William G. 1997. Statistical Graphics for Univariate and Bivariate Data. Thousand Oaks, CA: Sage Publications.

Jacoby, William G. 2006. The dot plot: A araphical display for labeled quantitative values. Political Methodologist 14 (1): 6–14.Google Scholar

Jann, Ben. 2005. Making regression tables from stored estimates. Stata Journal 5 (3): 288–308.Google Scholar

Johnson, Douglas H. 1999. The insignificance of statistical significance testing. Journal of Wildlife Management 63 (3): 763–72.CrossRef Google Scholar

Kaplan, Noah, David K. Park, and Travis N. Ridout. 2006. Dialogue in American political campaigns? An examination of issue convergence in candidate television advertising. American Journal of Political Science 50 (3): 724–36.CrossRef Google Scholar

King, Gary. 1995. Replication, replication. PS: Political Science 28 (3): 443–52.Google Scholar

King, Gary, Michael Tomz, and Jason Wittenberg. 2000. Making the most of statistical analyses: Improving interpretation and presentation. American Journal of Political Science 44 (2): 347–61.CrossRef Google Scholar

McClurg, Scott D. 2006. The electoral relevance of political talk: Examining disagreement and expertise effects in social networks on political participation. American Journal of Political Science 50 (3): 737–54.CrossRef Google Scholar

Murrell, Paul. 2006. R Graphics. Boca Raton, FL: Chapman & Hall/CRC.

Pekkanen, Robert, Benjamin Nyblade, and Ellis S. Krauss. 2006. Electoral incentives in mixed-member systems: Party, posts, and zombie oliticians in Japan. American Political Science Review 100 (2): 183–93.CrossRef Google Scholar

Political Analysis: Information for Authors. 2007. available at http://www.oxfordjournals.org/polana/for_authors/general.html. [accessed 24 June 2007].

R Development Core Team. 2006. R: A Language and Environment for Statistical Computing. Vienna, Austria: R Foundation for Statistical Computing.

Sarkar, Deepayan. 2006. lattice: Lattice Graphics. R package version 0.149.

Schenker, Nathaniel, and Jane. F Gentleman. 2001. On judging the significance of differences by examining the overlap between confidence intervals. American Statistician 55 (3): 182–86.CrossRef Google Scholar

Schmidt, Frank L. 1996. Statistical significance testing and cumulative knowledge in psychology: Implications for training of researchers. Psychological Methods 1 (2): 115–29.CrossRef Google Scholar

Schwindt-Bayer, Leslie A. 2006. Still supermadres? Gender and the policy priorities of Latin American legislators. American Journal of Political Science 50 (3): 570–85.CrossRef Google Scholar

Sterne, Jonathan A.C., George Davey Smith, and D.R. Cox. 2001. Sifting the evidence—What's wrong with significance tests? Another comment on the role of statistical methods. British Medical Journal 322 (7280): 226–31.CrossRef Google Scholar

Stevens, Daniel, Benjamin G. Bishin, and Robert R. Barr. 2006. Authoritarian attitudes, democracy, and policy preferences among Latin American elites. American Journal of Political Science 50 (3): 606–20.CrossRef Google Scholar

Sturtz, Sibylle, Uwe Ligges, and Andrew Gelman. 2005. R2WinBUGS: A package for running WinBUGS from R. Journal of Statistical Software 12 (3): 1–16.CrossRef Google Scholar

Tufte, Edward R. 1983. The Visual Display of Quantitative Information. Cheshire,CT: Graphics Press.

Tukey, John W. 1977. Exploratory Data Analysis. Reading, MA: AddisonWesley.

Wainer, Howard. 2000. Visual Revelations: Graphical Tales of Fate and Deception From Napoleon Bonaparte To Ross Perot. Mahwah, NJ: LEA, Inc.

Wainer, Howard. 2001. Order in the Court. Chance 14 (2): 43–6.CrossRef Google Scholar

Wainer, Howard. 2005. Graphic Discovery: A Trout in the Milk and Other Visual Adventures. Princeton, NJ: Princeton University Press.

Tables and graphs in political science journals.The left graph depicts the percentage of all graphs and tables presented in five political science journals that fall into the categories on the y-axis (i.e., the number of graphs and tables that fall into each category divided by the total number of tables and graphs); the right graph depicts the percentage of graphs within each category (i.e., the number of graphs in each category divided by the total number of tables and graphs in the respective category). “Estimates and uncertainties” include such quantities as regression coefficients and standard errors; “Summary statistics” include descriptive statistics like means and standard deviations; “Predicted values” include post-regression estimations such as changes in predicted probabilities; “Non-numeric” includes any information that is not quantitative; “Mathematical” generally includes figures from formal models; finally, “other” is a residual category. The plots show that summary stats and estimates and uncertainties comprise the majority of graphical and tabular presentation—while the former are sometimes displayed graphically (about 40% of the time in our sample), the latter are always displayed as tables.

Iversen and Soskice 2006, table 1: Electoral system and the number of years with left and right governments (1945–98)

Using a Mosaic Plot to Present Cross Tabulations.Table 1 from Iversen and Soskice (2006) displays a cross-tabulation of electoral systems and government partisanship. We turn the table into a two-dimensional mosaic plot. The top plot depicts the relationship between electoral system and partisanship of government, while the bottom plot depicts how often countries featured center-left governments at least 50% of the time, broken down by their electoral systems (defined as an “overweight” in the text of the paper), also across proportional and majoritarian systems. A key feature of the mosaic plot is that the area of each rectangle is proportional to the number of observations that fall within their respective contingencies. The mosaic plot clearly displays the key comparisons while retaining the actual counts. Titles above each graph make it clear to the reader what is being compared, unlike in the table.

McClurg 2006, table 1 (panel A): The political character of social networks

Using a Single Dot Plot to Present Summary Statistics.Table 1 (panel A) from McClurg (2006) presents descriptive statistics from his study of social networks. We turn it into a graph by using a single dot plot, taking advantage of the fact that the scales of all variables are similar. The dots depict the means of each variable, the solid line extends from the mean minus one standard deviation to the mean plus one deviation, and the dashed line extends from the minimum to the maximum of each variable. The number of respondents covered under each variable is given under the y-axis labels. The variables are ordered according to their mean values, in descending order. The graph allows for easy lookup and comparison of the means, while the lines visually depict the distribution of each variable.

Kaplan et al. 2006, table 2: Descriptive statistics of campaign and issue-level variables

Using a Dot Plot and Violin Plots to Present Summary Statistics.Table 2 from Kaplan, Park and Ridout (2006) presents summary statistics from their study of issue convergence and campaign competitiveness. We group similar variables and present separate graphs for each group, allowing for comparisons within each graph and greater understanding of variable distributions. The top graph plot the means of the binary variables, ordering the variable in descending order and placing sample size in the y-axis labels. For percentage variables and variables measured in millions, we present individual violin plots of each variable. The points depict median values, the white boxes connect the 25th and 75th percentiles and the thin black lines connect the lower adjacent value to the upper adjacent value. The shaded area depicts a density trace, plotted symmetrically above and below the horizontal box plot. The violin plots clearly depict the distribution of each variable and allow for the detection of skewness and outliers.

Schwindt-Bayer 2006, table 2: Number of bills sponsored in each thematic area

Using an Advanced Dot Plots to Present Proportions.Table 2 from Schwindt-Bayer (2006) presents counts of the type of bills initiated in four Latin American legislative chambers in two time periods each. We turn the table into a graph that allows for comparisons across time periods, chambers and issue areas. For each country or chamber, the “o's” indicate the percentage of all initiated bills pertaining to the issue area listed on the y-axis in the first period, while the “+'s” indicate the percentages from the second period. The grey strips indicate the country or chamber analyzed in the panel immediately below, along with the relevant time periods. For each period, the first count in parentheses gives the number of legislators who sponsored at least one bill, while the second count gives the total number of bills initiated. We scale the x-axis on the log 2 scale: each tick mark is double the percentage of the tick mark to its left.

Ansolabehere and Konisky (2006), table 4: Registration effects on turnout in New York and Ohio counties, fixed effects model, 1954–2000

Presenting regression results with confidence intervals

Stevens et al. 2006, table 2: Determinants of authoritarian aggression

Presenting a single regression model using a dot plot with error bars.Table 2 from Stevens et al. 2006 displays a single least squares regression model that examines the relationship between individual authoritarianism and economic perceptions, ideology and demographic variables in six Latin American countries. We turn the table into a single regression graph, taking advantage of the similar scaling across the estimates. The dots represent the point estimates, while the horizontal lines depict 95% confidence intervals. The range of parameter estimates is displayed on the x-axis, while the variable labels are displayed on the y-axis. We place a vertical line at zero for convenience, and make the length of the x-axis symmetric around this reference line for easy comparison of the magnitude of positive and negative coefficients.

Pekkanen, Nyblade and Krauss (2006), table 1: Logit analysis of electoral incentives and LDP post allocation (1996–2003)

Using parallel dot plots with error bars to present two regression models.Table 1 from Pekkanen et al. 2006 displays two logistic regression models that examine the allocation of posts in the LDP party in Japan. We turn the table into a graph, and present the two models by plotting parallel lines for each of them grouped by coefficients. We differentiate the models by plotting different symbols for the point estimates: filled (black) circles for Model 1 and empty (white) circles for Model 2.

Using “small multiple” plots to present regression results from several models.Table 4 from Ansolabehere and Konisky 2006 presents regression results from six separate models. We turn the table into a graph that displays a single plot for each predictor, varying across models. The models vary with respect to sample (full sample vs. excluding partisans registration counties) and predictors (with/without state year dummies and with/without law change). On the x-axis we group each type of model: “full sample,” “excluding counties with partial registration” and “full sample with state year dummies.” Within each of these types, two different regressions are presented: including the dummy variable law change and not including it. For each type, we plot two point estimates and intervals, using solid circles for the models in which law change is included and empty circles for the models in which it is not.

Article contents

Using Graphs Instead of Tables in Political Science

Abstract

The Use of Tables versus Graphs in Political Science

Why Tables?

Why Graphs?

Using Graphs Instead of Tables: Descriptive Statistics

Using a Mosaic Plot to Present Cross Tabulations

Using a Dot Plot to Present Means and Standard Deviations

Using Dot Plots and Violin Plots to Present Distributions of Variables with Differing Scales

Using an Advanced Dot Plot to Present Multiple Comparisons

Using Graphs Instead of Tables: Regression Analyses

On Regression Tables and Confidence Intervals

Plotting a Single Regression

Plotting Multiple Models in a Single Plot

Using Multiple Plots to Present Multiple Models

Conclusion

References

Save article to Kindle

Save article to Dropbox

Save article to Google Drive

Reply to: Submit a response

Your details

You have entered the maximum number of contributors

Conflicting interests