In 2005, the American Political Science Review published an article that used data on political attitudes from identical and fraternal twins in the United States and Australia to show that genes may play a role in political orientations (Alford, Funk, and Hibbing Reference Alford, Funk and Hibbing2005). This was the first “twin study” published in political science, and a replication study was published shortly thereafter in Behavior Genetics (Hatemi et al. Reference Heinz2007). We followed up these studies of political attitudes with a twin study of political behavior. Using self-reported and validated turnout data from two sources, we showed that patterns of voter turnout and political participation were significantly more similar between identical twins than they were between fraternal twins (Fowler, Baker, and Dawes Reference Fowler, Baker and Dawes2008).
These articles initiated a new field of inquiry that we call “genopolitics,” the study of the genetic basis of political behavior. And although the twin study design used in these early studies has been tested and replicated using alternative methods (Visscher et al. Reference Visscher2006; Yang et al. Reference Yang2010), they are only a starting point that suggest whether and how much genes matter in general. Ultimately, however, scholars are interested in how genes matter. One way to address this question is to identify specific genes that are associated with a particular trait and so better understand the mechanism by which genes have their effect.
In 2008, we reported the results of the first candidate gene association (CGA) study of political behavior in an article published in the Journal of Politics (Fowler and Dawes Reference Fowler and Dawes2008). In that study, we used data from the National Longitudinal Study of Adolescent Health (Add Health) to show that voter turnout is significantly associated with variants of the 5HTT and MAOA genes. This was the first study to identify specific genes associated with a political behavior, and it was also the first political science study to test a gene-environment interaction. In a recent issue of the American Political Science Review, Charney and English (Reference Charney and English2012) (hereafter “CE”) published a general critique of genopolitics, focusing in particular on our 2008 candidate gene study. CE replicated our results from the earlier study and then suggested several alternatively specified models using the same data that we employed.
Here, we offer a reply to this critique. First, we describe candidate gene association tests and their limitations, reemphasizing our initial statement that our findings are only suggestive until they are replicated using independent samples. We then go on to conduct such a replication of our earlier work using new data from Wave IV of Add Health. The results show that the association between turnout and an interaction between 5HTT and church attendance replicates; however, the association with MAOA does not. We then conduct several tests with alternative specifications, showing that the 5HTT result is robust.
Turning to the general critique of genopolitcs, we draw on literatures in both political science and genetics to show that many of the critique's arguments are misleading or even incorrect. We then review the potential advantages of genopolitics research and prescribe guidelines for political science journals that are considering whether to publish candidate gene association studies. We conclude on an optimistic note, showing how genopolitics has stimulated an exchange of ideas between biology and politics.
At the outset, we would like to highlight several important points that we discuss later in greater detail. First, many of the criticisms raised by CE were already clearly acknowledged and explained in our original paper. Second, many of the points raised by CE are not specific to candidate gene studies, but are broader critiques of the choices typically made by political scientists doing empirical analysis. Third, we acknowledge that since our study first appeared there has been a growing concern over widespread failed replications of CGA studies (Chabris et al. Reference Chabris2012). These failed replications are likely due to the polygenic architecture of complex political traits, which means there are many genes of small effect that require large samples and/or independent replications to achieve adequate power. In addition, the problem of false-positive results may be more pervasive among candidate gene-environment interaction studies due to the added challenge of determining and measuring environmental moderators (Duncan and Keller Reference Duncan and Keller2011). Fourth, unlike CE, we believe that genetic studies, if they are done carefully, can be useful in helping gain a better understanding of political behaviors. Finally, we do not share CE's vision of scientific inquiry. The etiology, genetic or environmental, of any complex behavior is surely complicated, but we do not think most scientists would suggest that this complexity means we should give up this exploration. We advance our understanding of complex systems by studying their parts, with the goal of integrating this knowledge after enough is understood about the basic processes that lie at the heart of any phenomenon.
CANDIDATE GENE ASSOCIATION STUDIES AND THEIR LIMITATIONS
Fowler and Dawes (Reference Fowler and Dawes2008) reported the results of a candidate gene association (CGA) study; it focused on 5HTT and MAOA as candidates because of their role in the serotonin system (Fowler and Dawes Reference Fowler and Dawes2008, 579):
These two genes transcribe neurochemicals that exert a strong influence on the serotonin system in parts of the brain that regulate fear, trust, and social interaction (Bertolino et al. Reference Bertolino, Arciero, Rubino, Latorre, De Candia and Mazzola2005; Eisenberger et al. Reference Eisenberger, Way, Taylor, Welch and Lieberman2007; Hariri et al. Reference Hatemi, Alford, Hibbing, Martin and Eaves2002; Hariri et al. Reference Hariri, Matty, Tessitore, Kolachana, Fera and Goldman2005; Heinz et al. Reference Hewitt2005; Meyer-Lindenberg et al. Reference Mondak, Hibbing, Canache, Seligson and Anderson2006). MAOA and 5HTT have been studied for more than 20 years, and much is known about the way different versions of their genes regulate transcription, metabolism, and signal transfers between neurons, all of which have an effect on social interactions (Craig Reference Craig2007). In particular, the less transcriptionally efficient alleles of these genes have been associated with a variety of antisocial behaviors (Rhee and Waldman Reference Risch2002).
Given that voting is considered a prosocial activity (Blais and Labbé-St-Vincent Reference Blais and Labbé-St-Vincent2011; Dawes, Loewen, and Fowler Reference Dawes, Loewen and Fowler2011; Edlin, Gelman, and Kaplan Reference Edlin, Gelman and Kaplan2007; Fowler Reference Fowler2006a; Gerber et al. Reference Goldstein and Freedman2008; Jankowski 2004; Knack 1992), we hypothesized that these genes influence voting. But we were also interested in the possibility that these genes might not act in isolation: “[A]n association between either MAOA and 5HTT and voting may not be direct. Instead, an association between a gene and turnout may be moderated by environmental factors” (Fowler and Dawes Reference Fowler and Dawes2008, 583).
Although it is possible that there might have been other environmental moderators, we focused on church attendance because of its well-known relationship with voting and because it might have a specific effect on those who prefer to avoid social conflict:
Religious group activity in particular has been singled out as one of the strongest predictors of voter turnout, even more so than socioeconomic status (Olsen Reference Oxley1972, Sallach, Babchuk & Booth Reference Sallach, Babchuk and Booth1972). . . . Cassel (Reference Cassel1999) suggests that the main reason for the association is that religious groups build a sense of belonging to a larger community. However, it may not be possible to build such a sense in people who are too averse to social conflict, since they will resist appeals to become involved. We therefore hypothesize that MAOA and 5HTT, when interacted with religious group activity, may be significantly associated with turnout. (Fowler and Dawes Reference Fowler and Dawes2008, 583)
A CGA study tests whether a particular variant of a gene (an “allele”) is found more frequently than can be attributed to chance in a group exhibiting a particular trait compared to those without the trait. In our case, we asked whether the frequency of a particular allele is higher among voters than nonvoters, after noting several limitations of this method:
[A] significant association can mean one of three things: (1) The allele itself influences voting behavior; (2) the allele is in “linkage disequilibrium” with an allele at another locus that influences voting; or (3) the observed association is a false positive signal due to population stratification. (Fowler and Dawes Reference Fowler and Dawes2008, 584)
“Linkage disequilibrium” refers to the fact that alleles near one another on a strand of DNA may be correlated with one another. Candidate gene studies typically cannot be used to determine whether the causal allele is the one identified or the one in linkage disequilibrium nearby. “Population stratification” occurs when groups have different allele frequencies due to their genetic ancestry. Through the process of natural selection, assortative mating, recombination, ecological adaptation, or genetic drift these groups may develop different frequencies of a particular allele. At the same time, the two groups may also develop divergent behaviors that are influenced not by the allele but completely by the environment in which they live. Once these two groups mix in a larger population, simply comparing the frequency of the allele to the observed behavior would lead to a spurious association.
In case-control CGA studies, researchers control for the problem of population stratification by including the race/ethnicity of the subject in the model or by analyzing data from each group separately. But this method does not guard against population stratification within each race or ethnic group, which has been shown to occur even in populations that are thought to be highly homogeneous (Price et al. Reference Price2009). Moreover, dividing the sample by group can dramatically reduce the study's power (more on this later). Conversely, family-based CGA studies eliminate the problem of population stratification by using family members, such as parents or siblings, as controls for ancestry. However, a major limitation of family-based studies is that they tend to be even more underpowered (Xu and Shete Reference Yang2006).
Recent work has provided direct evidence of the long-held assertion that complex traits such as intelligence (Davies et al. Reference Davies2011) and personality (Vinkhuyzen et al. Reference Visscher2012) exhibit a polygenic architecture, with the heritable variation explained by many genetic variants with small effects. Benjamin et al. (Reference Benjamin2012b) showed that this is also the case for political attitudes. Combined, these results suggest that, to achieve the necessary power to detect small effects and thus reduce the risk of false-positive results, large samples are necessary. It also suggests that some previous candidate gene studies reporting large effect sizes may have been false positives due to a lack of power. Chabris et al. (Reference Chabris2012) were unable to replicate several previously published associations for general intelligence even though they had adequate power to do so.
In Figure 1 we show the relationship between effect size and power for achieving significance at conventional levels (p = 0.05) in candidate gene association studies. The power of a test is the probability of rejecting the null hypothesis when it is false. The red lines show the power to detect effects in a sample of the same size as that reported in Fowler and Dawes (Reference Fowler and Dawes2008), simulated by assuming the same distribution of the dependent variable, independent variables, and their interaction. The yellow lines show the power to detect effects in samples the same size as those reported by CE. Notice that CE's models are relatively underpowered. In contrast, the replication sample we describe later and the combined sample have much greater power. In fact, Figure 1 shows that this new sample is adequately powered to detect effects as small as 0.1 in the MAOA model and 0.2 in the 5HTT interaction model. The main reason for the difference in thresholds is that interaction tests are underpowered relative to tests for main effects (Esarey and Lawrence Reference Esarey and Lawrence2012). What this figure makes clear is that large sample sizes, replication, and meta-analyses are probably necessary to detect the small effects we are seeking, especially in gene-environment interaction models (Duncan and Keller Reference Duncan and Keller2011).
![](https://static.cambridge.org/binary/version/id/urn:cambridge.org:id:binary-alt:20170128170710-64164-mediumThumb-S0003055413000063_fig1g.jpg?pub-status=live)
FIGURE 1. Power to Detect Genetic Association
Relationship between effect size and power for the sample sizes we previously reported (N = 2,273), those reported by CE (N = 803), the independent Add Health replication sample we use in this article (N = 8,744), and the combined (original and replication) sample (N = 9,821). Left panel shows results for MAOA models, and right panel shows results for 5HTT interaction models. These results show that the replication and combined samples have substantially more power than the original tests. They also show that models tested by CE have very low power, suggesting a high probability of falsely confirming the null hypothesis.
REPLICATION
In our previous work, we wrote that “[a]ssociation studies like ours require further replication before their findings can be truly considered anything more than suggestive; therefore more work needs to be done in order to verify and better understand the specific associations we have identified” (Fowler and Dawes Reference Fowler and Dawes2008, 590). Fortunately, we recently gained access to data that allow us to conduct a replication of our 2008 findings. In 2012, Add Health released a set of genotype information for an expanded number of subjects in Wave IV who had not previously been genotyped in Wave III; see Fowler and Dawes (Reference Fowler and Dawes2008) or http://www.cpc.unc.edu/projects/addhealth/ for more details about the Add Health study. This new data included 2,297 individuals who were previously genotyped for MAOA and 9,311 who were not, as well as 2,314 individuals who were previously genotyped for 5HTT and 9,410 individuals who were not. Restricting our analysis to those who were genotyped in Wave IV but not in Wave III gave us the opportunity to conduct an out-of-sample test of the significant associations previously reported between these genotypes and voting behavior. Table 1 presents summary statistics for the combined sample. Note that the incidence of genotypes is very similar between the smaller Wave III sample and the larger Wave IV sample; the correlation in genotype for those who were genotyped twice is 0.98 for MAOA and 0.99 for 5HTT (see Smolen et al. Reference Speliotes2012, for full details on genotyping the new sample). All other variables were measured exactly as reported in Fowler and Dawes (Reference Fowler and Dawes2008), except where noted later.
TABLE 1. Summary Statistics
![](https://static.cambridge.org/binary/version/id/urn:cambridge.org:id:binary-alt:20170128170710-01255-mediumThumb-S0003055413000063_tab1.jpg?pub-status=live)
Note: All summary statistics are based on the combined sample except those denoting Wave III or IV.
Using the same mixed-effects model reported in Fowler and Dawes (Reference Fowler and Dawes2008) on individuals who were not previously genotyped, Table 2 shows that the association between self-reported voting and the interaction of 5HTT and church attendance remained significant in the new sample (that is, excluding all individuals who were in the previously reported model). Church attendees with at least one “long” allele were 21% (SE 7%) more likely to vote than those with two “short” alleles. Moreover, the association replicated in a model that included only white subjects. Figure 2 shows that this interaction is apparent in the raw data as well.
TABLE 2. Association of 5HTT and MAOA with Voter Turnout in a New Sample
![](https://static.cambridge.org/binary/version/id/urn:cambridge.org:id:binary-alt:20170128170710-32740-mediumThumb-S0003055413000063_tab2.jpg?pub-status=live)
Note: Mixed-effects logistic regression models of voter turnout (1 = voted in 2000 election) using newly released subjects from Wave IV of the Add Health data not analyzed in Fowler and Dawes (Reference Fowler and Dawes2008). Cohen's d for Long * Attendance is 0.10 in model 1 and 0.09 in model 2.
![](https://static.cambridge.org/binary/version/id/urn:cambridge.org:id:binary-alt:20170128170710-70021-mediumThumb-S0003055413000063_fig2g.jpg?pub-status=live)
FIGURE 2. Distribution of Voter Turnout by 5HTTLPR and Church Attendance Based on Combined Add Health Sample Data
Vertical bars indicate +/-1 standard error of the mean.
In contrast, Table 2 also shows that the association with MAOA did not replicate, and contrary to the originally reported association, it even produced a slightly negative coefficient. This result is unlikely to be due to low power – as mentioned earlier, Figure 1 shows that we have a sample size sufficient to detect effects as small as 0.1. We therefore suspect that the original MAOA result was a false positive.
Although we believe it would be sufficient to stop here and simply report the results in Table 2, in the remainder of this section on replication we address CE's methodological concerns with additional models that test the robustness of this replication and the overall relationship among 5HTT, church attendance, and voting.
Dealing with Within-Family Correlation
CE question our use of mixed-effect models to cope with multiple observations from the same family. The key focus of this critique is that “‘families’ in the Add Health dataset segregate into different ‘types’ whose members differ systematically” (9) and that this likely biases our result. Instead, they propose randomly choosing one sibling from each family to create an independent sample.
Although their recommendation is likely far too conservative (mixed-effect models are used throughout the social sciences), in the interest of testing whether our replication is sensitive to the choice of modeling framework we created a randomly chosen, fully independent sample with 8,744 unrelated individuals (in other words, we took only one subject from each family) and used a generalized linear model to estimate the association between 5HTT and the Add Health measure of self-reported turnout in the 2000 general election. Table S1 (Tables S1–S7 are in the Online Appendix) shows that this more conservative approach also yielded a significant association.Footnote 1 Importantly, none of the 8,744 subjects in this model were included in the original study because we did not have genotype information for them (they were genotyped in Wave IV but not in Wave III). Moreover, none came from households with oversampled twins or siblings, suggesting that CE's conjecture that oversampling would bias the results is incorrect.
In Table S2, we combined the previously genotyped and newly genotyped samples to generate an estimate across both samples; however we continued to use only one individual per family (N = 9,821). The results show that the estimated effect increased slightly, and the p value on the association was 5 × 10−3. Note that the 95% confidence interval for the beta coefficient on the 5HTT interaction in the original study was [0.13 to 0.79] and in the replication study it was [0.06 to 0.32]. Although this suggests that the original sample was underpowered to detect an effect of the size estimated in the replication sample (0.45 > 0.32), it does not invalidate the replication: In both samples, the association was positive and significantly different from zero, and a test of the difference in the coefficients produced by both samples was not quite significant (p = 0.07). Given the much larger sample size in the replication study, however, we suspect that the true coefficient is closer to that we reported in the replication, which is confirmed by the model reporting results on the combined independent sample, suggesting a confidence interval of [0.07 to 0.33].
Genotype Specification
CE assert that the results we originally presented are sensitive to specification of the 5HTT variable. They claim that rather than coding the genotype as a 1 if a person had at least one long allele (combining the LL and Ls genotypes into one group), we should have coded it as 1 if a person had two long alleles (including only those with the LL genotype). In Table S3 we present results for alternative genotype coding criteria by including separate variables for both the LL and Ls genotypes. We found that the effect size of the Ls genotype is exactly half that of the LL genotype, suggesting an additive process that increases with the number of L alleles. In fact, an additive model of 5HTT produces a better fit (as evidenced by a lower AIC) than a model that includes the Ls and LL genotypes separately (and both models have a lower AIC than a fully saturated model that omits the genotype variables).
Splitting the Sample
Throughout their article CE argue against our findings by analyzing subsets of the overall sample and demonstrating a lack of statistical significance. However, it is important to consider that the failure to maintain a significant association found based on the total sample may simply be the result of a dramatic reduction in sample size. More concretely, any random culling of the data by X% will increase the standard error by a factor of approximately
$\sqrt{100/(100-X)}$
and, as noted earlier and demonstrated in Figure 1, will cause a large reduction in power. A more appropriate way to determine whether pooling by subsample is appropriate is to test whether the effect size varies by group. An easy way to do this is to interact the effect size with an indicator variable for the subsample in question.
In their first argument that the sample should be split, CE assert that individuals voting in their first election (ages 18–20) or their second election (ages 20–22) should not be pooled with older individuals. We tested the assertion that the association varies by age group by interacting it with the 5HTT-Attendance interaction. Table S4 shows that we could reject the hypothesis that the interaction effect is different for 18–20 year olds (p = 0.17) and for 20–22 year olds (p = 0.21).
CE also assert that nonwhite individuals should not be pooled with whites because the effect for the whole population might be driven by an effect that only exists for a particular subgroup. We tested the assertion that the effect size varies by race or ethnicity by interacting it with the 5HTT-Attendance interaction. Table S5 shows that we could reject the hypothesis that the interaction effect is different for blacks (p = 0.49), Hispanics (p = 0.72), and Asians (p = 0.45). The interaction with Native Americans is significant (p = 0.02), but is positive, and the main interaction continues to be strongly significant (p = 2 × 10−3), suggesting that Native Americans are not driving the overall association.
Alternative Markers for 5HTT
In addition to studying the original 5HTT measure, we used a newly available classification from the recently released Add Health data. In the previous release, 5HTT was classified according to whether individuals had a “short” or a “long” version of the 5HTTPLR allele in the promoter region. In the most recent release, a single nucleotide polymorphism (rs25531) was genotyped that has been shown to interact with 5HTTPLR to affect gene expression levels (Hu et al. Reference Ioannidis2006). Specifically, only those with the “long” genotype and adenine (A) at rs25531 evinced higher expression of 5HTT. We used this “triallelic” genotype to test the association among 5HTT, church attendance, and voting. In Table S6 we show that this alternative classification also yielded a significant interaction with church attendance (p = 1 × 10−3).
Alternative Phenotypes
CE assert that voting in a single election is not sufficient to measure voting behavior. We have more to say about the broader issues related to this critique in the next section, but we were able to take advantage of new data to test CE's assertion. Although Add Health does not have additional measures covering voting behavior in other elections, Wave IV does contain a question tapping one's overall propensity to vote: “How often do you usually vote in local or statewide elections?” Four responses were permitted for this question: never, sometimes, often, and always. This question is not strictly comparable to the question we previously used as a measure of voting because the Wave III question is about a national election held in 2000 and the Wave IV question is about state and local elections held through 2008. Nonetheless, the polychoric correlation between these two measured variables was positive and highly significant (0.55, 95% CI 0.53 to 0.57).
To test whether the association with 5HTT is sensitive to phenotype specification, we regressed the Wave IV voting measure on the same independent variables shown in our other models. Table S7 shows that the interaction between the genotype and church attendance remained significant (p = 0.02). This model assumes that the categorical variable is continuous, so to be sure this assumption is not driving the result, we also estimated an ordered logit specification. The interaction between the genotype and church attendance was also significant in this alternative model (p = 0.03).
There are important limitations to note regarding our replication exercise. Although our analysis is based on a much larger sample than previously employed and is therefore more likely to be adequately powered, the previous track record of candidate gene-environment interaction studies suggests that our results should be evaluated cautiously. In addition, we still run the risk of finding false positives due to population stratification. Even though we compared estimates across groups and restricted our analysis to whites, we lacked additional genetic information to control for unobserved population structure. This information could be provided using genome-wide association study (GWAS) samples once they become available. Another alternative is to conduct a replication based on a non-U.S. sample in which the potential confounding relationship between allele frequency and the environmental determinants of voting behavior is unlikely to follow the same pattern. Finally, it should be pointed out that a previously reported association between depression and an interaction of 5HTT and stressful life has not been reproduced by follow-up studies (Duncan and Keller Reference Duncan and Keller2011; Risch et al. Reference Sallach, Babchuk and Booth2009). Although we conducted a direct replication (Duncan and Keller Reference Duncan and Keller2011) of a relationship among 5HTT, church attendance, and voter turnout, the hypothesis that the effect of 5HTT is moderated by stressful events has clearly been undermined by failed replications.
BROADER ISSUES IN THE STUDY OF VOTER BEHAVIOR
CE make several arguments that suggest it is impossible to measure voter behavior and that we either inadequately defined our measures or we were doing something that was inconsistent with prior literature. In this section we address these critiques with the goal of documenting that the choices we made and the limitations we originally highlighted are consistent with standard practices in the discipline.
For example, CE argue that “FD distinguish their cases—‘voters’—from controls, ‘nonvoters,’ on the basis of their ‘voting behavior,’ but do not define these terms” (3). Yet we clearly stated that our measure of voting behavior is the answer to this question: “Did you vote in the most recent [2000] presidential election?” (Fowler and Dawes Reference Fowler and Dawes2008, 584). In fact, CE themselves even reference our definition when they assert, “It is apparent that responses to the question, ‘Did you vote in the most recent presidential election?,’ do not provide information concerning what is usually intended by the expression ‘voting behavior’”(3). They also complain that “FD provide no further specification of this new phenotype” (4) and call it “underspecified” (5), suggesting that we were somehow doing things differently than other scholars had in the past.
Their assertions mischaracterize the literature on voting. Self-reported turnout has been used as a measure of voting behavior in dozens (perhaps hundreds) of published studies (see, for example, Bowler and Donovan Reference Bowler and Donovan2002; Goldstein and Freedman Reference Hariri2002; Jackson and Carsey Reference Jankowski2007; Kan and Yang Reference Karp and Brockington2001; Koch Reference Lande1998; Timpone Reference Verhulst1998). In fact, the National Election Study has asked this question at each election since 1948 precisely with the hope that it would be used in research on voting behavior.
Relatedly, CE complain that we “dismiss” the problem of overreporting: the tendency for some survey respondents who did not vote to say that they did (5). However, we were not dismissive. In the 2008 article, we wrote,
[I]t would be preferable to have information about validated turnout because of the well-known problem of over-reporting—many people who say they voted actually did not (Karp and Brockington Reference King2005). However, Fowler, Baker, and Dawes (Reference Fowler, Baker and Dawes2008) show that a substantial genetic component exists for both validated and self-reported turnout, and they do not find a statistically meaningful difference in the size of the component for the different measures. (Fowler and Dawes Reference Fowler and Dawes2008, 584)
In fact, in other work we have often relied on validated data as an additional measure of voting behavior (e.g., Bond et al. Reference Bond, Fariss, Jones, Kramer, Marlow, Settle and Fowler2012; Fowler Reference Fowler and Zuckerman2005; Reference Fowler2006b). However, we also recognize that validated turnout data are not a panacea. A recent comparison of the two measures concluded, “Using government records in lieu of self-reports, which can be both time-consuming and expensive, appears to inject more error than accuracy into measurements of registration and turnout” (Berent, Krosnick, and Lupia Reference Berent, Krosnick and Lupia2011, 72) Thus, we have usually sought to do studies using both validated and self-report measures when they are available (see Fowler, Baker, and Dawes Reference Fowler, Baker and Dawes2008, for an example).
Noting that “inherent imprecision is a natural feature of human language” (5), CE point out that voting is affected by context—in countries where voting is mandatory, dangerous, or irrelevant, people may behave differently than they do in the United States. Yet they draw from this feature the inference that voting is “a different behavior” (5) in each of these circumstances. The goal of CE seems to be to make measurement impossible by persuading the reader that all acts of voting are unique and that we cannot even be sure that we know what it means when someone says “I voted” in a specific context.
Although it is true that semantics makes quantitative evaluation of behavior difficult, systematic application of well-described measures can help us overcome the problem of imprecise language. In our case, we relied on a replicable measure that has been used for several decades. In fact, it is so replicable that CE themselves were able to reproduce our original analysis. We therefore can imagine other scholars testing our hypotheses in different contexts using the same instrument. If other studies produce different results, one possible explanation is that context matters.
CE conclude their argument: “As commonly intended, however, the expression ‘voting behavior’ refers to a quantitative variable (i.e., one votes more or less frequently)” (5). In this statement, not only do CE misconstrue the literature on voting but they also mischaracterize what is a quantitative variable. A yes/no measure is still a “quantity” measured by 1 and 0, and it can be used to infer a latent continuous variable that indicates the probability of taking a certain action. In fact, this is exactly the assumption underlying the replication models they conducted. In logistic regression, the assumption is that there is an unobserved value p that indicates the probability of an event (in this case, a vote) that is a function of several independent variables, and the goal is to fit parameters that define this p such that it maximizes the likelihood of the observed distribution of events (e.g., vote choices; King Reference Klemmensen, Hatemi, Hobolt, Skytthe and Nørgaard1998).
BROADER ISSUES IN GENETICS
CE make several key points in their discussion of broader issues in genetics that we believe should be directly addressed because they are misleading or even incorrect.
Findings from Medical Genetics
CE argue that “the search for genes that could predict prevalent and devastating behavioral phenotypes such as schizophrenia and autism, not to mention global killers such as diabetes and hypertension, has to date been unsuccessful” (2). They go on to identify just three single-nucleotide polymorphisms (SNPs) as “the few reliably reproduced associations between an [sic] SNP and a given phenotype of such a magnitude that the SNP can be called a risk factor” (13), but this argument is highly misleading. For example, a recent review of the literature shows that since 2007, 40 genetic loci have been discovered for type I diabetes and 50 have been discovered for type II diabetes (Visscher et al. Reference Weber and Johnson2012). These loci are among 384 SNPs associated with autoimmune and metabolic diseases with great enough significance in large samples that they are considered to be reliable associations. In fact, enough progress has been made that “it should be possible to identify groups of individuals who are at a substantially greater-than-average risk for diabetes” (Visscher et al. Reference Weber and Johnson2012, 17). Genetic susceptibility scores have already been created and are being applied for other complex phenotypes such as body mass index (Speliotes et al. Reference Speliotes2010). We do not claim that medical genetics has a complete understanding of complex diseases or that prediction at the individual level is a viable reality yet; however, neither is it accurate to summarily dismiss what has been accomplished so far.
The Current “Paradigm” in Genetics
CE argue that “CGA studies of complex behavioral traits (such as political behaviors) rely on an outdated genetic paradigm” (2). However, the journal Behavior Genetics recently updated its guidelines on the publication of candidate gene studies (Hewitt Reference Hu2012) to reflect the latest research in genetics, and it continues to publish CGA studies. Moreover, these studies are still viewed as beneficial, even by critics. The authors of a recent highly critical review of candidate gene studies write that they “may well prove to be important or even central for understanding the etiology of psychiatric disorders. At issue is how to separate the wheat from the chaff” (Duncan and Keller Reference Duncan and Keller2011,1047–48) Thus the “paradigm” is not at issue—it is the methodology that proves challenging. The solution is not to abandon candidate gene studies, but rather to implement guidelines for their publication that reduce the likelihood of false positives.
Gene Transcription and Stability
It is important to understand that what takes place at the molecular level is complex. However, CE's description of this complexity is highly misleading. For example, they write, “Once considered the paragon of stability, DNA is subject to all manner of transformation. For example, retrotransposons or “jumping genes” comprise 45% of the human genome, move about the genome by a copy-and-paste mechanism changing DNA content and structure” (10–11). Yet consider that the rate of mutation per generation per site in human DNA is estimated to be about 1 in 100 million (Lynch Reference Mackay2010). This means that all but a few dozen of the 3 billion base pairs in an individual's DNA will be exactly the same throughout our reproductive lifetimes. Thus, by and large, the genes we are born with are the genes we will die with. It is true that the environment plays a critical role in gene expression, myelination, and other biological processes that influence the way genes affect us, but it is extremely disingenuous to argue that these measures of our biological inheritance are unstable.
CE further assert, “Genes do not regulate the extent to which they are capable of being transcribed in any obvious, unidirectional manner” (10). This claim is directly contradicted by evidence from more than 5,000 genes that shows that transcription explains nearly 40% of the variation in levels of expression in mammals (Schwanhäusser et al. Reference Settle, Dawes and Christakis2011). Although this evidence also suggests an important role for the environment, it shows that heritable DNA sequence variation does have an important impact on expression. In particular, a recent review of the literature on the 5HTTLPR polymorphism shows that the transcriptional efficiency of the “long” version of this genotype is associated with significantly higher expression than the shorter allele in both mice and humans (Murphy and Lesch Reference Norton and Han2008). Causal evidence of the gene's effect on expression is especially strong in mouse studies, because many of them “knock out” (delete) or modify the gene in question (Murphy and Lesch Reference Norton and Han2008).
This does not mean that gene-gene and gene-environment effects on transcription are absent, and recent research is exploring a variety of factors that modify gene expression (Cheung and Spielman Reference Cheung and Spielman2009), including cell type (Dimas et al. Reference Dimas2009). However, it is incorrect to imply that it is impossible to detect a relationship between genotypes and transcription levels. Given that we do not know what many genes do, the best place to start is to study genotypes that we know have a measurable effect on the molecules they transcribe.
Pleiotropy
CE list many phenotypes that have been associated with variation in 5HTT and MAOA and ask, “How is it possible that the same polymorphisms of the same gene could simultaneously predict (or be risk factors for) so many different phenotypes?” (13). Genes playing a role in important hormonal systems that regulate social behavior are likely to influence a wide variety of outcomes, a phenomenon known as “pleiotropy” (Lande Reference Littvay1980). Depression and personality factors are two possible mediating mechanisms, but there may be many others.
CE question the idea of an underlying factor that may cause several of the phenotypic outcomes (an “endophenotype”). However, they neglect to point out that systems such as the serotonin system influence a wide variety of behaviors via different mechanisms (Murphy and Lesch Reference Norton and Han2008). This means that a single genotype may influence several endophenotypes, each of which is associated with particular phenotypes; there is strong evidence that this is true for 5HTT. Knock-out studies in mice where the 5HTT gene is completely removed show that a large number of phenotypes are affected, and the results are so well accepted that pleiotropy in 5HTT has been proposed as a reason for evolutionary conservation of diversity in this gene (Murphy and Lesch Reference Norton and Han2008).
Consider a genotype we have already been studying for decades: sex. A wide range of political phenomena are influenced by a person's sex (Hatemi et al. Reference Hatemi, Medland and Eaves2012), and although we do not claim that these effects are all purely genetic, neither do we doubt a reported association between sex and one political behavior merely because it also predicts other political behaviors. Sex can influence specific hormonal responses, but it can also induce gender-specific roles in the self and others (Hatemi et al. Reference Hatemi, Medland and Eaves2012). Each of these could be considered an endophenotype with many phenotypic consequences.
Another important consideration is that many of the phenotypes listed by CE have not been replicated, so it may be premature to ask what such different phenotypes have in common with one another. For example, our replication efforts here suggest that voting should be removed from the list of those phenotypes associated with MAOA. At the same time, however, our confidence that 5HTTPLR plays a role is now increased by successful replication, so (for now) it belongs on the list.
CE continue: “Consider that the polymorphisms of MAOA-μVNTR and 5-HTTLPR are the only polymorphisms for these two genes for which there are data in a large data set such as Add Health and that MAOA and 5-HTT are two out of the eight genes in total (emphasis in original) for which there are genotypic data. What is the likelihood that these same polymorphisms on these same genes will conveniently turn out to be the genetic key to so much human behavior?” (13). This question is misleading in two ways. First, we never called these genes “the key to human behavior.” Consider what we actually wrote:
It is important to emphasize that there is likely no single “voting gene”— the results presented here suggest that at least two genes do matter and there is some (likely large) set of genes whose expression, in combination with environmental factors, influences political participation. (Fowler and Dawes Reference Fowler and Dawes2008, 590)
Second, the investigators of the National Longitudinal Study of Adolescent Health did not choose their genes randomly—they focused on genes that had already shown promise in explaining social behavior and that were linked to behavioral and psychiatric outcomes. According to the study researchers,
Saliva samples were collected from full siblings or twins to genotype DNA for seven candidate polymorphisms. These candidate genes have been reported to be associated with individual differences in behavior related to mental health; are reported to be functional, exonic, in promoter regions, or affect gene expression; are expressed in the brain; and have prima facie involvement in neurotransmission. (Smolen and Hewitt Reference Smolen2003, 42–43)
It should thus not be surprising that biomarkers for the genes they chose have an effect on many social behaviors, and we would have a better chance of finding more than one association among them.
Stochastic vs. Deterministic Outcomes
CE write, “The problem is that a large set of genes (and if large enough, then we are simply talking about the human genome and hence the human organism), the transcriptional activity of which is influenced by the environment and each others’ functional products, is incompatible with the expectation that two genes could predict voter turnout” (30). This is a disingenuous critique. We clearly acknowledged that there are multiple genetic and environmental causal factors that underlie turnout: “There is some (likely large) set of genes whose expression, in combination with environmental factors, influences political participation” (Fowler and Dawes Reference Fowler and Dawes2008, 590).
Our usage of the word “predict” in our original article was meant to convey that there is a nonrandom, systematic relationship between at least two (and probably very many more) genes and voter turnout. In this article we show that one of these relationships replicates and one does not. We therefore conclude that one of these genes “predicts” turnout, even though we in no way believe and have never argued that it is the only such gene. We were very careful in 2008 to make this clear: “[I]t is important to emphasize that there is likely no single ‘voting gene’” (Fowler and Dawes Reference Fowler and Dawes2008, 590) In fact, we deliberately (emphasis added) chose to publish associations with two genes in one paper (rather than two separate papers) to emphasize this point.
CE go on to argue that “CGA studies depend on the assumption that the presence of a particular allele entails that it is turned on. . .. We can no longer assume, however, that the presence of a particular allele entails that it is capable of being transcribed in the manner associated with that allele, because it may be epigenetically silenced” (10). Again, this argument demonstrates a profound misunderstanding of scientific inference. Although it is true that gene expression is variable and it can be sensitive to the environment and other genotypes, this does not mean that the gene plays no role in influencing a particular behavior. Nor does it preclude studies that take this stochasticity into account by testing the hypothesis with many observations and under many conditions.
Polygenic Effects
CE emphasize the multifactorial nature of genetic effects, writing that “most human traits with a genetic component are influenced by a vast number of genes of small effect” (11) and that “[r]esearchers who conduct CGA studies err when they suppose that multifactorial traits exhibit a genotype-phenotype relationship analogous to that of monogenic disorders” (12). They write this in such a way that it implies that we are not aware of these complexities. Yet, note what we wrote four years ago: “[T]he vast majority of phenotypes are ‘polygenic,’ meaning they are influenced by multiple genes (Mackay Reference Martin2001; Plomin Reference Price2008) and are shaped by a multitude of environmental forces” (Fowler and Dawes Reference Fowler and Dawes2008, 581).
CE conclude that the existence of polygenic effects “calls into question the underlying assumption of CGA studies” (11) and makes genetic studies “particularly problematic” (11). In stark contrast, we argued as follows:
[S]imple association models between genotype and phenotype are an important first step to establish candidate genes, but they are not the end of the story. It is also important to investigate the extent to which genetic associations are moderated by environmental factors (“environmental modifiers”) and other genes (“genetic modifiers”). (Fowler and Dawes Reference Fowler and Dawes2008, 581)
At the heart of this disagreement is a fundamental difference in the way we and CE approach scientific inquiry. CE claim that because the system we are studying is complex, it is pointless to examine any of its parts. But this philosophy would preclude nearly all of social science. Social systems are inherently complex, and observational and experimental studies attempt to reduce complexity by isolating parts of these systems to determine which factors and processes are most important for a given outcome. The hope is that these factors can then be integrated into a more holistic theory at the system level.
To be clear, we are sympathetic to systems-level approaches. In other work, we advocate for a better understanding of interdependence and its effect on a number of political phenomena (e.g., Christakis and Fowler Reference Christakis and Fowler2009). However, we would never make the argument that macro-level approaches are a substitute for micro-level research. The truth is that we need both.
BENEFITS OF GENOPOLITICS RESEARCH
There are a number of interesting ways that genopolitics research may contribute to existing literatures in political science. One potentially beneficial application is to use genetic information as a control variable in standard nongenetic studies (Beauchamp et al. Reference Beauchamp, Cesarini, Johannesson, van der Loos, Koellinger, Groenen, Fowler, Rosenquist, Thurik and Christakis2011). At least part of the variation in political attitudes and behaviors has been demonstrated to be due to genetic variation. Standard models relegate this variation to the residual, and thus accounting for it may allow us to more precisely estimate the effects of other factors of interest.
Another possibility, already being explored by economists (Norton and Han Reference Olsen2008), is to use genetic variants as instrumental variables. The idea is to use the fact that genotypes are randomly passed from parents to offspring during meiosis as a type of natural experiment. In other words, genetic data could be used to construct an instrumental variable allowing causal inference to be based on observational data. However, this approach faces many technical challenges, including credibly meeting the exclusion restriction due to the possibility of pleiotropy (Benjamin et al. Reference Benjamin2012b).
Yet another innovation that could potentially come out of genopolitics research is the ability to predict political traits using genetic data (Benjamin et al. Reference Benjamin2012a). A major goal of medical genetics is to diagnose and treat diseases before they can otherwise be detected. However, Benjamin et al. (Reference Benjamin2012b) suggest that our ability to predict political behaviors and attitudes out of sample is presently limited and will ultimately require much larger samples.
In addition to these potential applications to existing literatures, genopolitics has already helped spur inquiries into a number of phenomena on the causal pathway from genes and environment to political outcomes. For example, the study of personality was recently revived by political scientists who wondered why genes might play a role in political behavior:
Five-factor models of personality trait structure have thrived in psychology for two decades, and rigorous attention to these models offers an obvious complement to recent developments in the exploration of links between politics and genetics. Our concern with this situation has motivated the present investigation. (Mondak et al. Reference Murphy and Lesch2010, 20)
The study of neuropolitics has been similarly influenced: “The new science of human nature demands that we recognize that genes are the institutions of the human body. They regulate the neurological processes that drive social and political behavior” (Fowler and Schreiber Reference Fowler and Schreiber2008, 914). The new field of political physiology specifically seeks to understand the physiological mechanisms that genes influence: “Indeed, given that political and social attitudes are heritable and that amygdala activity also has been traced to genetics, genetic variation relevant to amygdala activity could affect both physiological responses to threat and political attitudes bearing on threats to the social order” (Oxley et al. Reference Pearson and Manolio2008, 1669).
Thus, genetic studies of political behavior are not only beneficial in their own right. They may help integrate the social sciences with one another and may bring us closer to a “consilient” unification with the natural sciences (Wilson Reference Xu and Shete1998) that will allow us to better understand not only politics but also what it means to be human.
RECOMMENDATIONS FOR FUTURE WORK
Many of the questions raised by CE regard matters of empirical analysis in general, rather than genetics in particular. Our 2008 candidate gene study faced the same kinds of challenges that nearly all political science studies face. However, our replication exercise speaks to an incredibly important concern that particularly affects genetic association studies—that of power.
As Benjamin et al. (Reference Benjamin2012b) recently demonstrated, political traits have a polygenic architecture, meaning that the heritable variation in these traits is explained by many genes with individually small effects. This implies that very large samples are necessary to reliably detect these effects and that underpowered CGA studies run the risk of producing false positives (Benjamin et al. Reference Benjamin2012a). This power issue is even more critical when conducting a genome-wide study based on hundreds of thousands of genetic variants because of the need for required corrections for multiple testing.
The typical approach now in behavior genetics, as well as in most good science, is to search for independent sources of information by which to evaluate hypotheses. In fact, this is the new policy at the journal Behavior Genetics: “To avoid publishing findings that will not replicate, we recommend that authors conduct a direct replication analysis, prior to publication, such that the same predictor(s), outcome variable, and statistical model are tested in an independent sample” (Hewitt Reference Hu2012, 1).
A recent critical review of candidate gene-environment interaction studies suggests that “direct” replication is the best way to avoid false positives (Duncan and Keller Reference Duncan and Keller2011). This means that the phenotype, the environmental modifier, and the genetic marker should all be measured in the same way. The concern is that authors might report “indirect” replications after they discovered that the direct replication did not produce a confirmation of the original result. We recommend that the American Political Science Review and other political science journals adopt the same policy as Behavior Genetics (Hewitt Reference Hu2012). This will allow political scientists and geneticists to continue to develop the field of genopolitics while minimizing the likelihood of false discovery.
CONCLUSION
The publication of Alford, Funk, and Hibbing's (Reference Alford, Funk and Hibbing2005) seminal article in the American Political Science Review on the genetic basis of political attitudes came as something of a shock to many scholars in political science. At the time, the vast majority of the discipline had little exposure to the natural sciences. Meanwhile, many in the behavior genetics community wondered what all the fuss was about, because the original analyses of the same dataset had been published nearly two decades earlier (Martin et al. Reference McDermott1986). But political scientists have been rapidly learning the new methods in genetics that have been developed in the past few years (Cranmer and Dawes Reference Cranmer and Dawes2012; Dawes and Fowler Reference Dawes and Fowler2009; Hatemi, Funk, et al. Reference Hatemi, Alford, Hibbing, Martin and Eaves2009; Littvay Reference Loewen and Dawes2012; Loewen and Dawes Reference Lynch2012; McDermott and Hatemi Reference Medland and Hatemi2011; Settle et al. Reference Settle, Dawes and Fowler.2010; Settle, Dawes, and Fowler Reference Settle, Dawes and Fowler.2009; Smith and Hatemi Reference Smolen and Hewitt2012; Stam, Von Hagen-Jamar, and Worthington Reference Stam, Von Hagen-Jamar and Worthington2012; Verhulst Reference Verhulst and Estabrook2012; Weber, Johnson, and Arceneaux Reference Weber and Johnson2011), and we are now working directly with geneticists on a variety of political outcomes and behaviors (Arceneaux, Johnson, and Maes Reference Arceneaux, Johnson and Maes2012; Benjamin et al. Reference Benjamin2012b; Eaves and Hatemi Reference Eaves and Hatemi2008; Fowler, Baker, and Dawes Reference Fowler, Baker and Dawes2008; Hatemi et al. Reference Heinz2007; Reference Hatemi and Hibbing2010; Reference Hatemi, McDermott, Bailey and Martin2012; Hatemi, Alford, et al. Reference Hatemi, Funk, Medland, Maes, Martin and Eaves2009; Hatemi, Dawes, et al. Reference Hatemi, Dawes, Frost-Keller, Settle and Verhulst2011; Hatemi, Gillepsie, et al. Reference Hatemi, Dawes, Frost-Keller, Settle and Verhulst2011; Hatemi, Medland, and Eaves Reference Hatemi, Funk, Medland, Maes, Martin and Eaves2009; Klemmensen et al. Reference Klemmensen, Hatemi, Hobolt, Skytthe and Nørgaard2011; McDermott et al. Reference McDermott and Hatemi2009; Medland and Hatemi Reference Meyer-Lindenberg2008; Smith et al. Reference Smith and Hatemi2012; Verhulst and Estabrook Reference Verhulst2012; Verhulst, Hatemi, Eaves Reference Verhulst2012a; Reference Verhulst and Estabrook2012b; Verhulst, Hatemi, and Martin Reference Verhulst, Hatemi and Martin2010). As a result, we have attained new literacy in methods including twin studies, candidate gene association studies, and genome-wide analyses that incorporate hundreds of thousands of genetic markers into our analysis.
It is important to note that we chose the title “In Defense of Genopolitics,” rather than “In Defense of Candidate Gene Studies.” We believe that genopolitics has the potential to make important contributions to the study of politics and also that CGA studies, like any other study, are open to criticism when not executed carefully. The CGA study was the dominant paradigm in medical genetics for a long time, but has recently been supplanted by genome-wide approaches both because of the rapidly falling cost of genotyping and their failure to be replicated (Chabris et al. Reference Chabris2012; Ioannidis Reference Jackson and Carsey2005). This failure may be due to a variety of factors including differences in samples and measures, a lack of power, and a publication bias toward significant results. However, these issues can be addressed by designing studies that are adequately powered and/or draw on independent replication samples. The replication presented here illustrates this point; one of our original results held, whereas another one was likely a false positive.
It is also important to point out that, although CGA studies are hypothesis driven, they are constrained by an imperfect understanding of the pathways linking genes to behavior (Pearson and Manolio Reference Plomin2008). Therefore, scholars should be careful not to put too much weight on any given hypothesis based on a single finding. Significant associations are the beginning rather than the end of a long process of inquiry.
In conclusion, we think the future is bright for genopolitics research. Rather than taking an approach that assumes the effect of genes on behavior is hopelessly complicated and too hard to understand, we prefer to conduct open empirical inquiries that lay out the advantages and disadvantages of the methods we use to measure the world. We invite scholars to come forward with their own data and analyses, and together we can build up our theoretical and empirical understanding of the role biology plays in politics.
Comments
No Comments have been published for this article.