Dissolving the Missing Heritability Problem

Pierrick Bourrat; Qiaoying Lu

doi:10.1086/694007

Dissolving the Missing Heritability Problem

Published online by Cambridge University Press: 01 January 2022

Pierrick Bourrat and

Qiaoying Lu

Article contents

Abstract
Introduction
Heritability in Traditional Quantitative Methods
Heritability in GWAS
Dissolving the Missing Heritability Problem
Conclusion
Footnotes
References

Rights & Permissions

Abstract

Heritability estimates obtained from genome-wide association studies (GWAS) are much lower than those of traditional quantitative methods. This phenomenon has been called the “missing heritability problem.” By analyzing and comparing GWAS and traditional quantitative methods, we first show that the estimates obtained from the latter involve some terms other than additive genetic variance, while the estimates from the former do not. Second, GWAS, when used to estimate heritability, do not take into account additive epigenetic factors transmitted across generations, while traditional quantitative methods do. Given these two points we show that the missing heritability problem can largely be dissolved.

Type: Biology
Information: Philosophy of Science , Volume 84 , Issue 5 , December 2017 , pp. 1055 - 1067

DOI: https://doi.org/10.1086/694007 [Opens in a new window]
Copyright: Copyright © The Philosophy of Science Association

1. Introduction

One pervasive problem encountered when estimating the heritability of quantitative traits is that the estimates obtained from genome-wide association studies (GWAS) are much smaller than those calculated by traditional quantitative methods. This problem has been called the missing heritability problem (Turkheimer Reference Turkheimer2011). Take human height for example. Traditional quantitative methods deliver a heritability estimate of about 0.8, while the first estimates using GWAS were 0.05 (Maher Reference Maher2008). More recent GWAS methods have revised this number and estimate the heritability of height to be 0.45 (Yang et al. Reference Yang2010; Turkheimer Reference Turkheimer2011).Footnote ¹ Yet, compared to traditional quantitative methods, half of the heritability is still missing.

In quantitative genetics, heritability is defined as the portion of phenotypic variance in a population that is due to genetic difference (Falconer and Mackay Reference Falconer and Mackay1996; Downes Reference Downes and Zalta2015; Lynch and Bourrat Reference Lynch and Bourrat2017). Traditionally, this portion is estimated by measuring the phenotypic resemblance of genetically related individuals without identifying genes at the molecular level (more particularly DNA sequences). GWAS have been developed in order to locate the DNA sequences that influence the target trait and estimate their effects, especially for common complex diseases such as obesity, diabetes, and heart disease (Frazer et al. Reference Frazer, Murray, Schork and Topol2009; Visscher et al. Reference Visscher, Brown, McCarthy and Yang2012). Almost 300,000 common DNA variants associated with height in human populations have been identified by GWAS (Yang et al. Reference Yang2010). Although many grant that the heritability estimates obtained by traditional quantitative methods are quite reliable, the methods used in GWAS have been questioned (Eichler et al. Reference Eichler, Flint, Gibson, Kong, Leal, Moore and Nadeau2010).

A number of partial solutions to the missing heritability problem have been proposed, with most of them focusing on improving the methodological aspects of GWAS in order to provide a more accurate estimate (e.g., Manolio et al. Reference Manolio, Collins, Cox, Goldstein, Hindorff, Hunter, McCarthy, Ramos, Cardon and Chakravarti2009; Eichler et al. Reference Eichler, Flint, Gibson, Kong, Leal, Moore and Nadeau2010). Some authors have also suggested that heritable epigenetic factors might account for part of the missing heritability. For instance, in Eichler et al. (2010, 448), Kong notes that “epigenetic effects beyond imprinting that are sequence independent and that might be environmentally induced but can be transmitted for one or more generations could contribute to missing heritability.” Furrow, Christiansen, and Feldman (Reference Furrow, Christiansen and Feldman2011, 1377) also claim that “epigenetic variation, inherited both directly and through shared environmental effects, may make a key contribution to the missing heritability.” Others have made the same point (e.g., Johannes, Colot, and Jansen Reference Johannes, Colot and Jansen2008; McCarthy and Hirschhorn Reference McCarthy and Hirschhorn2008). Yet, in the face of this idea one might notice what appears to be a contradiction: how can epigenetic factors account for the missing heritability, if the heritability is about genes?

To answer this question as well as to analyze the missing heritability problem, we compare the assumptions underlying both heritability estimates in traditional quantitative methods and those in GWAS. We make two points. First, traditional methods typically overestimate heritability (narrow-sense heritability, h ²) because these estimates do not successfully isolate the additive genetic component of phenotypic variance, which is part of the definition of h ² (see sec. 2), from the nonadditive genetic and nongenetic ones and the potential effects of assortative mating. Second, the concept of the gene used in the definition of h ² is an evolutionary one, and it differs from the one used in GWAS, which is DNA centered. This means that the heritability estimates obtained from traditional methods can include heritability due to heritable epigenetic factors (which can be regarded as evolutionary genes), while the effects from these factors are not included in the estimates obtained from GWAS. With these two points taken into account, we expect the missing heritability problem to be largely dissolved as well as setting the stage for further discussions.

The reminder of the article will be divided into three parts. First, we briefly introduce two ways in which heritability is estimated in traditional methods, namely, twin studies and parent-offspring regression. We show that the estimates obtained by each way include some nonadditive or nongenetic elements (or both) and consequently overestimate h ². Second, we outline the basic rationale underlying GWAS and illustrate that they estimate heritability by considering solely DNA variants. By arguing that the notion of additive genetic variance used in traditional methods does not necessarily refer to DNA sequences but can also refer to epigenetic factors, we show that the notion of heritability estimated in GWAS is more restrictive than h ². Finally, in section 4, on the basis of the conclusions from sections 2 and 3, we show that the missing heritability problem can be partly dissolved in two ways. One is that if nonadditive and nongenetic variance was removed from the estimates obtained via traditional methods, these estimates would be lower. The other is that if additive epigenetic factors were taken into account by GWAS, the heritability estimates obtained would be higher. We conclude section 4 by demonstrating how our analysis sheds some light on a discussion about the role played by nonadditive factors in the missing heritability problem. Because human height has been “the poster child” of the missing heritability problem (Turkheimer Reference Turkheimer2011, 232), we will use it to illustrate each of our points.

2. Heritability in Traditional Quantitative Methods

Although there exist different definitions of heritability (Jacquard Reference Jacquard1983; Downes Reference Downes2009; Bourrat Reference Bourrat2015), according to the standard model of quantitative genetics, the phenotypic variance (V_P) of a population can be explained by two components, its genotypic variance (V_G) and its environmental variance (V_E). In the absence of gene-environment interaction and correlation, we have

(1)

\begin{matrix} V_{P} = V_{G} + V_{E} . \end{matrix}

From there, broad-sense heritability (H ²) is defined as

(2)

\begin{matrix} H^{2} = \frac{V_{G}}{V_{P}}, \end{matrix}

where V_G can further be portioned into the additive genetic variance (V_A), the dominance genetic variance (V_D), and the epistasis genetic variance (V_I). Thus, equation (1) can be rewritten as

(3)

\begin{matrix} V_{P} = V_{A} + V_{D} + V_{I} + V_{E}, \end{matrix}

where V_A is the variance due to alleles being transmitted from the parents to the offspring that contribute to the phenotype, V_D is the variance due to interactions between alleles at one locus for diploid organisms, and V_I is the variance due to interactions between alleles from different loci. Symbols V_D and V_I together represent the variance due to particular combinations of genes of an organism.

Because genotypes of sexual organisms recombine at each generation via reproduction, the effects of combinations of genes, namely, dominance and epistasis effects (measured respectively by V_D and V_I) are not transmitted across generations; only the effects of the genes independent from their genetic background (measured by V_A) are. By taking only V_A into account, narrow-sense heritability (h ²) that “expresses the extent to which phenotypes are determined by the genes transmitted from the parents” (Falconer and Mackay Reference Falconer and Mackay1996, 123) is defined as

(4)

\begin{matrix} h^{2} = \frac{V_{A}}{V_{P}}, \end{matrix}

where h ² is used in breeding studies and by evolutionary theorists who are interested in making evolutionary projections, while broad-sense heritability (H ²) is used most by behavioral geneticists and psychologists (Downes Reference Downes and Zalta2015).

Following equation (4), to know h ², both V_A and V_P should be known. For most quantitative traits (including height), V_P can be directly obtained by measuring phenotypes of individuals. However, traditional quantitative methods do not permit one to obtain V_A directly. It is classically obtained by deduction. This deduction is based on two types of information. First, one needs one or several population-level measures of a phenotypic resemblance of family relative pairs.Footnote ² These measures are obtained by calculating the covariance of the phenotypic values for those pairs. Second, one needs the genetic relation between family pairs. It indicates the percentage of genetic materials the pairs are expected to share. With these two pieces of information, assuming a large population with no interaction and correlation between some of the genetic and environmental components, one can estimate how much the genes shared by the two relatives (estimated by V_A) contribute to the phenotypic resemblance. From there, knowing V_P and having an estimate of V_A permits one to estimate h ².

As mentioned above, for simplicity, traditional quantitative methods usually assume that there is neither gene-environment interaction nor correlation (Falconer and Mackay Reference Falconer and Mackay1996, 131). In such cases, the covariance between the phenotypic values (e.g., height) of pairs equals the additive genetic covariance, dominant and epistasis genetic covariance, plus the environmental covariance. Formally, this covariance for the general case can be written as follows:

(5)

\begin{matrix} Cov (P_{1}, P_{2}) = Cov (A_{1} + D_{1} + I_{1} + E_{1}, A_{2} + D_{2} + I_{2} + E_{2}) = Cov (A_{1}, A_{2}) + Cov (D_{1}, D_{2}) + Cov (I_{1}, I_{2}) + Cov (E_{1}, E_{2}), \end{matrix}

where Cov(P ₁, P ₂) is the covariance between the phenotypic values of one individual with the other, with indexes 1 and 2 representing the two family members for each pair studied. Symbols A, D, I, and E represent additive effects, dominant effects, epistasis effects, and environmental effects, respectively.

The most common pairs of relatives used for estimating heritability are twins (both monozygotic and dizygotic). In twin studies, one already knows that monozygotic twins share almost 100% of their genetic material, while dizygotic twins share about 50%. The environment is typically divided into two parts: one that affects both twins in the same way (the shared environment, C) and the other that affects one twin but not the other (the unique environment, U; Silventoinen et al. Reference Silventoinen, Sammalisto, Perola, Boomsma, Cornes, Davis, Dunkel, Lange, Harris and Hjelmborg2003). In the absence of interaction and correlation between C and U, we have

(6)

\begin{matrix} E = C + U . \end{matrix}

Assuming epistasis effects to be negligible (a common assumption in twin studies), by inserting equation (6) into equation (5) in the case of twins, we have

(7)

\begin{matrix} Cov (P_{T 1}, P_{T 2}) = Cov (A_{T 1} + D_{T 1} + C_{T 1} + U_{T 1}, A_{T 2} + D_{T 2} + C_{T 2} + U_{T 2}) = Cov (A_{T 1}, A_{T 2}) + Cov (D_{T 1}, D_{T 2}) + Cov (C_{T 1}, C_{T 2}) + Cov (U_{T 1}, U_{T 2}), \end{matrix}

where Cov(P _T1, P _T2) is the covariance between the phenotypic values of one twin with the other, with indexes T1 and T2 representing the two twins for each twin pair studied.

Because each twin’s unique environment is, by definition, independent of that of the other twin, Cov(U _T1, U _T2) is nil for both monozygotic and dizygotic twins. Given that variance is a special case of covariance in which the two variables are identical, and that for monozygotic twins A _T1, D _T1, and C _T1 equal A _T2, D _T2, and C _T2 respectively, we can reformulate equation (7) as follows:

(8)

\begin{matrix} {Cov}_{MT} (P_{T 1}, P_{T 2}) = V_{A} + V_{D} + V_{C}, \end{matrix}

where Cov_MT(P _T1, P _T2) is the covariance between the phenotypic values of monozygotic twin (MT) pairs studied.

By contrast, dizygotic twins (DT) are expected to share half of their genes, which means that the covariance between the phenotypic values of one twin with the other, Cov_DT(P _T1, P _T2), is expected to be equal to half of the additive genetic variance, a quarter of dominant variance,Footnote ³ and all of the shared environmental variance (with Cov(U _T1, U _T2) equal to zero). In this case, we can rewrite equation (7) as

(9)

\begin{matrix} {Cov}_{DT} (P_{T 1}, P_{T 2}) = \frac{1}{2} V_{A} + \frac{1}{4} V_{D} + V_{C} . \end{matrix}

It is classically assumed that, for both monozygotic and dizygotic twin pairs, the shared environment acts in the same way if the pair has been reared together.Footnote ⁴ That is to say, the term V_C in equations (8) and (9) is the same. Hence, it can be canceled by subtracting equation (9) from equation (8). Heritability can then be estimated as follows:

(10)

\begin{matrix} \hat{h_{TS}^{2}} = \frac{2 {{Cov}_{MT} (P_{T 1}, P_{T 2}) - {Cov}_{DT} (P_{T 1}, P_{T 2})}}{V_{P}} = \frac{V_{A}}{V_{P}} + \frac{(3 / 2) V_{D}}{V_{P}} . \end{matrix}

We label the heritability estimate obtained from twin studies $\hat{h_{TS}^{2}}$ , with “ˆ” symbolizing an estimate. It should be noted that this provides an accurate estimate of neither H ² nor h ², although it is a better estimate of H ² than of h ² (Falconer and Mackay Reference Falconer and Mackay1996, 172).

Another often used traditional quantitative method to estimate heritability is a parent-offspring regression (Falconer and Mackay Reference Falconer and Mackay1996, 164). This method also assumes neither gene-environment interaction nor correlation. Following these assumptions, we can deduce that the covariance between the height of parents (one or the mean of both, but we will use the case with one parent here) and the mean of their offspring is equal to the additive genetic covariance, dominant covariance (the epistasis covariance is assumed to be small and is not included), plus environmental covariance between the heights of parents and offspring. Formally, in this case, we can write equation (5) as follows:

(11)

\begin{matrix} Cov (P_{P}, P_{O}) = Cov (A_{P} + D_{P} + I_{P} + E_{P}, A_{O} + D_{O} + I_{O} + E_{O}) = Cov (A_{P}, A_{O}) + Cov (D_{P}, D_{O}) + Cov (E_{P}, E_{O}), \end{matrix}

where indexes P and O represent “parents” and “offspring.”

Three further assumptions are then made. The first one is that parents are not related, and consequently no dominant effects are transmitted from the parents to the offspring (Doolittle Reference Doolittle2012, 178), which means that Cov(D_P, D_O) is nil. The second one is that there is no correlation between the parents’ environment and the offspring’s environment, so that Cov(E_P, E_O) is also nil. Finally, the third assumption is that there is no assortative mating between parents. Given that, on average, parents are expected to share 50% of their genes with their offspring (parents and offspring share half of their genes), it leaves equation (11) with a result of half of the additive genetic variance ( $(1 / 2) V_{A}$ ). Given V_P, since by definition the slope of the regression of average offspring phenotype on parent phenotype is equal to $Cov (P_{P}, P_{O}) / V_{P}$ , which is equal to $(1 / 2) V_{A} / V_{P}$ , h ² can be estimated by doubling the value of this slope.

But the above three assumptions might be violated. First, there is evidence of inbreeding in human populations (Bittles and Black Reference Bittles and Black2010). Without correcting for inbreeding, Cov(D_P, D_O) is likely to be non-nil. Second, because the environments experienced by individuals are likely to be more similar within a family line, Cov(E_P, E_O) might not be nil either. Finally, there is evidence of assortative mating in human populations (Guo et al. Reference Guo, Wang, Liu and Randall2014). The consequences of assortative mating for estimating heritability are complex. That said, in the case of one parent-offspring regression, when the population is at equilibrium, one effect of assortative mating is the overestimation of the value of V_A. If we take these three factors into consideration, the covariance between the parents and their offspring is equal to half of the additive genetic variance, plus a term representing some effects due to dominance, similarities between environments, and assortative mating. This can be written formally as

(12)

\begin{matrix} Cov (P_{P}, P_{O}) = \frac{1}{2} V_{A} + ε, \end{matrix}

where ε represents the sum of covariance due to some nonadditive genetic factors, environmental factors, and assortative mating.

Heritability, if estimated by performing a parent-offspring regression and doubling its slope, will thus capture the numerator as $V_{A} + 2 ε$ rather than solely V_A. Formally we will have

(13)

\begin{matrix} \hat{h_{POR}^{2}} = \frac{2 Cov (P_{P}, P_{O})}{V_{P}} = \frac{V_{A}}{V_{P}} + \frac{2 ε}{V_{P}} . \end{matrix}

In light of the equations presented both in twin studies and while performing parent-offspring regressions, we can conclude that heritability estimates obtained by these methods will generally overestimate h ², such that

(14)

\begin{matrix} \hat{h_{TM}^{2}} = h^{2} + o, \end{matrix}

where the index TM is for “traditional methods,” and o is the part of the estimate contributed by the component(s) other than the ratio of additive genetic variance on phenotypic variance. In the next section, we analyze the main method used in GWAS.

3. Heritability in GWAS

Although any two unrelated individuals share about 99.5% of their DNA sequences, their genomes differ at specific nucleotide locations (Aguiar and Istrail Reference Aguiar and Istrail2013). Given two DNA fragments at the same locus of two individuals, if these fragments differ at a single nucleotide, they represent two variants of a single nucleotide polymorphism (SNP). GWAS focus on SNPs across the whole genome that occur in the population, with a probability larger than 1% referred to as “common SNPs.” If one variant of a common SNP, compared to another, is associated with a significant change on the trait studied, then this SNP is a marker for a DNA region (or a gene) that leads to phenotypic variation.

The development of commercial SNP chips makes it possible to rapidly detect common SNPs of DNA samples from all the participants involved in a study. On the basis of the readings of SNP chips and by using a series of statistical tests, it can be investigated at the population level whether each SNP associates with the target trait. For quantitative traits like height, the test reveals whether the mean height of a group with one variant of a SNP is significantly different from the group with another variant of the same SNP (Bush and Moore Reference Bush and Moore2012).Footnote ⁵ With all the SNPs associated with differences in phenotype being identified, data from the HapMap project are then used. The HapMap project provides a list of SNPs that are markers for most of the common DNA variants in human populations (International HapMap 3 Consortium 2010), which permits one to identify the exact genomic regions for each SNP. With genetic studies examining those regions, it can then be determined whether the variant of the SNPs associated with a statistically significant difference in height do cause phenotypic variations. These variants are called “causal variants” (Visscher et al. Reference Visscher, Brown, McCarthy and Yang2012).

Other statistical tests combined with the ones mentioned above, of which the details would greatly exceed the scope of the article, can also be used to estimate the effects of SNPs that associated with height, so that the portion of the variance in height explained by these SNPs can be calculated (e.g., Weedon et al. Reference Weedon, Lango, Lindgren, Wallace, Evans, Mangino, Freathy, Perry, Stevens and Hall2008). This portion thus represents the variance contributed by the causal variants. Since biologists classically regard genes as only made up of pieces of DNA, it is assumed that this variance should represent the additive genetic variance. With this assumption, and the ratio of this variance on total phenotypic variance in the population, one can estimate h ² (Visscher et al. Reference Visscher, Medland, Ferreira, Morley, Zhu, Cornes, Montgomery and Martin2006; Yang et al. Reference Yang2010). However, the claim that additive genetic effects are solely based on DNA sequences is problematic when faced with the evidence of epigenetic inheritance.

As was mentioned in section 2, traditional quantitative methods for estimating heritability are based on measuring phenotypic values and genetic relations without reaching the molecular level. The genes are not defined physically but functionally as heritable difference makers (Lu and Bourrat Reference Lu and Bourrat2017). In other words, they are theoretical units defined by their effects on the phenotype (Griffiths and Neumann-Held Reference Griffiths and Neumann-Held1999, 661; Griffiths and Stotz Reference Griffiths and Stotz2013, 35). With the discovery of DNA structure in 1953, it was thought that the originally theoretical genes were found in the physical DNA molecules. Since then, biologists commonly refer to genes as portions of DNA, as do the geneticists performing GWAS. This step was taken too hastily (Lu and Bourrat Reference Lu and Bourrat2017). If there is physical material, other than DNA pieces, that can affect the phenotype and be transmitted stably across generations, then it should also be thought to play the role that contributes to additive genetic effects.

Many studies have provided evidence for epigenetic inheritance, namely, the stable transmission across multiple generations of epigenetic modifications that affect organisms’ traits (e.g., Youngson and Whitelaw Reference Youngson and Whitelaw2008; Dias and Ressler Reference Dias and Ressler2014). A classical example is the methylation pattern on the promoter of the agouti gene in mice (Morgan et al. Reference Morgan, Sutherland, Martin and Whitelaw1999). It shows that mice with the same genotype but different methylation levels display a range of colors of their fur, and the patterns of DNA methylation can be inherited through generations causing heritable phenotypic variations. Epigenetic factors such as self-sustaining loops, chromatin modifications, and three-dimensional structures in the cell can also be transmitted over multiple generations (Jablonka, Lamb, and Zeligowski Reference Jablonka, Lamb and Zeligowski2014). Studies on various species suggest that epigenetic inheritance is likely to be “ubiquitous” (Jablonka and Raz Reference Jablonka and Raz2009).

The increasing evidence of epigenetic inheritance seriously challenges the restriction of the concept of the gene in the evolutionary sense to be materialized only in DNA. Relying on traditional quantitative methods, it is impossible to distinguish whether additive genetic variance is DNA based or based on other material. Some transmissible epigenetic factors, which are neither DNA based nor caused by DNA variation, might de facto be included in the additive genetic variance used to estimate h ². This extension of heritable units also echoes to the recent suggestion that genetic (assuming genes to be DNA based) and nongenetic heredity should be unified in an inclusive inheritance theory (Day and Bonduriansky Reference Day and Bonduriansky2011; Danchin Reference Danchin2013).

To apply the idea that some epigenetic factors can lead to additive genetic effects, the additive variance term in equation (4) should be decomposed into two terms, namely, the additive variance of DNA sequences ( $V_{A_{DNA}}$ ) and the additive variance of epigenetic factors ( $V_{A_{epi}}$ ), assuming there is no interaction between them, so that

(15)

\begin{matrix} V_{A} = V_{A_{DNA}} + V_{A_{epi}} . \end{matrix}

Inserting equation (15) to equation (4) leads to

(16)

\begin{matrix} h^{2} = \frac{V_{A_{DNA}}}{V_{P}} + \frac{V_{A_{epi}}}{V_{P}} . \end{matrix}

Here we label the first term on the right side of equation (16) “DNA-based narrow-sense heritability” ( $h_{DNA}^{2}$ ), and the second term, “epigenetic-based narrow-sense heritability” ( $h_{epi}^{2}$ ). We thus have

(17)

\begin{matrix} h_{DNA}^{2} = h^{2} - h_{epi}^{2} . \end{matrix}

4. Dissolving the Missing Heritability Problem

As was mentioned in introduction, since the first of the successful GWAS was published in 2005 (Klein et al. Reference Klein, Zeiss, Chew, Tsai, Sackler, Haynes, Henning, SanGiovanni, Mane and Mayne2005), there have been many proposals for methodological improvements in GWAS (Manolio et al. Reference Manolio, Collins, Cox, Goldstein, Hindorff, Hunter, McCarthy, Ramos, Cardon and Chakravarti2009; Eichler et al. Reference Eichler, Flint, Gibson, Kong, Leal, Moore and Nadeau2010). Studies have been conducted according to those proposals that permit one to obtain higher heritability estimates. Examples include increasing the sample sizes, which has resulted in more accurate estimates (e.g., Wood et al. Reference Wood, Esko, Yang, Vedantam, Pers, Gustafsson, Chu, Estrada, Luan and Kutalik2014); considering all common SNPs simultaneously instead of one by one, which has increased the heritability estimates of height from 0.05 to 0.45 (see Yang et al. Reference Yang2010); and conducting metaanalyses, which can lead to more accurate results when compared to a single analysis (see Bush and Moore Reference Bush and Moore2012). Biologists have also suggested searching for rare SNPs with frequencies lower than 1% in order to account for a wider range of possible causal variants (Schork et al. Reference Schork, Murray, Frazer and Topol2009).

Apart from these methodological improvements, which would certainly lead to an increase in heritability estimates obtained from GWAS and thus reduce the gap between the estimates obtained from GWAS and traditional quantitative methods, our analysis reveals two other reasons explaining away the missing heritability problem: (a) in traditional quantitative methods heritability is overestimated because the methods used cannot fully isolate the additive genetic variance from other components of variance, and (b) in GWAS, heritability is estimated based on causal DNA variants only, while in traditional quantitative methods the additive effects contributed by epigenetic difference ( $V_{A_{epi}}$ ) are de facto included in the estimates.

These two reasons, as well as the potential methodological flaws, can be expressed formally using the equations presented in sections 2 and 3. Using our terminology, an estimate of the missing heritability ( $\hat{MH}$ ) can be obtained by deducing the heritability estimates obtained by GWAS ( $h_{DNA}^{2}$ ), from estimates obtained by traditional quantitative methods ( $\hat{h_{TM}^{2}}$ ) plus some of the potential methodological flaws in GWAS mentioned above. We thus have

(18)

\begin{matrix} \hat{MH} = \hat{h_{TM}^{2}} - (h_{DNA}^{2} + e), \end{matrix}

with e representing errors coming from methodological flaws in GWAS (we assume no measurement errors otherwise).

Replacing $\hat{h_{TM}^{2}}$ and $h_{DNA}^{2}$ in equation (18) by the right-hand sides of equations (14) and (17), we obtain

(19)

\begin{matrix} \hat{MH} = h^{2} + o - (h^{2} - h_{epi}^{2} + e) = h_{epi}^{2} + o - e . \end{matrix}

This means that the missing heritability, excluding potential methodological flaws in GWAS, results from the part of heritability originating from additive epigenetic factors, plus the overestimation obtained from family studies, in which the additive genetic term cannot be fully isolated from other terms. Those other terms include nonadditive genetic and nongenetic terms and terms coming from assortative mating.

Our illustration of how part of the missing heritability problem can be dissolved by considering nonadditive genetic factors supports the claim that one reason for the existence of missing heritability might be that almost all GWAS to date have focused on additive genetic effects (McCarthy and Hirschhorn Reference McCarthy and Hirschhorn2008). Although there are not enough data to confirm that nonadditive effects do explain away some part of the missing heritability, this claim appears numerous times in discussions on the missing heritability problem (see, e.g., Maher Reference Maher2008; Frazer et al. Reference Frazer, Murray, Schork and Topol2009; Eichler et al. Reference Eichler, Flint, Gibson, Kong, Leal, Moore and Nadeau2010). Yang et al. (Reference Yang2010, 565) disagree with this claim and respond that “non-additive genetic effects do not contribute to the narrow-sense heritability, so explanations based on non-additive effects are not relevant to the problem of missing heritability.”

We agree with Yang et al. (Reference Yang2010) that nonadditive genetic effects do not contribute to h ². That said, because the heritability estimates obtained from traditional quantitative methods do not strictly correspond to h ² but include some terms different from V_A, those factors cannot be dismissed as irrelevant in the missing heritability debate. And indeed, Visscher, Hill, and Wray (Reference Visscher, Hill and Wray2008, 258) have pointed out that assumptions made in traditional methods such as twin studies may deliver a heritability estimate biased upward. Although Visscher et al. (Reference Visscher, Hill and Wray2008) only mention shared environmental effects for the upward bias as an example, we showed in section 2 that nonadditive genetic effects could be another one. More recently, Yang et al. (Reference Yang2015) considered this upward bias as one of three hypotheses regarding the missing heritability problem (Bourrat, Lu, and Jablonka Reference Bourrat, Lu and Jablonka2017).

5. Conclusion

We have explained away the missing heritability problem in two major ways. First, the heritability estimates from traditional quantitative methods are overestimated when compared to the theoretical definition of heritability, namely, h ². The resulting estimates would be smaller if the additive genetic component of phenotypic variance was accurately separated from other terms. Second, the theoretical notion of heritability used in GWAS ( $h_{DNA}^{2}$ ) does not strictly correspond to h ², for it does not include the additive effects of epigenetic factors on phenotype that are indistinguishable from the effects of DNA sequences. Hence, the heritability estimates obtained from GWAS would be superior if those factors were taken into account. We have voluntarily stayed away from the question whether heritability should be defined strictly relative to DNA sequences or whether it should encompass any factors behaving effectively like evolutionary genes. Our inclination is that there is no principled reason to exclude non-DNA transmissible factors from the definitions of heritability, but our analysis does not bear on this choice.

Footnotes

†

PB’s research was supported under the Australian Research Council’s Discovery Projects funding scheme (project DP150102875). QL’s research was supported by a grant from the Ministry of Education of China (13JDZ004). Both authors contributed equally.

1. According to Yang et al. (Reference Yang2015), GWAS may deliver a higher estimate of the heritability of height in the future.

2. Or the mean values of their class (e.g., offspring), depending on the particular method used.

3. For each given gene with two alleles, the possibility that dizygotic twins have the same genotype is one-quarter.

4. Monozygotic twins are often treated more similarly than are dizygotic twins and are more likely to share a placenta. Hence, the shared environments for monozygotic twins are more similar than dizygotic twins. By using adoption twin studies in which environments are random on average, this shared environment difference can be mitigated.

5. For categorical (often binary disease/control) traits, the test used involves measuring an odds ratio, namely, the ratio of the odds of disease for individuals having a specific variant of a SNP, and the odds of disease for individuals with another variant of that SNP. If this odds ratio is significantly different from 1, then that SNP is considered to be associated with the disease (Bush and Moore Reference Bush and Moore2012).

References

Aguiar, Derek, and Istrail, Sorin. 2013. “Haplotype Assembly in Polyploid Genomes and Identical by Descent Shared Tracts.” Bioinformatics 29 (13): i352–i360.CrossRef Google Scholar PubMed

Bittles, Alan H., and Black, M. L. 2010. “Consanguinity, Human Evolution, and Complex Diseases.” Proceedings of the National Academy of Sciences 107 (Suppl.): 1779–86.CrossRef Google Scholar PubMed

Bourrat, Pierrick. 2015. “How to Read ‘Heritability’ in the Recipe Approach to Natural Selection.” British Journal for the Philosophy of Science 66:883–903.CrossRef Google Scholar

Bourrat, Pierrick, Lu, Qiaoying, and Jablonka, Eva. 2017. “Why the Missing Heritability Might Not Be in the DNA.” BioEssays 39 (7).CrossRef Google Scholar

Bush, William S., and Moore, Jason H. 2012. “Genome-Wide Association Studies.” PLoS Computational Biology 8 (12): e1002822.CrossRef Google Scholar PubMed

Danchin, Étienne. 2013. “Avatars of Information: Towards an Inclusive Evolutionary Synthesis.” Trends in Ecology and Evolution 28 (6): 351–58.CrossRef Google Scholar PubMed

Day, Troy, and Bonduriansky, Russell. 2011. “A Unified Approach to the Evolutionary Consequences of Genetic and Nongenetic Inheritance.” American Naturalist 178 (2): E18–E36.CrossRef Google Scholar PubMed

Dias, Brian G., and Ressler, Kerry J. 2014. “Parental Olfactory Experience Influences Behavior and Neural Structure in Subsequent Generations.” Nature Neuroscience 17 (1): 89–96.CrossRef Google Scholar PubMed

Doolittle, Donald P. 2012. Population Genetics: Basic Principles. Vol. 16. Dordrecht: Springer.Google Scholar

Downes, Stephen M. 2009. “Moving Past the Levels of Selection Debates.” Biology and Philosophy 24 (5): 703–9.CrossRef Google Scholar

Downes, Stephen M. 2015. “Heritability.” In Stanford Encyclopedia of Philosophy, ed. Zalta, Edward N. Stanford, CA: Stanford University.Google Scholar

Eichler, Evan E., Flint, Jonathan, Gibson, Greg, Kong, Augustine, Leal, Suzanne M., Moore, Jason H., and Nadeau, Joseph H. 2010. “Missing Heritability and Strategies for Finding the Underlying Causes of Complex Disease.” Nature Reviews Genetics 11 (6): 446–50.CrossRef Google Scholar PubMed

Falconer, Douglas S., and Mackay, Trudy F. C. 1996. Introduction to Quantitative Genetics. 4th ed. Essex: Longman.Google Scholar

Frazer, Kelly A., Murray, Sarah S., Schork, Nicholas J., and Topol, Eric J. 2009. “Human Genetic Variation and Its Contribution to Complex Traits.” Nature Reviews Genetics 10 (4): 241–51.CrossRef Google Scholar PubMed

Furrow, Robert E., Christiansen, Freddy B., and Feldman, Marcus W. 2011. “Environment-Sensitive Epigenetics and the Heritability of Complex Diseases.” Genetics 189 (4): 1377–87.CrossRef Google Scholar PubMed

Griffiths, Paul E., and Neumann-Held, Eva M. 1999. “The Many Faces of the Gene.” BioScience 49 (8): 656–62.CrossRef Google Scholar

Griffiths, Paul E., and Stotz, Karola. 2013. Genetics and Philosophy: An Introduction. Cambridge: Cambridge University Press.CrossRef Google Scholar

Guo, Guang, Wang, Lin, Liu, Hexuan, and Randall, Thomas. 2014. “Genomic Assortative Mating in Marriages in the United States.” PloSOne 9 (11): e112322.CrossRef Google Scholar PubMed

International HapMap 3 Consortium. 2010. “Integrating Common and Rare Genetic Variation in Diverse Human Populations.” Nature 467 (7311): 52–58.CrossRef Google Scholar

Jablonka, Eva, Lamb, Marion J., and Zeligowski, Anna. 2014. Evolution in Four Dimensions: Genetic, Epigenetic, Behavioral, and Symbolic Variation in the History of Life. Rev. ed. Cambridge, MA: MIT Press.CrossRef Google Scholar

Jablonka, Eva, and Raz, Gal. 2009. “Transgenerational Epigenetic Inheritance: Prevalence, Mechanisms, and Implications for the Study of Heredity and Evolution.” Quarterly Review of Biology 84 (2): 131–76.CrossRef Google Scholar

Jacquard, Albert. 1983. “Heritability: One Word, Three Concepts.” Biometrics 39 (2): 465–77.CrossRef Google Scholar PubMed

Johannes, Frank, Colot, Vincent, and Jansen, Ritsert C. 2008. “Epigenome Dynamics: A Quantitative Genetics Perspective.” Nature Reviews Genetics 9 (11): 883–90.CrossRef Google Scholar PubMed

Klein, Robert J., Zeiss, Caroline, Chew, Emily Y., Tsai, Jen-Yue, Sackler, Richard S., Haynes, Chad, Henning, Alice K., SanGiovanni, John Paul, Mane, Shrikant M., and Mayne, Susan T. 2005. “Complement Factor H Polymorphism in Age-Related Macular Degeneration.” Science 308 (5720): 385–89.CrossRef Google Scholar PubMed

Lu, Qiaoying, and Bourrat, Pierrick. 2017. “The Evolutionary Gene and the Extended Evolutionary Synthesis.” British Journal for Philosophy of Science. doi:10.1093/bjps/axw035.CrossRef Google Scholar

Lynch, Kate E., and Bourrat, Pierrick. 2017. “Interpreting Heritability Causally.” Philosophy of Science 84 (1): 14–34.CrossRef Google Scholar

Maher, Brendan. 2008. “Personal Genomes: The Case of the Missing Heritability.” Nature News 456 (7218): 18–21.CrossRef Google Scholar PubMed

Manolio, Teri A., Collins, Francis S., Cox, Nancy J., Goldstein, David B., Hindorff, Lucia A., Hunter, David J., McCarthy, Mark I., Ramos, Erin M., Cardon, Lon R., and Chakravarti, Aravinda. 2009. “Finding the Missing Heritability of Complex Diseases.” Nature 461 (7265): 747–53.CrossRef Google Scholar PubMed

McCarthy, Mark I., and Hirschhorn, Joel N. 2008. “Genome-Wide Association Studies: Potential Next Steps on a Genetic Journey.” Human Molecular Genetics 17 (R2): R156–R165.CrossRef Google Scholar PubMed

Morgan, Hugh D., Sutherland, Heidi G. E., Martin, David I. K., and Whitelaw, Emma. 1999. “Epigenetic Inheritance at the Agouti Locus in the Mouse.” Nature Genetics 23 (3): 314–18.CrossRef Google Scholar

Schork, Nicholas J., Murray, Sarah S., Frazer, Kelly A., and Topol, Eric J. 2009. “Common vs. Rare Allele Hypotheses for Complex Diseases.” Current Opinion in Genetics and Development 19 (3): 212–19.CrossRef Google Scholar PubMed

Silventoinen, Karri, Sammalisto, Sampo, Perola, Markus, Boomsma, Dorret I., Cornes, Belinda K., Davis, Chayna, Dunkel, Leo, Lange, Marlies De, Harris, Jennifer R., and Hjelmborg, Jacob V. B. 2003. “Heritability of Adult Body Height: A Comparative Study of Twin Cohorts in Eight Countries.” Twin Research 6 (5): 399–408.CrossRef Google Scholar PubMed

Turkheimer, Eric. 2011. “Still Missing.” Research in Human Development 8 (3–4): 227–41.CrossRef Google Scholar

Visscher, Peter M., Brown, Matthew A., McCarthy, Mark I., and Yang, Jian. 2012. “Five Years of GWAS Discovery.” American Journal of Human Genetics 90 (1): 7–24.CrossRef Google Scholar PubMed

Visscher, Peter M., Hill, William G., and Wray, Naomi R. 2008. “Heritability in the Genomics Era: Concepts and Misconceptions.” Nature Reviews Genetics 9 (4): 255–66.CrossRef Google Scholar PubMed

Visscher, Peter M., Medland, Sarah E., Ferreira, Manuel A. R., Morley, Katherine I., Zhu, Gu, Cornes, Belinda K., Montgomery, Grant W., and Martin, Nicholas G. 2006. “Assumption-Free Estimation of Heritability from Genome-Wide Identity-by-Descent Sharing between Full Siblings.” PLoS Genetics 2 (3): e41.CrossRef Google Scholar PubMed

Weedon, Michael N., Lango, Hana, Lindgren, Cecilia M., Wallace, Chris, Evans, David M., Mangino, Massimo, Freathy, Rachel M., Perry, John R. B., Stevens, Suzanne, and Hall, Alistair S. 2008. “Genome-Wide Association Analysis Identifies 20 Loci That Influence Adult Height.” Nature Genetics 40 (5): 575–83.CrossRef Google Scholar PubMed

Wood, Andrew R., Esko, Tonu, Yang, Jian, Vedantam, Sailaja, Pers, Tune H., Gustafsson, Stefan, Chu, Audrey Y., Estrada, Karol, Luan, Jian’an, and Kutalik, Zoltán. 2014. “Defining the Role of Common Variation in the Genomic and Biological Architecture of Adult Human Height.” Nature Genetics 46 (11): 1173–86.CrossRef Google Scholar PubMed

Yang, Jian, et al. 2010. “Common SNPs Explain a Large Proportion of the Heritability for Human Height.” Nature Genetics 42 (7): 565–69.CrossRef Google Scholar PubMed

Yang, Jian 2015. “Genetic Variance Estimation with Imputed Variants Finds Negligible Missing Heritability for Human Height and Body Mass Index.” Nature Genetics 47:1114–20.CrossRef Google Scholar PubMed

Youngson, Neil A., and Whitelaw, Emma. 2008. “Transgenerational Epigenetic Effects.” Annual Review of Genomics and Human Genetics 9:233–57.CrossRef Google Scholar PubMed

Article contents

Dissolving the Missing Heritability Problem

Abstract

1. Introduction

2. Heritability in Traditional Quantitative Methods

3. Heritability in GWAS

4. Dissolving the Missing Heritability Problem

5. Conclusion

Footnotes

References

Save article to Kindle

Save article to Dropbox

Save article to Google Drive

Reply to: Submit a response

Your details

You have entered the maximum number of contributors

Conflicting interests