Introduction
Attention deficit hyperactivity disorder (ADHD) is characterized by developmentally inappropriate and impairing symptoms that aggregate into two dimensions of inattention and hyperactivity-impulsivity (APA, 1994). ADHD is one of the most common childhood psychiatric diagnoses and frequently persists into adulthood (Faraone et al. Reference Faraone, Biederman and Mick2006). Research suggests that ADHD symptoms are distributed continuously throughout the population, with clinical cases derived from extreme symptom scores (Chen et al. Reference Chen, Zhou, Sham, Franke, Kuntsi, Campbell, Fleischman, Knight, Andreou, Arnold, Altink, Boer, Boholst, Buschgens, Butler, Christiansen, Fliers, Howe-Forbes, Gabriëls, Heise, Korn-Lubetzki, Marco, Medad, Minderaa, Müller, Mulligan, Psychogiou, Rommelse, Sethna, Uebel, McGuffin, Plomin, Banaschewski, Buitelaar, Ebstein, Eisenberg, Gill, Manor, Miranda, Mulas, Oades, Roeyers, Rothenberger, Sergeant, Sonuga-Barke, Steinhausen, Taylor, Thompson, Faraone and Asherson2008; Larsson et al. Reference Larsson, Anckarsäter, Råstam, Chang and Lichtenstein2012a).
The methods used to assess ADHD symptoms vary throughout the lifespan. In childhood, symptoms are usually rated by parents and teachers; in late adolescence and adulthood, symptoms are more frequently self-rated (Asherson, Reference Asherson2005). Parent, teacher and self-ratings of ADHD symptoms correlate only moderately, around 0.3 to 0.5 (Achenbach & Rescorla, Reference Achenbach and Rescorla2000; Goodman, Reference Goodman2001).
Univariate twin studies suggest that heritability estimates for ADHD symptoms are to some extent informant specific. Parent and teacher ratings of childhood and adolescent ADHD symptom scores typically yield high heritability estimates (70–80%) (Nikolas & Burt, Reference Nikolas and Burt2010) whereas studies that use self-ratings consistently estimate lower heritability (<50%). This is true of self-ratings obtained in adolescence (Young et al. Reference Young, Stallings, Corley, Krauter and Hewitt2000; Martin et al. Reference Martin, Scourfield and McGuffin2002; Ehringer et al. Reference Ehringer, Rhee, Young, Corley and Hewitt2006) and of retrospective and current self-ratings obtained in adulthood (Schultz et al. Reference Schultz, Rabi, Faraone, Kremen and Lyons2006; Van Den Berg et al. Reference Van Den Berg, Willemsen, De Geus and Boomsma2006; Haberstick et al. Reference Haberstick, Timberlake, Hopfer, Lessem, Ehringer and Hewitt2008; Boomsma et al. Reference Boomsma, Saviouk, Hottenga, Distel, de Moor, Vink, Geels, van Beek, Bartels, de Geus and Willemsen2010; Larsson et al. Reference Larsson, Asherson, Chang, Ljung, Friedrichs, Larsson and Lichtenstein2012b). Some studies also estimate lower heritability when different teachers rate each twin from a pair, rather than the same teacher rating each twin (Simonoff et al. Reference Simonoff, Pickles, Hervas, Silberg, Rutter and Eaves1998; Saudino et al. Reference Saudino, Ronald and Plomin2005; Derks et al. Reference Derks, Hudziak, Van Beijsterveldt, Dolan and Boomsma2006; Hartman et al. Reference Hartman, Rhee, Willcutt and Pennington2007).
One explanation for the lower heritability of self-ratings is that they may be less reliable than other informant ratings of ADHD symptoms. Low reliability leads to lower identical (monozygotic, MZ) within-twin correlations, imposing a ceiling on heritability estimates by increasing measurement error (Plomin et al. Reference Plomin, DeFries, McClearn and McGuffin2008). Indeed, measurement error has been proposed as an explanation for the lower heritability estimated for different-teacher ratings of ADHD (Hartman et al. Reference Hartman, Rhee, Willcutt and Pennington2007).
Parent ratings of ADHD symptoms often show non-additive in addition to additive genetic influences (Burt, Reference Burt2009), whereas teacher and self-ratings tend to show only additive genetic influences. This may reflect a rater contrast effect, whereby parents contrast the behavior of their twins and underestimate the similarity of non-identical (dizygotic, DZ) twins (Simonoff et al. Reference Simonoff, Pickles, Hervas, Silberg, Rutter and Eaves1998; Wood et al. Reference Wood, Buitelaar, Rijsdijk, Asherson and Kuntsi2010). In genetic modeling, contrast effects and genetic non-additivity both lead to low within-twin correlations for DZ twins. However, contrast effects can be distinguished from genetic non-additivity by greater variance in the behaviors of DZ than MZ twins (Price et al. Reference Price, Simonoff, Asherson, Curran, Kuntsi, Waldman and Plomin2005).
Because of these nuances, an important question is whether different informants rate the same aspects of ADHD-related behaviors. Rater differences can occur because of genuine differences in perspective and/or rater biases (Derks et al. Reference Derks, Hudziak, Van Beijsterveldt, Dolan and Boomsma2006) and can be disentangled through multivariate twin studies that use multiple informant data: unique genetic influences indicate that different informants rate unique but valid aspects of behavior; unique environmental influences may reflect rater-specific bias or measurement error; and overlapping genetic influences and overlapping environmental influences indicate the extent to which different informants rate the same aspects of behavior (Hewitt et al. Reference Hewitt, Silberg, Neale, Eaves and Erickson1992).
Bivariate twin studies have identified both common and unique genetic influences on parent and teacher ratings of ADHD symptoms. This suggests that the same aspects of behaviour are rated by different informants, in addition to unique aspects of behavior (Simonoff et al. Reference Simonoff, Pickles, Hervas, Silberg, Rutter and Eaves1998; Thapar et al. Reference Thapar, Harrington, Ross and McGuffin2000; Martin et al. Reference Martin, Scourfield and McGuffin2002; Nadder et al. Reference Nadder, Rutter, Silberg, Maes and Eaves2002; Derks et al. Reference Derks, Hudziak, Van Beijsterveldt, Dolan and Boomsma2006; Hartman et al. Reference Hartman, Rhee, Willcutt and Pennington2007; McLoughlin et al. Reference McLoughlin, Rijsdijk, Asherson and Kuntsi2011). However, there are as yet no studies that have investigated the relationship between parent and teacher ratings and self-ratings of ADHD symptoms.
In the current study we used a large population-based sample of 11–12-year-old twins to investigate the genetic and environmental contributions to individual differences in parent, teacher and self-ratings of ADHD symptoms. Multivariate genetic modeling was used to evaluate the extent to which the different informant ratings reflect the same and/or specific views of ADHD. Characterizing the phenotypic and etiological relationships between self and other informant ratings of ADHD is particularly relevant to our understanding of the developmental course of ADHD, not least because self-ratings are increasingly relied upon in the transition from adolescence toadulthood but also because etiological research depends on the quality of ratings.
Method
Sample and procedure
Participants were from the Twins Early Development Study (TEDS), a population-representative sample of all twin birth records in the UK for the years 1994 to 1996 (Oliver & Plomin, Reference Oliver and Plomin2007). Ethical approval was provided by the Research Ethics Committee of the Institute of Psychiatry, King's College London. Twin zygosity was initially determined by parental report using a questionnaire with 95% accuracy (Price et al. Reference Price, Freeman, Craig, Petrill, Ebersole and Plomin2000) and was subsequently verified using DNA obtained from cheek swabs (Oliver & Plomin, Reference Oliver and Plomin2007).
Data were collected when twins were aged 11–12 years. Exclusion criteria were severe medical problems at the time of assessment, severe problems at birth or during pregnancy, and unknown or uncertain sex or zygosity. Pairs in which ADHD symptom ratings were unavailable for either twin were also excluded. The final sample thus comprised 6372 twin pairs: parent ratings of ADHD were available for 5590 pairs (including two incomplete pairs); teacher ratings for 5217 pairs (including 1069 incomplete pairs); and self-ratings for 5621 pairs (including 84 incomplete pairs). The number of pairs is presented by sex, zygosity and informant in Table 1. The mean age of participating twins was 11.28 (s.d. = 0.70) years.
Table 1. Descriptive statistics and correlations

MZM, monozygotic male; MZF, monozygotic female; DZM, dizygotic male; DZF, dizygotic female; DZO, dizygotic opposite-sex; P, parent; T, teacher; S, self; rWT, within-twin correlation; rCTCT, cross-twin cross-trait correlation.
n includes complete and incomplete pairs; mean and standard deviation (s.d.) reported for raw (untransformed) data, variances and correlations reported for transformed data regressed on age and sex; variances reported separately for male and female DZO twins; 95% confidence intervals for variances and correlations in parentheses.
Measures
The Strengths and Difficulties Questionnaire (SDQ; Goodman, Reference Goodman2001)
The SDQ is a 25-item questionnaire designed to measure common mental health problems during childhood and adolescence. ADHD symptoms were assessed using the SDQ hyperactivity scale, a five-item measure of inattention (‘easily distracted, concentration wanders’), hyperactivity (‘constantly fidgeting or squirming’) and impulsivity (‘thinks things out before acting’). Ratings were made using a three-point Likert scale. There are insufficient items to provide a valid separation of the inattentive and hyperactive/impulsive symptoms into separate subscales and the loading of all five items onto a hyperactivity scale is supported by factor analysis (Goodman, Reference Goodman2001; Van Roy et al. Reference Van Roy, Veenstra and Clench-Aas2008). As in previous research using the SDQ hyperactivity scale (Price et al. Reference Price, Simonoff, Asherson, Curran, Kuntsi, Waldman and Plomin2005), scores across all five items were averaged to create a total ADHD symptom score. The scale was completed by parents and teachers, and was self-rated by children. Cronbach's α was 0.76 for parent ratings, 0.86 for teacher ratings and 0.69 for self-ratings.
Analyses
The twin method (Plomin et al. Reference Plomin, DeFries, McClearn and McGuffin2008) was used to decompose phenotypic variance/covariance into additive genetic (A), non-additive genetic (D) and non-shared environmental (E) components. Broad-sense heritability estimates were derived from the sum of A + D. Measurement error was accounted for by the component E. Genetic modeling was conducted using the structural equation modeling program Mx (Neale, Reference Neale1997), which estimated genetic and environmental correlations within and across twin pairs. It is assumed that both additive genetic (r A) and non-additive genetic (r D) correlations within MZ twin pairs are 1.00 because 100% of genetic variation is shared. Within DZ pairs, r A is assumed to be 0.50 and r D is assumed to be 0.25, reflecting on average 50% additive genetic similarity and 25% non-additive genetic similarity. The non-shared environment (E) is unique to individuals and therefore correlates at zero within MZ and DZ twin pairs.
Prior to genetic modeling, raw data were square-root transformed to correct for non-normal distribution and were regressed to correct for the effects of age and sex, a standard twin modeling procedure (McGue & Bouchard, Reference McGue and Bouchard1984). All transformed/regressed variables showed approximately normal distribution (skewness and kurtosis within range ± 1). Mx used full-information maximum likelihood estimation, in which a likelihood statistic (−2 log likelihood or −2LL) of the data for each observation was calculated. Likelihood-based confidence intervals (CIs) were used to assess the accuracy and significance of parameter estimates (Neale & Miller, Reference Neale and Miller1997).
Univariate genetic modeling
Univariate genetic models decomposed the variance in parent, teacher and self-ratings of ADHD symptoms into the components ADE. Models including contrast effects (b) were fit when low DZ within-twin correlations were observed in the presence of greater variances for DZ than MZ twins. ADE and ADE+b models were tested separately as this provides the greatest power to detect genetic non-additivity (Rietveld et al. Reference Rietveld, Posthuma, Dolan and Boomsma2003).
Full sex-limitation models were used to test whether the genetic and environmental factors influencing males were different to those influencing females (qualitative sex differences), whether the magnitude of factor loadings influencing males and females were different (quantitative sex differences), and whether there were differences in phenotypic variances between males and females (scalar sex differences). The full sex-limitation model (1) contains three nested submodels (2–4) and can be explained as follows (Davis et al. Reference Davis, Arden and Plomin2008):
(1) The full sex-limitation model allowed quantitative and qualitative differences in the parameter estimates between males and females, and freely estimated either r Aor r D.
(2) The common effects sex-limitation model allowed quantitative sex differences between males and females but no qualitative differences, fixing r A to 0.5 and r D to 0.25 in the DZ opposite-sex group.
(3) The scalar effects sex-limitation model allowed variance differences between males and females but no qualitative or quantitative differences, fixing r A to 0.5 and r D to 0.25 in the DZ opposite-sex group and constraining the variance components for males to be a scalar multiple of female variance components. (To test for sex differences in contrast effects for this study, parameter estimates for b were equated for males and females as an additional step.)
(4) The null model equated all parameter estimates for males and females, testing the hypothesis that there were no sex differences.
The relative fit of nested models was assessed using the likelihood ratio test (LRT), calculated by comparing the difference in –2LL against a χ2 distribution with degrees of freedom equal to the number of parameters eliminated in the reduced model. A significant result indicated deterioration in model fit. Goodness of fit was also determined using the Bayesian information criterion (BIC) statistic, which favors parsimony in large sample sizes (Raftery, Reference Raftery1995). Lower BIC values indicated better model fit; differences of more than 10 identified a strong preference for the model with the lower BIC value.
Multivariate genetic modeling
Multivariate genetic models were used to examine covariance between parent, teacher and self-ratings of ADHD symptoms. These used cross-twin cross-trait (CTCT) correlations to decompose phenotypic covariation into genetic and environmental components. Contrast effects were included where appropriate, based on the univariate genetic results. The fit of different multivariate models was compared using the BIC statistic, with the decision on whether to accept reduced models based on the LRT. Three classes of model were tested:
(1) Cholesky decomposition. We interpreted the mathematically equivalent correlated factors solution (Fig. 1 a) (Loehlin, Reference Loehlin1996). It parameterized the extent to which influences underlying one informant rating of ADHD symptoms (e.g. parent) also influenced other ratings (e.g. teacher, self). Each rating of ADHD symptoms was decomposed into its genetic and environmental components (ADE) and the correlation of these components was estimated across informants.
(2) Independent pathway model (Fig. 1 b). This was based on a biometric model, in which common variance components (ADE) loaded onto the ADHD symptom ratings to account for phenotypic covariance. These represented genetic and environmental influences that contributed to all informant ratings. Residual components (ade) accounted for the remaining variance that was specific to each informant.
(3) Common pathway model (Fig. 1 c). This was based on a psychometric model, in which common components (ADE) loaded onto a latent ADHD factor with variance constrained to 1.00. The latent factor accounted for covariance among the different informant ratings, representing a common, pervasive view of ADHD. Residual components (ade) accounted for the remaining variance that was specific to each informant.
Results
Descriptive statistics and correlations are presented in Table 1. Tests of mean differences were performed on the raw data using robust regressions in Stata (StataCorp, 2007) to control for dependence in the observations from twin pairs (Williams, Reference Williams2000). Mean ADHD symptom scores were significantly higher for males than females based on ratings from parents (t = 22.24, p < 0.001), teachers (t = 25.20, p < 0.001) and children (t = 17.00, p < 0.001).

Fig. 1. Path diagrams for the multivariate genetic models. (a) Correlated factors solution: depicts only the additive genetic (A) correlations between parent, teacher and self-ratings of attention deficit hyperactivity disorder (ADHD), for illustrative purposes; non-additive genetic (D) and non-shared environmental (E) correlations are similar, except that the coefficient of relatedness between twins is β for D and 0 for E. (b) Independent pathway model: A, D and E are common components of variance; a, d and e are residual components of variance unique to each informant. (c) Common pathway model: A, D and E are components of variance influencing the common latent factor (F); a, d and e are residual components of variance unique to each informant. T1, twin 1; T2, twin 2; A, additive genetic component of variance; b, contrast effect for parent ratings; α, coefficient of additive genetic relatedness between T1 and T2, set to 1.00 for monozygotic (MZ) pairs and 0.5 for dizygotic (DZ) pairs; β, coefficient of non-additive genetic relatedness between T1 and T2, set to 1.00 for MZ pairs and 0.25 for DZ pairs.
Twin variances and correlations were estimated using a saturated model fit to the transformed/regressed data. In this model phenotypic correlations were constrained to be equal regardless of sex or zygosity whereas variances, within-twin and CTCT correlations were estimated separately for the different sex-by-zygosity groups. To test for differences, variances were equated across sex-by-zygosity groups to see whether this resulted in a significant deterioration in model fit based on the LRT. Variances were significantly higher for males than females based on ratings from parents (χ2 = 28.68, p < 0.001) and teachers (χ2 = 200.54, p < 0.001), indicating probable scalar sex differences. For parent ratings variances were also significantly higher for DZ than MZ twins (χ2 = 18.72, p = 0.001), indicating possible contrast effects.
For parent ratings the DZ within-twin correlations were less than half the MZ correlations, further suggesting contrast effects and/or non-additive genetic influences on phenotypic variance. For teacher ratings the DZ correlations were roughly half the MZ correlations, suggesting additive genetic influences. For self-ratings the DZ correlations were less than half the MZ correlations, suggesting some non-additive genetic influences. CTCT correlations for the DZ pairs were less than half of those for the MZ pairs, suggesting non-additive genetic influences on phenotypic covariance. Phenotypic correlations were 0.34 (95% CI 0.32–0.36) for parent with teacher ratings, 0.45 (95% CI 0.45–0.47) for parent with self-ratings and 0.29 (95% CI 0.27–0.31) for teacher with self-ratings.
Univariate genetic results
Full sex-limitation models indicated variance sex differences for all informant ratings of ADHD symptoms. For parent ratings, the most parsimonious model was an AE scalar model, with a contrast effect (b) that was equated for males and females (A2 = 0.82, 95% CI 0.80–0.83; b = − 0.04, 95% CI −0.05 to −0.03). The most parsimonious models were an AE scalar model for teacher ratings (A2 = 0.60, 95% CI 0.58–0.63) and an ADE scalar model for self-ratings (A2 = 0.28, 95% CI 0.15–0.41; D2 = 0.20, 95% CI 0.06–0.34). Broad-sense heritability estimates were thus 82%, 60% and 48% respectively. Fit statistics for all univariate models tested are presented in the online supplementary material (Table S1).
Multivariate genetic results
Based on the univariate results the multivariate models included a scalar to account for variance sex differences for all informant ratings of ADHD, in addition to a contrast effect (b) for parent ratings only. The BIC statistic indicated a strong preference for the common pathway model, from which a reduced model parameterizing ADE at the common level, ae at the residual level and b for parent ratings provided the most parsimonious fit. Parameter estimates for this model are presented in Table 2. Fit statistics for all multivariate models tested are presented in the online supplementary material (Table S2).
Table 2. Parameter estimates for the common pathway model (whole sample)

F is the loading of each informant rating onto the latent factor; A2, D2 and E2 are standardized components of variance for latent factor; a2 and e2 are standardized components of variance unique to each informant rating; b is the rater contrast effect unique to parent ratings; the lower section of the table gives the proportion of phenotypic variance explained by common/residual genetic (A/a, D) and non-shared environmental (E/e) factors for each informant, where the proportion of variance from the common factor is calculated as the factor loading multiplied by the standardized parameter estimate multiplied by the factor loading (i.e. Common A = F×A2 × F); 95% confidence intervals in parentheses.
A common factor accounted for similarities among the different informant ratings of ADHD symptoms. This factor was highly heritable (A2 + D2 = 0.84), with the remaining variance explained by the non-shared environment. When examining loadings of each informant rating onto the latent factor, it was apparent that common genetic influences (common A + D) accounted for 43% of the total variance in parent ratings, 17% in teacher ratings and 32% in self-ratings (see footnote of Table 2 for calculations). These results indicate that parent, teacher and self-ratings assessed some of the same aspects of ADHD-related behavior, and that common genetic influences accounted for most of the similarity between informants.
The remaining variance for each informant rating was accounted for by residual genetic and environmental factors. The presence of residual genetic influences indicate that all informants rated unique but valid aspects of ADHD-related behavior whereas residual non-shared environmental influences indicate that different informant reports were also influenced by the unique environment and/or measurement error.
Post-hoc analyses of same/different-teacher ratings
In genetic modeling the heritability estimated for teacher ratings was lower than expected. Previous research indicates that this can occur when same- and different-teacher ratings of ADHD symptoms are combined (Derks et al. Reference Derks, Hudziak, Van Beijsterveldt, Dolan and Boomsma2006). We therefore split the sample based on whether both twins from a pair had the same teacher (n = 1868 pairs) or different teachers (n = 3349 pairs) at school and repeated all genetic modeling separately for these groups.
We first conducted univariate modeling. For both groups, the most parsimonious models were AE scalar models. These estimated higher heritability for same-teacher ratings (A2 = 0.76, 95% CI 0.73–0.78)] than different-teacher ratings (A2 = 0.49, 95% CI 0.44–0.53). Non-overlapping CIs indicated that this was a significant difference. Fit statistics are presented in the online supplementary material (Table S3).
We then refit the common pathway model. For both groups a model that parameterized ADE at the common level and ae at the residual level provided the best fit. The model for the different-teacher group also incorporated a contrast effect (b) for parent-rated ADHD symptoms; however, in the same-teacher group the contrast effect for parent ratings was non-significant and could be removed in the interests of model parsimony. Additive genetic influences on the latent factor for the same-teacher group were also non-significant but were retained in the model because it is considered biologically implausible to model genetic non-additivity in the absence of additive genetic effects. Non-significance of these parameter estimates probably reflects the smaller sample size of the same-teacher group.
In both the same-teacher and different-teacher models, a highly heritable latent factor accounted for covariance among parent, teacher and self-ratings of ADHD symptoms (A2 + D2 = 0.85 and 0.83 respectively). This is consistent with results reported for the whole sample. Residual genetic influences (a2) were significantly higher in the same-teacher than the different-teacher models. Parameter estimates are presented in Table 3 and model fit statistics in the online supplementary Table S4.
Table 3. Parameter estimates for the same-teacher and different-teacher common pathway models

F is the loading of each informant rating onto the latent factor; A2, D2 and E2 are standardized components of variance for latent factor; a2 and e2 are standardized components of variance unique to each informant rating; b is the rater contrast effect unique to parent ratings; the lower section of the table gives the proportion of phenotypic variance explained by common/residual genetic (A/a, D) and non-shared environmental (E/e) factors for each informant, where the proportion of variance from the common factor is calculated as the factor loading multiplied by the standardized parameter estimate multiplied by the factor loading (i.e. Common A = F×A2 × F); 95% confidence intervals in parentheses.
Discussion
This study investigated the etiological relationship between parent, teacher and self-ratings of ADHD symptoms. There were two main findings. First, heritability estimates were lower for self-ratings (48%) than for parent (82%) or teacher (60%) ratings, even though all ratings were obtained concurrently during early adolescence. Second, multivariate modeling indicated shared and unique etiological influences on different informant ratings, suggesting shared but also rater-specific views of ADHD-related behaviors.
Previous twin studies of self-rated ADHD symptoms have reported univariate heritabilities below 50% in adolescence and adulthood (Young et al. Reference Young, Stallings, Corley, Krauter and Hewitt2000; Martin et al. Reference Martin, Scourfield and McGuffin2002; Ehringer et al. Reference Ehringer, Rhee, Young, Corley and Hewitt2006; Schultz et al. Reference Schultz, Rabi, Faraone, Kremen and Lyons2006; Van Den Berg et al. Reference Van Den Berg, Willemsen, De Geus and Boomsma2006; Haberstick et al. Reference Haberstick, Timberlake, Hopfer, Lessem, Ehringer and Hewitt2008; Boomsma et al. Reference Boomsma, Saviouk, Hottenga, Distel, de Moor, Vink, Geels, van Beek, Bartels, de Geus and Willemsen2010; Larsson et al. Reference Larsson, Asherson, Chang, Ljung, Friedrichs, Larsson and Lichtenstein2012b). In the current study we extended these findings to a younger age group, finding similar heritability (48%) for self-ratings of ADHD symptoms in 11–12-year-old twins. This focus on early adolescence indicates that the lower heritability associated with self-ratings is not exclusive to late adolescence and adulthood, and challenges the conclusion that ADHD might be a less heritable phenotype in adults (Boomsma et al. Reference Boomsma, Saviouk, Hottenga, Distel, de Moor, Vink, Geels, van Beek, Bartels, de Geus and Willemsen2010; Saviouk et al. Reference Saviouk, Hottenga, Slagboom, Distel, de Geus, Willemsen and Boomsma2011).
As expected from a recent meta-analysis (Nikolas & Burt, Reference Nikolas and Burt2010), the heritability estimate for parent ratings was high (82%), but was lower than expected for teacher ratings (60%). When we divided our data into two samples, based on whether the behaviors for both twins from a pair were rated by the same or different teachers, we estimated a significantly higher heritability from same-teacher ratings (76% v. 49%). This observation has been reported previously (Simonoff et al. Reference Simonoff, Pickles, Hervas, Silberg, Rutter and Eaves1998; Saudino et al. Reference Saudino, Ronald and Plomin2005; Derks et al. Reference Derks, Hudziak, Van Beijsterveldt, Dolan and Boomsma2006; Hartman et al. Reference Hartman, Rhee, Willcutt and Pennington2007) and therefore seems to be a robust finding.
It is noteworthy that the heritability estimates derived from same-teacher ratings were similar to parent ratings whereas the estimates from different-teacher ratings were similar to self-ratings. This suggests that having a single informant rate the behaviors of both twins from a pair (either a parent or the same teacher) leads to higher heritability estimates than having ratings by different informants for each twin (either the children themselves or different teachers). There are several possible conclusions.
One conclusion is that the different-informant ratings may be more sensitive to genuine non-shared environmental influences on behavior, such as peer relationships or teacher characteristics. If this is the case, then different-informant ratings may provide more accurate heritability estimates that better account for non-shared environmental effects. Another conclusion is that of gene–environment interaction, which occurs when genetic influences depend on the environment. This was the conclusion of a recent twin study that suggested that exposure to different teachers and the corresponding classroom environments triggered different externalized behaviors in each twin from a pair (Lamb et al. Reference Lamb, Middeldorp, Van Beijsterveldt and Boomsma2012). A third conclusion is that different-informant ratings may be associated with increased measurement error, a likely scenario because reliability between ratings will always be lower when two raters rather than just one is involved (unless inter-rater reliability approaches 1). If this is the case then the different-informant ratings may underestimate heritability. Unfortunately, we were unable to distinguish genuine non-shared environmental effects from measurement error in this study, so cannot say which of these explanations may be correct.
An additional explanation that must be considered in relation to the low heritability of self-ratings is that children may be unreliable informants of their own behavior. Previous research has shown that the SDQ hyperactivity scale is less reliable for self-ratings than parent or teacher ratings, based on internal consistency and retest stability in children and adolescents (Goodman, Reference Goodman2001). Moreover, the internal consistency of self-ratings from the SDQ hyperactivity scale is found to increase with age, from 10–13 years (α = 0.57) to 13–16 years (α = 0.65) and 16–19 years (α = 0.66) (Van Roy et al. Reference Van Roy, Veenstra and Clench-Aas2008). Children may therefore be less reliable informants than older individuals when rating their own ADHD symptoms. In the present study the internal consistency for self-ratings was acceptable (α = 0.69), although not as good as for parent (α = 0.76) or teacher (α = 0.86) ratings. Nonetheless, this suggests that the children who participated in this study were reasonably reliable when assessing their own ADHD symptomatology.
In the multivariate genetic modeling a highly heritable latent factor accounted for similarity between parent, teacher and self-ratings of ADHD symptoms, indicating that the overlap between different informant ratings was largely due to a common set of genetic effects. Post-hoc analyses showed similar results when same-teacher and different-teacher ratings were considered separately. However, the loading of teacher ratings onto the latent factor was always significantly lower than the loadings of parent or self-ratings, indicating that the greatest similarity was between the parents and children. The weaker association of teacher ratings with this pervasive view is in line with previous studies showing distinct and/or shared etiological influences for parent and teacher ratings of ADHD symptoms (Simonoff et al. Reference Simonoff, Pickles, Hervas, Silberg, Rutter and Eaves1998; Thapar et al. Reference Thapar, Harrington, Ross and McGuffin2000; Martin et al. Reference Martin, Scourfield and McGuffin2002; Nadder et al. Reference Nadder, Rutter, Silberg, Maes and Eaves2002; Derks et al. Reference Derks, Hudziak, Van Beijsterveldt, Dolan and Boomsma2006; Hartman et al. Reference Hartman, Rhee, Willcutt and Pennington2007; McLoughlin et al. Reference McLoughlin, Rijsdijk, Asherson and Kuntsi2011). Because of this, and because of the finding of residual genetic influences on parent, teacher and self-ratings, rater-specific effects are likely to be valid indicators of different aspects of ADHD-related behaviors, perhaps reflecting differences at home and at school.
Finally, we can comment on the role of contrast effects and genetic non-additivity across different informant ratings of ADHD. Consistent with previous research using the SDQ (Price et al. Reference Price, Simonoff, Waldman, Asherson and Plomin2001, Reference Price, Simonoff, Asherson, Curran, Kuntsi, Waldman and Plomin2005; Saudino et al. Reference Saudino, Ronald and Plomin2005), univariate modeling identified significant contrast effects for parent ratings only. Conversely, there were significant non-additive genetic influences on self-ratings, a finding not reported previously. The multivariate model also included non-additive genetic influences on the common factor, indicating that these were important with regard to the overlap between informants.
The results should be interpreted in the context of several limitations. First, we examined ADHD symptoms in a population-based twin sample, meaning that the findings may not generalize to clinical cases of ADHD. Second, we used a short five-item measure of ADHD symptoms (the SDQ hyperactivity scale) rather than an 18-item questionnaire. We took this approach because self-ratings on more comprehensive measures of ADHD symptoms were unavailable; however, the scale has been used to assess ADHD symptoms in previous twin studies in this sample (Price et al. Reference Price, Simonoff, Waldman, Asherson and Plomin2001, Reference Price, Simonoff, Asherson, Curran, Kuntsi, Waldman and Plomin2005; Saudino et al. Reference Saudino, Ronald and Plomin2005). Third, because we used the SDQ we were unable to examine the dimensions of inattention and hyperactivity-impulsivity separately and across raters. ADHD is a heterogeneous disorder, and the two dimensions are not perfectly correlated at the phenotypic or genetic level (Greven et al. Reference Greven, Asherson, Rijsdijk and Plomin2011a,b; Larsson et al. Reference Larsson, Asherson, Chang, Ljung, Friedrichs, Larsson and Lichtenstein2012b). Accordingly, one recent twin study found that parents and teachers rated unique aspects of inattentive and hyperactive-impulsive behaviors (McLoughlin et al. Reference McLoughlin, Rijsdijk, Asherson and Kuntsi2011).
There are two main implications that arise from this study. First, the identification of a highly heritable common factor suggests that clinical and etiological investigations of ADHD will benefit from combining data from multiple informants to create a pervasive, more heritable phenotype. This has the effect of reducing measurement error, thereby increasing power for tests of association with genetic, environmental and neurobiological variables. The second implication is for our understanding of the self-rating measures that are used in most adult studies of ADHD. Our findings suggest that self-ratings in childhood, when used as the sole measure of ADHD symptoms, may underestimate heritability. Thus, previous results indicating lower heritability of ADHD in adulthood may be due to a rater effect rather than a true change in the extent of genetic influences over the course of development. Longitudinal family and twin studies are now required to characterize stability and change in the familial and genetic influences on ADHD symptoms throughout the lifespan, and whenever possible should include similar informant ratings to those routinely collected in childhood and adolescence, in addition to self-ratings.
Supplementary material
For supplementary material accompanying this paper visit http://dx.doi.org/10.1017/S0033291712002978.
Acknowledgements
We gratefully acknowledge the ongoing contribution of the TEDS families. TEDS is supported by a program grant (G0500079) from the UK Medical Research Council (MRC); our work on school environments is supported by a grant from the US National Institutes of Health (NIH) (HD044454).
Declaration of interest
None.