Depression in women during pregnancy is concerning not only because of the distress it causes for women but also because of the risk of adverse outcomes for offspring. Given the high prevalence of depressive disorders and elevated depressed mood during pregnancy, many children are at risk. An estimated 20% of pregnant women exceed the established cutoff for clinically significant levels of depressive symptoms (Marcus, Flynn, Blow, & Barry, Reference Marcus, Flynn, Blow and Barry2003), and 18.4% of pregnant women are depressed (major or minor depression) at some point during pregnancy, with 12.7% experiencing an episode of major depressive disorder (Gavin et al., Reference Gavin, Gaynes, Lohr, Meltzer-Brody, Gartlehner and Swinson2005). Across numerous studies, women's depression during pregnancy has been associated with adverse functioning in offspring, including birth outcomes, attachment, temperament (especially negative affectivity), emotional and behavioral problems, and biological functioning (autonomic, brain, and neuroendocrine functioning; see Goodman & Halperin, Reference Goodman, Halperin, Harkness and Haydenin press, for a review). Among efforts to explain how depression experienced by mothers prenatally increases risk to offspring, preclinical studies with animals support a model of developmental or fetal programming. Among humans, cross-sectional and longitudinal descriptive studies indicate that prenatal depression in mothers is reliably associated with increased risk of adverse outcomes among children, beginning early in infancy and persisting throughout development. Such findings support a model of fetal programming as a mechanism in the transmission of risk for psychopathology from mothers to children; however, studies in which depression among mothers is manipulated experimentally are required to test the model with the human prenatal dyad.
In this paper, we leverage studies of interventions to prevent or treat depression in pregnant women as experimental tests of depression as a potential mechanism in the transmission of risk. That is, we ask: to what extent does treating or preventing depression in pregnant women protect against the emergence of indicators of adverse development in the offspring? In this way, we use intervention trials to test the model of fetal programming that links depression in pregnant women to adverse outcomes in offspring.
Initially proposed by Barker (Reference Barker1998) to explain tendencies toward chronic cardiovascular and other diseases in certain individuals, in utero programming proposes that the developing fetus neurobehaviorally adapts to the prenatal environment in ways that predispose the infant to be prepared for similar conditions in the postnatal environment. Thus, fetal exposure to adverse conditions will lead to adaptations designed to promote survival in stressful or depriving postnatal environments. As applied to the link between prenatal maternal mood states and risk for offspring developmental psychopathology, the understanding is that such mood states may influence the developing fetus via stress physiology (Glover, Reference Glover2014; Glover, O'Connor, & O'Donnell, Reference Glover, O'Connor and O'Donnell2010). That is, the mother's mood state is reflected in disturbances in her autonomic nervous system (ANS) and hypothalamic–pituitary–adrenal (HPA) axis system, each of which has been implicated not only in depression (including during pregnancy) but also in the development of psychopathology in offspring and, thus, are potential mechanisms. In addition to changes in women's ANS and HPA systems in relation to mood, other potential mechanisms include elevated fetal cortisol levels regardless of mothers’ levels, placental enzymes and other aspects of placental functioning (e.g., inflammatory cytokines), intestinal microbiota, serotonin exposure, and epigenetic changes (Glover, Reference Glover2014; O'Connor, Monk, & Burke, Reference O'Connor, Monk and Burke2016; Osborne & Monk, Reference Osborne and Monk2013; Zijlmans, Korpela, Riksen-Walraven, de Vos, & de Weerth, Reference Zijlmans, Korpela, Riksen-Walraven, de Vos and de Weerth2015). Through these mechanisms, the depression-exposed infant may be born with certain tendencies, vulnerabilities, and capacities that are consistent with the prenatal exposures, conferring certain advantages but also risks for the development of psychopathology. Findings from correlational studies with humans are consistent with this developmental programming model (see O'Connor, Monk, & Fitelson, Reference O'Connor, Monk and Fitelson2014, for a review).
Direct tests of the developmental programming model of psychopathology risk with the human dyad are needed to test the model empirically among infants exposed to mothers’ depression in utero; yet such experimental studies are scarce. Ethical concerns prohibit study designs commonly used in animal models. As one example, widely cited animal studies of maternal care influencing the offspring's developing brain often rely on cross-fostering studies, including prenatal cross fostering (embryo transfers), a design that is particularly useful for discerning prenatal relative to postnatal influences (Curley & Champagne, Reference Curley and Champagne2016). Other study designs involve manipulation of maternal mood states, such as stress-inducing paradigms (Lemaire, Lamarque, Le Moal, Piazza, & Abrous, Reference Lemaire, Lamarque, Le Moal, Piazza and Abrous2006), or fetal exposures, such as the introduction of glucocorticoids (Kapoor, Dunn, Kostaki, Andrews, & Matthews, Reference Kapoor, Dunn, Kostaki, Andrews and Matthews2006). Although such study designs may guide the development of experiments in humans (e.g., Babb, Deligiannidis, Murgatroyd, & Nephew, Reference Babb, Deligiannidis, Murgatroyd and Nephew2015), many aspects of the designs prohibit their consideration within studies of pregnant women.
We consider studies of clinical interventions (prevention or treatment) as experiments to test hypotheses from the developmental programming model (Alves, Martins, Fonseca, Canavarro, & Pereira, Reference Alves, Martins, Fonseca, Canavarro and Pereira2017). Our approach was to systematically review and, where the literature allowed, meta-analyze the literature in which depression (symptom level or diagnosis) or risk for depression was manipulated via clinical intervention (prevention or treatment) and fetal and/or infant neurobiological and behavioral outcomes were measured. Relying on the intervention literature in this way is justified by robust evidence that it is possible to manipulate maternal mood with prevention and treatment approaches during pregnancy. Several approaches to preventing or treating depression in women during pregnancy have been found to be effective (van Ravesteyn, Lambregtse-van den Berg, Hoogendijk, & Kamperman, Reference van Ravesteyn, Lambregtse-van den Berg, Hoogendijk and Kamperman2017).
Our approach to using the clinical intervention literature to test the developmental programming hypothesis has been suggested by others (e.g., O'Connor et al., Reference O'Connor, Monk and Burke2016), and other reviews have examined the literature on treatment of depression in mothers in relation to functioning in older children (Cuijpers, Weitz, Karyotaki, Garber, & Andersson, Reference Cuijpers, Weitz, Karyotaki, Garber and Andersson2015; Gunlicks & Weissman, Reference Gunlicks and Weissman2008); however, no other reviews have addressed our question in relation to prenatal depression exposure and offspring outcomes. Glover's (Reference Glover2014) review of this literature stated: “No study has yet been conducted on interventions specifically designed to reduce maternal depression, anxiety, or stress during pregnancy with a long-term follow-up study of outcomes for the child.” Thus, we aimed to address these issues to advance our understanding of how the prenatal environment may contribute to risk for the development of psychopathology in offspring.
Numerous conceptual and methodological considerations guided our selection of studies to include in our review. We included treatment studies, which directly intervened to reduce depression (i.e., diagnosis or symptom level), and prevention studies, which targeted known risk factors for depression in women during the perinatal period: stress, low social support, co-parenting issues, poverty, and infant dysregulation (e.g., sleep and cry behavior). To inform future studies, we asked whether universal prevention yielded different effect sizes for child outcomes relative to treatment studies.
Our examination of neurobiological and behavioral changes in offspring was limited to those reported in the published studies. We included studies of any offspring indices that have been reliably associated with depression in mothers during pregnancy in an effort to represent the full extent of multifinality (Cicchetti & Rogosch, Reference Cicchetti and Rogosch1996). These include birth outcomes such as preterm birth and lower birth weight, mother–infant attachment and other indices of compromised relationship quality, fussy/difficult and dysregulated temperament, socioemotional competence, cognitive/intellectual/motor functioning, behavioral and emotional functioning, autonomic functioning, brain functioning, and neuroendocrine functioning (see Goodman & Halperin, in press, for a review). We included only studies that reported sufficient data to extract two effect sizes of the intervention relative to control: the effect of the intervention on pregnant women's depression and the effect on offspring functioning. We would have liked to further considered evidence for mediation of offspring indices by changes in depression among mothers; however, we found no studies that tested this (see Field, Diego, Hernandez-Reif, Schanberg, & Kuhn, Reference Field, Diego, Hernandez-Reif, Schanberg and Kuhn2004, for consideration of this concern). In addition, although we considered including only studies that utilized an intervention control condition, in the spirit of representativeness and given the paucity of literature, we opted to include open trial designs that had no control condition.
Our primary aim was to critically examine the evidence regarding the neurobiological and behavioral changes in infants as a function of interventions to prevent or reduce maternal depression during pregnancy, treating interventions as experiments manipulating depression or depression risk in order to test the proposed developmental programming model. In so doing, we evaluated the extent of empirical support for proposing to reduce prenatal depression as a way to minimize abnormal fetal neurobiological and behavioral development. First, we expected to find that interventions to treat or prevent depression in pregnant women would be effective, which would also serve as a manipulation check for our second hypothesis. Second, we anticipated that intervention (the manipulation) would be significantly associated with child functioning, as evidenced by a statistically significant overall effect size. We further examined the extent to which the effect size of the interventions on depression would explain significant portions of the variance in the effect size of interventions on child functioning. Third and finally, we hypothesized that the effect size for intervention being associated with child functioning would differ in relation to: (a) whether the intervention was a preventive intervention or a treatment; (b) the approach to intervention (i.e., cognitive behavioral therapy, supportive counseling, etc.); (c) the domain of child functioning outcome; (d) child age at time of measurement; and (e) risk of bias in the study design.
Method
Protocol registration
We followed the Preferred Reporting Items for Systematic Reviews and Meta-Analyses (PRISMA) guidelines throughout the meta-analytic review process (Liberati et al., Reference Liberati, Altman, Tetzlaff, Mulrow, Gotzsche and Ioannidis2009). Before conducting the search, we registered the protocol for this review on the PROSPERO international prospective register of systematic reviews. The protocol was assigned the registration number CRD42017064502.
Search strategy
We located records using the following databases: PubMed, PsycINFO (ProQuest), Cochrane CENTRAL, BIOSIS Citation Index (Web of Science), Social Sciences Citation Index (Web of Science), ERIC, and Sociological Abstracts (ProQuest). All searches were limited to scholarly journal articles published in English, with no restrictions based on publication date. The following search terms were used, such that retrieved articles would contain at least one word/phrase from each of the following groups in the title and/or abstract: perinatal, prenatal, peripartum, pregnan*, antenatal, antepartum, (maternal AND fetus), (maternal AND fetal), (mother AND fetus), (mother AND fetal); depress*; infant, infancy, fetal, fetus, toddler, neonat*, child*, baby, babies, “Infant Toddler Social Emotional Assessment,” “Strange Situation Procedure,” “Bayley Scales of Infant Development,” “Eyberg Child Behavior Inventory,” “Child Behavior Questionnaire,” “Child Behavior Checklist,” “Neonatal Behavioral Assessment Scale,” “Newborn Behavioral Observations.” We further required that retrieved articles contain one of the following words in the title: therap*, interven*, treat*, prevent*, experiment*, manipulat*, or trial*. Finally, we also included database-specific relevant subject terms, including medical subject headings (MeSH terms; e.g., “Cognitive Therapy”), in each group of words/phrases. The complete search strategy for each database is stored in PROSPERO.Footnote 1
Inclusion/exclusion criteria
For inclusion in the review, a study had to meet the following predetermined criteria. First, the study must report on an intervention or experimental manipulation to prevent or treat depression. We excluded case studies, review/commentary articles, qualitative studies, unpublished student theses or dissertations, and conference abstracts. Second, participants must have been human, female adults who were pregnant at the time when the intervention began. We excluded studies in which the sample consisted of nonhuman mothers or the intervention began during the postpartum period.Footnote 2 Third, studies had to have measured the depressive symptoms and/or depressive episodes of the mother. Fourth, studies had to have measured the neurobiological or behavioral functioning of the fetus and/or infant. That is, we excluded studies if the only child functioning outcomes were medical or physical health-related (e.g., Rahman, Malik, Sikander, Roberts, & Creed, Reference Rahman, Malik, Sikander, Roberts and Creed2008). Fifth, we excluded studies that did not provide the statistics that allowed for calculation of effect sizes on the degree of association between the intervention and changes in (a) mothers’ depression and (b) fetal or infant functioning. We excluded studies that examined the safety or negative effects on offspring of treating mothers’ depression with medications without having presented data on the effect of the medication treatment on (a) depression in the pregnant women and (b) offspring functioning outcomes that had the potential to improve along with maternal depression.
Selection of studies
Our search yielded a total of 6,324 records. After duplicates were removed, 4,608 records were screened. Two of the authors (L.M.R. and K.A.C.) independently reviewed these records and, based on determination that they did not meet the inclusion criteria outlined above, eliminated 4,299 records. The same two authors then each independently reviewed and assessed the remaining full text articles for eligibility (n = 309). The two reviewers discussed and resolved discrepancies through consensus, resulting in a final set of 25 eligible studies (see Figure 1; Moher, Liberati, Tetzlaff, Altman, & Prisma Group, Reference Moher, Liberati, Tetzlaff and Altman2009).
![](https://static.cambridge.org/binary/version/id/urn:cambridge.org:id:binary:20180925163941905-0010:S0954579418000536:S0954579418000536_fig1g.jpeg?pub-status=live)
Figure 1. (Color online) PRISMA flow diagram.
Data extracted
From the included studies, coders extracted data for assessment of study quality and evidence synthesis using a structured form. Two review authors each extracted data independently for 50% of the eligible studies, plus a randomly selected 20% of those studies assigned to the other review author. Reviewers identified and resolved discrepancies through discussion. Extracted data included information on general study characteristics (e.g., study design, study period, and follow-up); study participants (e.g., age, gender, race, etc.); eligibility criteria; details of the intervention and control conditions (e.g., number, length, and nature of treatment and control sessions); study methodology; maternal and fetal/infant outcomes and times of measurement; suggested mechanisms of intervention action on fetal/child outcomes; and study quality. For studies that reported the depression or a child functioning outcome having been measured at more than one follow-up time point, we selected effect sizes for the effect of the intervention on depression and on each child outcome at the first follow-up. For all data, if both adjusted and unadjusted scores were reported, we analyzed the unadjusted scores.
Maternal outcome variables
We coded whether depression was assessed with a diagnostic interview or rating scale, when it was assessed relative to the intervention and in relation to the perinatal time period (i.e., during pregnancy and/or the postpartum period as well as the specific time points), and whether a baseline measurement was administered. For studies that measured depression at more than one follow-up time point, we included only effect sizes for the effect of the intervention on maternal depression at the first follow-up. For all depression data, if both adjusted and unadjusted scores were reported, we analyzed the unadjusted scores.
Intervention variables
We also coded each intervention in terms of intervention type (i.e., being a preventive intervention or a treatment), treatment type/the approach to intervention (i.e., cognitive behavioral therapy, interpersonal therapy, psychoeducation, supportive counseling, humanistic, couple-focused, parent training, yoga/massage, and vitamin or mineral supplementation), and whether or not the intervention was compared to a control condition.
Offspring outcome variables
The included studies measured a wide range of constructs related to birth outcomes and neonatal health, with many having been measured by only a single study; we selected for the meta-analyses those measures that are most commonly reported in association with depression in pregnant women and the development of psychopathology in offspring and excluded idiosyncratic measures of birth outcomes and neonatal health. We coded all measures of child functioning into one of the following a priori defined domains, selected based on the most commonly reported child outcomes associated with depression: (a) birth weight; (b) gestational age/preterm delivery; (c) Apgar (at 5 min); (d) perinatal (obstetrical, birth, or delivery) complications; (e) perinatal mortality; (f) neonatal neurobehavioral functioning; (g) dysregulation (fussy/difficult, problems in eating or sleeping or regulating negative emotional states/soothability, and negative affectivity); (h) socioemotional competence (adaptive behavior, compliance, attention regulation, mastery motivation, empathy, emotional awareness, and prosocial skills); (i) cognitive, motor, language development, and developmental milestones; (j) emotional or behavioral problems (e.g., anxiety or externalizing); (k) qualities of relationship with the mother (engagement, responsiveness, and attachment classification); and (l) cortisol. As an additional check, a clinical developmental psychologist with expertise in assessment of functioning in infants and children grouped the various measures and scores into global categories and agreed with the authors on their particular placements of study variables within these categories. For all included studies, we coded the time of measurement (i.e., immediately after intervention, at delivery, or at later postnatal follow-up) and the offspring age at the time of measurement.
Study quality
We assessed risk of bias for each study using the Methods Guide for Effectiveness and Comparative Effectiveness Reviews and the Cochrane Collaboration's “Risk of Bias” assessment tool (Higgins et al., Reference Higgins, Altman, Gøtzsche, Jüni, Moher, Oxman and Sterne2011; Owens et al., Reference Owens, Lohr, Atkins, Treadwell, Reston, Bass and Helfand2011). Using the tool's categories of low, moderate, high, or unclear risk of bias, we evaluated each study's risk of bias overall and in five domains: selection, performance, detection, attrition, and reporting. We used the overall bias categorical score in the analyses.
Meta-analytic method
We created a database using the Comprehensive Meta-Analysis (CMA) program, Version 3.3 (Borenstein, Hedges, Higgins, & Rothstein, Reference Borenstein, Hedges, Higgins and Rothstein2014). Given that studies varied in terms of the format in which they reported the data for the effect of the intervention on depression in mothers or child functioning, we implemented a set of decision rules based on established guidelines (Borenstein, Hedges, Higgins, & Rothstein, Reference Borenstein, Hedges, Higgins and Rothstein2009). First, if researchers specified only that no significant effects were found, but otherwise provided statistics necessary to permit the calculation of an exact effect size, we used the conservative strategy of assigning a p value of .99. Second, if researchers provided the statistical significance range of a result (e.g., p < .05), we assigned a p value lying at the midpoint of the statistical significance range, using a strategy described by Stanley and Doucouliagos (Reference Stanley and Doucouliagos2012). Third, if researchers did not provide data necessary to calculate an effect size for a particular child outcome, then the outcome from that study was excluded from the analysis specific to that outcome (e.g., Cooper et al., Reference Cooper, De Pascalis, Woolgar, Romaniuk and Murray2015, for infant mental development, which was reported as a nonsignificant difference without provision of means and standard deviations). Fourth, if studies included nondepressed comparison groups as well as a depressed comparison group (or at risk for depression, in the case of preventive interventions), we extracted the data for the comparison between the treatment group and the depressed comparison group. Fifth, if a study included only a nondepressed control group (or not at risk, for preventive intervention studies), the control group was treated as an uncontrolled comparison group and we relied on the pre-/post- comparison of changes in depression within the treatment group only. Sixth, we took several steps to avoid nonindependence of data, following guidelines from Borenstein et al. (Reference Borenstein, Hedges, Higgins and Rothstein2009, Chap. 22). That is, when the same participant (the pregnant woman or the child) provided data from multiple measures of a particular outcome, treating the data points as though they were independent would result in incorrect estimates of the variance for the overall effect size. In order to avoid nonindependence, when a study included more than one measure of depression and both were depression rating scales that yielded continuous scores, we computed an average score across those measures so that each study contributed only one effect size to the analyses. In contrast, when a study included both an interview and a rating scale measure of depression, we only included the rating scale measure in the meta-analyses. Similarly, when a study measured a child functioning outcome with both continuous and categorical variables (e.g., mean gestational age at delivery and percent with preterm delivery), we used the continuous score in analyses. When studies provided multiple values for a particular child outcome (i.e., multiple scores from a measure of one outcome), we used the CMA program to compute composite scores, which corrects the variance by taking into account the assumed relationship among the outcomes. To further avoid nonindependence, in studies that yielded effect sizes for both birth weight and gestational age, we only included birth weight, given that birth weight is highly correlated with gestational age and is the stronger predictor of infant survival (Wilcox & Skjaerven, Reference Wilcox and Skjaerven1992). Studies provided effect sizes for birth weight more frequently than gestational age, and all studies that included gestational age also reported birth weight. Similarly, for studies that provided effect sizes for both birth weight and rate of preterm deliveries (n = 2), we only included birth weight. Finally, one study provided four cortisol values (i.e., morning, evening, average, and slope); to avoid nonindependence, we only included cortisol slope.
Analysis of effect of intervention on depression in pregnant women
As our manipulation check, we meta-analytically examined the effect of the interventions on depression in pregnant women. We used the CMA program to calculate the pooled mean effect size, an unbiased estimate of the population effect size, and to examine the heterogeneity of effect sizes for the effect of intervention on depression in pregnant women at posttest. Because we expected heterogeneity across the studies, we used the random effects pooling model. For all analyses, we report the range of effects and show forest plots (see Figure 2).
![](https://static.cambridge.org/binary/version/id/urn:cambridge.org:id:binary:20180925163941905-0010:S0954579418000536:S0954579418000536_fig2g.jpeg?pub-status=live)
Figure 2. (Color online) Forest plot for depression outcomes.
To test the extent of inconsistency of findings across the included studies due to chance findings, relative to true effects, we used CMA to calculate the I 2 statistic, which indicates “the extent to which confidence intervals from the different studies overlap with each other” (Borenstein, Higgins, Hedges, & Rothstein, Reference Borenstein, Higgins, Hedges and Rothstein2017, p. 11). The I 2 statistic is the inverse of the proportion of variance that would remain if we could eliminate sampling error and, thereby, observe the true effect size, with small I 2 indicating little inconsistency of findings across studies and larger values indicating greater inconsistency, with the latter potentially due to differences between subgroups of studies. It has advantages over Cochran's Q in that it is not dependent on the number of studies (Liberati et al., Reference Liberati, Altman, Tetzlaff, Mulrow, Gotzsche and Ioannidis2009). To test the effect of study quality, we treated the bias rating as a moderator in meta-regression using CMA.
Analysis of effect of intervention on child functioning
We used the CMA program to obtain an unbiased estimate of the population effect size and to examine the heterogeneity of effect sizes for the effect of intervening to treat or prevent depression in pregnant women on child functioning. Because we expected heterogeneity across the studies, we used the random effects pooling model. As with the meta-analysis of effects of interventions on depression, to test the extent of inconsistency of findings across the included studies of child functioning, we used CMA to calculate the I 2 statistic. To examine our hypothesized moderators, we tested whether the effect sizes varied in relation to the proposed moderators using subgroup analyses.
Analysis of the conceptual model: Mediation of effect on children by effect on mothers’ depression
To test the proposed conceptual model, we conducted a meta-regression analysis, treating the effect size for child functioning as the dependent variable and the effect size for the effects of intervention on mothers’ depression as the predictor.
Publication bias
We tested the potential role of publication bias by using the CMA program to compute Fail-safe N, create funnel plots, and conduct Trim and Fill procedures. These procedures yield an estimate of the effect size after taking publication bias into account.
Results
Characteristics of included studies
The 25 eligible studies included a total of 27,342 participants, M = 1,243 (SD = 4,012), with a range from 26 to 19,030 participants per study. Mean age of mothers was 27.32 years (SD = 2.20); mean age of children was 6.65 months (SD = 17.68).Footnote 3 Additional study characteristics are summarized in Table 1. Although we reliably categorized the approach to intervention, each category of intervention approach was notably broad. For example, among studies coded as “cognitive behavioral therapy” (n = 8), number of sessions ranged from 4 to 16, 25% conducted group relative to individual sessions, most tailored the cognitive behavioral therapy to be perinatal specific or to focus on stress reduction, interventionists ranged from trained professional mental health providers to community health workers (n = 1) or a range of trainees (n = 1), several of these were implemented in poverty samples (in the United States or abroad), and several studies provided “add ons” such as including partners in one session or conducting review/booster sessions in the postpartum period. Thus, we consider analyses with intervention approach to be tentative and interpret these findings cautiously. Many studies reported on more than one child functioning domain, and thus, we do not report percentages. Aspects of child functioning that were reported in three or more studies are in bold. Table 2 displays the individual study characteristics.
Table 1. Characteristics of sample of included studies (n = 25)
![](https://static.cambridge.org/binary/version/id/urn:cambridge.org:id:binary:20180925163941905-0010:S0954579418000536:S0954579418000536_tab1.gif?pub-status=live)
Note: One study included two separate samples, and their samples and study characteristics are reported separately in this table; one study reported baseline data for one measure of depression and not the other. Many studies reported on more than one child functioning domain. RCT, randomized clinical trials.
Table 2. All included studies
![](https://static.cambridge.org/binary/version/id/urn:cambridge.org:id:binary:20180925163941905-0010:S0954579418000536:S0954579418000536_tab2.gif?pub-status=live)
Note: M age, mean age of mothers (in years). Kashanian et al. (2017) and Khalafallah et al. (2012) report median age of mothers. DM, how depression was manipulated (preventive intervention [PI] or treatment [T]). GA, mean gestational age (in weeks) at mother's treatment onset. Sample size, sample size total at enrollment (sample size at enrollment for treatment group/sample size at enrollment for control group), if applicable. TDM, type of depression measure (scale or interview). CLM, development, cognitive, language, motor development, and developmental milestones. CSQ, Child Sleep Questionnaire. IBQ, Infant Behavior Questionnaire. PSI, Parental Stress Index. CTL, control. Bayley-III, Bayley Scales of Infant and Toddler Development, 3rd Edition. ICQ, Infant Characteristics Questionnaire. BSQ, Behavior Screening Questionnaire. SSP, Ainsworth Strange Situation procedure. MDI, Bayley Scales of Mental Development. OCS, Obstetric Complications Scale. FTII, Fagan Test of Infant Intelligence. SCAS, Spence Children's Anxiety Scale. SDQ, Strengths and Difficulties Questionnaire. WPPSI FSIQ, Wechsler Preschool and Primary Scale of Intelligence Full-Scale IQ. ASQ-3, Ages and Stages Questionnaire, 3rd Edition. ASQ-SE, Ages and Stages Questionnaire: Social–Emotional. IBQ-R, Infant Behavior Questionnaire—Revised. *Out of 16, all deaths were fetal deaths except for one at 15 months.
Effects on depression
In terms of the effect of intervention on depression or depression risk in pregnant women, our manipulation check, the 25 studies yielded 27 effect sizes for effects of intervention on depression, given that two studies compared the intervention group with two different control groups (Field et al., Reference Field, Diego, Hernandez-Reif, Medina, Delgado and Hernandez2012; Kenyon et al., Reference Kenyon, Jolly, Hemming, Hope, Blissett, Dann and MacArthur2016). The results of the random effects meta-analysis yielded a pooled effect size for all interventions across studies of g = 0.48, 95% CI [0.29, 0.68], p = .001; range of effect sizes from –0.20 to +2.51 (see Table 3 and Figure 2). Heterogeneity of this outcome was high, I 2 = 97.07, indicating inconsistency across the effect sizes. Given the small number of studies and the large inconsistency, we considered whether the removal of outliers would reduce the inconsistency. We reran the analysis eliminating one study (Kenyon et al., Reference Kenyon, Jolly, Hemming, Hope, Blissett, Dann and MacArthur2016), which, despite a small mean difference between groups, with very small standard deviations, had yielded a large effect size. The overall effect size with that study eliminated was g = 0.38, 95% CI [0.27, 0.49], p = .001, I 2 = 87.35. Given the relatively minor changes with the removal of that study, we report the remaining analyses with all studies included. Subgroup analyses revealed that the overall effect size did not vary by intervention type (see Table 3).
Table 3. Effects on depression outcomes
![](https://static.cambridge.org/binary/version/id/urn:cambridge.org:id:binary:20180925163941905-0010:S0954579418000536:S0954579418000536_tab3.gif?pub-status=live)
Note: N, number of effect sizes. CI, confidence interval. I 2, the variation in the mean effect size attributable to heterogeneity. Q-value is for the between-groups test.
The fail-safe N was large (3,137), and inspection of the funnel plot did not suggest publication bias. Using Duval and Tweedie's Trim and Fill method left the overall effect size estimate unchanged. Subgroup analyses revealed that study quality (bias) was associated with a significant difference in effect sizes, with high quality/low bias studies having a higher effect size than moderate quality/bias studies (see Table 3).
Effects on child functioning
In terms of the effect of intervention on child functioning, the 25 included studies yielded 55 effect sizes across the different child functioning measures once nonindependence decisions and composites were taken into account. The results of a random effects meta-analysis yielded a pooled effect size for all child functioning constructs across studies of g = 0.15, 95% CI [0.09, 0.21], p < .001 (see Table 4 and Figure 3). Effect sizes ranged from –0.40 to +1.04. In terms of variation in this outcome, tests revealed that heterogeneity was moderate, I 2 = 57.87.
![](https://static.cambridge.org/binary/version/id/urn:cambridge.org:id:binary:20180925163941905-0010:S0954579418000536:S0954579418000536_fig3g.jpeg?pub-status=live)
Figure 3. (Color) Forest plot for child outcomes.
Table 4. Effects on child outcomes
![](https://static.cambridge.org/binary/version/id/urn:cambridge.org:id:binary:20180925163941905-0010:S0954579418000536:S0954579418000536_tab4.gif?pub-status=live)
Note: N, number of effect sizes. CI, confidence interval. I 2, the variation in the mean effect size attributable to heterogeneity. Q-value is for the between-groups test.
The included studies varied considerably in terms of domain of child functioning outcome, child age, treatment versus preventive intervention, and intervention approach, enabling us to test moderation of the effect of intervention on child functioning by these four constructs (Table 1). To test moderation by child functioning outcome domain, intervention type, and intervention approach, we conducted a set of subgroup analyses; to test moderation by child age (a continuous variable), we conducted a set of meta-regressions (Table 4).
In terms of the three subgroup analyses (see Table 4), first, effect sizes varied significantly by child functioning category. Individual functioning domains for which we were able to calculate three or more effect sizes and that yielded statistically significant effect sizes in association with intervention were Brazelton and dysregulation. Second, effect sizes for child functioning did not vary significantly by intervention type. Third, effect sizes for child functioning varied significantly by intervention approach category. Intervention approaches for which we were able to calculate three or more effect sizes and that yielded statistically significant effect sizes in association with intervention were supportive counseling and yoga/massage.
Based on meta-regression, child age was a statistically significant moderator, explaining 16% of the variance in the overall effect size, Q model = 7.29, df = 1, p = .007. Effect sizes of intervention on child functioning were larger for younger children than older children.
The fail-safe N was large (627), and inspection of the funnel plot did not suggest publication bias. Using Duval and Tweedie's Trim and Fill method left the overall effect size estimate unchanged. Study quality (bias) was not a significant moderator of the effect size for child outcomes, as shown in Table 4.
Mediation of effect on children by effect on mothers’ depression
The test of the proposed conceptual model, a meta-regression analysis, revealed that the effect size for the effects of intervention on mothers’ depression did not significantly predict the effect size for child functioning (dependent variable), Q model = 2.74, df = 1, p = .10. That is, effect sizes for maternal depression intervention effects did not explain significant amounts of variance in the effect sizes for child functioning.
Discussion
In an effort to understand the extent to which depression in women during pregnancy acts mechanistically to increase risk for adverse outcomes for offspring, we considered treatment studies as the equivalent of experimental manipulations to test the effects of change in women's depression on changes in child outcomes. In this way, we proposed to add to the literature on humans that relies on cross-sectional and longitudinal studies (i.e., correlational designs). The clear limitation of such studies is that they are, essentially, not experiments and, therefore, preclude causal inference about the relationship between changes in depression among mothers and offspring outcomes. By contrast, by reviewing intervention study designs, which involve direct manipulations (interventions) of the purported mediating variable (depression or depression risk in pregnant women), we were able to provide a direct test of causal processes. Our investigation also supported a test of causality due to the temporal precedence involved in our conceptual model. We examined whether an intervention during pregnancy affects the mediator (depression among mothers during pregnancy) and whether the changes in the mediator are associated with changes in offspring functioning postnatally (Kraemer, Stice, Kazdin, Offord, & Kupfer, Reference Kraemer, Stice, Kazdin, Offord and Kupfer2001; MacKinnon, Lockwood, Hoffman, West, & Sheets, Reference MacKinnon, Lockwood, Hoffman, West and Sheets2002).
We found that intervention to prevent or treat depression in women during pregnancy was associated with a reduction in depressive symptom levels, with a medium effect size. This finding is consistent with the results from recent reviews of interventions to treat depression during pregnancy (van Ravesteyn et al., Reference van Ravesteyn, Lambregtse-van den Berg, Hoogendijk and Kamperman2017) and from recent tests of interventions to prevent depression during pregnancy (Dimidjian et al., Reference Dimidjian, Goodman, Felder, Gallop, Brown and Beck2016). Given support for the effectiveness of interventions to treat or prevent depression in pregnant women, we proceeded to test the extent to which the interventions were associated with child functioning.
In support of our primary aim, we found that depression interventions for pregnant women had a significant, albeit small, effect on children's functioning. This finding supports the value to offspring of both researchers’ and clinicians’ continuing efforts to develop and deliver effective interventions to prevent or treat depression in women during pregnancy. Offspring benefit from women's participation in interventions that help to reduce or prevent perinatal depression.
The two child functioning constructs for which there were three or more effect sizes and that were significantly associated with treatment for depression both implicated offspring dysregulation. The Brazelton Neonatal Behavioral Assessment Scale (NBAS; Brazelton, Reference Brazelton1984) is a widely used measure of potential consequences of prenatal exposures and captures newborns’ difficulties attending and orienting to stimuli, including faces and voices, which may relate to later qualities of interaction with the mother and others. The NBAS is also an index of dysregulation, potentially an early sign of problems with self-regulation with possible fetal origins (Wachs, Pollitt, Cueto, & Jacoby, Reference Wachs, Pollitt, Cueto and Jacoby2004). Newborns’ NBAS scores have been prospectively associated with adaptability at 4 months (Tirosh, Harel, Abadi, Berger, & Cohen, Reference Tirosh, Harel, Abadi, Berger and Cohen1992), sensory threshold and adaptability at 8 months (Koniak-Griffin & Rummell, Reference Koniak-Griffin and Rummell1988), and at 1 year of age with adaptability (Jones & Parks, Reference Jones and Parks1983) and likelihood of anxious/resistant relative to secure attachment relationships (Waters, Vaughn, & Egeland, Reference Waters, Vaughn and Egeland1980). Consistent with the idea of the NBAS being an index of dysregulation, the other of the two key child functioning constructs (i.e., three or more effect sizes and was significantly associated with treatment for depression) was the set of child functioning measures categorized as dysregulation. Taken together, the findings on the NBAS and those on dysregulation suggest that intervening to prevent or treat depression during pregnancy may alter the fetal neurobehavioral functioning implicated in the transmission of risk from depression in pregnant women to the risk for development of psychopathology in the offspring. Thus, our findings are consistent with the Barker fetal programming hypothesis.
We found that both preventive and treatment interventions were associated with statistically significant effect sizes for child functioning, with no significant difference between the two types of interventions. This finding supports an approach to intervention during pregnancy whereby acutely depressed women are engaged with treatment and women at risk for depression are engaged with preventive interventions. Such an approach requires routine periodic screening during pregnancy, not only for current (acute) depressive symptom levels, but also for depression risk. The latter would include history of depression as well as psychosocial stressors associated with depression onset or recurrence. More broadly, our findings are consistent with the idea that prevention and treatment of depression in pregnant women is also a preventive intervention for the offspring, in terms of their physical health and risk for the development of psychopathology. This has important implications for clinicians who treat pregnant women and may not be considering their potential role in contributing to the prevention of psychopathology in the children of the women they treat (Zalewski, Goodman, Cole, & McLaughlin, Reference Zalewski, Goodman, Cole and McLaughlin2017).
Analyses by age of child at the time of outcome measurement revealed that depression intervention contributed to a stronger effect size for the functioning of younger, rather than older, children. This finding suggests that preventing or treating depression in women during pregnancy has effects on early life outcomes, which matter greatly in that they have important implications for the development of psychopathology. However, the moderating role of child age, with stronger effects of intervention on child functioning outcomes measured at younger ages, suggests that intervention during pregnancy alone may not be sufficient. That is, sustained benefits to functioning of older children may require women engaging with ongoing interventions or routine repeated depression assessments that trigger any necessary further treatment (Goodman & Garber, Reference Goodman and Garber2017).
The review also revealed a mitigating concern with regard to support for the Barker hypothesis. We found that the effect size of the impact of intervention on maternal depression did not significantly predict effect size for child functioning. In other words, although intervention was associated with a small effect size for child functioning, showing that children benefit from intervention for mothers, child effect sizes did not vary as a function of mothers’ effect sizes. A positive interpretation of this finding is that engaging women in effective interventions is what matters for children. In contrast, the finding fails to support a dose-dependent response relationship in that greater efficacy of the depression intervention for mothers’ depression was not associated with greater differences in the functioning of the children. This will be an important question to revisit in relation to future interventions that are able to demonstrate enhanced effects of treatment or prevention on depression in pregnant women. Moreover, more comprehensive tests of fetal programming would also take into consideration the potential role of a mismatch between the fetal and postnatal environment characteristics, which may be associated with maladaptive tendencies for particular contexts and, thereby, promote the development of psychopathology (Ellis, Boyce, Belsky, Bakermans-Kranenburg, & van IJzendoorn, Reference Ellis, Boyce, Belsky, Bakermans-Kranenburg and van IJzendoorn2011). Important as well is the consideration that some children are likely more sensitive to the environment, including prenatal environment, and thus, children may differ in the extent to which they benefit from their mothers’ depression intervention even during pregnancy.
An optimistic finding from this review is that most studies were of high quality (low bias). Nonetheless, it is concerning that the handful of studies with lower quality (moderate bias) yielded higher effect sizes for the effects of interventions on depression, relative to the majority of studies, which were of high quality (low bias). However, bias was not significantly associated with effect sizes for child functioning.
Our findings are consistent with the broader literature on the benefits for children when mothers receive treatment for depression, during the postnatal period and beyond. In particular, in a meta-analytic review of nine randomized controlled trials that compared the effects of psychotherapy relative to control on children's outcomes, therapy was associated with greater mental health in children with a small effect size (Cuijpers et al., Reference Cuijpers, Weitz, Karyotaki, Garber and Andersson2015). The effect size we found was also comparable to the size of effect of interventions to prevent depression in mothers or fathers on depressive and internalizing symptoms of children (up to 18 years of age) at postintervention (Loechner et al., Reference Loechner, Starman, Galuschka, Tamm, Schulte-Körne, Rubel and Platt2018). Further, the considerable variability across studies suggests that treatment of mothers’ depression alone may not be sufficient. This is consistent with Goodman and Garber's (Reference Goodman and Garber2017) proposed model for an integrated approach to depression treatment for mothers and their children.
A limitation of our review is that there were numerous aspects of child functioning for which there were insufficient numbers of effect sizes to be able to interpret an overall effect size. In particular, we were disappointed to not be able to examine effects of intervention on fetal functioning. The one study that included a measure of fetal functioning, fetal activity at 36 weeks, did not provide the data needed to compare the treated group to matched controls (Field et al., Reference Field, Diego, Hernandez-Reif, Schanberg and Kuhn2004). Broadly speaking, important next steps are for more intervention studies among pregnant women to include measures of child functioning. Special attention should be given to the aspects of child functioning that are theorized to indicate risk for, or early signs of, the later development of psychopathology. In particular, it would be of value for future studies to include measures that were reported most frequently in the studies in this review to expand the evidence base and allow for future moderation analyses (e.g., birth outcomes, NBAS, cognitive and motor development, dysregulation, and relationship quality). Nonetheless, it is positive that a broad range of child functioning was measured across the eligible studies, consistent with the understanding of the important role of multifinality in developmental psychopathology (Cicchetti & Rogosch, Reference Cicchetti and Rogosch1996).
We were limited in our ability to address whether the particular type of intervention mattered for child outcomes. Although we were able to reliably categorize the studies’ therapeutic approaches (e.g., cognitive behavioral therapy, and supportive counseling), we found tremendous variability within most of the categories of approaches. Nonetheless, we found that the type of intervention was significantly associated with differences in effect sizes for the effect of intervention on child functioning. The specific findings need replication and consideration of possible approach-specific mechanisms of change. We found a great deal of variability among studies classified as cognitive behavioral therapy in terms of how they were implemented; that variability likely interfered with finding that they were not associated with significant effect sizes for effects on children. In a review that included interventions for antenatal depression as well as postpartum depression or both, the authors concluded, as we did, that there was limited evidence for one treatment conferring greater benefits than others on mothers’ depression or infant outcomes (Letourneau, Dennis, Cosic, & Linder, Reference Letourneau, Dennis, Cosic and Linder2017).
Finally, our review was constrained to the first time point at which outcomes were measured in the children. Several of the included studies also have longer term outcomes, which themselves would be important to review, and it will be beneficial for future randomized trials of depression interventions among women to include long-term follow-up of offspring outcomes. In a review of interventions to prevent depression in parents (mothers or fathers), although intervention (relative to control) was associated with small but significant effects on depressive and internalizing symptoms at postintervention in children (up to age 18 years), intervention effects were no longer present in the children at longer term follow-ups, either up to 12 months or 15–72 months postintervention (Loechner et al., Reference Loechner, Starman, Galuschka, Tamm, Schulte-Körne, Rubel and Platt2018). Based on similar concerns about retention of gains from interventions, Goodman and Garber (Reference Goodman and Garber2017) recommended a set of steps toward enhancing the durability of intervention effects for parents.
The review suggests several important steps to improve and expand the evidence base relevant to the impact of depression in women on offspring. First, we found no studies that directly tested the extent to which changes in depression among women mediated change in offspring outcomes. Second, we also found no studies that measured any of the purported mechanisms by which depression among women during pregnancy may impact offspring development. For example, no studies reported measures of changes, in relation to the intervention, in women's ANS and HPA axis functioning, fetal cortisol levels, placental enzymes and other aspects of placental functioning (inflammatory cytokines), intestinal microbiota, serotonin exposure, or epigenetic changes. In all, only 1 of the 25 eligible studies measured any of these purported mediating mechanisms. In that one, Field et al. (Reference Field, Diego, Hernandez-Reif, Schanberg and Kuhn2004) proposed a model whereby their intervention (massage therapy) would benefit offspring by way of increases in serotonin and dopamine and decreases in cortisol and norepinephrine. Although they measured and reported levels of each of these stress hormones, they did not test mediation. Thus, an important next step in this line of research is to propose and test mediational models. These would include not only studies of experimental interventions designed to change any of the potential pregnancy-related mediators (e.g., reducing pregnant women's stress reactivity and enhancing their coping skills; see Isgut, Smith Alicia, Reimann Eduardo, Kucuk, & Ryan, Reference Isgut, Smith Alicia, Reimann Eduardo, Kucuk and Ryan2017; Richter et al., Reference Richter, Bittner, Petrowski, Junge-Hoffmeister, Bergmann, Joraschky and Weidner2012, for two examples) but also postnatal functioning of the mother, her quality of parenting the infant, the stress context of child rearing (Hammen, Reference Hammen, Goodman and Gotlib2002), quality of her relationship with the child's father/her romantic partner, and so forth. A review of the literature on mediation studies for child depression treatments proposed a ladder of evidence, based on various research design features (Maric, Wiers, & Prins, Reference Maric, Wiers and Prins2012). Due to the paucity of the literature, and our effort to be as inclusive as we deemed to be reasonable, many of the studies we included in this review fall low on the ladder (i.e., do not provide the strongest evidence for mediation).
Third, despite antidepressants being the most common treatment for depression during pregnancy, and also being commonly used preventively among pregnant women with depression prior to pregnancy, we were unable to include them in this review since they did not meet the inclusion/exclusion criteria. Given ethical constraints on randomization to medicine during pregnancy, the numerous studies examining effects of medication treatment during pregnancy on infant and later child functioning (e.g., Oberlander & Vigod, Reference Oberlander and Vigod2016) typically were naturalistic, prospective longitudinal studies that used medication exposure as a variable to predict adverse fetal or infant functioning, rather than potential benefits to offspring of the mother being treated. These studies suffer from confounding by indication, in that more severely depressed women are more likely to be given antidepressant medication.
In summary, in this paper, we critically examined the state of the evidence regarding the neurobiological and behavioral changes in fetal and infant functioning as a function of treatment or prevention of depression in women during pregnancy. The findings support the understanding that women's prenatal depression is a critical risk factor for infants’ vulnerabilities to, and early signs of, the development of psychopathology. They provide the empirical support to justify further investment in reducing and preventing depression in women during pregnancy. Further, they justify the design and implementation of studies designed to test specific putative mechanisms in the transmission of risk from depression in mothers during the prenatal period to the development of psychopathology in offspring.