The commentary by Beauchaine and Slep (Reference Beauchaine and Slep2018) on our article, “Does the Incredible Years reduce child externalizing problems through improved parenting? The role of child negative affectivity and serotonin transporter linked polymorphic region (5-HTTLPR) genotype” (Weeland et al., Reference Weeland, Chhangur, Jaffee, van Der Giessen, Matthys, Orobio de Castro and Overbeek2018), raises important questions about intervention research: What are the mechanisms of change underlying interventions and how do we test them? What factors moderate intervention effects? Which children might benefit the most from the interventions? The commentary points to several interesting issues in this regard that we would like to address, highlighting key points of agreement and responding to issues that we feel require clarification.
The Incredible Years Parenting Training: The Observational Randomized Control Trial of Childhood Differential Susceptibility (ORCHIDS) Study
We agree with Beauchaine and Slep (Reference Beauchaine and Slep2018) that there is extensive evidence that the parenting training program The Incredible Years (IY) is effective in decreasing disruptive behavior in children. Previous publications by our research team have demonstrated the effectiveness of the IY program across different Dutch intervention settings and in different families (Leijten, Raaijmakers, Orobio de Castro, Van den Ban, & Matthys, Reference Leijten, Raaijmakers, Orobio de Castro, Van den Ban and Matthys2017; Menting, Orobio de Castro, Wijngaards-de Meij, & Matthys, Reference Menting, Orobio de Castro, Wijngaards-de Meij and Matthys2014; Posthumus, Raaijmakers, Maassen, van Engeland, & Matthys, Reference Posthumus, Raaijmakers, Maassen, van Engeland and Matthys2012). Based on these results, IY received the highest status in the Dutch database of evidence-based interventions (https://www.nji.nl/nl/Databank/Databank-Effectieve-Jeugdinterventies/Erkende-interventies/Incredible-Years-(Basis).html).
IY was therefore specifically selected as the intervention in the ORCHIDS study, a study about Gene × Environment interactions (G × E) in the development of children's disruptive behavior (see for our a priori hypotheses: Chhangur, Weeland, Overbeek, Matthys, & Orobio de Castro, Reference Chhangur, Weeland, Overbeek, Matthys and Orobio de Castro2012). The ORCHIDS study (N = 387) successfully implemented the 14-session (15 including the booster session) IY basic program in an indicated prevention setting, with families screened for being at risk for the development of early child behavior problems. In total, 197 parents were randomized in the intervention condition and offered the program of which 153 actively participated or attended at least 1 session. These participants attended on average 11.01 (SD = 3.69) out of 15 sessions (74% attended at least 10 sessions) (the average of the total 197 families randomized in the IY group was indeed 8.6 sessions; see Weeland, Chhangur, van der Giessen, et al., Reference Weeland, Chhangur, van der Giessen, Matthys, Orobio de Castro and Overbeek2017).
The results of the ORCHIDS study again demonstrated that IY is effective (we published these findings in Weeland, Chhangur, van der Giessen, et al., Reference Weeland, Chhangur, van der Giessen, Matthys, Orobio de Castro and Overbeek2017) in improving parenting behavior and decreasing child disruptive behavior in a real-world setting, using stringent intent-to-treat analyses (i.e., accounting for all families enrolled in the study, minimizing Type I error, and allowing for better generalizability of study results to clinical practice; cf. CONSORT guidelines by Moher, Schulz, Altman, & Lepage, Reference Moher, Schulz, Altman and Lepage2001). The effect sizes in the ORCHIDS study were fully in line with previous studies in (indicated) prevention settings (Menting, Orobio de Castro, & Matthys, Reference Menting, Orobio de Castro and Matthys2013), ranging from small to moderate effects: from Cohen's d = 0.02 (child prosocial behavior at posttest) to 0.46 (parent-reported negative parenting behavior at posttest; see Weeland, Chhangur, van der Giessen, et al., Reference Weeland, Chhangur, van der Giessen, Matthys, Orobio de Castro and Overbeek2017). In the ORCHIDS study the effects of IY sustained over time, up to 4 months after the intervention (Weeland, Chhangur, van der Giessen, et al., Reference Weeland, Chhangur, van der Giessen, Matthys, Orobio de Castro and Overbeek2017) and recently obtained follow-up data demonstrated that there are even follow-up effects up to 2.5 years after the intervention (Van Aar et al., Reference Van Aar, Leijten, Orobio de Casstro, Weeland, Matthys, Chhangur and Overbeek2018).
We agree with Beauchaine and Slep that the severity of children's disruptive behavior (in our study ranging from high/borderline to clinical) and the intervention dosage might be important moderators of IY effectiveness. This is why in our 2017 publication we reported analyses on the ORCHIDS data in which these variables, together with children's sex, family socioeconomic status, and parental marital status, were tested as moderators of IY effectiveness (Weeland, Chhangur, van der Giessen, et al., Reference Weeland, Chhangur, van der Giessen, Matthys, Orobio de Castro and Overbeek2017). Results of these moderation analyses indicated that, when controlling for confounding effects of the different moderators, only dosage (i.e., the number of sessions that parents attended) was a moderator of IY intervention effects. Parents who attended more IY sessions reported less negative parenting behavior and reported and showed more positive parenting behavior than parents who attended fewer sessions. Based on these and previous results, we explicitly concluded that IY is an effective prevention strategy to reduce child disruptive behavior. However, we do also state that there is still much to learn about for whom and how IY works. The goal of our paper in Development and Psychopathology was to shed light on this by assessing moderation (for whom does it work) and mediation (how does it work).
For Whom It Works: G × E
Our research aim was to test whether some children benefitted more from the intervention due to their genetic makeup and/or temperament because they are more susceptible to changes in parenting behavior and/or parental affect. Children's genotype and temperament were used as moderators of the intervention effect and parenting behavior and parental affect as mediators (i.e., putative mechanisms of change) of the intervention effects. Although we did not use “traditional” interaction analyses (i.e., by using a cross-over product term between environment and genes) to assess moderation by genotype, we did test G × E interplay. Using a multigroup approach, we first tested whether our models as a whole differed between certain subgroups based on children's genotype and temperament by testing whether a multigroup model (in which all pathways are freely estimated within groups) fitted the data better than a single-group model (in which all pathways were forced to be equal across groups; Ryu & Cheong, Reference Ryu and Cheong2017; Shelleby & Shaw, Reference Shelleby and Shaw2014). If a multigroup model significantly fitted better than a single-group model, we tested whether the coefficients of the mediation paths were significantly stronger, or less strong, in one group versus the others.
A multigroup approach to moderation has some important strengths. First of all, testing group differences for the entire model as a first step prevents unnecessary analyses, by testing many interactions/pathways and/or adding many interaction variables to your model, and therefore decreases Type I errors due to chance findings. Second, multigroup models allow testing categorical group differences (such as a three-category genetic variable: SS, SL, and LL). This is an important strength compared to using categorical variables in a cross-over product term. The latter has been criticized, specifically in studies on G × E, because these models assume (a) equal variances across the genotypes, (b) a linear relationship between the genotype and outcome, and (c) (in case an interaction occurs) that the parameters of the different genotypes cross each other at the same point. This thus also comes with the risk that the statistical model mismatches the underlying biology of the interaction and might lead to erroneous results (see for critical notes on cross-over product analyses, Aliev, Latendresse, Bacanu, Neale, & Dick, Reference Aliev, Latendresse, Bacanu, Neale and Dick2014; Dick et al., Reference Dick, Agrawwal, Keller, Adkins, Aliev, Monroe and Sher2015; Ryu & Cheong, Reference Ryu and Cheong2017; Salvatore & Dick, Reference Salvatore and Dick2015).
As emphasized by Beauchaine and Slep in their commentary, the results of our G × E analyses might be important. Our results show that the intervention interacts with children's genotype in predicting negative parenting behavior at posttest (but not child behavior, positive parenting behavior, or parental affect). This suggests that not all parents benefit equally from IY: some parents show a steeper decrease in negative parenting behavior after participating in IY than others, and this might be predicted by their children's genetic makeup (no moderation by temperament was found). However, as we acknowledge in our paper, the different groups based on genotype are small, as well as the differences in the effects of IY on parenting, between these groups. Moreover, our results show contrasting effects of children's 5-HTTLPR genotype in the models using parent-reported negative parenting behavior as mediator and those using observed negative parenting behavior as mediator. Parents of children homozygous for the 5-HTTLPR long allele reported the largest decrease in negative parenting after participating in IY of all parents. In contrast, compared to parents of short allele carriers, these parents showed lower observed decreases in negative parenting during the parent–child interactions at posttest. The implications of these findings are further discussed in our paper.
How It Works: Mediation by Parental Behavior and Parental Affect
Testing mechanisms of change underlying intervention effects comes with many challenges. Recent statistical advances led to more sophisticated strategies for testing mediation compared to the traditional causal step approach (Baron & Kenny, Reference Baron and Kenny1986) or joint test of significance (Stone & Sobel, Reference Stone and Sobel1990). Specifically, these newer strategies (e.g., macros for various programs such as SPSS and SAS, structural equation modeling, including path models and parallel process growth curve models) allow testing both direct and indirect effects without relying on multiple tests to infer mediation, and most of them enable calculations of bootstrapped confidence intervals for the indirect effects. These different approaches to testing mediation all have different strengths and limitations, possibly leading to different findings. We agree that the cross-lagged panel models we selected are a rigorous test of mediation, which when combined with small to moderate intervention effects might lead to conservative estimates and modest power. Although power estimates for complex path models are not well defined in the literature, there are several easy-to-implement tools to gain some insight into the expected power (e.g., published specifications on needed sample size for testing mediation; Fritz & McKinnon, Reference Fritz and MacKinnon2007). These specifications show that, given that in our model the pathways between the experimental condition and mediator are modest in size and the pathways between mediator and outcome are small in size, a sample size of 368 should result in a .80 power to test mediation (not taking into account the distribution of the data). This (cautiously) suggests that our sample of 387 should be sufficient to test mediation.
However, in general, the Achilles’ heel of intervention studies on mediation might not be the use of ineffective statistical strategies or a lack of statistical power to assess mediation, but rather not taking into account the timeline of change, and thus not being able to infer causal mechanisms of change (Kazdin & Nock, Reference Kazdin and Nock2003). In many mediation papers, the mediator and outcome variables are assessed at the same time. For example, in a previous paper by our team, we found that IY was most effective in decreasing disruptive behavior between pretest and follow-up for boys (not girls) who carried more rather than fewer plasticity alleles, and did so especially when parents manifested large positive changes (relative to other parents) in parenting between pretest and follow-up (Chhangur et al., Reference Chhangur, Weeland, Overbeek, Matthys, Orobio de Castro, van der Giessen and Belsky2017). However, based on this analysis, we cannot be certain that this decrease in externalizing behavior followed the increase in positive parenting behavior of their parents. Similarly, the results of the pioneering study by Bell, Shader, Webster-Stratton, Reid, and Beauchaine (Reference Bell, Shader, Webster-Stratton, Reid and Beauchaine2018; described in the commentary) are very important in showing that IY improved not only parenting behavior but also children's resting respiratory sinus arrhythmia and pre-ejection period reactivity to incentives and that such changes in parenting and child reactivity were interrelated. These results however do not infer causal order (and thus cannot establish the mechanism of change), as both changes in parenting behavior and children's respiratory sinus arrhythmia and pre-ejection period reactivity were measured between pre- and posttest.
From a statistical point of view, taking into account the timeline of change results in stronger inference about the direction of causation in comparison to strategies in which mediator and outcome are assessed at the same time point or over the same time period (e.g., as is done for instance in parallel process modeling). In the case of the latter strategy, no temporal order can be established between changes in the mediator and outcome (Kazdin, Reference Kazdin2007; Pek & Hoyle, Reference Pek and Hoyle2016; Selig & Preacher, Reference Selig and Preacher2009). Although we might have good reasons to suspect that the intervention-induced changes in parenting behavior precede and are responsible for changes in child behavior and/or reactivity to incentives, strictly speaking, without a time line between change in the mediator and outcome, we cannot tell why the change occurred (Kazdin, Reference Kazdin2007; Kazdin & Nock, Reference Kazdin and Nock2003).
From a theoretical point of view, we do not expect single-unidirectional causal relations between parenting practices and child behavior; likewise, the putative mechanisms of change underlying parenting intervention might also be more complex (Burke & Loeber, Reference Burke and Loeber2016; Kazdin, Reference Kazdin2007; Rimestad, O'Toole, & Hougaard, Reference Rimestad, O'Toole and Hougaard2017; Settipani, O'Neil, Podell, Beidas, & Kendall, Reference Settipani, O'Neil, Podell, Beidas and Kendall2013). Although BPT interventions such as IY directly target parenting behavior (and do not directly target child behavior), it is still possible that changes in child behavior during the intervention precede (further) changes in parenting behavior or that both changes in parenting and child behavior are explained by another (unmeasured) variable. For example, a recent Danish study on IY showed that a decline in attention-deficit/hyperactivity disorder symptoms between pre- and midtreatment predicted an increase in parental self-efficacy at posttreatment (and not the other way around). In this case, improved parenting behavior might thus follow, instead of predict, child symptom reductions via an increase in parental self-efficacy (Rimestad et al., Reference Rimestad, O'Toole and Hougaard2017). Moreover, the association between changes in child behavior and parenting in the same time interval (during the intervention period) might be explained by another, unmeasured variable. For instance, the intervention might increase parents’ perceptions of support, which in turn might decrease parenting stress, which causes parents to change their evaluation of their own parenting and their children's disruptive behavior.
We would like to argue that using cross-lagged panel models to test mediation has important strengths. Most important, it takes into account prior levels of both the mediator and outcome variables, partialing out stable aspects of, and prior changes in, these variables (thus allowing to assess a timeline of change; Cole & Maxwell, Reference Cole and Maxwell2003; Pek & Hoyle, Reference Pek and Hoyle2016; Wu, Carroll, & Chen, Reference Wu, Carroll and Chen2017). In addition to establishing temporal order, a strength of cross-lagged panel models is that they assess bidirectional relations between mediators and outcomes. In our mediation model, we predicted child behavior at 4-month follow-ups (controlling for prior levels of child behavior at pretest and immediate posttest) by parenting behavior and parental affect at posttest (directly after the intervention and controlling for parenting behavior at pretest). Beauchaine and Slep are correct that this model tests whether parenting behavior and parental affect at posttest predict subsequent reductions in disruptive child behavior between posttest and follow-up. In this model we thus allowed IY-induced changes in parenting behavior 4 months (which included a “booster” session organized 1 month after the last session) to show its effects on child behavior. By choosing this approach, we opted for a relatively stringent, temporally informative approach to mediation.
Although we found that IY has positive effects on parenting behavior and parental affect, in our six cross-lagged panel modelsFootnote 1 we did not find evidence that changes in parenting behavior or parental affect explained the changes in child behavior in our sample. We agree with Beauchaine and Slep that this neither means that parenting and child behavior do not influence each other, nor that changes in parenting behavior or affect do not (indirectly) play a role in the intervention effects on child behavior. Our findings do suggest that in our sample changes in parenting and child behavior after IY might rather be parallel than sequential processes and/or that other (unmeasured) mechanisms of change might also underlie the changes in child disruptive behavior.
In our view cross-lagged panel models are a valid and valuable approach to mediation analyses in examining intervention effects.Footnote 2 This approach has been used often by other scholars to analyze mediated intervention effects (e.g., Hesser, Hedman-Lagerlöf, Andersson, Lindfors, & Ljótsson, Reference Hesser, Hedman-Lagerlöf, Andersson, Lindfors and Ljótsson2018; Mathis & Bierman, Reference Mathis and Bierman2015; Posthumus et al., Reference Posthumus, Raaijmakers, Maassen, van Engeland and Matthys2012; Shaffer, Lindhiem, Kolko, & Trentacosta, Reference Shaffer, Lindhiem, Kolko and Trentacosta2013; Te Brinke, Deković, Stoltz, & Cillessen, Reference Te Brinke, Deković, Stoltz and Cillessen2017). At the same time, however, we fully concur with Beauchaine and Slep that the outcomes of our models do raise several important questions, specifically about the timing and form of expected changes in parenting and child behavior. For example, are intervention-induced changes in child behavior and parenting behavior sequential or parallel processes? Do we expect change to be gradual and linear or to happen more suddenly (e.g., the “aha-experience”; Aderka, Nickerson, Bøe, & Hofmann, Reference Aderka, Nickerson, Bøe and Hofmann2012)? Finally, do we expect change to occur in full already during the intervention, or do parents perhaps need more time to practice and implement the strategies and tips they learn and receive during the sessions? Our findings and those of others might suggest most change occurs already during the intervention and then sustains over time (Rimestad et al., Reference Rimestad, O'Toole and Hougaard2017; Weeland et al., Reference Weeland, Chhangur, Jaffee, van Der Giessen, Matthys, Orobio de Castro and Overbeek2018).
To make the latter question even more complex, in the case of IY, the parenting strategies are discussed in a specific order. The first sessions focus on positive parenting techniques such as play and praise. Limit setting and time-out are discussed and practiced near the end of the program. Should we therefore expect that there is an increase in positive parenting strategies specifically at midtreatment, whereas perhaps decreases in negative parenting behavior and/or increases in effective limit setting should be expected only to occur at posttreatment? We might not expect that any single mechanism has a strong, simple, and linear effect on intervention outcomes, but our selected methods to test them mostly do assume this. Developing much more detailed hypotheses about what changes when and how over the course of parenting interventions might thus be essential in selecting the appropriate research design, assessment tools, and statistical models accordingly.
Future Research
An important next step in assessing change mechanisms might be to assess multiple mediators on the appropriate timescale, and not only before and after but also during the intervention (see Kazdin, Reference Kazdin2007). Unfortunately, in the ORCHIDS study we did not collect such data. Here might lie an important challenge for future randomized clinical trials (RCTs). For this, we need to determine the best timing and spacing of assessments of our mediator to capture (critical) points of change (Lemmens, Müller, Arntz, & Huibers, Reference Lemmens, Müller, Arntz and Huibers2016). In addition, we need to find a balance between optimal study design, the burden for participating families, and the risk of measurement artifacts when families are repeatedly asked to fill out the same questionnaires (see for an example on depression, Longwell & Truax, Reference Longwell and Truax2005). Because RCT designs are very costly, they might limit the opportunity and available resources for frequent, extensive, and/or multimethod assessment of mechanisms of change. Other designs might therefore be a valuable addition, such as experimental manipulations in micro-trials, component analyses, sequential multiple assignment randomized trials, or single-subject (time-series) designs (Cohen, Feinstein, Masuda, & Vowles, Reference Cohen, Feinstein, Masuda and Vowles2014; Falk & Compton, Reference Falk and Compton2016; Leijten et al., 2015; Vannest & Ninci, Reference Vannest and Ninci2015). Compared to RCTs, these designs have important advantages in terms of the needed resources, the burden on participants, and/or the required number of participants.
Another important issue is that we might have overlooked important mechanisms that, directly or via parenting behavior, might also cause changes in child behavior after BPT. It seems unlikely that any single mechanism leads to the outcomes of a comprehensive intervention such as BPT. Recent studies suggest that besides changes in parenting behavior, changes in parents’ cognitions and emotions (e.g., parental attributions, self-efficacy, emotion regulation, and stress), as well as differential treatment fidelity might also be important contributors of change in child behavior (Feldman & Werner, Reference Feldman and Werner2002; Lebowitz, Reference Lebowitz2016; Mikami, Chong, Saporito, & Na, Reference Mikami, Chong, Saporito and Na2015; Mouton & Roskam, Reference Mouton and Roskam2015; Rimestad et al., Reference Rimestad, O'Toole and Hougaard2017; Ros, Hernandez, Graziano, & Bagner, Reference Ros, Hernandez, Graziano and Bagner2016). Some of these change mechanisms might however be difficult to capture through traditional study designs using questionnaires and/or observations at only pre- and posttest. For some, we might need assessments of day-to-day or week-to-week changes. Using multiple timescale designs will allow examination of dynamic change processes (see Bamberger, Reference Bamberger2016). There are exciting methodological and technological innovations making it possible to measure such microlevel mechanisms on the appropriate timescales, such as daily diaries, experience sampling (ESM) and momentary assessments using electronically activated audio recordings (EAR; Aunola, Viljaranta, & Tolvanen, Reference Aunola, Viljaranta and Tolvanen2017; Geukes, Nestler, Hutteman, Küfner, & Back, Reference Geukes, Nestler, Hutteman, Küfner and Back2017; Manson & Robbins, Reference Manson and Robbins2017; Mehl, Reference Mehl2017). Statistical advancements make it possible to effectively model time courses and causal processes using such intensive longitudinal data (Bolger & Laurenceau, Reference Bolger and Laurenceau2013).
To conclude, assessing moderation and mediation in RCT studies comes with many challenges. The issues discussed above call for a joint research agenda for studying for whom and how our evidence-based programs work. An important next step might be to assess multiple putative mediators on different and appropriate timescales, not only before and after, but specifically also during the intervention. For this we need to state detailed hypotheses on what we think mechanisms of change are and when (at what time during or after the intervention) they occur. This might help future research to select the appropriate research design, assessment strategy, and statistical model capitalizing on methodological, technological, and statistical innovations.
The commentary by Beauchaine and Slep (Reference Beauchaine and Slep2018) on our article, “Does the Incredible Years reduce child externalizing problems through improved parenting? The role of child negative affectivity and serotonin transporter linked polymorphic region (5-HTTLPR) genotype” (Weeland et al., Reference Weeland, Chhangur, Jaffee, van Der Giessen, Matthys, Orobio de Castro and Overbeek2018), raises important questions about intervention research: What are the mechanisms of change underlying interventions and how do we test them? What factors moderate intervention effects? Which children might benefit the most from the interventions? The commentary points to several interesting issues in this regard that we would like to address, highlighting key points of agreement and responding to issues that we feel require clarification.
The Incredible Years Parenting Training: The Observational Randomized Control Trial of Childhood Differential Susceptibility (ORCHIDS) Study
We agree with Beauchaine and Slep (Reference Beauchaine and Slep2018) that there is extensive evidence that the parenting training program The Incredible Years (IY) is effective in decreasing disruptive behavior in children. Previous publications by our research team have demonstrated the effectiveness of the IY program across different Dutch intervention settings and in different families (Leijten, Raaijmakers, Orobio de Castro, Van den Ban, & Matthys, Reference Leijten, Raaijmakers, Orobio de Castro, Van den Ban and Matthys2017; Menting, Orobio de Castro, Wijngaards-de Meij, & Matthys, Reference Menting, Orobio de Castro, Wijngaards-de Meij and Matthys2014; Posthumus, Raaijmakers, Maassen, van Engeland, & Matthys, Reference Posthumus, Raaijmakers, Maassen, van Engeland and Matthys2012). Based on these results, IY received the highest status in the Dutch database of evidence-based interventions (https://www.nji.nl/nl/Databank/Databank-Effectieve-Jeugdinterventies/Erkende-interventies/Incredible-Years-(Basis).html).
IY was therefore specifically selected as the intervention in the ORCHIDS study, a study about Gene × Environment interactions (G × E) in the development of children's disruptive behavior (see for our a priori hypotheses: Chhangur, Weeland, Overbeek, Matthys, & Orobio de Castro, Reference Chhangur, Weeland, Overbeek, Matthys and Orobio de Castro2012). The ORCHIDS study (N = 387) successfully implemented the 14-session (15 including the booster session) IY basic program in an indicated prevention setting, with families screened for being at risk for the development of early child behavior problems. In total, 197 parents were randomized in the intervention condition and offered the program of which 153 actively participated or attended at least 1 session. These participants attended on average 11.01 (SD = 3.69) out of 15 sessions (74% attended at least 10 sessions) (the average of the total 197 families randomized in the IY group was indeed 8.6 sessions; see Weeland, Chhangur, van der Giessen, et al., Reference Weeland, Chhangur, van der Giessen, Matthys, Orobio de Castro and Overbeek2017).
The results of the ORCHIDS study again demonstrated that IY is effective (we published these findings in Weeland, Chhangur, van der Giessen, et al., Reference Weeland, Chhangur, van der Giessen, Matthys, Orobio de Castro and Overbeek2017) in improving parenting behavior and decreasing child disruptive behavior in a real-world setting, using stringent intent-to-treat analyses (i.e., accounting for all families enrolled in the study, minimizing Type I error, and allowing for better generalizability of study results to clinical practice; cf. CONSORT guidelines by Moher, Schulz, Altman, & Lepage, Reference Moher, Schulz, Altman and Lepage2001). The effect sizes in the ORCHIDS study were fully in line with previous studies in (indicated) prevention settings (Menting, Orobio de Castro, & Matthys, Reference Menting, Orobio de Castro and Matthys2013), ranging from small to moderate effects: from Cohen's d = 0.02 (child prosocial behavior at posttest) to 0.46 (parent-reported negative parenting behavior at posttest; see Weeland, Chhangur, van der Giessen, et al., Reference Weeland, Chhangur, van der Giessen, Matthys, Orobio de Castro and Overbeek2017). In the ORCHIDS study the effects of IY sustained over time, up to 4 months after the intervention (Weeland, Chhangur, van der Giessen, et al., Reference Weeland, Chhangur, van der Giessen, Matthys, Orobio de Castro and Overbeek2017) and recently obtained follow-up data demonstrated that there are even follow-up effects up to 2.5 years after the intervention (Van Aar et al., Reference Van Aar, Leijten, Orobio de Casstro, Weeland, Matthys, Chhangur and Overbeek2018).
We agree with Beauchaine and Slep that the severity of children's disruptive behavior (in our study ranging from high/borderline to clinical) and the intervention dosage might be important moderators of IY effectiveness. This is why in our 2017 publication we reported analyses on the ORCHIDS data in which these variables, together with children's sex, family socioeconomic status, and parental marital status, were tested as moderators of IY effectiveness (Weeland, Chhangur, van der Giessen, et al., Reference Weeland, Chhangur, van der Giessen, Matthys, Orobio de Castro and Overbeek2017). Results of these moderation analyses indicated that, when controlling for confounding effects of the different moderators, only dosage (i.e., the number of sessions that parents attended) was a moderator of IY intervention effects. Parents who attended more IY sessions reported less negative parenting behavior and reported and showed more positive parenting behavior than parents who attended fewer sessions. Based on these and previous results, we explicitly concluded that IY is an effective prevention strategy to reduce child disruptive behavior. However, we do also state that there is still much to learn about for whom and how IY works. The goal of our paper in Development and Psychopathology was to shed light on this by assessing moderation (for whom does it work) and mediation (how does it work).
For Whom It Works: G × E
Our research aim was to test whether some children benefitted more from the intervention due to their genetic makeup and/or temperament because they are more susceptible to changes in parenting behavior and/or parental affect. Children's genotype and temperament were used as moderators of the intervention effect and parenting behavior and parental affect as mediators (i.e., putative mechanisms of change) of the intervention effects. Although we did not use “traditional” interaction analyses (i.e., by using a cross-over product term between environment and genes) to assess moderation by genotype, we did test G × E interplay. Using a multigroup approach, we first tested whether our models as a whole differed between certain subgroups based on children's genotype and temperament by testing whether a multigroup model (in which all pathways are freely estimated within groups) fitted the data better than a single-group model (in which all pathways were forced to be equal across groups; Ryu & Cheong, Reference Ryu and Cheong2017; Shelleby & Shaw, Reference Shelleby and Shaw2014). If a multigroup model significantly fitted better than a single-group model, we tested whether the coefficients of the mediation paths were significantly stronger, or less strong, in one group versus the others.
A multigroup approach to moderation has some important strengths. First of all, testing group differences for the entire model as a first step prevents unnecessary analyses, by testing many interactions/pathways and/or adding many interaction variables to your model, and therefore decreases Type I errors due to chance findings. Second, multigroup models allow testing categorical group differences (such as a three-category genetic variable: SS, SL, and LL). This is an important strength compared to using categorical variables in a cross-over product term. The latter has been criticized, specifically in studies on G × E, because these models assume (a) equal variances across the genotypes, (b) a linear relationship between the genotype and outcome, and (c) (in case an interaction occurs) that the parameters of the different genotypes cross each other at the same point. This thus also comes with the risk that the statistical model mismatches the underlying biology of the interaction and might lead to erroneous results (see for critical notes on cross-over product analyses, Aliev, Latendresse, Bacanu, Neale, & Dick, Reference Aliev, Latendresse, Bacanu, Neale and Dick2014; Dick et al., Reference Dick, Agrawwal, Keller, Adkins, Aliev, Monroe and Sher2015; Ryu & Cheong, Reference Ryu and Cheong2017; Salvatore & Dick, Reference Salvatore and Dick2015).
As emphasized by Beauchaine and Slep in their commentary, the results of our G × E analyses might be important. Our results show that the intervention interacts with children's genotype in predicting negative parenting behavior at posttest (but not child behavior, positive parenting behavior, or parental affect). This suggests that not all parents benefit equally from IY: some parents show a steeper decrease in negative parenting behavior after participating in IY than others, and this might be predicted by their children's genetic makeup (no moderation by temperament was found). However, as we acknowledge in our paper, the different groups based on genotype are small, as well as the differences in the effects of IY on parenting, between these groups. Moreover, our results show contrasting effects of children's 5-HTTLPR genotype in the models using parent-reported negative parenting behavior as mediator and those using observed negative parenting behavior as mediator. Parents of children homozygous for the 5-HTTLPR long allele reported the largest decrease in negative parenting after participating in IY of all parents. In contrast, compared to parents of short allele carriers, these parents showed lower observed decreases in negative parenting during the parent–child interactions at posttest. The implications of these findings are further discussed in our paper.
How It Works: Mediation by Parental Behavior and Parental Affect
Testing mechanisms of change underlying intervention effects comes with many challenges. Recent statistical advances led to more sophisticated strategies for testing mediation compared to the traditional causal step approach (Baron & Kenny, Reference Baron and Kenny1986) or joint test of significance (Stone & Sobel, Reference Stone and Sobel1990). Specifically, these newer strategies (e.g., macros for various programs such as SPSS and SAS, structural equation modeling, including path models and parallel process growth curve models) allow testing both direct and indirect effects without relying on multiple tests to infer mediation, and most of them enable calculations of bootstrapped confidence intervals for the indirect effects. These different approaches to testing mediation all have different strengths and limitations, possibly leading to different findings. We agree that the cross-lagged panel models we selected are a rigorous test of mediation, which when combined with small to moderate intervention effects might lead to conservative estimates and modest power. Although power estimates for complex path models are not well defined in the literature, there are several easy-to-implement tools to gain some insight into the expected power (e.g., published specifications on needed sample size for testing mediation; Fritz & McKinnon, Reference Fritz and MacKinnon2007). These specifications show that, given that in our model the pathways between the experimental condition and mediator are modest in size and the pathways between mediator and outcome are small in size, a sample size of 368 should result in a .80 power to test mediation (not taking into account the distribution of the data). This (cautiously) suggests that our sample of 387 should be sufficient to test mediation.
However, in general, the Achilles’ heel of intervention studies on mediation might not be the use of ineffective statistical strategies or a lack of statistical power to assess mediation, but rather not taking into account the timeline of change, and thus not being able to infer causal mechanisms of change (Kazdin & Nock, Reference Kazdin and Nock2003). In many mediation papers, the mediator and outcome variables are assessed at the same time. For example, in a previous paper by our team, we found that IY was most effective in decreasing disruptive behavior between pretest and follow-up for boys (not girls) who carried more rather than fewer plasticity alleles, and did so especially when parents manifested large positive changes (relative to other parents) in parenting between pretest and follow-up (Chhangur et al., Reference Chhangur, Weeland, Overbeek, Matthys, Orobio de Castro, van der Giessen and Belsky2017). However, based on this analysis, we cannot be certain that this decrease in externalizing behavior followed the increase in positive parenting behavior of their parents. Similarly, the results of the pioneering study by Bell, Shader, Webster-Stratton, Reid, and Beauchaine (Reference Bell, Shader, Webster-Stratton, Reid and Beauchaine2018; described in the commentary) are very important in showing that IY improved not only parenting behavior but also children's resting respiratory sinus arrhythmia and pre-ejection period reactivity to incentives and that such changes in parenting and child reactivity were interrelated. These results however do not infer causal order (and thus cannot establish the mechanism of change), as both changes in parenting behavior and children's respiratory sinus arrhythmia and pre-ejection period reactivity were measured between pre- and posttest.
From a statistical point of view, taking into account the timeline of change results in stronger inference about the direction of causation in comparison to strategies in which mediator and outcome are assessed at the same time point or over the same time period (e.g., as is done for instance in parallel process modeling). In the case of the latter strategy, no temporal order can be established between changes in the mediator and outcome (Kazdin, Reference Kazdin2007; Pek & Hoyle, Reference Pek and Hoyle2016; Selig & Preacher, Reference Selig and Preacher2009). Although we might have good reasons to suspect that the intervention-induced changes in parenting behavior precede and are responsible for changes in child behavior and/or reactivity to incentives, strictly speaking, without a time line between change in the mediator and outcome, we cannot tell why the change occurred (Kazdin, Reference Kazdin2007; Kazdin & Nock, Reference Kazdin and Nock2003).
From a theoretical point of view, we do not expect single-unidirectional causal relations between parenting practices and child behavior; likewise, the putative mechanisms of change underlying parenting intervention might also be more complex (Burke & Loeber, Reference Burke and Loeber2016; Kazdin, Reference Kazdin2007; Rimestad, O'Toole, & Hougaard, Reference Rimestad, O'Toole and Hougaard2017; Settipani, O'Neil, Podell, Beidas, & Kendall, Reference Settipani, O'Neil, Podell, Beidas and Kendall2013). Although BPT interventions such as IY directly target parenting behavior (and do not directly target child behavior), it is still possible that changes in child behavior during the intervention precede (further) changes in parenting behavior or that both changes in parenting and child behavior are explained by another (unmeasured) variable. For example, a recent Danish study on IY showed that a decline in attention-deficit/hyperactivity disorder symptoms between pre- and midtreatment predicted an increase in parental self-efficacy at posttreatment (and not the other way around). In this case, improved parenting behavior might thus follow, instead of predict, child symptom reductions via an increase in parental self-efficacy (Rimestad et al., Reference Rimestad, O'Toole and Hougaard2017). Moreover, the association between changes in child behavior and parenting in the same time interval (during the intervention period) might be explained by another, unmeasured variable. For instance, the intervention might increase parents’ perceptions of support, which in turn might decrease parenting stress, which causes parents to change their evaluation of their own parenting and their children's disruptive behavior.
We would like to argue that using cross-lagged panel models to test mediation has important strengths. Most important, it takes into account prior levels of both the mediator and outcome variables, partialing out stable aspects of, and prior changes in, these variables (thus allowing to assess a timeline of change; Cole & Maxwell, Reference Cole and Maxwell2003; Pek & Hoyle, Reference Pek and Hoyle2016; Wu, Carroll, & Chen, Reference Wu, Carroll and Chen2017). In addition to establishing temporal order, a strength of cross-lagged panel models is that they assess bidirectional relations between mediators and outcomes. In our mediation model, we predicted child behavior at 4-month follow-ups (controlling for prior levels of child behavior at pretest and immediate posttest) by parenting behavior and parental affect at posttest (directly after the intervention and controlling for parenting behavior at pretest). Beauchaine and Slep are correct that this model tests whether parenting behavior and parental affect at posttest predict subsequent reductions in disruptive child behavior between posttest and follow-up. In this model we thus allowed IY-induced changes in parenting behavior 4 months (which included a “booster” session organized 1 month after the last session) to show its effects on child behavior. By choosing this approach, we opted for a relatively stringent, temporally informative approach to mediation.
Although we found that IY has positive effects on parenting behavior and parental affect, in our six cross-lagged panel modelsFootnote 1 we did not find evidence that changes in parenting behavior or parental affect explained the changes in child behavior in our sample. We agree with Beauchaine and Slep that this neither means that parenting and child behavior do not influence each other, nor that changes in parenting behavior or affect do not (indirectly) play a role in the intervention effects on child behavior. Our findings do suggest that in our sample changes in parenting and child behavior after IY might rather be parallel than sequential processes and/or that other (unmeasured) mechanisms of change might also underlie the changes in child disruptive behavior.
In our view cross-lagged panel models are a valid and valuable approach to mediation analyses in examining intervention effects.Footnote 2 This approach has been used often by other scholars to analyze mediated intervention effects (e.g., Hesser, Hedman-Lagerlöf, Andersson, Lindfors, & Ljótsson, Reference Hesser, Hedman-Lagerlöf, Andersson, Lindfors and Ljótsson2018; Mathis & Bierman, Reference Mathis and Bierman2015; Posthumus et al., Reference Posthumus, Raaijmakers, Maassen, van Engeland and Matthys2012; Shaffer, Lindhiem, Kolko, & Trentacosta, Reference Shaffer, Lindhiem, Kolko and Trentacosta2013; Te Brinke, Deković, Stoltz, & Cillessen, Reference Te Brinke, Deković, Stoltz and Cillessen2017). At the same time, however, we fully concur with Beauchaine and Slep that the outcomes of our models do raise several important questions, specifically about the timing and form of expected changes in parenting and child behavior. For example, are intervention-induced changes in child behavior and parenting behavior sequential or parallel processes? Do we expect change to be gradual and linear or to happen more suddenly (e.g., the “aha-experience”; Aderka, Nickerson, Bøe, & Hofmann, Reference Aderka, Nickerson, Bøe and Hofmann2012)? Finally, do we expect change to occur in full already during the intervention, or do parents perhaps need more time to practice and implement the strategies and tips they learn and receive during the sessions? Our findings and those of others might suggest most change occurs already during the intervention and then sustains over time (Rimestad et al., Reference Rimestad, O'Toole and Hougaard2017; Weeland et al., Reference Weeland, Chhangur, Jaffee, van Der Giessen, Matthys, Orobio de Castro and Overbeek2018).
To make the latter question even more complex, in the case of IY, the parenting strategies are discussed in a specific order. The first sessions focus on positive parenting techniques such as play and praise. Limit setting and time-out are discussed and practiced near the end of the program. Should we therefore expect that there is an increase in positive parenting strategies specifically at midtreatment, whereas perhaps decreases in negative parenting behavior and/or increases in effective limit setting should be expected only to occur at posttreatment? We might not expect that any single mechanism has a strong, simple, and linear effect on intervention outcomes, but our selected methods to test them mostly do assume this. Developing much more detailed hypotheses about what changes when and how over the course of parenting interventions might thus be essential in selecting the appropriate research design, assessment tools, and statistical models accordingly.
Future Research
An important next step in assessing change mechanisms might be to assess multiple mediators on the appropriate timescale, and not only before and after but also during the intervention (see Kazdin, Reference Kazdin2007). Unfortunately, in the ORCHIDS study we did not collect such data. Here might lie an important challenge for future randomized clinical trials (RCTs). For this, we need to determine the best timing and spacing of assessments of our mediator to capture (critical) points of change (Lemmens, Müller, Arntz, & Huibers, Reference Lemmens, Müller, Arntz and Huibers2016). In addition, we need to find a balance between optimal study design, the burden for participating families, and the risk of measurement artifacts when families are repeatedly asked to fill out the same questionnaires (see for an example on depression, Longwell & Truax, Reference Longwell and Truax2005). Because RCT designs are very costly, they might limit the opportunity and available resources for frequent, extensive, and/or multimethod assessment of mechanisms of change. Other designs might therefore be a valuable addition, such as experimental manipulations in micro-trials, component analyses, sequential multiple assignment randomized trials, or single-subject (time-series) designs (Cohen, Feinstein, Masuda, & Vowles, Reference Cohen, Feinstein, Masuda and Vowles2014; Falk & Compton, Reference Falk and Compton2016; Leijten et al., 2015; Vannest & Ninci, Reference Vannest and Ninci2015). Compared to RCTs, these designs have important advantages in terms of the needed resources, the burden on participants, and/or the required number of participants.
Another important issue is that we might have overlooked important mechanisms that, directly or via parenting behavior, might also cause changes in child behavior after BPT. It seems unlikely that any single mechanism leads to the outcomes of a comprehensive intervention such as BPT. Recent studies suggest that besides changes in parenting behavior, changes in parents’ cognitions and emotions (e.g., parental attributions, self-efficacy, emotion regulation, and stress), as well as differential treatment fidelity might also be important contributors of change in child behavior (Feldman & Werner, Reference Feldman and Werner2002; Lebowitz, Reference Lebowitz2016; Mikami, Chong, Saporito, & Na, Reference Mikami, Chong, Saporito and Na2015; Mouton & Roskam, Reference Mouton and Roskam2015; Rimestad et al., Reference Rimestad, O'Toole and Hougaard2017; Ros, Hernandez, Graziano, & Bagner, Reference Ros, Hernandez, Graziano and Bagner2016). Some of these change mechanisms might however be difficult to capture through traditional study designs using questionnaires and/or observations at only pre- and posttest. For some, we might need assessments of day-to-day or week-to-week changes. Using multiple timescale designs will allow examination of dynamic change processes (see Bamberger, Reference Bamberger2016). There are exciting methodological and technological innovations making it possible to measure such microlevel mechanisms on the appropriate timescales, such as daily diaries, experience sampling (ESM) and momentary assessments using electronically activated audio recordings (EAR; Aunola, Viljaranta, & Tolvanen, Reference Aunola, Viljaranta and Tolvanen2017; Geukes, Nestler, Hutteman, Küfner, & Back, Reference Geukes, Nestler, Hutteman, Küfner and Back2017; Manson & Robbins, Reference Manson and Robbins2017; Mehl, Reference Mehl2017). Statistical advancements make it possible to effectively model time courses and causal processes using such intensive longitudinal data (Bolger & Laurenceau, Reference Bolger and Laurenceau2013).
To conclude, assessing moderation and mediation in RCT studies comes with many challenges. The issues discussed above call for a joint research agenda for studying for whom and how our evidence-based programs work. An important next step might be to assess multiple putative mediators on different and appropriate timescales, not only before and after, but specifically also during the intervention. For this we need to state detailed hypotheses on what we think mechanisms of change are and when (at what time during or after the intervention) they occur. This might help future research to select the appropriate research design, assessment strategy, and statistical model capitalizing on methodological, technological, and statistical innovations.