Pediatric obsessive-compulsive disorder (OCD) is a debilitating condition that interferes with youth educational, social, and emotional development (e.g., Piacentini et al., 2003; Valderhaug & Ivarsson, Reference Valderhaug and Ivarsson2005). Recent meta-analyses provide strong support for clinically meaningful reduction in symptoms using cognitive behavioral therapy (CBT) with exposure and response prevention (ERP) and pharmacological treatment options (Cervin et al., Reference Cervin, McGuire, D’Souza, De Nadai, Aspvall, Goodman and Storch2024; Chorpita & Daleiden, Reference Chorpita and Daleiden2009; Freeman et al., Reference Freeman, Benito, Herren, Kemp, Sung, Georgiadis and Garcia2018; Öst et al., Reference Öst, Riise, Wergeland, Hansen and Kvale2016). However, a significant number of patients do not show clinically meaningful symptom reduction (McGuire et al., Reference McGuire, Piacentini, Lewin, Brennan, Murphy and Storch2015). In addition, although previous meta-analyses have provided important insights about potential predictors and moderators (see Figure 1 for an overview), findings remain inconsistent across studies. To further optimize outcomes for youth with OCD, more work is needed to tailor the selection of existing treatments or treatment combinations to the individual.
Answering this “what works for whom?” question requires comparative effectiveness estimates (what works) in clinically meaningful subpopulations (for whom). The standard approach to generating such estimates would involve running numerous randomized controlled trials (RCTs) comparing each available intervention in each subpopulation. Such time- and resource-intensive studies are prohibitive. The next best option is to use a patchwork of existing trial evidence to conduct meta-analyses. Meta-analytic work to date has identified several patient characteristics that are associated with better outcomes, including younger age, lower symptom severity, lower functional impairment, and better family functioning (Kemp et al., Reference Kemp, Barker, Benito, Herren and Freeman2021; McGuire et al., Reference McGuire, Piacentini, Lewin, Brennan, Murphy and Storch2015; Turner et al., Reference Turner, O’Gorman, Nair and O’Kearney2018). However, conflicting findings across different meta-analyses highlight significant barriers to evidence synthesis using aggregate data meta-analytic approaches. For example, one meta-analysis suggests that better outcomes are no longer associated with younger age when trials of the youngest children (ages 3–8) are removed (Öst et al., Reference Öst, Riise, Wergeland, Hansen and Kvale2016). Age and treatment response are thus confounded by variation in treatment approach (e.g., high family involvement in treatment for young children) and sample characteristics across trials; these relationships cannot be disentangled via conventional aggregate-data meta-analysis. Findings related to comorbidity are similarly mixed and may depend on comparison conditions used per trial. For example, McGuire et al. (Reference McGuire, Piacentini, Lewin, Brennan, Murphy and Storch2015) report improved CBT outcomes for youth with comorbid anxiety (only in trials with nonactive comparison) and youth with tic disorders (only in trials with active comparison). However, a more recent systematic review suggests attenuated CBT outcomes with the presence of any comorbidity (Kemp et al., Reference Kemp, Barker, Benito, Herren and Freeman2021).
Inconsistent findings highlight important limitations of conventional aggregate-data meta-analysis in answering “what works, for whom?” One key limitation relates to implementation differences across trials, comparisons of which have been described as “apples to oranges” (Freeman et al., Reference Freeman, Benito, Herren, Kemp, Sung, Georgiadis and Garcia2018). Individual studies have tested a variety of different CBT formats, content, and doses, in a variety of settings, and using different comparison conditions. Studies have also tested a variety of CBT augmentation treatment strategies, such as serotonin reuptake inhibitors (SRIs; e.g., clomipramine, sertraline, fluoxetine), other medications (e.g., d-cycloserine), and family therapy. Few of the above options have been directly compared in the same trial. To make such comparisons using existing trial data across studies requires making causal assumptions. Conventional meta-analysis does not have a clear causal interpretation when the distribution of effect modifiers differs among the populations underlying the included trials and the target population. Results of each trial apply to the population underlying it (reflecting the trial’s eligibility criteria and recruitment practices), and that population typically has a different distribution of effect modifiers than treatment populations (Dahabreh et al., Reference Dahabreh, Petito, Robertson, Hernán and Steingrimsson2020; Pearl & Bareinboim, Reference Pearl and Bareinboim2014). Conventional meta-analytic methods do not address these concerns.
Causally interpretable meta-analysis (CI-MA) pairs individual-participant-data (IPD) from trials and data from target populations using recent advancements in transportability methods to address limitations of conventional aggregate-data and IPD meta-analysis (see Table 1 for an overview). Transportability methods account for differences between the population underlying a trial and the target population by combining background knowledge, causal assumptions, statistical methods, data from the trials, and a sample of baseline characteristics from a target population to extend causal inferences from the trials to that target population. Robust statistical models are used to address between-trial differences in covariates, enabling the transportation of causal estimates from trial samples to target populations. Under explicit causal and statistical assumptions, which are weaker than the implicit assumptions required by conventional meta-analysis, the transported analyses provide unbiased estimates of how interventions will fair in the target population(s), enabling “apples-to-apples” comparisons of interventions even when two interventions have not been compared directly in a head-to-head trial. In addition, the target population can be defined as subpopulations, enabling evaluation of relevant interventions for those subpopulations. Importantly, this can include information about minoritized subpopulations, who have been historically underrepresented in OCD trial samples (e.g., 91% White and 99% non-Latinx in a systematic review of adult trials; Williams et al., Reference Williams, Powers, Yun and Foa2010). Thus, CI-MA represents a powerful advance in evidence synthesis and may better facilitate understanding “what works, for whom?” This approach is critically important for pediatric OCD treatment personalization but also has far-reaching implications for other conditions where low within-trial sample sizes hamper efforts to identify treatment predictors/moderators with greater specificity.
Note. MA, meta-analysis; IPD-MA, individual participant data meta-analysis; IPD-MA-T, transportability analysis.
Despite the promise of CI-MA methods, they are under-utilized. This is in part because CI-MA requires well-curated IPD, including detailed records of trial design and implementation, as well as a detailed accounting of the intervention (Barker et al., Reference Barker, Dahabreh, Steingrimsson, Houck, Donenberg, DiClemente and Brown2022). While most investigators meet the current requirements of funding agencies to make data available upon request, there continues to be great variability in the quality of documentation provided because it is costly and time-intensive to integrate datasets and documentation into archival-quality packages that meet modern standards of reproducible research. In addition, IPD needs to be harmonized to a common template (i.e., same variable names, value labels, scoring procedures). Principal investigators are best positioned to harmonize their data, but it is only recently that investigators have begun allocating appropriate funds to generate well-curated IPD data. Moreover, the majority of CBT trials in pediatric OCD occurred prior to federal data-sharing requirements; data for these have never been harmonized and are not publicly available. These seminal trials established the efficacy of CBT for pediatric OCD, and trials of this nature will not be repeated in the future. As such, harmonizing these trial data would provide a strong, unparalleled repository of evidence from which to examine treatment response in this and future studies.
Any answers that emerge from CI-MA studies are also at risk of falling through the “leaky pipeline” from research to translation into clinical practice. Unlike RCTs, which provide group-level estimates of what works for patients on average, results from CI-MA studies have the potential to more directly inform clinical decision-making for individual patients. Knowledge about which treatments work for whom could also allow for more efficient leveraging of limited available resources (i.e., offering the most effective treatment to the individual from the beginning of their treatment course). However, this potential clinical utility cannot be tapped without a strong grounding in dissemination and implementation considerations from the project’s outset. An integrated knowledge translation model (Grimshaw et al., Reference Grimshaw, Eccles, Lavis, Hill and Squires2012; Kothari, McCutcheon, & Graham, Reference Kothari, McCutcheon and Graham2017) suggests that effective dissemination and implementation efforts must involve partnerships with key end-users (i.e., clinicians, parents, patients, and advocates) who can act on study findings. Ideally, partners are involved in each project step and share the power to select meaningful outcomes, interpret analyses, and translate findings into accessible formats for other end users. Indeed, emerging evidence suggests the authentic engagement of partners as full participants in all phases of a research project can enhance the quality and resulting impact of findings (Woolf et al., Reference Woolf, Zimmerman, Haley and Krist2016). However, to date, few IPD studies have involved key partners at any stage of the research process.
Here we describe harmonization and CI-MA methods for Project Harmony, which will acquire and harmonize individual patient data from 28 RCTs of youth OCD treatments (Nexpected = 1,900), along with target data from treatment-seeking youth and young adults ages 4–20 with OCD entering the intensive and outpatient treatment programs at Bradley Hospital’s Pediatric Anxiety Research Center (PARC) and the outpatient program at the University of South Florida Rothman Center (USF; Nexpected = 853), to generate the largest harmonized IPD dataset available to date for pediatric OCD trials. Of note, expectations around the time required for harmonization projects vary widely across reviewers, funders, researchers, and other decision-makers. As misaligned expectations can be a significant barrier to obtaining funding and completing harmonization projects, we report an estimate of approximate person-hours required to date for data acquisition and harmonization. Following data harmonization, a process that is still underway for the current project, comparative effectiveness estimates will then be generated in the target population. Estimates will also be used to compare treatment effectiveness in subsamples of the target population. Subgroup constructs of primary interest were selected a priori to overlap with those examined in previous meta-analyses of OCD treatment trials (i.e., demographics, family factors, treatment history, comorbidity, OCD symptom severity/impairment, and global functioning/disability). Data-driven approaches to specific variable selection will also be used. In addition, treatment variations in clinical trials will be examined, including CBT variations (i.e., format, content, setting, dose), CBT augmentation or combination treatments, and comparison treatments (i.e., open trial, inactive, psychotherapy, medication, and treatment as usual). Variables for treatment variations will be selected using both a theoretical and data-driven approach involving detailed coding of each trial’s treatment manual by two independent coders. All de-identified harmonized trial data and syntax will be made publicly available through the NIMH Data Archive (NDA). Throughout the study, the research team will work to disseminate findings through an ongoing strategic partnership with the International OCD Foundation (IOCDF).
Methods
Overview
We describe five study phases: (1) data acquisition, (2) harmonization, (3) coding meta-data, (4) IPD analysis, and (5) dissemination (Figure 2) and detail partner involvement that occurred throughout the project.
Partner involvement
A group of Dissemination Partners (N = 8) was assembled at the project outset to represent the interests of end users and ensure that study findings are made accessible and actionable. Partners consisted of patients, parents, pediatric OCD clinicians, and leadership/advocacy representatives from IOCDF. The group will meet bi-annually during: (1) Pre-harmonization, (2) Post-harmonization/Pre-analyses, and (3) Post-analyses/Dissemination. Discussions will focus predominantly on the dissemination of findings (see “Dissemination” section).
Data acquisition
The Project Harmony team completed a one-year PCORI planning grant to build consensus for an IPD-MA in pediatric OCD. This resulted in commitments from PIs representing 28 RCTs for pediatric OCD treatments (Nexpected = 1,900) identified using a recent evidence-based update (Freeman et al., Reference Freeman, Benito, Herren, Kemp, Sung, Georgiadis and Garcia2018) and an updated search for this project using the same search strategy. Specifically, we searched Medline and PubMed (keywords: obsessive-compulsive disorder OR obsessive behavior; exposure therapy OR behavior therapy OR cognitive-behavior therapy OR treatment; AND children OR adolescents OR pediatric). Studies were included if they (1) involved a sample size >30 (including open trials), (2) focused on children and adolescents 18 years old and younger, (3) required participants to have a primary or co-primary diagnosis of OCD, and (4) reported OCD outcome measures. Studies were excluded if they were (1) non-treatment studies, (2) psychopharmacology interventions only, (3) secondary analyses, (4) case studies, or (5) reviews. Following the primary search, we conducted a hand search of the reference sections from recent meta-analyses, review articles, and studies identified through the primary search. Finally, in some instances, PIs who committed larger trials also volunteered data from smaller trials that they were able to access; thus, there are a few included studies with sample sizes less than the threshold (N > 30) we set with the systematic search. Trials represent the core of currently available evidence for behavioral pediatric OCD interventions (see Table 2 for a summary).
Note: CBT, Cognitive Behavioral Therapy; DCS, D-Cycloserine; Effectiveness, Testing efficacy in community setting; Non-Face-to-Face, Internet or phone delivered CBT; Open, Open Trial; PFIT, Positive Family Interaction Therapy; RCT, Randomized Controlled Trial.
Following preliminary outreach efforts, data-sharing agreements were acquired following the policies of each original trial’s home institution; timelines to procure data-sharing agreements varied, taking up to several months per institution (~18 months total). Once acquired, PIs of original trials provided data dictionaries, de-identified item-level IPD including all measures available (see Table 2 for overlapping measures), and treatment manuals for each trial.
To date, we have received 26 out of an expected 28 data dictionaries and 19 datasets. We are in the process of obtaining five additional datasets. The remaining four studies are international and require alternative methods for analyzing their data on-site due to restrictive data-sharing regulations.
Harmonization
The goal of harmonization is to put variables used across trials on the same metric. This is a multi-step process that includes renaming variables, using logic to equate response options, using normative data to equate measures with clinical interpretations (e.g., dichotomizing using clinical cut-offs, centering variables using clinical cut-offs, standardizing using minimal clinically important difference), and implementing complex latent variable measurement models accounting for between-trial measurement differences. Latent variable approaches to harmonization (e.g., integrative data analysis; Curran & Hussong, Reference Curran and Hussong2009) are typically performed on outcomes and not predictors or covariates. For the purposes of this project, the Children’s Yale-Brown Obsessive-Compulsive Scale (CY-BOCS; Scahill et al., Reference Scahill, Riddle, McSwiggan-Hardin, Ort, King and Goodman1997) will be the primary outcome, as it has the advantage of being assessed in all trials (Table 2), is continuously distributed, and has clear, clinically meaningful, benchmarks. We will also explore other outcomes, including global improvement, as assessed using the Clinical Global Impressions - Improvement (CGI-I; Guy, Reference Guy1976) scale. As both measures were assessed in all trials, we will not be implementing latent variable approaches to harmonization in this study and will thus focus this manuscript on the pragmatic process of ensuring common variable names, variable labels, response options, and value labels. While this process is not particularly innovative, the pragmatic process is needed in studies using harmonized data and there is minimal documentation about how much effort is required to complete these steps. The harmonization of target population data is not reflected in the time estimates below. Harmonization was directed by a clinically trained data scientist (DHB) and implemented by a bachelor’s-level research assistant (ARR), with clinical perspective and consensus input provided by a leader in OCD research (KGB). A clinical postdoctoral fellow (LAN) provided additional consensus.
Implementing a common naming convention (trial by trial)
Collectively, data dictionaries (n = 26) contained 55,396 unique variable names. The number of unique variable names per data dictionary ranged from 79–21,984; the higher number is attributable to having item-level data with unique variable names for each assessment point. Data dictionaries were restructured so that each item had the same unique name across assessments. A common naming convention was then applied to restructured variables (n = 18,896) and a crosswalk was created from each trial to the naming convention. This process took around 22 hours (range 8–40) for studies that included item-level variables (n = 22) and 3 hours (range 2–3) for studies that did not (n = 4). Time varied according to the number of variables, consistency in naming convention, data dictionary format, document type (e.g., SPSS, Word, PDF), completeness of the dictionary (e.g., missing variable labels or response options, unidentified measure versions), and number and complexity of survey-style questions relating to demographics and family/medical history. The total time for this step was ~436 hours, predominantly done by ARR, guided and trained by DHB.
Implementing common value labels and responses (measure by measure)
At the end of this first pass, the total number of unique variables was 18,896; 1,437 focused on treatment processes (i.e., assessed during the course of treatment) and were set aside to be harmonized later, leaving 17,459 for further harmonization. We used a relational database to map trial-level variables (variable names, labels, response options, etc.) onto a shared table of harmonized names, using variable names as a key to link various tables within the database. The relational database enabled us to quickly access all original variable names, labels, response options, and value labels from each trial for every harmonized name. Working measure by measure, exact wordings of variable labels, response options, and value labels were selected for each harmonized variable. On average, this step took approximately 7.5 hours per measure, with 109 measures harmonized (792 hours total). Work was predominantly done by ARR, in consultation with DHB and KGB. Time depended on the number of items per measure, the number of trials that used that measure, the degree of similarity between wordings used across studies, and measure modifications (e.g., added items, excluded items, changed item orders). The study team met weekly to consensus code discrepancies and select variable labels, response options, and value labels most consistent with published literature.
Simplifying and harmonizing across similar measures (construct by construct)
Measures were categorized according to the constructs they measured to facilitate cross-measure harmonization within each domain and to align with categories of previously assessed treatment predictors and moderators (Figure 1). Domains are outlined in Table 3 and include (1) demographics, (2) family factors (family mental health history, family accommodation, family functioning), (3) treatment factors (treatment history, concomitant treatment, CBT adherence/fidelity), (4) comorbidity (structured diagnostic interview, externalizing symptoms including inattention and hyperactivity, mood symptoms, anxiety symptoms), (5) OCD symptoms (severity, parent/child report, OCD impairment), and (6) global functioning/disability (global improvement, global severity, quality of life/disability). For trials that had multiple measures of the same construct, measures were prioritized according to how often they were used across all trials. Measures were dropped if they were used in <3 trials or did not measure one of the included constructs. If a trial used a unique measure of a construct that it did not otherwise assess, we retained the measure despite it not being used in other trials. We also retained different versions of the same measure (e.g., CDI versus CDI-2) as separate variables for later harmonization. Demographics and medical/family history were asked in survey formats (i.e., questions are qualitatively distinct and not intended to be combined into scale scores) that differed vastly across trials, precluding conventional harmonization strategies. For example, the way medications were reported ranged from listing every past and current medication by name and dosage, to listing the medication classes, to a single item indicating whether a participant had ever taken medication. To address this incredible between-study variability, we treated this process as a chart review, where we first inventoried the type of information provided by the trials, then identified key information (e.g., the specificity of medication classes) through consensus meetings, created a template of information we wanted (i.e., variable name, variable label, value labels), and finally abstracted information by trial. The abstraction process also involved coding (e.g., how medications were assigned to classes), which was done in consensus.
Note: Measures were collected at baseline and post-treatment (at minimum) in each trial
a Demographics include age, gender, race, ethnicity, income, education;
b Family Accommodation measured with the Family Accommodation Scale (FAS; parent report);
c Family functioning measured with parent/child report on the Family Assessment Device (FAD), Family Assessment Measure (FAM), Family Adaptability and Cohesion Evaluation Scale (FACES), and/or Family Environment Scale (FES);
d Diagnostic Interviews were clinician-administered using the Mini International Neuropsychiatric Interview for Children and Adolescents (MINI-Kid), Anxiety Disorders Interview Schedule (ADIS), or Schedule for Affective Disorders and Schizophrenia for School-Age Children (KSADS);
e Externalizing symptoms measured with parent report on the Child Behavior Checklist (CBCL) or Conner’s Parent Rating Scale (CPRS);
f Mood symptoms measured with child-report on the Child Depression Inventory (CDI), Beck Depression Inventory-Youth (BDI-Y), or Children’s Depression Rating Scale (CDRS);
g Anxiety symptoms measured with child and parent report on the Multidimensional Anxiety Scale for Children (MASC), Screen for Child Anxiety Related Disorders (SCARED) and/or Spence Child Anxiety Scale (SCAS);
h Clinician-rated OCD severity measured with the Children’s Yale-Brown Obsessive Compulsive Scale (CY-BOCS);
i child and parent report of OCD symptoms measured with the Obsessive-Compulsive Inventory (OCI) and/or CY-BOCS self-report;
j OCD impairment measured with parent and child report Child OCD Impact Scale (COIS);
k Global improvement measured with clinician report on the Clinical Global Impression-Improvement Scale (CGI-I);
l Global Severity measured with clinician report on the Clinical Global Impression-Severity Scale (CGI-S) or Children’s Global Assessment Scale (CGAS);
m Quality of Life measured with parent and child report on the Children’s Sheehan Disability Scale (CSDS), Education, Work and Social Adjustment Scale (EWSAS), Pediatric Quality of Life Enjoyment and Satisfaction Questionnaire (PQLESQ), Pediatric Quality of Life Inventory (PedsQL), and/or Questionnaire for Measuring Health-Related Quality of Life in Children and Adolescents (KINDL)
During weekly team meetings, determinations were made at the measure and item level about which variables to keep/drop across studies; any measures that were dropped in the process will be reported on in subsequent manuscripts. Harmonization of measures within each of the 10 domains took an average of 51 hours (~5.5 hours per measure; ~643 hours total). Work was done by ARR with weekly collaboration meetings with DHB and additional weekly consensus meetings with DHB, KGB, and LAN. Challenges encountered at this stage related to between-trial differences in the structure and content of the demographics and medical, medication, treatment, and family history. In particular, studies differed in the amount of information collected, how it was recorded (i.e., numeric, categorical, open-ended text response), level of specificity (e.g., father versus paternal family, medication names versus classes), number of reporters, and how reporters were identified (e.g., primary caregiver, mother, father, other), and definitions of demographic information across countries. After this reduction and simplification process, the harmonized data dictionary included 3,657 variables.
Wrangling IPD (trial by trial)
Working trial by trial, datasets were restructured so that each variable had the same name across assessments, resulting in multiple lines of data for the same participant with an index variable for assessment point. Restructured data retained the original variable naming for each trial. Using crosswalks developed previously, we wrote syntax to change variable names, variable labels, value options, and value labels. Scoring of measures was checked and syntax was generated so that measures had the same scoring across trials. Syntax was also generated to implement the coding of demographic, medical, and family history data. On average, this process took 52 hours per trial (range 11–92 hours; ~1,124 hours total) and was especially time-consuming due to generating unique syntax for each trial and discrepancies between the original data and corresponding data dictionaries. In addition, it was time-consuming to code the differently structured demographic, medical, and family history information. It was also challenging to investigate how various measures were scored, as different versions, item numbers, and scoring conventions were used across trials for the same measure. Work was predominantly done by ARR, guided by DHB, and with additional input from KGB.
Pulling it all together
As of manuscript submission, we are continuing to wrangle the 24 trials. Together the process took about 125 hours per trial, for a total of approximately 3,000 hours. This estimate was considerably higher than what was projected during the planning stages, which further emphasizes that harmonization of multiple variables and domains across trials is a time-consuming process that requires sufficient investment of time and money. Quality control was implemented by DHB checking work performed by ARR, including all harmonized variables and data screening to ensure that variables were within expected ranges. The iterative nature of harmonization means that quality control by multiple members of the team is built into the process.
Data coding
We will use the Cochrane Revised Risk of Bias tool (RoB 2.0) to evaluate the risk of bias within each trial. Decisions about the inclusion of trials identified as being at elevated risk of bias will be made in consensus with the core research team and relevant trial investigators and partners. We will perform sensitivity analyses where trials with elevated risk of bias will be included and excluded from analyses. Information about each trial will be coded in accordance with Preferred Reporting Items for Systematic Reviews and Meta-Analyses (PRISMA) guidelines. We will use Covidence to organize and systematically review information about trial design and implementation. This work will be more involved than that of conventional meta-analysis, as we have access to the trial manuals and other documentation beyond those reported in peer-reviewed manuscripts. Trial manuals, study protocols, and primary outcome manuscripts were reviewed to inform the development of a coding template (Guest, MacQueen, & Namey, Reference Guest, MacQueen and Namey2012) aligned with PRISMA guidelines but including more detail about treatment type and implementation to better categorize differences in treatment approaches across trials. This coding template was then reviewed with the core study team, which included experts in pediatric OCD treatment, and updated to reflect key treatment components. Codes were informed by treatment variations in clinical trials reported on in previous meta-analyses (“top-down”) and from codes inductively generated following a review of every treatment manual (“bottom-up”) by the primary coder (LAN). Codes included general information on treatment format (group, individual), mode of delivery (in person, online), treatment duration (number of total sessions, session duration, number of sessions with in-session exposure, weekly/biweekly meetings, etc.), the inclusion of core CBT components (psychoeducation, cognitive restructuring, relaxation, rewards, relapse prevention), and degree of family involvement (type, targeting of accommodation) that will be used to inform subgroup analyses. Once the template is finalized, each trial’s documentation will be double-coded by two independent coders. Codes will be compared and discrepancies resolved recursively, with input from trial investigators as needed.
Analytic approach
To transport treatment effects from trial samples to target populations, there are required identifiability conditions and estimators. This work is based on the potential outcomes framework, which posits that each participant has an unrealized or potential outcome under each of the treatment conditions (Peterson & van der Laan, Reference Petersen and van der Laan2014; Rubin, Reference Rubin2005). In a trial, one potential outcome, at most, may be observed for each individual (because they are assigned to one treatment), while the other potential outcomes (treatments to which the individual is not assigned) remain unobserved (counterfactual). The causal parameter that will allow comparisons among interventions evaluated in different trials is the potential outcome mean (POM), which is an estimate of how the target population is expected to respond, had they received the intervention. Generating unbiased estimates of POMs requires making identifiability assumptions, building accurate statistical models to adjust for covariates, and protecting against measured confounding. The identifiability assumptions include within-trial assumptions common to causal inference using RCT designs (consistency, exchangeability of treatment conditions, positivity) and between-trial assumptions that mirror the within-trial assumptions (consistency of potential outcomes across trials, exchangeability of trial participation, and positivity of trial participation; (Barker et al., Reference Barker, Dahabreh, Steingrimsson, Houck, Donenberg, DiClemente and Brown2022; Barker et al., Reference Barker, Bie and Steingrimsson2023). These assumptions imply that after adjusting for covariates, there is no confounding that links the outcome with either trial selection or treatment assignment (Dahabreh et al., Reference Dahabreh, Robertson, Tchetgen, Stuart and Hernán2019; Dahabreh et al., Reference Dahabreh, Petito, Robertson, Hernán and Steingrimsson2020). When transporting POMs, the covariates of interest are variables that either predict the outcome or modify treatment response and that relate to trial participation or treatment assignment. In other words, to generalize a POM from one sample to another, between-trial differences in baseline characteristics affecting treatment response must be measured and adjusted using statistical modeling. If all such covariates are measured and models are accurate, then resulting estimates are unbiased and POMs can be safely transported to a sample of the target population (Barker et al., Reference Barker, Dahabreh, Steingrimsson, Houck, Donenberg, DiClemente and Brown2022; Barker et al., Reference Barker, Bie and Steingrimsson2023; Dahabreh et al., Reference Dahabreh, Robertson, Tchetgen, Stuart and Hernán2019; Dahabreh et al., Reference Dahabreh, Petito, Robertson, Hernán and Steingrimsson2020).
Importantly, identifiability conditions required for causal inference when pooling data across trials are not unique to transportability analysis and are shared with conventional meta-analytic approaches, although they are rarely made explicit. Advances in transportability analysis can capitalize on IPD from each trial and the target population to make weaker assumptions than those implicitly required by conventional meta-analysis. For example, identifiability assumptions can hold conditional on covariates, and doubly robust estimators can be used to provide more robust statistical estimates (Barker et al., Reference Barker, Bie and Steingrimsson2023). The approach also justifies making inferences in populations that differ from the trial populations and making novel comparisons among treatments within subpopulations using unbiased estimates of POMs for each treatment in the target population or subpopulations. The advantages of transportability methods over meta-analysis and individual participant data meta-analysis are summarized in Table 1.
Generate comparative effectiveness estimates in the target population
Generating comparative effectiveness estimates in the target population requires identifying key covariates, building working models (i.e., predictive models that are part of a statistical estimator), using the models to transport estimates from trials to the target population, and performing subgroup analyses.
Identifying covariates. Covariates will include treatment modifiers identified by previous literature (see Figure 1). One of the limitations in the pediatric OCD literature is that within-trial sample sizes do not support robust evaluation of treatment modifiers. Pooling data across trials can help overcome this limitation, but only for variables that were assessed in multiple trials of the same treatment. There is thus a tension between evaluating a wide range of possible covariates within single trials and evaluating a more limited number in a pooled sample. We will use two data-adaptive approaches to variable selection. First, we will integrate the pooled dataset using regularized linear models (i.e., Lasso). This approach will include covariates included in a majority of the trials. The second approach will again use Lasso models with data from the larger trials, which will enable us to include a larger number of potential covariates.
Building working models. Two types of models are used in causal estimators of treatment effects in the target population: models that predict the outcome and models that predict trial assignment (e.g., propensity scores). It is also possible to use both types of working models to provide a doubly robust estimator that is unbiased if either of the two types of models is correctly specified (Dahabreh et al., Reference Dahabreh, Robertson, Tchetgen, Stuart and Hernán2019; Dahabreh et al., Reference Dahabreh, Petito, Robertson, Hernán and Steingrimsson2020). Within each subset of trials that evaluated the same treatment, we will use Lasso models to build both prediction and trial assignment models to enable the use of the double robust estimators.
Transporting estimates. Once working models are built for each type of treatment, we will fit a doubly robust causal estimator to estimate POMs for the target population (Barker et al., Reference Barker, Bie and Steingrimsson2023; Dahabreh et al., Reference Dahabreh, Robertson, Tchetgen, Stuart and Hernán2019; Dahabreh et al., Reference Dahabreh, Petito, Robertson, Hernán and Steingrimsson2020). Essentially, the baseline covariates from the target population are used in the working models to generate an estimate of what the POM would be, had the target population received the intervention, given the modeled covariates. Average treatment effects will be estimated by taking the difference between POMs. Non-parametric bootstrap sampling will be used to generate standard errors, with participants sampled with replacement from each of the trial/treatment arms. The accuracy of the transported estimates will be evaluated using a type of cross-validation where the observed treatment outcomes in each trial are used as the ground truth for other trials that evaluated the same treatment. For each treatment, each trial will be iteratively set aside and used as the target population for the other trials.
Subgroup analysis. Although other possible modifiers will be explored, POMs will be compared for each treatment condition among subsamples defined by indicators examined in previous meta-analyses, including demographics, family factors, treatment factors, comorbidity, OCD symptoms, and global functioning/disability. Estimates within subsamples are estimated using the same working models as the overall estimates but then marginalizing over those who belong to a subsample instead of marginalizing over the entire target population sample. Average treatment effects will be calculated within each sub-sample. Because the comparative effectiveness estimates within target subsamples use the same working models as the overall target sample, the analytics are a natural extension of those described above. The challenge is in understanding the implications and nuances of how treatments interact with subpopulations. For example, some treatments might not be appropriate for some subsamples (younger children). To address this challenge, the research team will work closely with the trial investigators and partners to ensure that relevant features of the treatment are well understood in the context of treating the subpopulation of interest.
Missing data. Two sources of missing data are expected: (1) within-trial missingness due to drop-out or missed assessments (expected 10%–20%; Johnco et al., Reference Johnco, McGuire, Roper and Storch2020) and (2) between-trial or systematic missing data due to differences in assessment battery. Fortunately, pediatric OCD RCTs have been using the CY-BOCS for some time, which will provide a consistent outcome and baseline symptom severity measure. Trials also used common diagnostic interviews with strong normative data, enabling us to equate the presence of comorbidities using clinical cut-offs. Other clinical measures will also be harmonized using normative data. We will address the remaining systematic missing data using current recommendations for CI-MA (Steingrimsson et al., Reference Steingrimsson, Wen, Voter and Dahabreh2024).
Dissemination
Partner involvement
During pre-harmonization, the Dissemination Partner group met twice to review project objectives, define group member roles, and identify any additional partners. During post-harmonization, the group will meet twice to (1) review the list of comparative variables that result from harmonization, (2) provide input on planned analyses to ensure utility of findings (Brownson et al., Reference Brownson, Jacobs, Tabak, Hoehner and Stamatakis2013; Concannon et al., Reference Concannon, Grant, Welch, Petkovic, Selby and Crowe2019; Noar, Harrington, & Aldrich, Reference Noar, Harrington and Aldrich2009), and (3) discuss potential dissemination streams within IOCDF’s communication tool network. During post-analyses/dissemination, the group will use feedback to iteratively improve dissemination content and strategy. Potential strategies for disseminating study findings include (1) blog and social media posts co-developed with the dissemination partner group to identify potential audiences (e.g., patients/families, providers, policymakers) and design content best targeted towards these audiences, (2) leverage outlets like IOCDF to present findings in a variety of formats (e.g., newsletter, IOCDF social media, etc.), and (3) present at IOCDF conferences that are geared towards both research and clinical (e.g., patient/families) audiences. We will continue to hone our dissemination strategy and work with our partners to explore other unique avenues. Partner suggestions to date include creating a graphic novel, a brief animated video, and music.
Publicly available data and code
Our team will facilitate the upload of de-identified harmonized datasets to the NIMH NDA and additional resources (e.g., regression models, analytic syntax) that will allow individuals to replicate study analyses within other target populations. Because data in target populations were not collected under controlled settings, they may not be a good fit for the NDA; if not accepted, data will be made available as part of published manuscripts and/or upon request.
Future clinical decision-making tools
CI-MA has distinct advantages for the development of decision tools to help match patients to optimal treatment, as this approach, unlike other meta-analytic methods, can compare interventions evaluated in different settings, with different treatment and control arms, and among different trial populations (e.g., older samples, more severe symptoms, different comorbidity mixtures). As a result, code from Project Harmony can be used to transport original study findings into new and different settings (e.g., intensive, outpatient) and for new end users. The methods also maximize information from subgroups typically under-represented in RCTs (i.e., different races, ethnicities, and co-occurring autism diagnoses), for which there is not enough data in any given trial to draw substantive conclusions. Thus, when completed, Project Harmony will be the best available dataset and analytic code upon which to build a precision intervention clinical decision-making tool for youth with OCD.
Discussion
Our experience to date with acquiring and harmonizing IPD from RCTs in pediatric OCD reinforces the idea that careful planning and understanding of harmonization is needed to accurately budget project time and resources; it is easy to be overly optimistic about the projected effort required. Project Harmony will also use data adaptive approaches to identify treatment modifiers, which will be used together with CI-MA to provide the best comparative effectiveness estimates given current evidence, including comparisons among treatments that have not been evaluated in head-to-head trials. These estimates will be generated for target populations and important subpopulations within pediatric OCD. When completed, this analysis will provide the best evidence available for what works for whom in pediatric OCD. Finally, we will coordinate with partners to facilitate the dissemination of study findings, enabling future work extending inferences to other pediatric OCD populations and developing clinical decision-making tools. Importantly, the outlined approach can also be utilized to better answer the question “What works, for whom?” with greater specificity across a range of disorders, facilitating a more personalized approach to treatment.
Acknowledgements
Kristina Aspvall, Karolinska Institutet, Stockholm, Sweden; David H. Barker, Pediatric Anxiety Research Center at Bradley Hospital, East Providence, RI, USA and The Warren Alpert Medical School of Brown University, Providence, RI, USA; Kristen Benito, Pediatric Anxiety Research Center at Bradley Hospital, East Providence, RI, USA and The Warren Alpert Medical School of Brown University, Providence, RI, USA; Scott Compton, Duke University School of Medicine, Durham, NC, USA; Lara J. Farrell, School of Applied Psychology & Griffith University Centre for Mental Health, Griffith University, South East Queensland, Australia; Martin E. Franklin, University of Pennsylvania, Philadelphia, PA, USA and Rogers Behavioral Health, Philadelphia, PA, USA; Jennifer Freeman, Pediatric Anxiety Research Center at Bradley Hospital, East Providence, RI, USA and The Warren Alpert Medical School of Brown University, Providence, RI, USA; Isobel Heyman, University College London, London, UK; David R.M.A. Højgaard, Department of Child and Adolescent Psychiatry, Aarhus University Hospital Psychiatry, Aarhus, Denmark; Joshua Kemp, Pediatric Anxiety Research Center at Bradley Hospital, East Providence, RI, USA and The Warren Alpert Medical School of Brown University, Providence, RI, USA; Fabian Lenhard, Karolinska Institutet, Stockholm, Sweden; Adam B. Lewin, University of South Florida, Tampa, FL, USA; David Mataix-Cols, Karolinska Institutet, Stockholm, Sweden and Lund University, Lund, Sweden; Joseph O’Neill, David Geffen School of Medicine, University of California at Los Angeles, Los Angeles, CA, USA; Tara S. Peris, David Geffen School of Medicine, University of California at Los Angeles, Los Angeles, CA, USA; John Piacentini, David Geffen School of Medicine, University of California at Los Angeles, CA, USA; Eva Serlachius, Lund University, Lund, Sweden and Karolinska Institutet, Stockholm, Sweden; Gudmundur Skarphedinsson, University of Iceland, Reykjavík, Iceland; Eric A. Storch, Baylor College of Medicine, Houston, TX, USA; Nor Christian Torp, Division of Mental Health and Addiction, Vestre Viken Hospital, Drammen, Norway; Peter Tuerk, Sheila C. Johnson Center for Clinical Services, Department of Human Services, University of Virginia, Charlottesville, VA, USA; Cynthia Turner, The Moore Centre, South Brisbane, QLD, Australia
Disclosures
Dr. Storch reports receiving research funding for his institution from the Ream Foundation, International OCD Foundation, and NIH. He was formerly a consultant for Brainsway and Biohaven Pharmaceuticals in the past 12 months. He owns stock of less than $5000 in NView/Proem for distribution related to the YBOCS scales. He receives book royalties from Elsevier, Wiley, Oxford, American Psychological Association, Guildford, Springer, Routledge, and Jessica Kingsley. Dr Mataix-Cols reports royalties from contributing articles to UpToDate, Inc., outside the current work. Dr. Piacentini receives book royalties from Oxford University Press, Guilford Press, and Elsevier, and research support from NIH. He is a founder/co-owner of Virtually BetterHealth, LLC receives salary support, and owns stock less than $5000 in Lumate Health. Other authors have no disclosures to report.