Introduction
A growing body of evidence has implicated the early infant gut microbiota in neurodevelopment. Reference Pronovost and Hsiao1 Recently, cross-sectional and longitudinal studies have found associations between the human gut microbiota in the first year of life and the neurodevelopmental indicators of temperament Reference Aatsinki, Lahti and Uusitupa2,Reference Christian, Galley, Hade, Schoppe-Sullivan, Kamp Dush and Bailey3 and cognitive development Reference Carlson, Xia and Azcarate-Peril4,Reference Sordillo, Korrick and Laranjo5 to age 3. Although there is no clear consensus over which microbiota characteristics are most relevant, the genus Dialister and the family Ruminococcaceae have emerged as potential mediators of early childhood behavioral problems, Reference Christian, Galley, Hade, Schoppe-Sullivan, Kamp Dush and Bailey3 with both taxa being associated with depression in adults. Reference Valles-Colomer, Falony and Darzi6 The critical period in which the microbiota may influence the gut–brain axis is not well-defined, though microbiota perturbations can alter neuroimmune development throughout prenatal, perinatal, and early life periods. Reference Pronovost and Hsiao1
Infant colic provides a good model for investigating the role of the early infant gut microbiota in neurodevelopment. Colic is defined as crying or fussing for ≥3 h/day in infants aged less than 3 months and affects up to 20% of infants. Reference Hemmi, Wolke and Schneider7 Approximately 10% of infants with colic will continue to have persistent crying beyond 3 months of age. Although multifactorial in etiology, colic is most notably associated with a gastrointestinal pathophysiology including inflammation, visceral hypersensitivity, and gut microbiota alterations. Meta-analysis has shown that the probiotic Lactobacillus reuteri DSM 17938 is an effective treatment for infant colic in exclusively breast-fed but not formula-fed infants. Reference Sung, D’Amico and Cabana8 This supports a role for the gut microbiota in infant crying, since the microbiota is known to vary on the basis of infant feeding mode. Reference Pärtty and Kalliomäki9 Indeed, reductions in bacterial diversity and stability, as well as increased abundance of microorganisms such as Klebsiella and Escherichia (of Enterobacteriaceae family), have all been found in fecal microbiota of infants with colic. Reference Savino, Quartieri and De Marco10
Although colic is a self-limiting condition, it remains controversial as to whether colic increases the risk of poor behavioral outcomes in later childhood. Reference Bell, Hiscock, Tobin, Cook and Sung11 Some studies have reported increased problems in behavior, family functioning, and negative temperamental emotions for several years following the resolution of colic. Reference Savino, Castagno, Bretto, Brondello, Palumeri and Oggero12-Reference Canivet, Jakobsson and Hagander14 However, these studies had important limitations such as the inclusion of children with persistent crying and uncontrolled confounders. Reference Bell, Hiscock, Tobin, Cook and Sung11 The hypothesized reasons for longer-term consequences of colic include biological interactions of the gut–brain axis, such as the role of gut microbes in neurodevelopment, and psychosocial effects on parental mood and parent–child bonding during a critical period of development. Irrespective of these potential longer-term consequences of colic, which remain controversial, colic is a gut–brain condition by definition of its demonstrated gastrointestinal etiology and behavioral symptomatology (i.e., crying and fussing). This broader context of the gut–brain axis in neurodevelopment is highly relevant to understanding the developmental origins of health and disease.
The present study therefore investigates the association between the gut microbiota and behavioral outcomes using data collected from a cohort of 118 infants enrolled in the Baby Biotics randomized controlled trial of a probiotic intervention for infant colic, the primary outcomes of which were reported previously. Reference Sung, Hiscock and Tang15 The aim of this study is first to describe the gut microbiota composition of infants with colic and second to identify gut microbiota signatures that associate with (a) severity of colic, (b) problem crying 4 weeks later, and (c) carer report of behavioral problems at 2 years of age. In addition to modeling bacterial diversity and abundance, we use a random forest classifier to predict future crying time from baseline fecal samples, a method which has not hitherto been used in this context.
Methods
Overview
The analyses are divided into 3 parts. For Aim 1, we perform an exploratory analysis of the community structure of the baseline gut microbiota. For Aim 2, we present an analysis of how bacterial diversity and abundances associate with behavior outcomes at 3 timepoints: baseline, 4-week behavior, and 2-year behavior. For Aim 3, we use the baseline gut microbiota to predict future behavior.
Participants
The sample comprised 120 infants aged less than 3 months (mean (M): 7.4 weeks, standard deviation (SD): 2.7 weeks) enrolled in the 4-week randomized controlled trial of Lactobacillus reuteri DSM 17938 probiotic supplementation in Melbourne, Australia, Reference Sung, Hiscock and Tang15,Reference Sung, Hiscock and Tang16 who a) had 2-year follow-up behavioral data available and b) had fecal samples collected at baseline, prior to randomization to the probiotic or control group, available for gut microbiota analyses. Two samples had insufficient mass of DNA for sequencing, resulting in a final sample size of 118. The mean read count per sample was 30709.6 (SD: 15665.4; range 3467–84382). We found no association between read depth and any of the three outcomes.
Microbiota sample preparation, processing, and sequencing
Fecal samples were collected from infant nappies and placed immediately into the caregiver’s home freezer for storage until transferred on ice following a median of 3 days to a −80°C freezer (for further details, see Reference Nation, Dunne and Joseph17 ).
DNA was extracted from 40 to 200 mg of fecal sample. An initial step of bead beating using 0.1–0.15 mm zirconia/silica beads (BioSpec Products, USA) on the Powerlyzer 24 homogenizer was performed as per the manufacturer’s instructions. About 1.2 ml Lysis buffer was added to the tubes and vortexed. Thirty microliters of Proteinase K was added and samples vortexed and incubated at 70°C for 10 min. Samples were then incubated at 95°C for 5 min, processed on the MoBio Powerlyzer for 5 min at 2000 rpm, and centrifuged for 1 min at 10,000 × g. About 800 μl of supernatant was transferred to a deep 96-well plate. DNA extraction was performed using the Chemagic™ 360 instrument according to the manufacturer’s protocol for Purification for Human Feces using 75 μl washed magnetic beads and eluted into 50 μl of buffer. DNA concentration was measured using a Qubit high-sensitivity assay and was adjusted to a concentration of 5 ng/μl.
Microbiota composition was assessed by sequencing of amplicons across the V3–V4 regions of 16S rRNA genes. Amplicons were generated using the previously described dual-indexing, variable spacer method Reference Oksanen, Blanchet and Kindt18 with the 16S rRNA gene priming sequences: forward ACTCCTACGGGAGGCAGCAG, reverse GGACTACHVGGGTWTCTAAT, and Q5 DNA polymerase (New England Biolabs). Amplification of products was performed in an Eppendorf Mastercycler using the following conditions: 98°C for 60 s then 25 cycles of 98°C for 5 s, 40°C for 30 s, 72°C for 30 s; elongation at 72°C for 10 min then held at 5°C. Sequencing of the pooled amplicons was conducted on the Illumina MiSeq platform, using 2x300 bp paired-end sequencing, according to the manufacturer’s protocol (Illumina Inc., San Diego, CA, USA).
Assessments and Measures
Crying and fussing measurement at baseline and 4 weeks
Presence of colic at study enrolment was determined using modified Wessel’s criteria: crying or fussing of 3 h or more per day for three or more days per week. Reference Wessel19 Participants were enrolled on the basis of parent report of colic using this criterion. Crying and fussing at baseline and 4-week follow-up were determined using the validated parent-reported diary measured for 24 and 48 h, respectively. Reference Barr, Kramer, Boisjoly, McVey-White and Pless20 Problem crying was defined as crying or fussing 3 h or more per day at 4-week follow-up (categorial with 2 levels: crying/fussing above this criterion or not).
Behavioral measurement at 2 years
The Child Behavior Checklist (CBCL) is a validated 99-item screening questionnaire consisting of parent-reported problem items on Internalizing, Externalizing, and Total Problems subscales. Reference Achenbach and Maruish21 The Total Problems subscale comprises scores across all problem items, including those not classified as either Internalizing or Externalizing. An elevation of behavior problems was defined as a score of T ≥ 60 (1 standard deviation above the population normed mean) on one or more of these three subscales, based on prior validation in a longitudinal study of conversion to psychiatric disorders diagnosed later in childhood, which showed high specificity (88–96%) and low to moderate sensitivity (25–34%) with this cutoff. Reference Petty, Rosenbaum and Hirshfeld-Becker22
Covariate measurement and ascertainment
We considered the covariates: mode of birth, sex, birth weight, gestational age, infant feeding type (exclusive breastfeeding, exclusive formula, or mixed), infant age at baseline, antibiotic or probiotic use in 24 h prior to sample collection, baseline crying/fussing time (for analyses of subsequent behavioral outcomes), maternal postpartum depression at baseline and 1 month (as defined by a score of greater than 9 on the Edinburgh Postnatal Depression Scale), maternal mental illness at child aged 2 years (as defined by score above the 95th percentile on the Kessler Psychological Distress Scale-6 Reference Prochaska, Sung, Max, Shi and Ong23 ), and maternal education level.
For the association analyses (Aim 2), covariates were included in multivariable models if they met criterion for associations with the outcome variables of interest (p < 0.1). Sex (for all time points) and randomization group allocation (for 4-week and 2-year follow-up) were included as standard a priori covariates, since both are considered to have strong potential effects on the subsequent gut microbiota and outcomes of interest. Of note, despite known relevance to the gut microbiome, neither infant feeding type nor birth mode met the preregistered inclusion criterion for covariates since they did not associate with the behavioral outcomes of interest.
For predictive analyses (Aim 3), we measured how well the a priori and additional criterion-based covariates could predict future outcomes alone. This is because we expect that the covariates alone can predict future behavior in an independent test set with better than chance accuracy. When evaluating the performance of a microbiota classifier, we wanted to determine whether the model accuracy is superior to the clinical covariate classifier. As such, we used the clinical covariate classifier as a reference for evaluating the goodness of the microbiota classifier. We chose not to evaluate a classifier that was trained on both the microbiota and the covariates because the extra features might make the classifier over-fit given the small sample size.
Analysis
The analysis plan for these data was preregistered on the Open Science Framework (osf.io/rnvej). The presented analysis reflects the aims and analytic strategies outlined in the preregistration with two exceptions. First, the UniFrac beta diversity metrics were calculated in favor of Bray Curtis beta diversity in order to take into account phylogenetic distance, rather than dissimilarity on the basis of counts only. Second, the linear discriminant analysis effect size analyses (LEfSe) were replaced with DESeq2, an alternative method for determining differential microbiota composition between groups. We performed a differential abundance analysis using the DESeq2 package (version 1.20.0), Reference Anders and Huber24 chosen because it was shown previously to control the false discovery rate for microbiota data. Reference Anders and Huber24,Reference Gloor, Macklaim, Pawlowsky-Glahn and Egozcue25 Prior to analysis, a number of checks of the sequencing data were undertaken to avoid bias from rare taxa. These included checks for low abundance taxa based on a cutoff of less than 10 (none removed) and lowly prevalent taxa with counts in fewer than 1% of samples (one taxon removed from the genus Parabacteroides). Comparison of demographic and birth characteristics of infants with and without behavior problems at 2 years was conducted using chi-squared tests and t-tests as appropriate.
Aim 1: Exploratory analysis of the microbiota in colic
Five measures of alpha diversity were calculated, measuring richness (count of observed taxa; Chao1 Index) and evenness (Shannon–Weaver Index, Simpson’s Index, Inverted Simpson’s Index). Reference Oksanen, Blanchet and Kindt18 After verifying that these measures all have a consistent association with key descriptive variables (mode of birth and milk feed type), further analyses proceeded with the Chao1 and Shannon–Weaver indices only to avoid multiple testing issues. These two were chosen since they are less likely to be biased by small subgroup sizes and sequencing depth differences. Reference Magurran26 Weighted and unweighted UniFrac distances with PERMANOVA were used to compare the beta diversity of the microbiota on the basis of outcomes of interest.
Aim 2: Association between microbiota and behavior
Analysis of bacterial diversity
Multivariable multiple linear regression and logistic regression models were used to model associations between alpha diversity measures and infant characteristics (i.e., early life factors and crying/behavioral outcomes).
Differential abundance analysis
We performed a differential abundance analysis using the DESeq2 package (version 1.20.0), Reference Anders and Huber24 chosen because it was shown previously to control the false discovery rate for microbiota data. Reference Anders and Huber24,Reference Gloor, Macklaim, Pawlowsky-Glahn and Egozcue25 For each behavioral outcome, a model was fitted that adjusted for the covariates in Table 1. Subjects were removed before modeling if there were missing data (removing 13 subjects [baseline], 20 subjects [4 week], and 0 subjects [2 year]; in all cases the missing data were the outcome itself). We report the FDR-adjusted p-values for the “Wald” test. We refer the reader to the Supplementary materials for a table of the results (S-Materials 1) and for all scripts needed to reproduce the analysis (S-Materials 2).
Aim 3: Prediction of future behavior
To assess whether the infant gut microbiota can predict future behavior at 4 weeks and 2 years, we used the exprso package (version 0.6.4) for the R programming language Reference Quinn, Tylee and Glatt27 to train and evaluate binary classification pipelines using either the OTU counts or the aggregated genus-level counts. OTU counts were aggregated to genus-level counts under the intuition that they provide a coarser description of the data and therefore may be less prone to over-fit on a small training set. Before classification, we used the zCompositions package to impute zeros, Reference Palarea-Albaladejo and Martin-Fernandez28 and then log-transformed the data using a centered log-ratio transformation. This transformation recognizes the microbiota data as compositional and is applied to each subject independently, thus maintaining test set independence. Reference Fernandes, Reid, Macklaim, McMurrough, Edgell and Gloor29,Reference Quinn, Erb, Gloor, Notredame, Richardson and Crowley30
The classification pipeline consists of the following steps: (1) data splitting, (2) feature reduction, (3) model training, and (4) model deployment. First, we define a training set by taking a random sample of 80% of the data. Second, we perform feature reduction using either the Student’s t-test, the Wilcoxon Rank-Sum test, or a principal components analysis (PCA). Third, we train a random forest model on the top [3, 5, 10] features using the randomForest R package (version 4.6.14). Reference Liaw and Wiener31 In the case of the t-test and Rank-Sum test, these top features are the 3, 5, or 10 variables with the biggest between-group differences; in the case of PCA, these top features are the first 3, 5, or 10 principal components. Fourth, we choose whether to use 3, 5, or 10 features (based on afivefold cross-validation of the training set), train a model with these features, and then deploy this model on the withheld test set. We measure performance as accuracy, sensitivity, specificity, precision, and area under the receiver operating curve (AUC) (computed with the ROCR package Reference Sing, Sander, Beerenwinkel and Lengauer32 ). To get a robust estimate of performance, we repeat steps 1–4 for 100 random cuts of the data and report the average. Note that for each random cut of the data, the test set remains completely independent from the training set and never informs model selection. As a reference, we also train a random forest model using the a priori and additional criterion-based covariates (see Table 1). The difference between the microbiota classifier and the covariate classifier can be interpreted as the comparative prognostic value of the microbiota measurements.
Results
The cohort comprised 118 infants (53% male) for whom complete sequencing and outcome data were available. Using dichotomized behavioral outcomes, we found no significant differences in the demographic and clinical characteristics of infants with poorer behavior at baseline, 4 weeks, or 2 years (see Table 2). A higher rate of concurrent maternal mental illness was observed in infants who had behavioral problems at age 2 (χ 2=5.06, p = 0.024). No infants had been administered antibiotics, and 3 infants had been administered a probiotic in the 24 h prior to the fecal sample collection (in all cases, the product consumed was the commercially available BioGaia with Lactobacillus reuteri DSM 17938). Sensitivity analyses excluding these three infants (conducted using diversity metrics only, due to sample size restrictions in other analyses) demonstrated the robustness of findings across all outcomes. The alluvial plot shows the trajectory of infants with respect to problem cry/fuss and behavior between baseline, 4 weeks, and 2 years (see S-Material 3), which demonstrates that a large proportion of infants with problem cry/fuss at 4 weeks also had longer cry/fuss durations at baseline.
IQR, interquartile range; SD, standard deviation.
* EPDS score >9.
a Chi-squared test, or t-test as appropriate.
b Kessler 6 score >95th percentile.
c Simulated p value based on 2000 replicates.
d Diary data of cry/fuss time were not available for these infants; however, all infants were reported to have >3-h cry/fuss per day at enrolment.
Exploratory analysis of the microbiota in colic
Alpha and beta diversity varies on the basis of infant sex, birth mode, and feeding
Variability was observed in the distribution of phyla across samples (S-Material 4). Greater alpha diversity was observed in infants who were formula-fed exclusively (Shannon: M: 2.12 [SD: 0.51], p = 0 < 0.001; Chao1: M:111.33 [SD: 26.89], p =< 0.001) or in addition to breast milk (Chao1: M: 91.35 [SD: 22.88], p = 0.037), compared to those who were fed breast milk only (Shannon: M: 1.65 [SD: 0.48]; Chao1: M: 79.73 [SD: 21.96]); see Fig. 1. There was weak evidence of greater alpha diversity in infants born by caesarean section (Shannon: M:1.93 [SD:0.56] vs 1.77 [SD: 0.51], p = 0.096). In combination, three factors (birth mode, milk feed type, and sex) accounted for 11.9% of the variance in evenness and 21.65% of the variance in richness (R2 = 0.119, F(4, 106)=4.72, p<.01; R2 = 0.217, F(4, 106)=8.6, p < 0.001, respectively).
Significant separation of microbial community structure was observed by mode of birth (weighted UniFrac: R2 = 0.023, p = 0.026; unweighted UniFrac: R2 = 0.07, p = 0.001) and type of milk feed at baseline (weighted UniFrac: R2 = 0.036, p = 0.042; unweighted UniFrac: R2 = 0.072, p = 0.001; see S-Material 5 in Supplementary materials).
Association between microbiota and behavior
Alpha diversity associates with 2-year behavioral outcome
Alpha diversity, defined here as either evenness (Shannon index) or richness (Chao1), was not significantly associated with cry/fuss time at baseline nor with problem crying/fussing at 4-week follow-up (all p > 0.1). Higher microbial evenness in the baseline infant gut microbiota was associated with an increased odds of behavioral problems at 2 years of age (Shannon–Weaver index; OR: 2.78, 95%CI: 1.06–8.10, p = 0.046), and this association persisted following adjustment with potential confounders (OR: 3.47, 95%CI: 1.24–10.88, p = 0.023). There was no significant separation in microbial compositions on the basis of either of the crying/fussing timepoints or behavioral outcome (weighted and unweighted UniFrac distances between subgroups, all p > 0.1; S-Materials 6 in Supplement).
Baseline and 4-week crying share common microbiota signatures
In total, 57 OTUs were associated with baseline crying/fussing, 75 OTUs were associated with crying/fussing at 4 weeks, and 54 OTUs were associated with the 2-year behavioral outcome (p < 0.05 after FDR-adjustment; see S-Materials 1 in Supplementary materials). Fig. 2 shows a Venn diagram of the OTUs associated with each outcome. Two OTUs – classified as Parabacteroides distasonis and Bifidobacterium animalis – associated with all 3 outcomes, though not in the same direction. However, baseline crying and 4-week crying share 20 OTUs in common. Of these, 9 OTUs were significantly enriched in both conditions, while 7 OTUs were significantly depleted in both conditions (Table 3).
This table also shows the percentage of zeros found for each OTU along with its median nonzero abundance.
Fig. 2 shows a Venn diagram of the number of OTUs associated with each outcome (p < 0.05 after FDR-adjustment) that also have the same direction of change.
Genus-level summary reveals associations with all behavioral outcomes
Fig. 3 shows a genus-level summary of the significant OTUs for each condition. Although we did not find a common unidirectional OTU signature for all 3 outcomes, several common unidirectional genus-level signatures were observed. For example, several Clostridium species are enriched in association with all outcomes. Meanwhile, many Bifidobacterium species are variably enriched and depleted. These findings suggest some convergence of microbiota associations at the genus level, even if few OTUs individually associate with all outcomes.
Fig. 3 shows a barplot of the number of OTUs that are significantly enriched or depleted in association with each behavioral outcome, summarized at the genus level.
Baseline microbiota predicts crying at 4 weeks
Random forest models were trained on the baseline microbiota to predict crying at 4 weeks (49 cases and 49 controls). Since this data set has more variables than samples, the features were reduced prior to model training. We trialed three feature reduction methods: genus-level summary, univariate feature selection, and a PCA of the training data. Table 4 shows the average cross-validation accuracy performance for each strategy (selected by inner-fold cross-validation). Training a model on the PCA loadings of the genus-level summary had the best performance during test set validation (64.95% accuracy). As a reference, a random forest model was also trained using the covariates listed in Table 1. All microbiota models had better accuracy than the reference covariate model.
All performance measures are averaged across 100 training-test set splits. The best performances are boldfaced.
Discussion
In this study, we use a combined approach of statistical and machine learning methods to evaluate the relationship between the infant gut microbiota and behavioral outcomes in a cohort of infants with colic. We found several lines of evidence that support the role of gut microbiota in colic severity. Statistical analyses demonstrated that one measure of alpha diversity (Chaos 1 index, a measure of microbial evenness) was associated with an increased odds of subsequent behavior problems at age 2, and that the abundance of several OTU-level taxa were associated with both baseline and 4-week cry/fuss time. When summarizing the data at a broader taxonomic resolution, a number of genera associated with concurrent (baseline), 4-week, and 2-year behavioral outcomes with directional concordance. Random forests demonstrated that the infant gut microbiota can predict problem crying at 4-week follow-up with up to 65% accuracy.
Crying-associated microbiota signatures and the previous literature
Although no OTUs were consistently associated with all three outcomes in the same direction, 20 OTUs significantly associated with both baseline and 4-week cry/fuss time, 16 of which are in the same direction (Table 3). Overall, 57 OTUs were associated with baseline crying/fussing. These OTUs were assigned to the 14 genera presented in Table 5. We find that several genera of microbiota – including Bifidobacterium, Clostridium, Lactobacillus, and Klebsiella – associate with colic severity in agreement with the prior literature.
Also presented in this table is the agreement with the previous literature and the overlapping association with 4-week cry/fuss time (if any). Results are derived from an OTU-level analysis but are summarized here at the genus level for ease of interpretation. The number of OTUs enriched or depleted refers to the FDR-adjusted p-values in association with baseline or 4-week crying time. All FDR and log-fold changes can be found in S-Materials 1. The “Association also present with 4-week crying time” refers to the subset of individual OTUs also found to associate with 4-week outcome (16 total; see Table 3). The “Agreement with previous literature” column has 4 categories: Novel finding (no relevant past literature found); Mixed agreement (findings lack a consistent direction); Agreement (findings tend toward the same direction); Disagreement (findings tend toward opposing directions).
For example, Clostridium and Lactobacillus taxa were found to associate with crying in previous studies. Reference Lehtonen, Korvenranta and Eerola33,Reference Pärtty, Kalliomäki, Salminen and Isolauri34 In an early study of gut microbiota, Lehtonen and colleagues produced fatty acid profiles of stool samples in colic culturing techniques. They reported more frequent colonization of Clostridium difficile in infants with colic than non-colicky controls, which passed by age 3 months. A more recent study by Pärtty and colleagues showed associations between Clostridium leptum and Clostridium coccoides and proinflammatory cytokines. Meanwhile, others have speculated that the gas-producing qualities of bacteria like Klebsiella, in conjunction with the proinflammatory properties of Gram-negative bacteria like Eubacterium, may explain their relevance to the intestinal pain and crying that characterizes colic. Reference Zeevenhooven, Browne, L’Hoir, de Weerth and Benninga35 Lactobacillus and Bifidobacterium are typically viewed as beneficial bacteria and have been associated with reduced crying outcomes in previous studies, having a demonstrated benefit for gut epithelium integrity and function as well as gastrointestinal motility. Reference Chichlowski, Guillaume De Lartigue, Raybould and Mills36 Future studies involving whole-genome shotgun sequencing could attain a higher resolution of taxonomic identification, resolve apparent contradictory findings regarding bacteria such as Lactobacillus, and may elucidate the functional role that these bacteria may have in colic. For example, deeper sequencing and metabolic analyses may enable the investigation of potential mechanistic explanations of microbial associations to colic such as hydrogen and lactate metabolism Reference Salli, Anglenius and Hirvonen37,Reference Pham, Lacroix, Braegger and Chassard38 which we were unable to test in this study.
Association with 2-year behavioral outcome
Although several OTUs were significantly associated with 2-year behavioral outcomes, none of these also associated with baseline colic severity and 4-week crying in the same direction. However, when examining patterns of association at the genus level (irrespective of the OTU in the genus), we see directional concordance across all 3 outcomes. Of note, several Clostridium OTUs are enriched in 2-year behavioral cases, and many OTUs from the Bifidobacterium genera are alternately enriched and depleted. Although these may not represent the same OTUs as those associated with the baseline and 4-week timepoints, the pattern is similar. Interestingly, both Clostridium and Bifidobacterium in the infant gut have been previously associated with neurodevelopmental outcomes. Reference Aatsinki, Lahti and Uusitupa2
Overall microbiota signature, not specific taxa, best predicts 4-week crying
Differential abundance analysis is useful for determining which bacteria taxa, if any, associate with an outcome. However, it does not tell us how well these taxa could diagnose or prognose an outcome given new data. For this, we trained random forest classifiers on the baseline infant microbiota to predict cry/fuss time at the 4-week follow-up. We trialed 3 feature reduction methods to see how they performed for this small patient cohort. Reducing dimensionality of the training set with a genus-level summary and subsequent PCA achieved the best performance at 64.95% accuracy, suggesting that the combination of many small signals within the gut microbiota helps predict future cry/fuss time better than using a few specific OTUs. The moderate accuracy reported here should be interpreted in light of two key considerations. First, the trained model is predicting future crying/fussing for colicky infants 4 weeks from fecal collection (making it a prognostic, not a diagnostic test). Second, problem crying is a behavioral syndrome, not a canonical gut disease. By comparison, classifiers trained to differentiate inflammatory bowel disease pathology from healthy guts in the presence of a standing clinical diagnosis has an AUC of ~75% across 4 gut microbiota data sets (in the presence of imbalanced class labels). Reference Duvallet, Gibbons, Gurry, Irizarry and Alm39
For completeness sake, we also trained the same random forest classifiers to predict 2-year behavior from the baseline gut microbiota (collected at 2 months of age). The classifiers had no prognostic utility (F1-score < 0.50), which may be due to the small sample size, 2-year time lag, and the multifactorial etiology of behavior. Reference Gardner and Shaw40 Importantly, while some studies report associations between biomarkers and future outcomes, the goodness of these associations is evaluated based on how well the data fit a model, and not based on how well the model generalizes to new data. Even when the data fit the model (i.e., the model is “predictive” with a good R-squared value), the model may not be useful for translation. Indeed, this study found many significant associations between OTU biomarkers and the 2-year behavioral outcome. Yet, these biomarkers do not work as a prognostic test when subjected to cross-validation. Predictive models should be carefully evaluated on new data to determine if they have prognostic utility.
Strengths and Limitations
We have employed a set of rigorous statistical analyses to understand the relationship between the gut microbiota and behavioral phenotypes in human infancy, while adjusting for confounding variables and multiple testing. Although the sample size is relatively large for an association study of its kind, it is relatively small for a machine learning analysis. The classification accuracies reported here are encouraging as they suggest that the curation of large datasets could enable the discovery of risk indices that predict future clinical events based on the early infant gut microbiota. Further, when considering applicability of these findings in independent datasets, it is important to note that our sample comprised mixed infant feeding modalities, and we did not adjust for this in the analysis. Given the differential effects of probiotic supplementation in colic on the basis of feeding modality Reference Sung, D’Amico and Cabana8 , this may be relevant to the interpretation of these results.
Future directions
Our analyses agree with the previous literature which suggests that the gut–brain axis in infancy is positively impacted by the genus Lactobacillus, is negatively impacted by genera Clostridium and Klebsiella, and has mixed associations with members of the Bifidobacterium genus (possibly due to strain-specific effects). Our machine learning results suggest that models trained on dimension-reduced data can better predict future clinical events, even if they preclude a clear understanding of mechanistic contributions. Future research should further examine the relevance of the Bifidobacterium, Clostridium, Lactobacillus, and Klebsiella genera in the infant microbiota, possibly through functional pathway analysis and whole-genome shotgun sequencing. A large data set of this kind could offer new insights into how the gut microbiota contributes to infant colic and may result in a better predictive model that translates into clinical practice.
Supplementary Material
To view supplementary material for this article, please visit https://doi.org/10.1017/S2040174420000227
Acknowledgments
AL conceived of the study and prepared the first draft of the manuscript. All authors contributed to the research questions, aims, and study design. RJM and TTH conducted the 16S sequencing. AL and TQ designed and conducted the statistical analysis. All authors contributed to the drafting and editing of the final manuscript. The authors acknowledge the contribution of Dr Nicola Angel at the Australian Centre for Ecogenomics, who completed the DNA extraction and corresponding methodological details presented in this manuscript and Dr Martin O’Hely for his useful feedback on the manuscript.
Financial Support
The original Baby Biotics study was funded by the Murdoch Children’s Research Institute, Royal Children’s Hospital Melbourne (Australia) – Centre for Community Child Health and the Georgina Menzies Maconachie Charitable Trust administered by Equity Trustees. The 16S sequencing was funded by an RMIT University Transitional Seed Grant awarded to Dr Amy Loughman, Dr Amy Reichelt, Professor Robert J Moore, and Dr Michelle Rank. AL is supported by the Wilson Foundation. Dr Valerie Sung was supported by Australian National Health and Medical Research Council (NHMRC) Early Career Fellowship [1125687]; research at the Murdoch Children’s Research Institute is supported by the Victorian Government’s Operational Infrastructure Support Program.
Conflicts of Interest
AL, TQ, MN, AR, RM, THHV, and VS have no potential conflicts to disclose. MT is past member Medical Advisory Board Nestle Nutrition Institute. past member Scientific Advisory Board Nutricia, speaker fees Abbott Nutrition, co-inventor on patent “Methods and compositions for determining and for minimizing the likelihood of development of allergy in infants”.
Ethical Standards
The authors assert that all procedures contributing to this work comply with the ethical standards of the Australian national guidelines on human experimentation and with the Helsinki Declaration of 1975, as revised in 2008, and has been approved Royal Children’s Hospital Melbourne Human Ethics Committee (HREC Ref No: 30111) and were performed in accordance to the approved protocol.