Introduction

Diarrhea is a global problem and causes about 500,000 deaths annually in children below 5 years of age, primarily (90%) in Sub-Saharan Africa and South Asia1,2. In addition to death attributed to diarrhea, recurrent enteric infections have far reaching consequences such as growth faltering, which is associated with decreased cognitive development and increased risk of death from other infectious diseases3,4.

The etiologies of diarrhea are diverse including pathogenic bacteria, vira, and protozoa1,5. In the Global Enteric Multicenter Study (GEMS), rotavirus, Shigella, enterotoxigenic Escherichia coli, and Cryptosporidium were identified as the leading causes of moderate-to-severe diarrhea across all seven low- and middle-income countries (LMICs) involved in the study6. However, pathogen detection was common even in asymptomatic children7, suggesting a more complex explanation for the development of diarrhea in many children.

The majority of diarrhea cases are acute diarrhea (AD), i.e., lasting less than 7 days8 and considerable progress had been made in decreasing childhood mortality due to AD9. However, childhood diarrhea remains a serious healthcare burden in LMICs due to the growing multidrug resistance of diarrheal pathogens10, the emergence of new pathogens11 and the absence of well-established management for prolonged diarrhea (ProD; 7–13 days of duration), and persistent diarrhea (PD; ≥14 days of duration), referred to as ProPD, when referring to the two combined12. ProPD is responsible for 36% to 54% of all diarrhea-related mortality13,14,15.

The gut microbiota (GM) is established during the first years of life16. This development is influenced by a range of factors, such as mode of birth and nutrition (breast milk vs. formula, timing, and composition of weaning foods)17. A balanced and diverse “eubiotic” GM18 is important for human health19,20 via the metabolization of otherwise indigestible dietary compounds, synthesis of vitamins, immune system regulation, and host resistance against pathogens21. Consequently, GM imbalance may have deleterious effects on the host, such as malnutrition or susceptibility to gut pathogens11,19. In children, AD is associated with GM dysbiosis22, which is characterized by decreased intestinal microbiota diversity23 as confirmed by the large GEMS study that reported decreased bacterial diversity and decreased abundance of beneficial anaerobic bacteria in AD cases compared to non-diarrheal controls11,24. The decreased bacterial diversity was apparent even after recovery from diarrhea25.

Despite many years of effort, mortality remains high for ProPD8,15,26. The role of enteric pathogens as causes of ProPD remains unclear. While some studies have identified a similar range of enteric pathogens in AD and diarrhea of longer duration12,27,28,29, others failed to identify any specific enteric pathogens associated with ProPD30. It has been hypothesized that GM dysbiosis is an important factor in ProPD etiology14,29,31,32. Interestingly, even though the management of ProPD is complex, nutritional interventions have shown promising outcomes13,31,33,34.

In this work, we take advantage of the CRYPTO-POC study, a case-control study of diagnostic accuracy for cryptosporidiosis in children that recruited 2080 children aged 0–59 months with and without diarrhea in Ethiopia35. Large, well-powered studies examining links between diarrhea and GM are relatively scarce in LMICs and there have been no studies particularly on GM composition in children with ProPD. We report the analysis of 1321 fecal samples of children available for DNA extraction and sequencing with the aim of deciphering underlying GM characteristics of children suffering from AD and ProPD compared to healthy controls. We report the GM composition of 1313 Ethiopian children under 5 years of age, a period where significant GM changes occur. Of these, 650 children have diarrhea and 663 are frequency matched, non-diarrheal controls. We demonstrate that the GM of diarrhea cases is associated with lower bacterial diversity, an enrichment of putative pathogens, and a depletion of gut commensals compared to non-diarrheal controls. Prolonged and persistent diarrhea cases are characterized by a more pronounced depletion of gut commensals, relative to acute diarrhea.

Results

Study cohort characteristics

A total of 1321 Ethiopian children aged 0–59 months were included in the present study. Out of these, 8 (0.6%) were excluded due to insufficient sequencing depth, meaning that 1313 children (99.4%) were included in the analysis. Of these, 650 (49.5%) were diarrhea cases and 663 (50.5%) were frequency matched non-diarrheal controls. Among the diarrhea cases (one with missing data on the duration of diarrhea), 554 (85%) had AD and 95 (15%) had ProPD. Cohort details including clinical and anthropometric data have been published elsewhere35,36 and are summarized in Table 1.

Table 1 Demographic/clinical factors associated with diarrhea and prolonged or persistent diarrhea in Ethiopian children aged 0–59 months

Briefly, we found that a higher proportion of children in the diarrhea group were born by cesarean section (11.4% vs. 5.7%; p = 0.0004), were born prematurely (6% vs. 1.5%; p < 0.0001), suffered from malnutrition (19.5 vs. 4.5%; p < 0.0001) and cryptosporidiosis (6.6% vs. 0.6%; p < 0.0001) compared to non-diarrhea controls. There was no significant difference on the antibiotics use between diarrhea cases and non-diarrhea controls (10% vs. 7.1%; p = 0.08) during the last week before enrollment. From those having data on the type of antibiotics used 21 (3.2%), 22 (3.4%), 3 (0.5%) diarrhea cases, and 25 (3.8%), 14 (2.1%) and 0 (0%) non-diarrhea cases consumed amoxicillin, cotrimoxazole and metronidazole, respectively.

We further analyzed data by logistic regression, and we found that premature birth (OR 3, 95% CI, 1.37, 7.1), mid-upper arm circumference ≤125 mm (OR 5.8, 95% CI, 3.01, 12.2) and if the children’s caretaker is another person than the mother (OR 2.8, 95% CI, 1.53, 5.57) all were significantly associated with diarrhea (Table 1).

The study included children aged 0–59 months—a period where the GM undergoes profound changes. The number of observed species and Shannon diversity index increased with age (p < 0.0001) (Fig. 1a) in both diarrhea cases and non-diarrheal controls (Fig. 2a, c). Bray-Curtis dissimilarity metrics also demonstrated a clear progression with age (Fig. 1b, c) for both groups (Fig. 2b, d). For younger children (up to 1 year of age), the GM was characterized by high relative abundance of Escherichia/Shigella, Bifidobacterium, and Streptococcus spp., while obligate anaerobes such as Prevotella, Faecalibacterium, and Dialister dominated the GM at older ages. This was seen among all participants combined (Fig. 1d), as well as in diarrhea cases and non-diarrheal controls when analyzed separately (Supplementary Fig. 1).

Fig. 1: Age has pronounced effect on gut microbiome diversity and composition of Ethiopian children aged 0–59 months (ntotal = 1313).
figure 1

a Observed number of zOTUs and Shannon diversity index as influenced by age. Boxplot distributions include median, min, max, 25 and 75 percentiles, and outliers (more than 1.5 IQR); b PCoA plot based on Bray-Curtis dissimilarity metrics as influenced by age; c Pairwise PERMANOVA (two-sided) between different age groups based on Bray-Curtis dissimilarity metrics (false discovery rate (FDR) corrected q values); d Genus level relative abundance of top taxa (≥2.1 %) in different age groups. ****, ***, **, and * represents unadjusted p values < 0.0001, <0.001, <0.01, <0.05 while ns: not significant, respectively (two-sided Wilcoxon rank sum test).

Fig. 2: The gut microbiome of Ethiopian children aged 0–59 months undergo maturation with increasing age irrespective of their diarrhea status.
figure 2

a, b (diarrhea cases); c, d (non-diarrheal controls). a, c observed number of zOTUs and Shannon diversity index increase with age. Boxplot distributions include median, min, max, 25 and 75 percentiles, and outliers (more than 1.5 IQR); b, d PCoA plot and pairwise PERMANOVA (two-sided) (false discovery rate (FDR) corrected q values) based on Bray-Curtis dissimilarity metrics of diarrhea cases and non-diarrheal controls stratified by age. ****, ***, **, and *, represents unadjusted p < 0.0001, <0.001, <0.01, <0.05, respectively, while ns: not significant (two-sided Wilcoxon rank sum test).

Diarrhea cases have lower gut microbial diversity than non-diarrheal controls

The GM of children with diarrhea combined (AD and ProPD) (Supplementary Fig. 2a) and cases with AD and ProPD separately (Fig. 3a), was characterized by having lower number of observed species as well as reduced Shannon diversity index compared to non-diarrheal controls (p < 0.0001). However, when comparing AD and ProPD cases we did not observe any significant differences in alpha diversity measures (p > 0.05) (Fig. 3a).

Fig. 3: Diarrhea has pronounced effect on gut microbiome diversity and composition of Ethiopian children aged 0–59 months.
figure 3

Gut microbiota characterization of children suffering from acute diarrhea (n = 554), prolonged/persistent diarrhea (n = 95), and non-diarrheal controls (n = 663). a Observed number of zOTUs and Shannon diversity index. Boxplot distributions include median, min, max, 25 and 75 percentiles, and outliers (more than 1.5 IQR); b Constrained distance-based Redundancy Analysis (db-RDA) PCoA plot based on Bray-Curtis dissimilarity metrics of diarrhea status conditioned for age in months, enrollment site, sex, enrollment season, WAM index, current breastfeeding or diarrhea in the previous month; c Pairwise PERMANOVA (two-sided) conditioned by age in months, enrollment site, sex, and enrollment season, WAM index, current breastfeeding or diarrhea in the previous month with false discovery rate (FDR) corrected q values; d Genus level relative abundance of top taxa (≥1.5 %) grouped by diarrhea status. ****, ***, **, and *, represents unadjusted p values < 0.0001, <0.001, <0.01, <0.05, respectively, while ns: not significant (two-sided Wilcoxon rank sum test).

The overall GM composition of diarrhea cases combined (Supplementary Fig. 2b, d) and cases with AD and ProPD separately (Supplementary Fig. 2c, d) differed from non-diarrheal controls (q < 0.001) as determined by pairwise permutation multivariate analysis of variance (PERMANOVA) of Bray-Curtis dissimilarity metrics. Interestingly, we also found that AD and ProPD differed significantly (q < 0.01) in GM composition (Supplementary Fig. 2c, d). Controlling for age, enrollment season, enrollment site, sex, WAM index, current breastfeeding status, and diarrhea in the previous month did not change the difference between cases (AD and ProPD) and non-diarrheal controls (q < 0.01) nor between AD and ProPD cases (q < 0.01) (Fig. 3b, c) as determined by distance-based Redundancy Analysis (db-RDA) of Bray-Curtis dissimilarity metrics.

Diarrhea cases are characterized by higher abundance of pathogens and loss of gut commensal bacteria

Differential abundance testing by DESeq2 adjusted for age group, enrollment season, enrollment site, sex, WAM index, current breastfeeding, and diarrhea in the previous month was used to determine taxa differing between children with diarrhea (AD and ProPD) and non-diarrheal controls. We found that diarrhea cases had a higher relative abundance of Escherichia spp. (16.3% vs. 8.2%; q < 0.0001), Campylobacter spp. (3.6% vs. 0.83%; q < 0.001), Streptococcus spp. (8.2% vs. 4.2%; q < 0.0001) and Veillonella dispar (6.3% vs. 3.2%; q < 0.0001). Furthermore, compared to non-diarrheal controls, the diarrhea cases were enriched in taxa associated with the oral cavity microbiota like Haemophilus parainfluenzae (1.5% vs. 1.2%; q < 0.05) (Fig. 4a, Fig. 5, Supplementary Table 1). This contrasted with the relatively lower abundance of obligate anaerobic bacterial taxa in diarrhea cases compared to non-diarrheal controls, in particular Prevotella copri (12.5% vs. 20.6%; q < 0.0001), Faecalibacterium prausnitzii (5.4% vs. 8.3%; q < 0.0001), Dialister succinatiphilus (2.0% vs. 4.5%; q < 0.0001), unclassified Lachnospiraceae (2.4% vs. 3.9%; q < 0.0001) and Bacteroides fragilis (3.6% vs. 5.1%; q < 0.0001) (Fig. 4a, Supplementary Table 1).

Fig. 4: Relative abundance of bacterial taxa at species level selected by DESeq2 differential abundance testing in Ethiopian children aged 0–59 months.
figure 4

a Diarrhea cases compared with non-diarrheal controls; b AD cases compared with non-diarrheal controls; c ProPD cases compared with non-diarrheal controls; d ProPD cases compared with AD cases. Boxplot distributions include median, min, max, 25 and 75 percentiles, and outliers (more than 1.5 IQR). ****, ***, **, and *, represents q values, <0.0001, <0.001, <0.01, <0.05, respectively, while ns: not significant and q value corrected by Benjamini-Hochberg method (two-sided Wilcoxon rank sum test). Black middle lines represent the median values. Summary statistics with exact q values are shown in Supplementary Table 1 (diarrhea cases vs. non-diarrheal controls), Supplementary Table 2 (AD vs. non-diarrheal controls), Supplementary Table 3 (ProPD vs. non-diarrheal controls), and Supplementary Table 4 (ProPD vs. AD).

Fig. 5: Specific gut microbiome profiles associated with diarrhea status of Ethiopian children aged 0–59 months.
figure 5

Heatmap depiction of differentially abundant taxa between diarrhea cases and non-diarrheal controls as determined by DESeq2 (two-sided Wald test) differential abundance testing (q < 0.05 corrected for multiple testing by the Benjamini-Hochberg method) and adjusted for age group, enrollment season, enrollment site, sex, WAM index, current breastfeeding, and diarrhea in the previous month. Subsequently, differences were tested by two-sided Wilcoxon rank sum test and corrected by the Benjamini-Hochberg method resulting in the stated exact q values found in Supplementary Table 1. Clusters: de novo clustering of the diarrhea cases and non-diarrheal controls using Canberra distance metrics and the proportions of diarrhea cases and non-diarrheal controls are determined by Chi-square test (two-sided) in each cluster with unadjusted p values.

To understand whether specific GM signatures characterize diarrhea cases and non-diarrheal controls, we used de novo clustering analysis of taxa being found by DESeq2-based differential abundance analysis to significantly differ between groups. Four overall clusters were identified and of these, two were dominated by diarrhea cases (I: 56.1% diarrhea cases; p = 0.009; III: 69.3% diarrhea cases; p < 0.0001, respectively), while one cluster was dominated by non-diarrheal controls (II: 62.2% non-diarrheal controls; p < 0.0001) (Fig. 5). The clusters dominated by diarrhea cases were enriched in Escherichia spp., Streptococcus spp., unclassified Bifidobacterium (I and III) and Campylobacter spp. (I) and characterized by a low relative abundance of obligate anaerobic gut commensals (P. copri, F. prausnitzii and D. succinatiphilus) (Fig. 6). The de novo clustering was supported by sparse partial least square discriminatory analysis (sPLS-DA), which overall showed the same clustering (Supplementary Fig. 3).

Fig. 6: Specific bacterial taxa that characterize gut microbiome clusters with a high fraction of diarrhea cases (I and III) or non-diarrhea controls (II), respectively.
figure 6

See Fig. 5 for cluster details (diarrhea cases and non-diarrheal controls). Boxplot distributions include median, min, max, 25 and 75 percentiles, and outliers (more than 1.5 IQR). ****, ***, **, and *, represents unadjusted p values < 0.0001, <0.001, <0.01, <0.05, respectively, while ns: not significant (two-sided Wilcoxon rank sum test).

Microbial taxa differing in AD cases and ProPD cases compared to non-diarrheal controls

When subgrouping the diarrhea cases according to duration of diarrhea, we found that AD cases relative to the healthy controls were characterized by an increased abundance of facultative anaerobes such as Escherichia spp. (16.3% vs. 8.7%; q < 0.0001) and Campylobacter (3.5% vs. 0.83%; q < 0.01) and a decreased relative abundance of obligate anaerobic gut commensals like F. prausnitzii (5.5% vs. 8.3%; q < 0.0001) and Prevotella copri (12.3% vs. 20.6%; q < 0.0001) (Fig. 4b, Fig. 7, Supplementary Table 2). De novo clustering analysis of the AD cases and non-diarrheal controls identified three major clusters, with one of them having a large fraction of AD cases (II: 64%). This cluster was characterized by relatively high abundance of Escherichia, Campylobacter and Streptococcus spp., whereas the cluster with a low proportion of AD cases (I: 36.2%) was characterized by higher relative abundance of P. copri and F. prausnitzii (Fig. 7). Again, sPLS-DA confirmed these patterns (Supplementary Fig. 4).

Fig. 7: Ethiopian children aged 0–59 months with AD are characterized by specific gut microbiome profiles when compared to non-diarrheal controls.
figure 7

Heatmap depiction of differentially abundant taxa between AD cases and non-diarrheal controls as determined by DESeq2 (two-sided Wald test) differential abundance testing (q < 0.05 corrected for multiple testing by the Benjamini-Hochberg method) and adjusted for age group, enrollment season, enrollment site, sex, WAM index, current breastfeeding, and diarrhea in the previous month. Subsequently, differences were tested by two-sided Wilcoxon rank sum test and corrected by the Benjamini-Hochberg method resulting in the stated exact q values found in Supplementary Table 2. Clusters: de novo clustering of the AD cases and non-diarrheal controls using Canberra distances metrics and the proportions of AD cases and non-diarrheal controls are determined by Chi-square test (two-sided) in each cluster with unadjusted p values.

Further, ProPD cases were characterized by an increased relative abundance of Escherichia spp. (15.9% vs. 8.2%; q < 0.0001), Campylobacter spp. (3.9% vs. 0.83%; q < 0.01), and Streptococcus spp. (9.0% vs. 4.2%; q < 0.0001) compared to non-diarrheal controls (Fig. 4c, Fig. 8, Supplementary Table 3). These findings were confirmed by selecting features using sPLS-DA comparing ProPD and non-diarrheal controls (Supplementary Fig. 5).

Fig. 8: Ethiopian children aged 0–59 months with ProPD are characterized by specific gut microbiome profiles when compared to non-diarrheal controls.
figure 8

Heatmap depiction of differentially abundant taxa between ProPD cases and non-diarrheal controls as determined by DESeq2 (two-sided Wald test) differential abundance testing (q < 0.05 corrected for multiple testing by the Benjamini-Hochberg method) and adjusted for age group, enrollment season, enrollment site, sex, WAM index, current breastfeeding, and diarrhea in the previous month. Subsequently, differences were tested by two-sided Wilcoxon rank sum test and corrected by the Benjamini-Hochberg method resulting in the stated exact q values found in Supplementary Table 3. Clusters: de novo clustering of the ProPD cases and non-diarrheal controls using Canberra distances metrics and the proportions of ProPD cases and non-diarrheal controls are determined by Chi-square test (two-sided) in each cluster with unadjusted p values.

Microbial taxa differing between ProPD and AD cases

While there was no difference between AD and ProPD cases in alpha diversity measures (p > 0.05: Fig. 3a), the GM composition differed significantly between AD and ProPD cases (q < 0.01: Fig. 3b, c). Compared to AD cases, ProPD cases had a higher relative abundance of unclassified lactobacilli (0.28% vs. 0.13%; q < 0.05) and was further characterized by lower relative abundance of gut commensals such as F. prausnitzii (4.4% vs. 5.5%; q = 0.02), Anaerostipes hardum (0.27% vs. 0.49%; q < 0.01), unclassified Coriobacteriaceae (0.27% vs. 0.77%; q = 0.02) and unclassified Bacteroides (1.8% vs. 0.64%; q = 0.04) (Fig. 4d, Fig. 9, Supplementary Table 4). Again, these findings were confirmed by sPLS-DA that additionally identified taxa such as Akkermansia muciniphila (0.94% vs. 1.2%) and Anaerostipes hardum (0.27% vs. 0.49%) as having somewhat lower relative abundance in ProPD cases compared to AD cases (Supplementary Fig. 6). However, de novo clustering analysis of AD and ProPD cases based on taxa found to be differentially abundant between the two groups did not lead to clear clustering of AD and ProPD cases (Fig. 9). In addition to the analysis above, we furthermore did a one-to-one ProPD vs. AD cases comparison, which again showed decreased abundance of Anaerostipes hardum and increased abundance of Limisolactobacillus mucosae (0.06% vs. 0.43%; q < 0.05) in ProPD cases. Interestingly, F. prausnitzii remained significantly lower in abundance in ProPD cases compared to AD cases but only before correcting for multiple testing (p = 0.03; q = 0.10) (Supplementary Fig. 7).

Fig. 9: Ethiopian children aged 0–59 months with ProPD showed differences in only a few gut commensals compared to AD cases, and all ProPD cases did not segregate into a single cluster.
figure 9

Heatmap depiction of differentially abundant taxa between ProPD cases and AD cases as determined by DESeq2 (two-sided Wald test) differential abundance testing (q < 0.05 corrected for multiple testing by the Benjamini-Hochberg method) and adjusted for age group, enrollment season, enrollment site, sex, WAM index, current breastfeeding status, dysentery, and children’s caretaker. Subsequently, differences were tested by two-sided Wilcoxon rank sum test and corrected by the Benjamini-Hochberg method resulting in the stated exact q values found in Supplementary Table 4. Clusters: de novo clustering of the ProPD cases and AD cases using Canberra distances metrics and the proportions of ProPD cases and AD cases are determined by Chi-square test (two-sided) in each cluster with unadjusted p values.

Demographic and clinical variables associated with GM variation in children

Out of a total of 19 variables, five were significantly associated with GM variation (Table 2). Of these, age had the highest explanatory power, explaining 18.9% of the GM variation (q < 0.01), followed by diarrhea, explaining 4.7% (q < 0.01) of the GM variation. Furthermore, the WAM index (<0.5), current breastfeeding status, and diarrhea status in the previous month explained 1.3%, 0.63%, and 0.35% of the GM variation, respectively. All the significant variables combined explained 26% of the variation in GM in the entire cohort. A subgroup analysis of diarrhea cases also found that age had the highest explanatory power, explaining of 14.7% of the GM variation in diarrhea cases (q < 0.01). Furthermore, WAM index <0.5, dysentery, current breastfeeding status, duration of diarrhea, and children’s caretaker explained 1.8%, 1.6%, 1.0%, 0.77%, and 0.66% and all significantly contributed to GM variation (Table 2).

Table 2 The gut microbiome structure of Ethiopian children aged 0–59 months is linked to demographic, clinical, and environmental characteristics as determined by distance-based redundancy analysis (db-RDA based on Bray-Curtis dissimilarity metrics) in the entire cohort (diarrhea cases and non-diarrheal controls combined) as well as in diarrhea cases

Discussion

The aim of our study was to decipher what differentiates the GM of children with diarrhea (AD and ProPD) from healthy controls in a LMIC setting. We also explored the role of the GM in prolonged and persistent diarrhea (ProPD) compared to acute diarrhea (AD). This is the first large scale study on GM in children below 5 years of age with AD and ProPD (650 children with diarrhea) and non-diarrheal controls (663 children, frequency matched) in a low-income setting.

The GM diversity and composition in infants and children is influenced by multiple factors including age16,37, mode of birth, prematurity, breastfeeding, time of weaning, and antibiotics use38. In line with this, we observed strong associations between GM composition and ongoing breastfeeding, which is in agreement with other studies39. The GM composition of the enrolled children (cases and controls) was also linked with socio-economic factors, water and sanitation and maternal education as also seen in Malawian children40. Somewhat surprisingly, we found no association between antibiotics use and GM variations. However, this finding is consistent with studies from Vietnam41, Bangladesh42 and a longitudinal cohort study in Western countries39, although there are obviously also studies reporting associations between antibiotics use and GM composition37,43.

Furthermore, in agreement with previous studies16,37,44, the GM diversity and richness increased with age irrespective of diarrhea status in our study. The GM composition shifted from taxa associated with a milk-rich diet early in life such as Bifidobacterium spp45. to taxa associated with a solid food-diet with a high content of indigestible carbohydrates such as Prevotella46 and Faecalibacterium47.

Children with diarrhea had a lower bacterial diversity and richness compared to non-diarrheal controls, regardless of the duration of diarrhea. This agrees with previous studies on AD in LMICs11,25,48,49. Repeated flushing of the intestinal lumen leads to a depletion of the microbiota, resulting in reduced bacterial diversity50. This decrease in diversity has also been linked to Clostridiodes difficile-associated diarrhea in adults51,52. Importantly, GM diversity and richness did not differ by diarrhea duration indicating that it is not loss of overall GM diversity that differentiates AD from ProPD. Of note, a recent study in undernourished children in Peru did find that longer duration of diarrhea was linked with reduced bacterial diversity and richness25. It can be speculated that the discrepancy between the two studies reflects different dietary traditions in Peru and Ethiopia.

Diarrhea cases (AD and ProPD cases) were associated with higher relative abundance of facultative anaerobes and microaerophilic bacterial taxa such as Escherichia spp., Campylobacter spp., and Streptococcus spp. than non-diarrheal controls. Both Campylobacter and certain Escherichia coli pathotypes are known to cause diarrhea53,54,55. Campylobacter has also been associated with gut inflammation, increased gut permeability, and impaired linear growth53 and has been implicated in GM dysbiosis in children56. Diarrhea induces an oxygenated environment within the gut57, promoting the proliferation of facultative anaerobic Enterobacteriaceae, as observed both in our study and by others11,49. This proliferation is considered as a hallmark of GM dysbiosis19,58.

Further, the inhibition of enterocyte lactase activity due to diarrhea leaves free lactose in the intestinal lumen14, which is efficiently utilized by lactose fermenting streptococci, that often becomes predominant in the GM of children with AD11,24,49. The increased relative abundance of Streptococcus spp. in AD and its association with dysentery11, and identification of Streptococcus lutetiensis virulence genes in children with AD of unknown etiology59 may support the possible link between increased abundance of Streptococcus spp. and diarrhea. Another possibility, not mutually exclusive, is increased flow-through from the small intestine, which is rich in streptococci60. Furthermore, we found a proliferation of H. parainfluenzae in diarrhea cases. Other studies have also reported an expansion of H. parainfluenzae in GM of children with diarrhea61 and has been linked to causing acute gastroenteritis62.

Interestingly, the increased relative abundance of facultative anaerobes in diarrhea cases were coupled with the depletion of potent short-chain fatty acids (SCFAs) producing obligate anaerobes, such as P. copri, F. prausnitzii and D. succinatiphilus. This is an observation that is in line with previous studies on AD11,49,61,63. Similar to our study, non-diarrhea controls were enriched in B. fragilis11. However, in cases that developed into diarrheal dysentery B. fragilis was found to be enriched, relative to non-complicated diarrhea cases11. This might be explained by strain-to-strain variation11. Bacteroides fragilis likely plays a double-edged sword role in its interaction with host, where non-toxigenic B. fragilis has anti-inflammatory properties through the release of polysaccharide A (PSA) and production of SCFAs via metabolization of complex carbohydrates64, whereas enterotoxigenic B. fragilis subtypes are pathogenic and drive inflammation in host cells65.

Our findings are consistent with a double disease burden, where diarrhea cases suffer both from increased abundance of (or an outright infection due to) pathogens causing diarrhea53,54,55 and a loss of beneficial gut commensals that would otherwise help to stabilize the GM and ease transition back to an eubiosis state. These gut commensals are involved in the production of SCFAs (acetate, propionate, butyrate) from otherwise indigestible complex carbohydrates66. SCFAs have numerous functions, such as serving as energy source for the host (~10% of daily energy from colonic production of acetate), improve intestinal epithelial barrier function and influence endocrine and immune signaling67,68,69. Furthermore, SCFAs make the environment less favorable for most pathogens by lowering colonic pH thereby providing host resistance against pathogens70,71,72.

We also compared AD vs. ProPD cases, to investigate if specific GM patterns characterize ProPD cases relative to AD cases. We found that ProPD cases were characterized by low abundance of gut commensals normally associated with GM eubiosis such as F. prausnitzii, Anaerostipes hadrum, and Akkermansia muciniphila when compared to AD cases. Previous studies investigating the etiology of ProPD have identified E. coli and Campylobacter as potential causative agents, but also noted that the presence or absence of specific enteric pathogens were unlikely to fully explain why some children progresses to ProPD and others not27,28,30. Our findings indicate that further loss of gut commensals is associated with the progression of AD into ProPD. Treatment of ProPD is complex, but lactose-free therapeutic food, rice and chicken-based diets, dietary fibers and green banana have shown promising results13,31,33,34. Colonic commensal bacteria ferment fibers and resistant starches to produce SCFAs which have several effects that may hasten recovery from ProPD such as increased salt and water absorption and strengthening of the epithelium and as a direct source of energy to the host in addition to helping to restore overall gut eubiosis31,34,73.

The increase in difficult-to-treat multidrug resistant diarrheal pathogens in LMICs10 and low success rates of oral rehydration and zinc supplementation to treat ProPD cases15 could indicate the future potential of using F. prausnitzii74 or P. copri based probiotics to ameliorate diarrhea symptoms in children. Therefore, re-establishing healthy gut commensals, either by targeted dietary supplements to enhance the recovery of depleted gut commensals75 and/or by direct replenishment of obligate anaerobes in so-called “next generation probiotics”70 might be promising new treatments for AD and ProPD.

Our study has limitations to consider when interpreting our findings. First and foremost, the design of this study does not allow us to determine any causal relationship between GM and diarrheal subgroups. Even though we found that AD or ProPD was characterized by a depletion of gut commensals it is possible that this reflects the effect of diarrhea on GM composition rather than being a driving force in the development of AD or ProPD. Secondly, the study design did not include follow-up, and we were therefore unable to determine whether certain GM signatures were associated with development of disease in non-diarrheal controls or progression, or regression of severity of diseases in diarrhea cases. Prospective follow-up studies with GM characterization would be a logical next step, especially with respect to designing more efficient future treatments. Thirdly, our results were based on compositional data and reported as a relative abundance rather than absolute abundance. It remains to be investigated if some of the investigated groups (AD, ProPD, and non-diarrheal controls) differ with respect to changes in the absolute abundance of GM members, which may influence the interpretation of data. Fourth, we categorized children with AD and ProPD based on the number of days with diarrhea before enrollment from parental reports. Some AD cases may thus proceed to ProPD after being enrolled and delivering the fecal sample. This limits us to show only effects of diarrhea duration rather than capturing features indicating disease progression from AD to ProPD.

Lastly, we employed short read sequencing technology targeting only one hypervariable region of the 16S rRNA gene (V4) limiting the possibility for detailed taxonomic identification and functional characterization of bacteria.

In conclusion, the GM of Ethiopian children with AD and ProPD were enriched in putative pathogens as well as facultative anaerobes and were further characterized by lowered relative abundance of obligate anaerobes. The GM of children with ProPD was characterized by a reduction of obligate anaerobic gut commensals compared to AD, suggesting that gut dysbiosis could be a potential factor in development of prolonged or persistent diarrhea. Re-establishing the healthy gut commensals by microbiota-directed food supplements, or next generation probiotics containing obligate anaerobes might be worth exploring as treatment strategies for diarrhea. We recommend further longitudinal studies characterizing the GM of children before, during and after a diarrheal episode to identify GM signatures of children who are at risk of developing ProPD. We also suggest further studies to develop and evaluate the effect of treatment regimens with targeted dietary supplements and/or next generation probiotics containing obligate anaerobic bacteria for the treatment of diarrhea in LMICs.

Methods

Study design and participant recruitment

The study has been conducted according to all ethical standards required to conduct research in human participants. The study received ethical approval to conduct the study and publishing the results of the study from the Ethiopian National Ethics Review Committee, Ethiopia (Reference MoSHE/RD/14.1/7188/19), the Regional Committee for Medical and Health Research Ethics of Western Norway (2016/1096/REK Vest) and the Minimal Risk IRB (Health Sciences) University of Wisconsin, USA (ID: 2019-0842). Written informed consent was obtained from the children’s parents or legal guardians to let their children participate in the study.

The present study is a substudy of a prospective case-control study (CRYPTO-POC) described in detail elsewhere35. Briefly, participants were recruited from February 2017 until July 2018, from two health clinic facilities in Ethiopia, namely Jimma University Specialized Hospital (JUSH), a referral and teaching hospital in Jimma, and Serbo Health Centre (SHC), a health center located in a small town 20 km from Jimma with a more rural catchment area.

Participants recruitment

The detailed procedure for recruitment of the study participants has been published previously35,36. In brief, children 0–59 months old with diarrhea were recruited at JUSH and SHC. Non-diarrheal controls recruited from communities surrounding JUSH and SHC were children without diarrhea in the previous 48-h who were frequency matched in strata by residential district, age, and enrollment week. Diarrhea was defined as the passage of three or more watery or loose stools during the 24-h prior to presentation76. The exclusion criterion was inpatient admission for longer than 24-h before enrollment in the study. The duration of diarrhea was assessed by the parents or legal guardians recall and completed by well-trained research nurse. Acute diarrhea (AD) was defined as an episode that had lasted <7 days on presentation8, ProD as 7–13 days36, and PD as a duration of ≥14 days77.

In the present study, ProD and PD were combined into the category “Prolonged or persistent diarrhea” (ProPD; ≥7 days of duration) due to a limited number of PD cases. Demographic and clinical data were collected with standardized and contextualized case report forms completed by research nurses. Water/sanitation, Assets, and Maternal education (WAM) index was calculated as previously described based on access to “improved” or “unimproved” water and/or sanitation, the presence or absence of eight household assets, and maternal education78.

A single stool sample was collected from both diarrhea cases and non-diarrheal controls. Stool samples from children with diarrhea were collected with a nappy lined with plastic film, a single-use bedpan or a potty and transferred to a closed plastic container at JUSH or SHC. Stool samples from non-diarrheal controls were collected at the child’s home, except non-diarrheal controls aged 0–11 months who were enrolled from vaccination rooms in JUSH or SHC. Stool samples collected at JUSH were transported immediately to Jimma University Microbiology Research Laboratory and stored at −80 °C. Samples from SHC were kept at 2–8 °C before same-day transportation to Jimma University Microbiology Research Laboratory for storage at −80 °C. After the completion of study recruitment, samples were shipped on dry ice to Vestfold Hospital Trust, Tønsberg, Norway. DNA was extracted from 500 μL pre-treated stool-buffer suspension supernatant using a MagNA Pure 96 instrument and a MagNA Pure 96 DNA and Viral NA Large Volume Kit, and eluted in 100 μL as previously described35,79.

High throughput 16S rRNA gene amplicon sequencing

Sequencing was done at the University of Minnesota Genomics Center (UMGC) using the Illumina MiSeq sequencing platform by standard methods. Sequencing targeted the V4 hypervariable region of the 16S rRNA gene using a two step PCR double-indexing protocol80. The estimation of 16S rRNA gene copy numbers were verified with qPCR using an ABI7900 (ThermoFisher Scientific, MA, USA) platform, whereas reactions were performed using KAPA HiFi Polymerase (KAPA Biosystems, Woburn, MA, USA). Based on the qPCR results, samples were normalized ~167,000 molecules/μL and 3 μL of sample was used for the PCR1. After the first round of amplification, the PCR1 products were diluted 1:100, and 5 μL of PCR1 was used in the second PCR reaction. Finally, PCR2 products were normalized using SequalPrep Normalization plates. The pooled sample was denatured with NaOH, diluted to 6 pM in Illumina’s HT1 buffer, spiked with 15% PhiX, and heat denatured at 96 °C for 2 min immediately prior to loading. A MiSeq 600 cycle v3 kit was used to sequence the samples.

Statistics and reproducibility

We used available stool samples from a prospective case-control study (CRYPTO-POC) described in detail here35 and no statistical method was used to predetermine sample size. The investigators were not blinded to allocation during experiments and outcome assessment. Raw sequencing reads were pre-processed using QIIME281. Overlapping sequences were compiled and low-quality reads were removed. Chimeric sequences were removed, and zero-radius Operational Taxonomic Units (zOTUs) constructed using the UNOISE 3 algorithm from Vsearch82. The analysis was based on zero-radius OTUs (zOTUs) using RDP-based taxonomic assignment at the lowest unambiguous taxonomic rank. Additionally, the taxonomy of all taxa that belonging to genus Lactobacillus has been updated for the new taxonomic description of the genus83. Samples with less than 5000 reads (8 samples in total) were removed and zOTUs found in ≤1% of the samples and with relative abundance less than 0.1% were purged from the zOTU table.

Data analysis was done using R version 4.1.2 (2021-11-01) using packages Vegan v2.5-784, Phyloseq v1.38.085, and DESeq2 v1.34.086. Logistic regression analysis adjusted for enrollment season and enrollment site in order to identify factors associated with diarrhea and between subcategories of diarrhea cases. Putative explanatory variables with a p value less than 0.05 were considered statistically significant.

Alpha diversity measures were based on the number of observed zOTUs and Shannon diversity metrics. Bray-Curtis dissimilarity metrics were determined after cumulative sum scaling (CSS) normalization. Unconstrained principal coordinate analysis (PCoA) and constrained PCoA plots conditioned for age, enrollment site, enrollment season, sex, WAM index, current breastfeeding, and diarrhea in the previous month were subsequently generated. Pairwise permutation multivariate analysis of variance (PERMANOVA) was used for pairwise comparisons based on Bray-Curtis dissimilarity metrics between groups (clinical/demographic data). DESeq2 was used for differential abundance testing between diarrhea cases and non-diarrheal controls and between AD and ProPD (illustrated in heatmaps). AD cases frequency matched by age group and sex with ProPD cases were randomly selected for one-to-one AD vs. ProPD comparison. DESeq2 analysis was controlled for variables that were contributing to the GM variations as determined by distance-based Redundancy Analysis (db-RDA) as well as variables that were known to affect GM from other studies40 including age, enrollment site, enrollment season, sex, WAM index, current breastfeeding, diarrhea in the previous month, dysentery and children’s caretaker. Subsequently, Wilcoxon rank sum test was carried out on the bacterial taxa that were found to differ significantly in abundance by DESeq2 (q < 0.05, with the q value corrected by the Benjamini-Hochberg method). All q values corrected by the Benjamini-Hochberg method reported in the differential abundance testing were from Wilcoxon rank sum test and relative abundances were presented as means. The DESeq2 results were augmented by sparse partial least square discriminatory analysis (sPLS-DA) with 10-fold cross validation used to determine the taxa that discriminates best between diarrhea case and non-diarrheal control status as well as AD vs. ProPD87. To further characterize the bacterial community clusters, we used de novo clustering based on community composition. Overall relationships between clinical and demographic parameters and GM composition in children were illustrated using distance-based Redundancy Analysis (db-RDA) conditioned for enrollment site and enrollment season based on Bray-Curtis dissimilarity metrics.

Reporting summary

Further information on research design is available in the Nature Portfolio Reporting Summary linked to this article.