Introduction

The human microbiome is the entire ecosystem of all microbes that reside in and on the human body. The field of human microbiome has been rapidly growing in both academia and industry due to the recent advances in next-generation sequencing technologies (i.e., 16S ribosomal RNA gene amplicon sequencing1,2 and shotgun metagenomic3 sequencing). Investigators are nowadays exploring, competing, and struggling to probe for the roles of the microbiome on human health or disease.

There have been many clinical and epidemiologic clues on the relationships between medical treatments/environmental exposures (e.g., diet, residence, smoking, preterm birth, delivery mode, antibiotic/probiotic use) and human diseases (e.g., obesity, intestinal disease, cancers, diabetes, brain disorders). The causal roles of the microbiome, as a mediator, between them are nowadays gaining increasing recognition4,5,6,7,8,9,10,11,12,13,14,15,16,17,18,19. That is, investigators seek to comprehend the causal link on if the treatment/exposure alters the microbiome, and then the altered microbiome, in turn, influences the human disease. For this, we have recently introduced a web cloud computing platform, named as microbiome mediation analysis (MiMed) (http://mimed.micloud.kr)20. MiMed enables comprehensive microbiome causal mediation analysis on user-friendly web environments20; as such, even non-professional programmers (e.g., microbiologists, medical doctors, public health practitioners) can easily deal with it. However, MiMed is limited to continuous (e.g., body mass index) or binary (e.g., normal vs. diseased) responses20.

We noted that, in a clinical research, investigators often trace disease progression sequentially in time; as such, time-to-event (e.g., time-to-disease, time-to-cure) responses, known as survival responses, are in practice commonly available as a surrogate variable for human health or disease. Typically, it is likely that the survival responses are right-censored indicating that the time-to-event is above a certain value, but it is unknown by how much; as such, only some specialized statistical methods can properly handle them. Thus, in this paper, we introduce a web cloud computing platform, named as microbiome mediation analysis with survival responses (MiMedSurv), that is an extension of MiMed20 for survival responses. The two main features that are well-distinguished are as follows. First, MiMedSurv can conduct some baseline exploratory non-mediational survival analysis, not involving microbiome, to survey the disparity in survival time between medical treatments (e.g., treatment vs. placebo, new treatment vs. old treatment, and so forth)/environmental exposures (e.g., rural vs. urban, smoking vs. non-smoking, calorie restriction vs. ad libitum diet, and so forth). Then, MiMedSurv can identify the mediating roles of the microbiome in various aspects: (i) as a microbial ecosystem using ecological indices (e.g., alpha and beta diversity indices) and (ii) as individual microbial taxa in various hierarchies (e.g., phyla, classes, orders, families, genera, species). We also stress that covariate-adjusted analysis is supported to control for potential confounding factors (e.g., age, sex) to improve causal inference, which is particularly essential for observational studies. MiMedSurv also provides user-friendly step-by-step data preprocessing and analytic modules and provides nice graphical facilities. Besides, MiMedSurv automatically lists all related references for each run of data preprocessing/analytic modules for user convenience. There have been many other web-based platforms for microbiome data analysis, for example, MiCloud21 for cross-sectional/longitudinal association analysis, MiPair22 for paired data analysis, MiSurv23 for survival analysis, MiTree24 for classification and regression modelling, MiMultiCat25 for classification modelling and association analysis with multi-categorical responses, and MiMed20 for causal mediation analysis with binary and continuous responses, yet MiMedSurv is the first for microbiome causal mediation analysis with survival response (Table 1). Recently, there have also been many other statistical methods proposed for microbiome causal mediation analysis26,27,28,29,30, yet we excluded them because they are not web-based and cannot handle survival responses. We organized the overall workflow of MiMedSurv in Fig. 1 for easy catch-ups, where the data processing module is the same with the one for MiMed20 while the non-mediational analysis and mediational analysis modules are unique for MiMedSurv [Fig. 1].

Table 1 Web-based platforms for microbiome data analysis, MiCloud21, MiPair22, MiSurv23, MiTree24, MiMultiCat25 and MiMed20, and their features/functionalities.
Fig. 1
figure 1

The overall workflow of MiMedSurv.

The rest of the paper is organized as follows. In the Materials and Methods section, we describe methodological aspects and web server architecture of MiMedSurv. Then, in the Results section, we illustrate the use of MiMedSurv, through an example data analysis to survey if the gut microbiome mediates the effect of antibiotic treatment to the onset of type 1 diabetes (T1D)11. Then, finally, in the Discussion section, we summarize and finish with concluding remarks.

Materials and methods

Mediation analysis begins with the observed relationship between a treatment/exposure and a response. Then, it surveys if there is any underlying mediating mechanism between them. In this research, we refer the former as ‘non-mediational analysis’, and the latter as ‘mediational analysis’. Although the non-mediational analysis does not involve any portion of microbiome, it is an essential procedure for baseline exploratory purposes. However, the latter, mediational analysis, is indeed the main analytic module of MiMedSurv.

Non-mediational analysis

The non-mediational analysis module is for some baseline exploratory survival analysis to survey the disparity in survival time between medical treatments (e.g., treatment vs. placebo, new treatment vs. old treatment, and so forth)/environmental exposures (e.g., rural vs. urban, smoking vs. non-smoking, calorie restriction vs. ad libitum diet, and so forth). For this, MiMedSurv (i) performs the Kaplan–Meier analysis31 coupled with the log-rank test32 or the Wilcoxon test33 for univariate analysis or (ii) fits the Cox proportional hazards model34 for both univariate and covariate-adjusted analyses. MiMedSurv employs the Kaplan–Meier or covariate-adjusted survival curve for graphics.

Mediational analysis

The mediational analysis module is to survey the mediating roles of the microbiome (i) as a microbial ecosystem using ecological indices (e.g., alpha and beta diversity indices) and (ii) as individual microbial taxa in various hierarchies (e.g., phyla, classes, orders, families, genera, species). In this research, we refer the former as ‘community-level analysis’, and the latter as ‘taxonomy-level analysis’. We also categorize the community-level analysis further into alpha diversity and beta diversity analyses.

First, for the alpha diversity analysis, as in MiMed20, MiMedSurv employs the Imai method for causal mediation analysis35,36. The only difference is in the outcome model35,36. That is, while MiMed employs the ordinary linear regression model for continuous responses and the logistic regression model for binary responses, MiMedSurv employs the Weibull regression model for survival responses35,36,37. MiMedSurv employs the forest plot for graphics.

Second, for the beta diversity analysis, MiMedSurv first surveys the disparity in beta diversity between treatment groups (e.g., treatment vs. placebo, new treatment vs. old treatment, and so forth) using MiRKAT38,39 (treatment model) and the relationship between beta diversity and survival responses adjusting for treatment status using MiRKAT-S39,40 (outcome model). Then, MiMedSurv combines the P-values from the treatment and outcome models using the Divide-Aggregate Composite-null Test (DACT) 41 that considers the null hypothesis of no mediation effect as a composite hypothesis to enhance statistical power. That is, based on the two regression models in Eqs. (1) and (2), the null hypothesis of no mediation effect is typically formulated as \({H}_{0}\): \({\alpha }_{1}{\beta }_{1}\) = 0, which indicates no effect of the treatment/exposure on the mediator (\({\alpha }_{1}\) = 0) or no effect of the mediator on the response conditioned on the treatment/exposure status (\({\beta }_{1}\) = 0).

$$M_{i} = \alpha_{0} + \alpha_{1} T_{i} + \varepsilon_{i}$$
(1)
$$Y_{i} = \beta_{0} + \beta_{1} M_{i} + \beta_{2} T_{i} + \upsilon_{i} ,$$
(2)

where \({Y}_{i}\) is a health or disease response, \({T}_{i}\) is a treatment/exposure status (e.g., treatment (t = 1) vs. placebo (t = 0), new treatment (t = 1) vs. old treatment (t = 0), and so forth), and \({M}_{i}\) is a mediator, for each individual i = 1, …, n. However, DACT41 formulates it as a composite hypothesis that \({H}_{0}\): (1) \({\alpha }_{1}\) = 0 & \({\beta }_{1}\) ≠ 0; (2) \({\alpha }_{1}\) ≠ 0 & \({\beta }_{1}\) = 0; or (3) \({\alpha }_{1}\) = 0 & \({\beta }_{1}\) = 0. Then, DACT41 can powerfully detect mediation effect rejecting at least one of the three disjoint null statements. More tails can be found in41. MiMedSurv employs the principal coordinate analysis plot (PCoA) for graphics42.

Third, for the taxonomy-level analysis, as for the alpha diversity analysis, MiMedSurv employs the Imai method for causal mediation analysis coupled with the Weibull regression model for survival responses35,36,37. MiMedSurv applies the Imai method35,36,37 and the Benjamini-Hochberg (BH) procedures to control for the error rate with regards to false discovery rate (FDR)43 to each taxonomic hierarchy separately. MiMedSurv employs the forest plot and dendrogram for graphics.

Some more methodological details and terminologies for the Imai method35,36 are as follows. Again, for the alpha diversity and taxonomy-level analyses, MiMedSurv employs the Imai method35,36, which is based on the potential outcomes framework of causal inference44, that is \({Y}_{i}({T}_{i}\), \({M}_{i}\)(\({T}_{i}\))), where \({Y}_{i}\) is a health or disease response, \({T}_{i}\) is a treatment/exposure status (e.g., treatment (t = 1) vs. placebo (t = 0) , new treatment (t = 1) vs. old treatment (t = 0), and so forth), and \({M}_{i}\) is a mediator, for each individual i = 1, …, n. Then, for each individual, the Imai method35,36 defines (i) the total treatment effect as \({\tau }_{i}\) in Eq. (3); (ii) the direct effect as \({\zeta }_{i}(t)\) in Eq. (4); and (iii) the indirect effect that represents the ‘causal mediation effect’ as \({\delta }_{i}(t)\) in Eq. (5).

$$\tau_{i} \equiv Y_{i} (1,M_{i} (1)) - Y_{i} (1,M_{i} (0))$$
(3)
$$\zeta_{i} \left( t \right) \equiv Y_{i} (1,M_{i} (t)) - Y_{i} (0,M_{i} (t))$$
(4)
$$\delta_{i} \left( t \right) \equiv Y_{i} (t,M_{i} (1)) - Y_{i} (t,M_{i} (0))$$
(5)

Then, the Imai method35,36 averages the total treatment effects, the direct effects, and the indirect effects, respectively, across all individuals, and then returns the average total effect (ATE), average direct effect (ADE), and average causal mediation effect (ACME) as its final outcomes with the equation that ATE = ADE + ACME. Since we are here especially interested in the roles of the microbiome as a mediator, ACME is the main analytic outcome. Further, the Imai papers35,36 introduce both parametric and non-parametric approaches to calculate the P-value and confidence interval, and the R package, mediation 4.5.0 (https://cran.r-project.org/web/packages/mediation), supports both of them. Yet, we employed the semi-parametric approach based on a bootstrap method45 for robust statistical inferences due to the possible high skewness and zero-inflation of the microbiome data especially for lower-level taxa (e.g., genera or species)46.

To be more detailed, the Imai method35,36 first generates bootstrap resamples through random sampling with replacement (say, there are B resamples). Then, for each resample, the Imai method35,36 fits the mediator model \({\widehat{f}}_{{M}_{b}}({M}_{i}\)| \({T}_{i}\)) non-parametrically based on the linear regression model using the method of least squares and the outcome model \({\widehat{f}}_{{Y}_{b}}({Y}_{i}\)| \({T}_{i}, {M}_{i}\)) based on the Weibull regression model; and then calculates (i) the total treatment effect Eq. (4) as \({\widehat{\tau }}_{bi}\) = \({\widehat{f}}_{{Y}_{b}}({Y}_{i}\)| \(1, {\widehat{f}}_{{M}_{b}}({M}_{i}\)|1)) − \({\widehat{f}}_{{Y}_{b}}({Y}_{i}\)| \(0, {\widehat{f}}_{{M}_{b}}({M}_{i}\)|0)); (ii) the direct effect Eq. (5) as \({\widehat{\zeta }}_{bi}(t)\) = \({\widehat{f}}_{{Y}_{b}}({Y}_{i}\)| \(1, {\widehat{f}}_{{M}_{b}}({M}_{i}\)| \(t\))) − \({\widehat{f}}_{{Y}_{b}}({Y}_{i}\)| \(0, {\widehat{f}}_{{M}_{b}}({M}_{i}\)| \(t\))); and (iii) the indirect effect Eq. (6) as \({\widehat{\delta }}_{bi}(t)\) = \({\widehat{f}}_{{Y}_{b}}({Y}_{i}\)| \(t, {\widehat{f}}_{{M}_{b}}({M}_{i}\)|1)) − \({\widehat{f}}_{{Y}_{b}}({Y}_{i}\)| \(t, {\widehat{f}}_{{M}_{b}}({M}_{i}\)|0)) for each individual, and also their average effects \({\text{ATE}}_{b}\), \({\text{ADE}}_{b}\), and \({\text{ACME}}_{b}\) for b = 1, …, B. Then, a breadth of statistical inferences is made based on the bootstrap sampling distributions of ATE, ADE and ACME35,36.

Web server architecture

MiMedSurv is written in R 4.1.1 language under the license of general public license 3 (GPL3). Especially, its app interfaces were developed using the shiny 1.7.5 package (https://shiny.rstudio.com). A web server is any computer software as well as its underlying hardware that can interact with an external client through a network protocol. MiMedSurv is based on a typical client–server architecture, for which we employed ShinyProxy 2.6.1 (https://shinyproxy.io), and Apache 2 (https://httpd.apache. org) to deploy it to the web, and a computing device with the specifications of Intel i9-1290 (16-core) processor (Intel, Santa Clara, CA, USA) and 64 GB DDR4 memory (Samsung, Seoul, South Korea). The web server is freely available at http://mimedsurv.micloud.kr not requiring users to log in or register. In case that the web server is busy or under repair, we developed a GitHub repository (https://github.com/yj7599/mimedsurvgit) for users to run MiMedSurv using their local computer. On the GitHub repository, users can also see related instructions, references, prerequisites, and troubleshooting tips.

Results

Example microbiome data

To ease the illustration of MiMedSurv, we employed example microbiome data to survey the mediating roles of the gut microbiome between antibiotic treatment and T1D onset11. T1D is an autoimmune disease, in which the human immune system is overly active and hence attacks even non-pathogenic normal cells. Unfortunately, T1D is increasing in incidence while decreasing in its onset age world-wide. Zhang et al.11 revealed that early-life antibiotic (known as tylosin) administration can accelerate T1D development through the gut microbiome dysbiosis. For illustration purposes, we reanalyze the data using (i) the antibiotic (tylosin) treatment as a treatment variable, (ii) the gut microbiome as a mediator, (iii) the time-to-T1D as a survival variable, and (iv) sex and elapse time after the antibiotic (tylosin) treatment as two covariate variables (Fig. 2). We expected sex and the elapse time after the antibiotic (tylosin) treatment to be potential confounders between the antibiotic (tylosin) treatment and the onset of T1D. Especially, as the elapse time increases, the effect of the antibiotic on gut microbiome can decrease and the elapse time can also be linked to the onset of T1D. Here, the time-to-T1D survival variable consists of typical right-censored survival responses with two components of (i) follow-up time (in week) and (ii) censored/event indicator with 0 for censored (no event) T1D free and 1 for event that is T1D onset.

Fig. 2
figure 2

A graphical representation on our example microbiome causal mediation analysis with survival responses. (i) antibiotic (tylosin) treatment is a treatment variable; (ii) gut microbiome is a mediator; (iii) time-to-T1D is a survival variable; and (iv) sex and elapse time after the antibiotic (tylosin) treatment are covariate variables.

The 16S ribosomal RNA gene amplicon sequence data are publicly available in the European Nucleotide Archive (ENA) (https://www.ebi.ac.uk/ena/browser/view/PRJEB14696, accession number: PRJEB14696). We processed them using a bioinformatic pipeline, QIIME 2 (https://qiime2.org)47, and a database, Greengenes (https://greengenes.secondgenome.com), to construct a feature table, taxonomic annotations, and a phylogenetic tree. We stored the final processed data as example data in the Data Input module of MiMedSurv for users to easily confirm data formatting requirements.

In the following sections, we describe all the preprocessing and analytic modules of MiMedSurv step-by-step using this example microbiome data (see Application Note).

Data processing: data input

Users first need to upload their microbiome data with (i) four components: a feature (i.e., operational taxonomic unit (OTU) or amplicon sequence variants (ASV)) table, a taxonomic table, a phylogenetic tree, and metadata or (ii) three components: a feature (i.e., OTU or ASV) table, a taxonomic table, and metadata. Here, the taxonomic table must contain seven taxonomic ranks, kingdom, phylum, class, order, family and species, while the phylogenetic tree much be a rooted tree that reflects evolutionary relationships across features (i.e., OTUs or ASVs). If users upload their microbiome dataset with three components without a phylogenetic tree, only non-phylogenetic community-level (alpha- and beta-diversity) analyses will later be performed.

Users can start with downloading the example microbiome data from the Example Data section. The data are stored in a widely used unified format, called phyloseq48, that can efficiently combine all essential microbiome data components. Alternatively, users can also employ them all individually. In this module, we also described all the instructions with example codes to check compatible data formats so that users can prepare microbiome data easily.

Application Note: We uploaded the example microbiome data using the Browse and Upload buttons.

Data processing: quality control

As in MiMed20, users can perform quality controls with respect to (i) a microbial kingdom of interest (default: Bacteria), (ii) a minimum library size (i.e., total read count) for the study subjects to be retained (default: 3000), (iii) a minimum mean proportion for the features (i.e., OTUs or ASVs) to be retained (default: 0.02%), and (iv) errors in taxonomic names to be removed. For reference, users can download the resulting microbiome data after quality controls.

Application note: We performed quality controls using the default settings and simply clicking the Run button. Then, we rescued 519 study subjects for 224 features, 7 phyla, 12 classes, 15 orders, 19 families, 25 genera, and 9 species (Fig. 3). We can see that the library sizes (i.e., total read counts) vary dramatically by study subjects, and the mean proportions are highly skewed to the left (Fig. 3).

Fig. 3
figure 3

The status of the resulting microbiome data after quality controls: (i) sample size, the number of features, the number of phyla, the number of classes, the number of orders, the number of families, the number of genera, and the number of species after quality controls; (ii) histogram and box plot for the distribution of library sizes (i.e., total read counts) for subjects; (iii) histogram and box plot for the distribution of mean proportions for features.

Data processing: data transformation

As in MiMed20, for the community-level analysis, users can compute nine alpha diversity indices (i.e., Observed, Shannon49, Simpson50, Inverse Simpson50, Fisher51, Chao152, abundance-based coverage estimator (ACE)53, incidence-based coverage estimator (ICE)54, phylogenetic diversity (PD)55) and five beta diversity indices (i.e., Jaccard dissimilarity56, Bray–Curtis dissimilarity57, Unweighted UniFrac distance58, Generalized UniFrac distance59, Weighted UniFrac distance60). For the taxonomy-level analysis, users can normalize the data using the widely used centered log-ratio (CLR) transformation method61 to relax the compositional constraint of the data, yet three other normalization methods of arcsine-root, rarefied count62 and proportion are also available. For reference, users can download all the resulting alpha and beta diversity indices and normalized taxonomic data.

Application Note: We computed all the alpha and beta diversity indices and normalized taxonomic data simply clicking the Run button.

Data analysis: non-mediational analysis

In this module, users can perform some baseline exploratory survival analysis to survey the disparity in survival time between medical treatments (e.g., treatment vs. placebo, new treatment vs. old treatment, and so forth) or environmental exposures (e.g., rural vs. urban, smoking vs. non-smoking, calorie restriction vs. ad libitum diet, and so forth). For this, users need to select (i) a survival time variable, (ii) a censored/event indicator variable, (iii) a treatment variable, (iv) covariate(s), and (v) an analytic method as we described in the previous section, Materials and Methods: Non-Mediational Analysis.

Application note: We selected (i) ‘T1Dweek’ as a survival time variable, (ii) ‘T1D’ as a censored/event indicator variable, (iii) ‘Antibiotic’ as a treatment variable, (iv) ‘Sex’ and ‘SampleTime’ as covariates, and (v) the Cox model34 as an analytic method. Then, we found a significant disparity in survival (against T1D) rate between the normal control and antibiotic (tylosin) groups adjusting for sex and elapse time after the antibiotic (tylosin) treatment (P-value: < 0.001) (Fig. 4). We also estimated that the hazard rate (toward T1D) is higher for the antibiotic (tylosin) group than the normal control group (hazard ratio (HR): 1.758 > 1), indicating that the antibiotic (tylosin) treatment is harmful accelerating T1D onset (Fig. 4).

Fig. 4
figure 4

The results from non-mediational analysis on the disparity in survival (against T1D) rate between the normal control and antibiotic (tylosin) groups adjusting for sex and elapse time after the antibiotic (tylosin) treatment. HR represents the hazard ratio, that is the hazard rate to T1D for the antibiotic (tylosin) group divided by the hazard rate to T1D for the normal control group.

Data analysis: mediational analysis

In the previous section, Results: Data Analysis: Non-Mediational Analysis, we observed a significant disparity in survival rate between the normal control and antibiotic (tylosin) groups adjusting for sex and elapse time after the antibiotic (tylosin) treatment. In the following sections, we seek to comprehend if the gut microbiome plays a mediating role between the antibiotic (tylosin) treatment and T1D onset with respect to (i) alpha diversity, (ii) beta diversity and (iii) microbial taxa at different hierarchies (i.e., phyla, classes, orders, families, and genera).

Community-level analysis: alpha diversity

In this module, users can perform the microbiome causal mediation analysis to test jointly if the treatment alters the microbial alpha diversity, and then the altered microbial alpha diversity, in turn, influences the survival responses. For this, users need to select (i) a survival time variable, (ii) a censored/event indicator variable, (iii) a treatment variable, and (iv) covariate(s) for covariate-adjusted analysis or not for univariate analysis. The available analytic method here is the Imai method35,36 coupled with the Weibull regression model37 as we described in the previous section, Materials and Methods: Mediational Analysis.

Application note: We selected (i) ‘T1Dweek’ as a survival time variable, (ii) ‘T1D’ as a censored/event indicator variable, (iii) ‘Antibiotic’ as a treatment variable, (iv) ‘Sex’ and ‘SampleTime’ as covariates, and (v) the Imai method35,36 as an analytic method. Then, we found a significant mediation effect of microbial alpha diversity with respect to the Simpson index50 between the antibiotic (tylosin) treatment and T1D onset adjusting for sex and elapse time after the antibiotic (tylosin) treatment (P-value: 0.045 < 0.05) [Fig. 5], yet all the other alpha diversity indices are not statistically significant. We also estimated that as the Simpson index50 level decreases, the T1D onset tends to be accelerated (Est. − 1.720 < 0), indicating a lower Simpson index level is harmful to T1D [Fig. 5].

Fig. 5
figure 5

The results from mediational analysis between the antibiotic (tylosin) treatment and T1D onset adjusting for sex and elapse time after the antibiotic (tylosin) treatment using alpha diversity indices (i.e., Observed, Shannon49, Simpson50, Inverse Simpson50, Fisher51, Chao152, ACE53, ICE54, PD55). ACME represents average causal mediation effect35,36.

Community-level analysis: beta diversity

In this module, users can perform the microbiome causal mediation analysis to test jointly if the treatment alters the microbial beta diversity, and then the altered microbial beta diversity, in turn, influences the survival responses. For this, users need to select (i) a survival time variable, (ii) a censored/event indicator variable, (iii) a treatment variable, and (iv) covariate(s) for covariate-adjusted analysis or not for univariate analysis. The available analytic method here is DACT41 coupled with MiRKAT38,39 for the treatment model and MiRKAT-S39,40 for the outcome model as we described in the previous section, Materials and Methods: Mediational Analysis.

Application Note: We selected (i) ‘T1Dweek’ as a survival time variable, (ii) ‘T1D’ as a censored/event indicator variable, (iii) ‘Antibiotic’ as a treatment variable, (iv) ‘Sex’ and ‘SampleTime’ as covariates, and (v) DACT41 as an analytic method. Then, we could not find any significant mediation effect of microbial beta diversity between the antibiotic (tylosin) treatment and T1D onset adjusting for sex and elapse time after the antibiotic (tylosin) treatment [Fig. 6].

Fig. 6
figure 6

The results from mediational analysis between the antibiotic (tylosin) treatment and T1D onset adjusting for sex and elapse time after the antibiotic (tylosin) treatment using beta diversity indices (i.e., Jaccard dissimilarity56, Bray–Curtis dissimilarity57, Unweighted UniFrac distance58, Generalized UniFrac distance59, Weighted UniFrac distance60). Censored represents the subjects who have not had T1D during the follow-up, while Event represents the subjects who have had T1D during the follow-up.

Taxonomy-level analysis

In this module, users can perform the microbiome causal mediation analysis to test jointly if the treatment alters microbial taxa, and then the altered microbial taxa, in turn, influence the survival responses. For this, users need to select (i) a data format (default: CLR61), (ii) a survival time variable, (iii) a censored/event indicator variable, (iv) a treatment variable, and (v) covariate(s) for covariate-adjusted analysis or not for univariate analysis, and (vi) the taxonomic ranks to be analyzed from phylum to genus for 16S ribosomal RNA gene amplicon sequencing1,2 or from phylum to species for shotgun metagenomics3. The available analytic method here is the Imai method35,36 coupled with the Weibull regression model37 as we described in the previous section, Materials and Methods: Mediational Analysis.

Application note: We selected (i) ‘CLR’61 as a data format, (ii) ‘T1Dweek’ as a survival time variable, (iii) ‘T1D’ as a censored/event indicator variable, (iv) ‘Antibiotic’ as a treatment variable, (v) ‘Sex’ and ‘SampleTime’ as covariates, (vi) ‘Phylum-Genus’ as taxonomic ranks to be analyzed, and (vii) the Imai method35,36 as an analytic method. Then, we found a significant mediation effect of the two phyla (Antinobacteria and Verrucomicrobia), three classes (Erysipelotrichi, Actinobacteria, and Verrucomicrobiae), three orders (Erysipelotrichales, Bifidobacteriales, and Verrucomicrobiales), three families (Erysipelotrichaceae, Bifidobacteriaceae, and Verrucomicrobiaceae), and three genera (Allobaculum, Bifidobacterium, and Akkermansia) between the antibiotic (tylosin) treatment and T1D onset adjusting for sex and elapse time after the antibiotic (tylosin) treatment (Q-value: < 0.001) (Fig. 7). We also estimated that as their relative abundance level decreases, the T1D onset tends to be accelerated (Est. < 0), indicating that they might be beneficial microbes to prevent T1D onset (Fig. 7).

Fig. 7
figure 7

The results from mediational analysis between the antibiotic (tylosin) treatment and T1D onset adjusting for sex and elapse time after the antibiotic (tylosin) treatment using microbial taxa from phylum to genus. Q-value represents FDR-adjusted P-value43.

Discussion

In this paper, we introduced a unified cloud computing platform, MiMedSurv, for comprehensive microbiome causal mediation analysis with survival responses. We described that MiMedSurv is well-distinguished with unique analytic procedures to deal with microbiome data with survival responses. That is, MiMedSurv conducts some baseline exploratory non-mediational survival analysis, not involving microbiome, for an initial check-up to survey the disparity in survival time between medical treatments / environmental or behavioral exposures (e.g., treatment vs. placebo, new treatment vs. old treatment, rural vs. urban, smoking vs. non-smoking, calorie restriction vs. ad libitum diet, and so forth) based on the Kaplan–Meier analysis31 or the Cox proportional hazards model34. Then, MiMedSurv identifies elaboratively the mediating roles of the microbiome in various spheres: (i) as a microbial ecosystem using alpha diversity indices based on the Imai method35,36 and using beta diversity indices based on DACT41 and (ii) as individual microbial taxa in different hierarchies (e.g., phyla, classes, orders, families, genera, species) based on the Imai method35,36. Besides, the covariate-adjusted analysis is also supported for both non-mediational and mediational analysis to enhance the causation of the results, which is crucial especially for observational studies. Moreover, MiMedSurv is of nice graphical user interfaces and automatically organizes all related references for user convenience. Overall, it is user-friendly; as such, even non-professional programmers (e.g., microbiologists, medical doctors, public health practitioners) can easily perform microbiome causal mediation analysis with survival responses.

We illustrated the use of MiMedSurv step-by-step through an example mediation analysis to see if the gut microbiome mediates the effect of antibiotic treatment to T1D onset11. We also uploaded the example data in the Data Input module of MiMedSurv to easily confirm compatible data formats. Users can also easily catch up with all related instructions, references, prerequisites, and troubleshooting tips from our GitHub page (https://github.com/yj7599/mimedsurvgit).

The mediation analysis has recently been spotlighted as a practical and powerful analytic tool to survey the causal roles of the microbiome as a mediator to explain the observed relationships between a medical treatment/environmental exposure and a human disease. In clinical studies, researchers often trace disease progression sequentially in time; hence, survival data are also one of the most common types of the data. Therefore, MiMedSurv can be widely used as a practical and powerful analytic tool for many researchers in various disciplines (e.g., microbiology, medical science, public health).

However, MiMedSurv has limitations as follows. First, its underlying statistical methods, such as the Kaplan–Meier analysis31, Cox proportional hazards model34, Imai method35,36, MiRKAT38,39, MiRKAT-S39,40 and DACT41, are all model-based methods; as such, any possible model misspecifications can lead to spurious outcomes. This means that all the treatment, mediator, response, and covariate variables need to be correctly specified by users. Second, MiMedSurv performs downstream data analysis for many alpha and beta diversity indices and data normalization methods; as such, users can encounter many inconsistent outcomes and interpretations across different diversity indices or data normalization methods. It was challenging to resolve such an issue because there is no consensus on which diversity index or normalization method is the best in the microbiome research community. Finally, MiMedSurv does not perform gene- or strain-level analyses. We could not incorporate such analytic approaches because of the different data structures and specialized requirements (e.g., software architecture, analytic protocols). We could not satisfy all the demands, and this is the reason why further development is needed.