Introduction

Prediabetes precedes Type 2 diabetes (T2D) and involves impaired fasting glucose (IFG) and/or impaired glucose tolerance (IGT)1,2. Those with prediabetes exhibit insulin resistance (IR), pancreatic β-cell and α-cell dysfunction, and reduced incretin effect3. Lifestyle and pharmacological interventions can achieve glycemic control, preventing or delaying T2D and its complications4,5,6.

Lifestyle intervention is the standard treatment for prediabetes, and certain drugs have also demonstrated efficacy in preventing T2D. The combination of linagliptin, metformin, and lifestyle modifications significantly improved glucose metabolism and pancreatic β-cell function (Pβf), reducing T2D incidence in prediabetic individuals compared to metformin and lifestyle intervention7. Studies in humans and animals have highlighted the influence of gut microbiota (GM) on the pharmacological effects of metformin and Dipeptidyl Peptidase 4 inhibitors (DPP-4is) (e.g., vildagliptin, sitagliptin, and liraglutide); however, the precise mechanisms by which GM affects the patient's metabolic condition remain unclear8,9,10,11,12,13.

Metformin, a biguanide, is commonly used as a first-line treatment for T2D and prediabetes14. Research suggests that metformin reduces GM diversity in murine models of induced T2D while promoting the growth of mucin-degrading bacteria (e.g., Akkermansia muciniphila) and SCFA-producing bacteria (e.g., Roseburia and Faecalibacterium prausnitzii)9. The increased abundance of these bacteria is associated with higher levels of goblet cells (mucin-producing), improved glucose tolerance, and reduced proinflammatory interleukin-6 (IL-6)15.

On the other hand, linagliptin, a widely used DPP-4i in T2D, is known for its cardiovascular and renal safety. Its efficacy in prediabetes has been demonstrated, improving glucose metabolism and pancreatic islet function7,16. Incretin-based therapies utilizing DPP-4is are based on the insulinotropic action of glucagon-like peptide 1 (GLP-1)17. By increasing endogenous GLP-1 and insulin levels and reducing glucagon secretion18, DPP-4i effectively lowers postprandial blood glucose levels by inhibiting incretin hormone degradation. Although some studies suggest a link between DPP-4is and GM, their exact mechanisms remain unclear.

To evaluate how changes in GM are associated with the clinical response, we assessed the impact of linagliptin/metformin (LM) versus metformin alone (M) on GM composition and its association with insulin sensitivity (Matsuda Index, IS) and pancreatic β-cell function (Pβf) in Mexican patients with prediabetes. Our results indicate that linagliptin/metformin is more clinically effective than metformin alone, and the contribution of GM to the clinical response is relatively low.

Results

Participants and clinical outcomes after the intervention

The patients in this study were part of the diabetes prevention trial PRELLIM, a double-blind, randomized parallel clinical trial comparing linagliptin + metformin + lifestyle (LM) to metformin + lifestyle (M) in terms of their effects on glucose metabolism, insulin resistance (IR), and pancreatic islet function [ClinicalTrials.gov ID: NCT03004612 (22/12/2016)]7. All participants in PRELLIM were encouraged to participate in the microbiome study. Then, between August 2018 and December 2019, 222 participants were assessed for eligibility and 167 eligible participants were included. These participants were randomly assigned to one of two distinct treatment groups. Specifically, 51 participants were included in the no-treatment group (basal evaluation), 55 at six months follow-up (35 in the M group and 20 in the LM group), and 61 at 12 months follow-up (28 in the M group and 33 in the LM group) (Fig. 1). Unfortunately, not all participants provided a stool sample during follow-up, and some did not complete the prescribed number of follow-up sessions. Accordingly, we organized the data for analysis as a cross-sectional study of groups at different points in time. Specifically, 205 samples were included in the analysis: 65 in the untreated group (baseline evaluation), 77 at the six-month follow-up (44 in the M group and 33 in the LM group), and 63 at the 12-month follow-up (28 in the group M and 35 in group LM). Since the original objective of the study was to observe changes in GM composition over time and its implications on clinical response, we obtained 38 stool samples corresponding to the participants' follow-up at months 6 (n = 12) and 12 (n = 24). Additional results from a dependent contrast analysis are shown in the supplementary material only in the subgroup of 24 participants who completed follow-up at 6 and 12 months (Table S3). Multiple cardio-metabolic risk factors were present in the whole studied population: obesity (51.6%), high total cholesterol (32%), low-HDL (72%), high triglycerides (63.7%), and high blood pressure (25.5%), without any significant difference between groups. None of the patients took medications or supplements affecting glucose metabolism or gut microbiota composition.

Figure 1
figure 1

Study profile. This work analyzed the data into two distinct analytical categories: (1) The participants were 51 in the no-treatment group during the baseline evaluation, 55 at the six-month follow-up (with 35 in the M group and 20 in the LM group), and 61 at the 12-month follow-up (comprising 28 in the M group and 33 in the LM group). (2) The gut microbiota samples were 65 in the untreated group (baseline evaluation), 77 at the six-month follow-up (44 in the M group and 33 in the LM group), and 63 at the 12-month follow-up (28 in the group M and 35 in group LM). Created with BioRender.com. LM: The combination of linagliptin + metformin + lifestyle. M: only metformin + lifestyle.

Insulin sensitivity and pancreatic β-cell function improved at six and 12 months of follow-up

Age was lower in subjects without pharmacological treatment than at six and 12 months of follow-up (p = 0.0347). Consistent with the original PRELLIM publication, BMI and adiposity showed progressive reductions in this subset of patients at 6 and 12 months. Glucose levels, insulin resistance (IR) index and Pβf significantly improved at six and 12 months of follow-up compared to the baseline group (Table 1) (Tables S1, S2, and S3). Pβf significantly improved in both treatment groups at six months [C-d 1.113 (0.716–1.507), p = 0.00001] and 12 months [C-d 1.056 (0.650–1.457), p = 0.00001], with a more pronounced improvement in the LM group at both six months [G-Δ 0.3597 (0.189–0.798)] and 12 months [C-d 0.2537 (0.104–0.659)] (Fig. 2 and Table S4).

Table 1 Characteristics of the study populations.
Figure 2
figure 2

Insulin sensitivity and pancreatic β-cells function. (A) IS upper panel; Matsuda index and lower panel; HOMA-IR at baseline, six and 12 months in groups M and LM. (B) Basal Pβf, six and 12 months in groups M and LM. Upper left panel; AIR, upper right panel; ORAL-DI, lower left panel; Disp_Index2 and lower right panel; AUCinsgluc_OGTT. (C) Glucose and insulin levels during baseline OGTT, six-month and 12-month follow-up: upper panel; glucose and lower panel; insulin. The monthly follow-up was compared with the monthly follow-up. *P < 0.05, **P < 0.01 and ***P < 0.001, one-way ANOVA. IS insulin sensitivity, Pβf pancreatic β-cells function, HOMA-IR Insulin Resistance Index, AIR acute insulin response, β-Cell function2 (Matsuda*(IncAUCins0-120/IncAUCgluc0-120)), Oral_DI insulin disposition index, AUCinsgluc_OGTT glucose area under the curve, OGTT curve of oral glucose tolerance.

Gut microbiome analysis showed a significant impact of hypoglycemic drugs on humans

A total of 17,355,182 quality reads were generated from 203 samples, resulting in 3,713 amplicon sequence variants (ASVs). Individual rarefaction curves indicated high sampling coverage in all samples (Fig. S1). Ecological analysis evaluated α diversity using the non-parametric Kruskal–Wallis test and found no significant changes between groups for Indices Chao 1 (p = 0.550) (Fig. S2A), Simpson (p = 0.817) (Fig. S2B), Shannon (p = 0.796) (Fig. S2C), and Pielou (p = 0.469) (Fig. S2D). β diversity analysis with Jaccard distances showed no discernible grouping pattern in microbial diversity between the analysis groups (Fig. S3).

Spervised machine learning: explainable analysis approach

Using the Random Forest algorithm with post-hoc explanations, we classified different hypoglycemic drugs (LM and M) (Fig. 3). Therapeutic interventions modified GM composition at six and 12 months of follow-up. AUC-ROC (Area Under the Receiver Operating Characteristic Curve) mean values were 0.79 and 0.74 for basal vs. M groups at six and 12 months, respectively (Table S5). Genera Subdoligranulum, Ruminococcaceae_DTU089, Catabacter, Ruminiclostridium 5, and Escherichia-Shigella identified the basal vs. M group at six months (Fig. 4A). Relevant bacterial genera (Fusicatenibacter, Blautia, and [Ruminococcus] gauvreauii group), all SCFA producers, identified prediabetes subjects treated for six months with M (Fig. 4A). In contrast, we found eight relevant SCFA-producing bacterial genera (Fusicatenibacter, Atopobiaceae_uncultured, Coprococcus 1, Lachnospiraceae ND3007 group, Anaerostipes, Dorea, Lachnospiraceae FCS020 group, and Blautia) in patients treated with M for 12 months (Fig. 4B). The effect of M at 12 months suggests cumulative impact, restructuring GM composition and favoring increased SCFA-producing genera abundance22.

Figure 3
figure 3

Schematic diagram of the proposed procedure for the data's clinical and GM analysis. It consists of (A) PRELLIM data, (B) Taxonomic and ecological analysis, and (C) Explainable Machine Learning analysis, generalized linear mixed models (GLMM), and structural equation models (SEM). The sequences obtained in the sequencing were processed using the QIIME2 (Quantitative Insights Into Microbial Ecology) analysis platform19. The taxonomic classification for end sequence variants was performed using the SILVA database (version 132). The analysis of the GM was performed using the phyloseq object20; through this object, the α and β diversity of the study groups in the different months of follow-up was obtained. Subsequently, with the abundance of the other bacterial genera, the genera that classify each pharmacological treatment in the different months of follow-up were obtained using machine learning algorithms (random forest)21. Once the bacteria that classified the subjects with M and LM treatment were identified, hierarchical linear regression models were performed to eliminate the effect of confounding variables (obesity, age, and gender). In addition, GLMM and, finally, mediation analysis using SEM. Created with BioRender.com.

Figure 4
figure 4

SHAP graph for each hypoglycemic drug and month of follow-up. The figure shows the first ten bacterial genera with the most significant contribution to classifying patients without hypoglycemic drugs (baseline) and those treated with M and LM at six and 12 months of follow-up and between treatments. We have ordered the bacterial genera from most to least relevant from top to bottom. Blue and red represent bacterial genera's low and high abundance, respectively. The higher positive values on the SHAP axis establish the relevance of bacterial genera to classify patients with M or LM at six or 12 months of follow-up, while the negative values establish the relevance of bacterial genera for patients with prediabetes without treatment. (A) Baseline vs M with six-month follow-up. (B) Baseline vs M with 12-month follow-up. (C) Baseline vs LM with six-month follow-up. (D) Baseline vs LM with 12-month follow-up. (E) M vs LM with a six-month follow-up, and (F) M vs LM with a 12-month follow-up.

Next, we compared the basal groups with the LM group at six and 12 months of follow-up; the mean AUC-ROC was 0.6 and 0.66, respectively (Table S5). Seven relevant bacterial genera were identified for classifying subjects with six months of LM treatment: Barnesiella, Coprococcus 1, Rhotia, Roseburia, [Eubacterium] hallii group, Coprococcus 2, and Bifidobacterium. Escherichia-Shigella classified the basal group and reduced it with hypoglycemic drugs23. [Eubacterium] hallii group classified subjects treated with LM at both times. Fusicatenibacter, Lachnospiraceae FCS020 group, and Enterorhabdus were characteristic in the LM group, all SCFA-producing genera11,24,25,26 (Fig. 4C,D).

On the other hand, we identified bacterial genera in both M and LM intervention groups at six and 12 months of follow-up. At six months, Rothia, [Eubacterium] Ruminantium group, Christensenellaceae_uncultured, Ruminiclostridium 5, Turicibacter, Ruminococcaceae UCG-002, Barnesiella, Clostridium Sensu Stricto 1, and Oscillibacter classified the LM group (Fig. 4E). In contrast, at 12 months with LM, Enterobacteriaceae, Holdemania, and Christensenellaceae_uncultured were characteristic genera (Fig. 4F). The mean AUC-ROC for the M vs LM comparison was 0.66 and 0.74, respectively. Interestingly, we observed that the compositional change produced by LM at six and 12 months is more significant in pathogenic bacteria involved in developing T2D. SCFA-producing genera favor classifying subjects who consumed M, regardless of follow-up month.

Hypoglycemic drugs and the change in the clinical-metabolic condition explain the increase in insulin sensitivity and pancreatic β-cell function

After machine learning analysis and classifying bacteria based on hypoglycemic drugs and months of follow-up, we conducted Generalized Linear Mixed Models (GLMM) considering IS indices and Pβf. Thirty-five significant genera were identified for IS (Matsuda index)27: Roseburia (b = − 0.042), Erysipelotrichaceae UCG-003 (b = 0.048), Butyrycicoccus (b = 0.041), [Eubacterium] xylanophilum group (b = − 0.045), and Eggerthellaceae uncultured (b = − 0.055)27. For Pβf (Disp_Index2), significant bacteria included Dorea (b = 0.101), Parabacteroides (b = − 0.111), Catenibacterium (b = 0.233), and Holdemania (b = 0.126) (Table 2) (Fig. 5A). We considered a bidirectional relationship between the response variables (IS and Pβf) and bacteria with statistical significance. The abundance of bacterial genera Butyrycicoccus (b = 0.666), [Eubacterium] xylanophilum group (b = − 0.630), Granulicatella (b = 0.544), Ruminococcaceae UCG-008 (b = − 0.366), Prevotella 6 (b = −  0.480), and Catabacter (b = − 0.508) was modified by the change in IS. Conversely, Negativibacillus (b = 0.972) and Lachnospiraceae UCG-004 (b = − 1.069) were statistically significant for increased Pβf. Models were adjusted for months of follow-up, hypoglycemic drugs, sex, age, BMI, IS, and Pβf (Table S6).

Table 2 GLMM for IS, IR, and Pβf indices and effect size and confidence interval for each bacterium.
Figure 5
figure 5

Heatmap resulted from the analysis of GLMM and SEM with standardized path coefficients. (A) The color scale represents the β coefficient of the GLMM; when it is red, it means a positive β, and negative when it is blue. All bacteria represented in the heatmap are statistically significant. (B) Catabacter SEM, the baseline model, showed a relationship between BMI and IS (Matsuda index) and the relationship between BMI and the genus Catabacter. Lachnospiraceae UCG-004 SEM; the baseline model shows a relationship between BMI and IS (Matsuda index) and the relationship between BMI and the genus Lachnospiraceae UCG-004. M is a model for metformin treatment, and LM is the model for linagliptin/metformin treatment. **p < 0.05**p < 0.05.

We evaluated the Bacteria X Treatment and Bacteria X Time interaction for IS and Pβf. In the Bacteria X Treatment interaction, we found a significant increase in IS in the LM group associated with Erysipelotrichaceae UCG-003 (p = 0.008). For Pβf, Erysipelotrichaceae UCG-003 showed statistical significance in the interaction with both LM (p = 0.025) and M (p = 0.048), while Lachnospiraceae UCG-004 (p = 0.032) and Turicibacter (p = 0.003) showed significance with M. In the Bacteria X Time interaction, a significant increase in IS at six months was observed with LM treatment for the bacterial genera Roseburia (p = 0.031), Erysipelotrichaceae UCG-003 (p = 0.018), Lachnospiraceae NK4A136 (p = 0.035), and Bilophila (p = 0.035). For Pβf, Erysipelotrichaceae UCG-003 (p = 0.004), Lachnospiraceae NK4A136 (p = 0.048), Granulicatella (p = 0.044), and Turicibacter (p = 0.046) showed significance at 6 months, and Bilophila (p = 0.028) and Lachnospiraceae UCG-004 (p = 0.011) showed significance at 12 months. There was no effect of treatment by time interaction.

Statistically significant bacteria from the GLMM models (with bidirectional effects) were used in subsequent statistical analyses: hierarchical linear regression and structural equations model (SEM). These analyses explored the relationship between GM composition, IS, and Pβf in prediabetes patients treated with hypoglycemic drugs. Hierarchical block models for linear regression revealed that hypoglycemic drugs and BMI significantly explained the increase in IS and Pβf. In the IS model, hypoglycemic drugs contributed a 10% increase (model 1 in Table 3). The addition of gender and age increased the explanation to 14% (model 2 in Table 3), and BMI further increased it to 25% (model 3 in Table 3). However, including the 17 representative genera raised the explanation to 33% (model 4 in Table 3). Among the bacteria analyzed, two were statistically significant: Catenibacterium (β = 0.079, p < 0.05) and Catabacter (β = − 0.083, p < 0.05).

Table 3 A hierarchical model for the IS (Matsuda Index).

For Pβf, hypoglycemic drugs explain 22% (model 1 in Table 4). Including gender and age, they reduced the explanation to 21% (model 2 in Table 4), while the addition of BMI explained only 23% (model 3 in Table 4). However, by incorporating the bacterial genera, the variability in the model increased to 35% (model 4 in Table 4). Two bacteria were statistically significant: Negativibacilis (β = 0.061, p < 0.05) and Lachnospiraceae UCG-004 (β = − 0.083, p < 0.001).

Table 4 A hierarchical model for the Pβf. Adjustment for hypoglycemic drugs was set for all models.

The intro method was selected in which all block variables are introduced in a single step. Stepwise regression: hypoglycemic drugs in model 1; gender and age in model 2; BMI in model 3; and bacterial genera in model 4.

Structural equation modeling (SEM)

To assess the mediating effect of GM on the increase in IS and Pβf in subjects treated with hypoglycemic drugs, we conducted SEM using two significant genera from the GLMM analysis along with a hierarchical model. The SEM also considered BMI to evaluate the impact of weight reduction on clinical variables. Catabacter and Lachnospiraceae UCG-004 were selected as mediator genera for IS and Pβf, respectively, based on the highest beta values and statistical significance obtained in model four of the hierarchical model (see Tables 3 and 4).

In the following SEM, the Matsuda index (IS) is considered the dependent variable, and the genus Catabacter is the mediating variable. BMI is strongly associated with the Matsuda index, with significant coefficients in the basal group (b = − 0.24, p = 0.022), the M group (b = − 0.27, p = 0.170), and the LM group (b = 0.48, p = 0.001). However, BMI showed a weak effect on the genus Catabacter in all groups (b = 0.23, p = 0.917 basal group; b = − 0.14, p = 0.411 M group; b = − 0.19, p = 0.960 LM group) and a negligible impact on the Matsuda index (b = − 0.27, p = 0.062 baseline group; b = − 0.064, p = 0.095 M group; b = − 0.18, p = 0.510 LM group), except for the LM group, where it was statistically significant (Fig. 5B).

In contrast, we assessed Pβf with the genus Lachnospiraceae UCG-004 as the mediating variable. BMI had a minor effect on Pβf [BMI—> Pβf (b = − 0.13, p = 0.022) in the basal group, (b = − 0.12, p = 0.170) for the M group, and (b = 0.22, p = 0.001) in the LM group]. BMI did not impact the genus Lachnospiraceae UCG-004 (b = 0.19 basal group, p = − 0.06; b = − 0.64, p = 0.411 M group; b = 0.16, p = 0.960 LM group). Conversely, Lachnospiraceae UCG-004 strongly correlated with Pβf (b = 0.057, p = 0.062 reference group; b = − 0.44, p = 0.095 M group; b = − 0.37, p = 0.510 LM group) in the hypoglycemic drug treatment groups, where they were statistically significant (Fig. 5B). These genera have been classified as keystone taxa in other studies with T2D patients28,29,30. These SEMs reveal that bacterial genera partially mediate clinical indices. Conversely, BMI and hypoglycemic drugs are strongly associated with IS and Pβf indices. Additionally, bacterial genera exhibited a weak association with IS and a strong association with Pβf.

To examine the impact of hypoglycemic drugs and the new metabolic condition on bacterial genera abundance, we conducted GLMM (Tweedie distribution) with bacterial genera as the dependent variable and hypoglycemic drugs as the independent variable. The model was adjusted for gender, age, BMI, and the Matsuda index or Pβf. We observed negative effects for Catabacter on both IS (β = − 0.503, Eta-Squared = 0.031, p = 0.031) (Table S7) and Pβf (β = − 0.990, Eta-Squared = 0.064, p = 0.002) (Table S8). These results indicate that the drug-induced changes in clinical indices significantly influence the abundance of the bacteria. It underscores the importance of considering the variation in the metabolic condition to comprehend the complexity of the ecological structure of the intestinal microbiota in subjects with prediabetes.

Discussion

PRELLIM study was the first randomized clinical trial to evaluate the effect of combined (linagliptin/metformin) therapy on GM in prediabetes subjects with a basal control and metformin monotherapy as a comparator. Consistent with previous reports7, at six and 12 months, both drugs significantly improved anthropometric, biochemical, and clinical parameters. The M group increased IS more, while LM favored Pβf. M and LM interventions impacted GM composition at the genus level, resembling findings in T2D. However, novel and specific changes in microbial structure were identified at six and 12 months with both drugs, affecting bacterial genera and ASVs.

The LM intervention impacts GM composition at the genus level, similar to effects observed in animal models using other DPP-4is. Qian Zhang et al. demonstrated vildagliptin's ability to increase butyrate-producing bacteria in rats induced to T2D31. Lin Wang et al. explored GM modulation by liraglutide (GLP-1 receptor agonist) and saxagliptin (DPP-4i); they reported increased levels of Lactobacillus, Allobaculum, and Turicibacter in mice treated with saxagliptin, indicating possible increment in incretins and their effect on glucose homeostasis32. Our study results are consistent with an enrichment of butyrate-producing bacteria, including Lactobacillus, Roseburia, Fusicatenibacter, and Blautia. These genera promote peptide production in the ileum, indirectly reducing hepatic expression of proinflammatory cytokines in T2D13,33. Notably, changes in GM composition with each hypoglycemic drug over time differed, particularly in the number of increased SCFA-producing bacteria in each group. Nevertheless, microbial functionality was maintained in each group, a characteristic of GM known as redundancy functions, suggesting species interchangeability within a given microbiota in terms of function34.

Current knowledge has established that metformin reduces GM diversity in diabetic mice fed with a high-fat diet, which increases Akkermansia muciniphila abundance and other SCFA-producing and mucin-degrading genera9. A meta-analysis confirmed the alteration of GM composition by metformin35. In our study, Metformin induces GM changes over time in prediabetes patients, partly differing from T2D evidence. At six months, SCFA-producing bacteria (Fusicatenibacter, Blautia, and [Ruminococcus] gauvreauii) increased, inhibiting enteropathogenic and LPS-producing bacteria (Proteobacteria, Escherichia-Shigella, and Enterococcus). These differences may result from distinct physiopathological deterioration between prediabetes and T2D36.

Interestingly, after a 12-month follow-up, we identified eight abundant SCFA-producing genera: Fusicatenibacter, Atopobiaceae, Coprococcus 1, Lachnospiraceae ND 3007 group, Anaerostipes, Dorea, Lachnospiraceae FCS020 group, and Blautia. Surprisingly, Akkermansia and other mucin-degrading bacteria were not significant in subjects with prediabetes, contrary to T2D reports37,38,39. However, the reduction in opportunistic pathogens and T2D-associated genera found in the M-treated group for 12 months aligns with worldwide studies23,38. Contrasting both interventions, LM increased bacterial genera, including SCFA producers and opportunistic pathogens, without establishing a clear pattern on the functional redundancy of the GM. Our results suggest that multiple coexisting and taxonomically distinct organisms perform diverse metabolic functions34,40,41 (see Fig. 4). We observed that the increase in SCFA-producing genera mediated the change in GM composition.

SCFA-producing genera are crucial for host health. Butyrate is vital to human insulin sensitivity (IS) through incretins. In a high-fat diet mouse model, butyrate supplementation prevents weight gain and increases IS. Butyrate and propionate induce intestinal gluconeogenesis, improving peripheral glucose production and IS42. GM changes influence the gut metabolome, affecting butyrate and acetate production43, key gut-derived metabolites in insulin resistance (IR) and glycemic control. Increased intestinal gluconeogenesis from these SCFAs in rodents reduces hepatic gluconeogenesis, appetite, and weight, leading to better glucose homeostasis44.

Most studies focus on short-term GM and glucose homeostasis associations. Moreover, the impact of hypoglycemic drugs on GM changes and metabolic improvement remains unclear45. Our findings show low associations (r = 0.1–0.2) between bacterial abundance of specific genera and clinical variables (fasting glucose, postprandial glucose, glucose AUC, HOMA-RI, or Matsuda Index) in diverse populations and study models. However, hypoglycemic drugs affect GM structure, anthropometric (reduced weight, body fat, and waist circumference), biochemical (increased IS and Pβf), and clinical (lower systolic and diastolic blood pressure) parameters, influencing the associations. In our study, hypoglycemic drugs improved clinical indices with a low GM contribution, as previously reported46. Using SEM, we found a strong relationship between BMI (not total adiposity) with IS and Pβf. Metformin and linagliptin effectively reduced weight and fat percentage in overweight and obese insulin-resistant outpatients47. Weight loss was the primary predictor of improved IS, while weight regain predicted reduced IS. Weight loss maintenance programs are crucial for preserving metabolic benefits. Physical activity and a balanced diet increase IS in patients with obesity and T2D48.

On the other hand, the bacterial abundance of SCFA-producing genera weakly explained the changes in IS and Pβf, with an eta squared of 0.01 in both cases, compared to the effects of hypoglycemic drugs and weight loss (See Table S6 and S7). Thus, we conclude that hypoglycemic drugs strongly impact metabolic conditions (IS and secretion) and moderately influence GM's composition. On the other hand, GM has a lower effect on metabolic changes. To reinforce this, we evaluated bacterial genera abundance using a GLMM, adjusting for hypoglycemic drugs, change in IS, and increased Pβf. We found that the shift in metabolic condition modifies the GM's structure.

Recent findings suggest that gut dysbiosis is linked to metabolic diseases like obesity, diabetes, and non-alcoholic fatty liver disease49. These discoveries support the coevolution theory between humans and the GM, profoundly affecting various host responses. It is clear that multiple variables influence glucose metabolism in prediabetes and diabetes prevention, and it cannot be explained by only one factor; in this context, GM seems to play a limited role, which still has to be elucidated in more detail. A limitation of our study is that we only measured GM composition and didn't analyze metabolites or other microbiota functionality measures. Nonetheless, our study provides the first evaluation of the effect of DPP-4 inhibitors on GM composition in humans with prediabetes.

Conclusions

Our study reveals that changes in GM have a low impact in mediating the effect of lifestyle, metformin, and linagliptin/metformin on glucose metabolism, IS, and Pβf in individuals with prediabetes. Despite the observed increase in SCFA-producing bacteria in the GM following these treatments, the SEM suggests a weak association between specific bacterial genera and improvements in IS and Pβf. Therefore, the primary mechanism of metabolic improvement in prediabetic patients is more directly attributable to the pharmacological effects of the hypoglycemic drugs with only a partial modulation of the GM. Future omics studies with long-term follow-up will determine the extent of drugs' hypoglycemic effect via GM modifications and its role in T2D development and progression.

Methods

Trial design and oversight

This study was part of a randomized, double-blind, placebo-controlled clinical trial [ClinicalTrials.gov ID: NCT03004612 (22/12/2016)]. Participants were enrolled between August 2018 and December 2019 as part of the PRELLIM project7 (Prevention of diabetes with linagliptin, lifestyle, and metformin). Further details are in the PRELLIM article7.

Participants and intervention procedure

Eligible participants with prediabetes (per ADA criteria) and no prior glycemic medication were randomly assigned to two groups in a 1:1 ratio: (i) Linagliptin + metformin + lifestyle (LM group): patients started on linagliptin/metformin 2.5/850 mg once daily for a month, then increased to twice daily until study end. (ii) Metformin + lifestyle (group M): patients began with 850 mg metformin once daily, then increased to twice daily. Identical envelopes contained metformin 850 mg and linagliptin/metformin 2.5/850 mg. Both groups received the same lifestyle program. Monthly follow-up visits assessed adherence and side effects and included nutritional evaluation. OGTTs were done at baseline, six, and 12 months. Primary outcomes were changes in the GM composition; Glucose levels, insulin resistance, and pancreatic β-cell function were secondary outcomes7.

The detailed inclusion criteria

167 patients were screened with anthropometric, nutritional, biochemical, and metabolic evaluation, including oral glucose tolerance test and hyperglycemic clamp at the Metabolic Research Laboratory, Hospital Regional de Alta Especialidad del Bajío. Patients were eligible for enrollment in the study based on the following criteria: (i) IGT (two h glucose levels 140–199 mg/dL) during oral glucose tolerance test, ± IFG (fasting glucose 100–125 mg/dL); (ii) age 18–65 years; (iii) ≥ 2 T2D risk factors per ADA50.

Exclusion criteria: (i) glucose-affecting treatments in past three months; (ii) glucose/metabolism-related conditions; (iii) recent use of antibiotics, proton pump inhibitors, or pharmaceutical-grade probiotics; (iv) fasting plasma triglycerides > 400 mg/dL; (v) pregnancy; (vi) systolic blood pressure > 180 mm Hg or diastolic > 105 mm Hg; (vii) glucose homeostasis-affecting meds/conditions (e.g., thiazides, beta-blockers, glucocorticoids, weight-reducing drugs, Cushing's syndrome, thyrotoxicosis).

To carry out this research, the ethical standards, the Regulations of the General Health Law on Research for Health, and the Declaration of Helsinki of the World Medical Association of the 52nd General Assembly, Edinburgh, Scotland, October 2000 have been considered with clarification note on paragraph 29 added by the General Assembly, Washington 2002 and current international codes and standards of good clinical research practice. Written informed consent was obtained from all participants before enrollment in this study. The Research and Ethical Committee approved the study at the Hospital Regional de Alta Especialidad del Bajío (CI-HRAEB-2017-048 and CEI-22-16 extension), registered with the Mexico Secretary of Health.

Anthropometric measures

Weight and body composition were assessed via the Tanita SC-240 Scale: Monthly weight recording and bioimpedance every six months in fasting conditions. Total body fat was measured in %, and visceral fat was measured in arbitrary units7.

Oral glucose tolerance test (OGTT)

Subjects arrived at the University of Guanajuato's Metabolic Research Laboratory between 7 and 8 a.m., fasting. An intravenous catheter was placed, and the first blood sample was drawn. Next, they ingested 75 g of glucose. Serum samples for glucose and insulin measurement were drawn at − 15 and 0 min and every 30 min afterward for two hours, with 4 ml of blood taken each time7.

Measurements

Glucose was measured with Analox glucoanalyzer GM9 and colorimetric glucose oxidase (Vitros 5600; Ortho Clinical Diagnostics). Lipid levels were measured by dry chemistry with colorimetric method (Vitros 5600; Ortho Clinical Diagnostics). Insulin (μU/ml) and C-peptide (ng/ml) were measured by chemiluminescent immunometric assay (IMMULITE 2000 Immunoassay system, Siemens). HbA1c was determined using high-performance liquid chromatography with DS-5 Analyzer (Drew Scientific, Inc. Miami, FL, USA). Personnel performing measurements were blinded to treatment allocation7.

Calculations

OGTT measurements: Incremental and AUC calculated via trapezoidal rule. Insulin secretion derived from AUCinsulin0_120 (μU/ml) divided by AUCglucose0_120 (mg/dl) during OGTT. Pβf measurements obtained: (i) AIR calculated as insulin change (μU/ml) from 0 to 30 min divided by glucose change (mg/dl) from 0 to 30 min during OGTT; (ii) IS/IR index (DI) during OGTT calculated as (AUCinsulin0_120 (μU/ml)/AUCglucose0_120 (mg/dl))*Matsuda index and (IncAUCinsulin0_120 (μU/ml)/IncAUCglucose0_120 (mg/dl))*Matsuda index; (iii) Oral disposition index calculated as the product of AIR and 1/fasting insulin (μU/ml). Insulin sensitivity during OGTT calculated with Matsuda index (10,000 √[(Glucosefasting (mg/dl) insulinfasting (μU/ml)) × (Glucosemean × Insulinmean)]) and fasting with homeostasis model assessment (HOMA-IR = fasting insulin (μU/ml) × fasting glucose (mg/dl)/405)51.

Randomization and masking

Patients were randomly assigned in a 1:1 ratio to receive a fixed combination of linagliptin/metformin 2.5/850 mg every 12 h + lifestyle modification program or metformin pills of 850 mg every 12 h + lifestyle modification program. Randomization was performed by a nutritionist who was not involved in the patient's follow-up using an electronic random numbers assignment system. Participants and investigators involved in the patient’s follow-up and outcome measurements were masked to treatment allocation during the entire study using identical envelopes for pills7.

Interventions

(i) Linagliptin + metformin + lifestyle (LM group): Patients allocated to this group started fixed combination pills of linagliptin/metformin 2.5/850 mg once daily during the first month, and after that, the dose was increased to 2.5/850 mg twice daily until the end of the study. (ii) Metformin + lifestyle (M group): Patients in this group started taking metformin pills of 850 mg once daily during the first month and increased to 850 mg twice daily until the end of the study. Pills of metformin 850 mg and linagliptin/metformin 2.5/850 mg were prepared using identical envelopes. Both groups received the same lifestyle implementation program based on a prescribed diet to reduce their body weight by at least 5–7%, adjusting their energy requirements based on their weight, and composed from 55 to 60% of carbohydrates, 25–30% fat, and 10–15% proteins. Patients were advised to start with 45 min/week of mild-moderate exercise and increase the duration and frequency or intensity of exercise every two weeks until reaching 150 min/week of moderate activity or 75 min/week of intense activity7.

Fecal sample collection and processing protocol

Fecal samples from intervention and control groups were collected in sterile containers at zero, six, and twelve months. Samples were homogenized and stored at  − 80 °C in sterile 1 ml screw-cap tubes before DNA extraction. DNA extraction and 16S rRNA Gene Amplification and sequencing protocol are shown in supplementary material protocol S123.

Processing of 16S sequencing data

Demultiplexed MiSeq FASTQ files were analyzed in QIIME2 using the DADA2 workflow. High read quality is ensured by filtering and trimming reads before processing. The first 5′ 10 bp of all reads were trimmed, and reads truncated on 3' to max 240 and 200 bp for forward and reverse reads, respectively, due to quality dip. Reads with > 2 expected errors under Illumina base model removed. Filtered and trimmed reads are grouped by sequencing run, and the error model fits separately for each run using DADA2 default parameters. Sequence variants were obtained for each run separately using calculated error models and dereplicated input sequences. Sequence variants and counts joined across all runs in the complete sequence table, and de novo chimera removal runs on the entire table23.

The final sequence variants taxonomy was assigned to DADA2's RDP classifier using the SILVA database (version 132). Species are identified separately via exact sequence matches (SILVA version 132). Joined with clinical metadata and saved as a phyloseq object for downstream analyses23.

Taxonomic and ecological analysis

A Phyloseq object was used to calculate alpha diversity indexes (i.e., Chao 1, Simpson, Shannon, and Pielou indexes) and β diversity index (Jaccard), computed by R Phyloseq library 1.34.052.

Supervised machine-learning: explainability analysis approach

To identify bacterial genera associated with different treatments (LM and M), we used the Random Forest algorithm, an ensemble method based on uncorrelated decision trees using the bagging technique. We compared various algorithms (decision tree, logistic regression, naive Bayes, and XGBoost) and selected Random Forest for its predictive performance and interpretability with SHAP values. Python3 (version 3.9.7) with the software library was used for calculations. We labeled data count matrices for M and LM-treated patients as 0 and 1, respectively. 75% of the samples were randomly chosen as the training dataset and the rest as the test dataset. To validate, the Random Forest model was built and evaluated with K-fold cross-validation (n_split = 5) to ensure independent results (Figueroa et al., 2012; Mentch and Hooker, 2016). This involved dividing the data into five equal proportions, using four for training and one for testing each run. Model performance was assessed with the AUC of ROC curves (see Table S5).

Random Forest classifiers that support the (place which) in the main text are reported in the machine learning section at https://github.com/resendislab/Microbiome_two_treatments_Metformine-Linagliptine.

Moreover, we assess the relevance of bacterial genera with Shap values (Shapley additive explanations) using TreeExplainer for the Random Forest algorithm53. Shap values use a game-theoretic approach for the best model interpretation and explanation.

Finally, we pairwise compared baseline groups vs both treatments (M and LM) and treatments (M vs LM) at 6 and 12 months of follow-up. SHAP values were used to explore the relevance of the classification process for each genus, quantifying its contribution to classification53. Microbiota data faces challenges of technical noise, zero-inflated abundance distribution, and high-dimensionality54. However, the random forest model effectively classifies and analyzes microbiome data under these conditions55. AUC-ROC was 0.79 and 0.74 for the basal vs M group at 6 and 12 months, respectively, indicating successful training and test data set selection.

Statistical analyses for 16S metagenomics and their correlations with clinical parameters

Statistical analyses for clinical parameters

We estimated the required sample size to observe an effect on GM changes among the treated groups. Briefly, the sample size for this study was determined both a priori and a posteriori using different analytical approaches. In the a priori analysis, we employed an ANOVA (Analysis of Variance) with repeated measures, factoring in two effective groups and three time-point measurements. We aimed for an effect size of 0.25 and set the β error (Type II error) at 20%. This calculation indicated a required sample size of 48 patients. To account for potential dropouts and ensure robustness in our data, we increased this number by 20%, including 167 subjects. For a posteriori analysis, our approach differed slightly. We utilized an ANOVA without repeated measures and considered three effective groups. The effect size was set at 0.01 (eta squared). Under these parameters, the power of our sample was calculated to be 75%. Primary analysis: GM composition change and its relation to IR and insulin secretion in prediabetes patients. The effect of hypoglycemic drugs on IS and β-cells function is analyzed at two levels: (1) changes over follow-up months, and (2) considering drugs and follow-up months. The student's t-test for the first level included independent contrast between basal and 6/12-month groups and dependent contrast between 6 to 12-month subjects. Cohen's d (C-d) and 100-repetition bootstrap 95% confidence intervals were calculated as standardized effect sizes. For the second level, one-way ANOVA was performed, comparing means of clinical parameters for three treatment groups and follow-up months. Two-way ANOVA examined the Time X Treatment impact, and Scheffe's post-hoc was used to identify differences between groups. Logarithmic transformation (base 10) for quantitative variables normalization with bias, and Van der Wader transformation for bacterial abundances.

After machine learning analysis identifying bacteria classifying each group by hypoglycemic drugs and follow-up months, GLMM was performed with IS indices and Pβf as dependent variables. Fixed factors included study group, follow-up month, sex, age, bacteria genera used for classification (Random Forest), and anthropometric parameters (BMI, weight, % body fat, waist circumference). Random factors were stool samples of study subjects. Eta-squared and 95% confidence intervals were calculated with 100 repetitions bootstrap. Interactions (bacteria X Treatment, Bacteria X Time, and Treatment X Time) evaluated. For bidirectional effect, models are executed with diversity and microbial abundance as dependent variables, including study groups, sex, age, IS and Pβf indices, or anthropometric parameters (BMI, weight, % body fat, waist circumference) as fixed factors. Bacteria with significance in both models were considered for subsequent statistical and structural equation modeling (SEM) analysis. Statistical analysis used Stata/SE 16.0, IBM-SPSS version 25, and RStudio version 4.1.1.

Statistical analysis of 16S metagenomics and its relationship with clinical parameters

To establish GM composition's relationship with IR and insulin secretion in prediabetes patients on hypoglycemic drugs, hierarchical linear regression, GLMM, and structural equation modeling tests were performed. Van der Wader transformation was applied to each bacterial genus for the tests. Statistical analysis was done using Stata/SE 16.0, IBM-SPSS version 25, and RStudio version 4.1.1.

Linear regression was used with IS indices and Pβf as dependent variables. Initially, regression models estimated treatment and follow-up months (by subject) as the main effects and interactions. Subsequently, models were estimated by visiting, sex, and age. Residuals and effect size estimated for each.

The structural equation model (SEM) was used to mediate between BMI, bacterial abundance, significant components, insulin secretion, and IS indices. Coefficients are estimated by a robust method.

Mixed models were used to study hypoglycemic drugs and the effects of new metabolic conditions on bacterial genera. GLMM (Tweedie distribution) was used, with bacterial genera as the dependent variable and hypoglycemic drugs as the independent variable. The model was adjusted for gender, age, BMI, and Matsuda index or Pβf. A stool sample is used as a random effect to consider the within-patient correlation with repeated measures.