Cardiovascular diseases (CVDs), including ischemic heart disease and stroke, are the leading causes of death worldwide. An estimated 20.5 million people died from CVDs in 2021, accounting for roughly one-third of all global deaths and becoming the leading cause of premature noncommunicable disease mortality, particularly in low- and middle-income countries1. While CVD is also the leading cause of death among diabetic patients, both diabetes and prediabetes are independent risk factors for CVD2, indicating the clinical significance of CVD in patients with prediabetes and diabetes.

The National Diabetes Statistics Report for 2020 estimates that 34.1 million Americans aged ≥ 18 years (13.0% of US adults) have diabetes, with approximately 26.8 million diagnosed and 7.3 million undiagnosed (source: https://www.cdc.gov/diabetes/php/data-research/index.html). Prediabetes, defined as elevated blood glucose levels but below the diabetic threshold, refers to a prelude to diabetes or a condition that indicates a high risk of developing diabetes. Although the annual conversion rate from prediabetes to diabetes varies from 5 to 15%, depending on population characteristics and prediabetes definitions used, up to 70% of people with prediabetes are expected to develop diabetes3. Because prediabetes and diabetes can quickly turn into CVD, a lot of progress has been made in the last few years in the use of drugs to treat diabetes. New drugs like sodium-glucose cotransporter-2 inhibitors and glucose-like peptide-1 receptor agonists not only help control blood sugar better but also lower the risk of cardiovascular events4. These developments highlight the changing landscape of diabetes care and the significance of taking these factors into account when assessing CVD risk in diabetic populations.

While diabetes and prediabetes are two distinct risk factors for CVD, they share some etiological factors4. Diabetes causes an increase in reactive oxygen species (ROS), promoting prothrombotic and inflammatory responses, hastening atherosclerosis, and developing macrovascular complications4. Consequently, individuals with diabetes have a high risk of developing CVD, such as myocardial infarction, stroke, and peripheral artery disease5. At the same time, diabetes affects patients with various CVDs6, Similarly, prediabetes is associated with an increased risk of diabetes and CVD in the general population7 and in patients with atherosclerotic CVD8. Despite having a pathology similar to that of diabetes, cardiovascular complications caused by prediabetes are frequently overlooked and undertreated.

CVD is a clinical disorder in which lifestyle choices, genetic predispositions, comorbid conditions, and inflammatory responses converge to affect cardiac and circulatory health. Demographic and socioeconomic variables, such as age, marital status, educational attainment, and income bracket, are closely related to CVD prevalence9,10,11. The condition further correlates with unhealthy habits, including sedentariness, physical inactivity, and excessive weight, along with detrimental practices, such as heavy smoking and alcohol consumption. Comorbidities, such as hypertension, sleep disorders, and gastrointestinal diseases, are associated with cardiovascular health12,13. Moreover, inflammation is increasingly recognized as a pivotal contributor to CVD pathogenesis. In contrast, lifestyle modifications, including regular physical activity, interruption of prolonged sedentary periods, and nutritious diet, exhibit protective and therapeutic benefits against diabetes and CVD, reducing recurrence and aiding health restoration14.

Predicting the risk of CVD in patients with diabetes and prediabetes can enable more effective management of these patients by tailoring therapy to those who are most at risk. However, understanding the multifactorial nature of CVD in patients with prediabetes and diabetes requires a nuanced and thorough approach. Therefore, we aimed to develop and validate a Boruta algorithm-based nomogram15,16 for a straightforward assessment of CVD risk in prediabetes and diabetes conditions. This tool is intended to facilitate prompt identification of risk factors, thereby streamlining early intervention strategies and rehabilitation efforts. However, the incorporation of risk factors into a single numerical estimate of CVD risk in patients with prediabetes and diabetes has not been thoroughly studied or validated. Our study was predicated on developing and validating two distinct nomogram models subsequently for accurately evaluating the potential for CVD within separate cohorts of prediabetes and diabetes survivors, enabling a more targeted approach to their clinical management.

Methods

Study design

This cross-sectional study used data from the National Health and Nutrition Examination Survey (NHANES), a nationally representative database that collects information on the health and nutritional status of adults and children living in the United States of America. The NHANES database is updated annually by the National Center for Health Statistics of the Centers for Disease Control and Prevention. The NHANES uses a sophisticated multistage probability sampling framework to meticulously select approximately 5000 participants every 2 years who undergo comprehensive personal interviews and detailed physical examinations. In this study, we methodically curated data from six survey cycles conducted between 2007 and 2018. After data cleaning and interpolation, we obtained the final data for analysis, which included 2294 prediabetic and 1037 diabetic survivors with CVD prevalence rates of 14.8% and 28.2%, respectively.

Data interpolation

This study addressed the inherent complexities of the NHANES, which employed a stratified multi-stage probability sampling design. To accurately handle the weighted nature of the NHANES database, our analyses were performed using the Jomo package, a comprehensive suite within the R statistical framework designed specifically for multilevel joint modeling and multiple imputation. We implemented a survey-weighted generalized linear model using a Bayesian model-fitting paradigm to account for the nuanced data structure for our interpolation efforts. A Gibbs sampling strategy was meticulously applied to generate imputed datasets, allowing the production of five distinct imputed datasets after a 1000-iteration burn-in period. This rigorous computational process was critical to ensure stochastic independence among the imputed sets, a necessary criterion for validating our subsequent statistical inference.

Participant consolidation and cohort assignment

As shown in Fig. 1, the initial aggregation yielded a comprehensive cohort of 59,842 participants (15,004 with prediabetes and 6384 with diabetes) after excluding those without complete CVD data (12,710 with prediabetes and 5347 with diabetes). This resulted in an eligible cohort of 2294 and 1037 patients with prediabetes and diabetes, respectively, for further analysis. A randomized allocation protocol was used to develop a predictive model with robust validation mechanisms. Three-quarters of the dataset was divided into a training cohort, allowing for the extensive development and refinement of predictive algorithms. The remaining quarter was designated as the validation cohort to ensure that the predictive capacity of the model was rigorously tested against an independent sample, thereby increasing the reliability and validity of the study results.

Fig. 1
figure 1

Selection of study participants.

Variables

@@@Definition of prediabetes, diabetes survivors, and CVD.

Prediabetes was defined as the presence of any of the following conditions: a fasting plasma glucose level of 100–125 mg/dL, a glycated hemoglobin (HbA1c) level of 5.7–6.4%, physician-diagnosed prediabetes, impaired fasting glucose, impaired glucose tolerance, borderline diabetes, or blood sugar level higher than normal but not high enough to be classified as diabetes or sugar diabetes17. Diabetes was defined as the occurrence of any of the following conditions: fasting glucose level of ≥ 126 mg/dL, HbA1C level of ≥ 6.5%, physician-diagnosed diabetes, injecting insulin, or intake of antidiabetic pills to lower blood sugar18. Patients with CVD were defined as those who had survived any type of diagnosed heart disease, such as stroke, congestive heart failure, coronary heart disease, angina/angina pectoris, or heart attack19.

@@@Definition of risk factors

The risk factors included in the study were comorbidities, health behaviors, complete blood count, and sociodemographic factors. The comorbidities included sleep disorders, stomach and intestinal illnesses, and hypertension. Answering "YES" to the following question indicated sleeping disorders: Have you ever told your doctor or another healthcare professional that you have trouble sleeping? Answering "YES" to the following question indicated stomach or intestinal illness: Have you had stomach or intestinal illness with vomiting or diarrhea within the last 30 days? Answering "YES" to the following question indicated hypertension: Have you ever been told by a doctor or another health professional that you have hypertension or high blood pressure?

Health behaviors included physical activity, sedentary time, body mass index (i.e., underweight, normal, overweight, obese, and morbidly obese), smoking (smokers vs. nonsmokers), heavy alcohol consumption, sleep duration, and oral health. The International Physical Activity Questionnaire was used to assess daily physical activities such as vigorous work or recreational activity (VPA), moderate work or recreational activity (MPA), and light physical activity (LPA) such as walking. The weekly physical activity volume was calculated as the metabolic equivalent of task (METs)-min per week by multiplying the MET value given to the VPA (8.0 METs), MPA (4.0 METs), and LPA (3.0 METs) by the minutes the activity was carried out, and again by the number of days that each activity was undertaken. Physical activity volume was classified as low (< 500 MET-min/week), moderate (500–1000 MET-min/week), or high (> 1000 MET-min/week)20.

Fasting blood samples were collected at NHANES mobile examination sites and sent to the National Center for Environmental Health and the Centers for Disease Control and Prevention. A Beckman Coulter MAXM instrument (UniCel DxH 800 analyzer, Beckman Coulter, Inc., Brea, California, United States) was used to perform a complete blood count (CBC), with CBC parameters calculated by counting, sizing, automatic dilution, and mixing of samples. The systemic immune inflammation index (SII) was calculated as the product of the platelet and neutrophil counts divided by the lymphocyte count. The systemic inflammatory response index (SIRI) was calculated as the product of monocyte and neutrophil counts divided by lymphocyte count. Pan immune inflammation (PIV) was calculated as the product of platelet, neutrophil, and monocyte counts divided by the lymphocyte count. The neutrophil/lymphocyte ratio (NLR) was calculated as the number of neutrophils divided by the number of lymphocytes. The platelet/lymphocyte ratio (PLR) was calculated by dividing the platelet count by the lymphocyte count. The monocyte/lymphocyte ratio (MLR) was calculated by dividing the monocyte count by the lymphocyte count. The aggregate index of systemic inflammation (AISI) was calculated by dividing the neutrophil, platelet, and monocyte counts by the lymphocyte count. The triglyceride-glucose index (TyG) was calculated using Ln [fasting triglycerides (mg/dl) × fasting blood glucose (mg/dl)/2]). White blood cell count, hematocrit, red cell distribution width, hemoglobin, mean platelet volume, albumin, alanine aminotransferase, aspartate aminotransferase, total bilirubin, uric acid, osmolality, lactate dehydrogenase, total cholesterol, high-density lipoprotein cholesterol (HDLC), and low-density lipoprotein cholesterol (LDLC) were all included. Detailed procedures for specimen collection and processing are available on the NHANES website (https://www.cdc.gov/nchs/nhanes/index.htm).

Finally, social and demographic factors included sex (female vs. male), race (non-Hispanic white people, non-Hispanic black people, Mexican Americans, other Hispanics, and other races), education (college graduate or above, less than college), marital status (married/living with partner, widowed/divorced/separated, never married), and family income to poverty ratio (> 4, 1–4, < 1).

Statistical analysis

Normally distributed data are represented using means and standard deviations, whereas non-normally distributed data are represented using medians and quartiles. Categorical variables are represented as counts and percentages (%), and group comparisons were performed using the chi-square test for categorical variables. Pearson’s correlation analysis was performed for all the variables.

We then used the Boruta feature selection algorithm to identify optimal predictors for CVD diagnosis. Multicollinearity was detected by calculating the variance inflation factor (VIF) and was removed by sequentially excluding variables with the largest VIF from the dataset until all variables had a VIF of < 4. The selected features were used to develop a web-based nomogram to assess the CVD risk. In the nomogram model, each risk factor was assigned a point based on the odds ratio (OR). The probability of CVD risk was calculated by adding the total score for each risk factor and plotting it against a total-point scale. The goodness of fit and stability of the nomogram model was tested using the Hosmer–Lemeshow test. The accuracy of the nomogram model was evaluated using the internal bootstrap method that involved 1,000 repeated random samples with replacements. The discriminative ability and predictive performance of the nomogram model were evaluated using receiver operating characteristic (ROC) curves and calibration plots. The clinical prediction utility of the model was evaluated using decision curve analysis (DCA) and clinical impact curves (CIC). All statistical analyses and graph visualizations were performed using R-4.3.2 for Windows.

Results

Descriptive statistics of study participants

Overall, this study enrolled 2,294 individuals with prediabetes and 1,037 individuals with diabetes who participated in the 2007–2018 NHANES. Patients with prediabetes were randomly assigned to either a training cohort (n = 1,719) or a validation cohort (n = 575). Similarly, patients with diabetes were randomly assigned to either a training cohort (n = 777) or a validation cohort (n = 260). Considering prediabetes, the prevalence of CVD was 14.8%, with 14.8% in the training cohort (n = 255) and 14.9% in the validation cohort (n = 86). Considering only diabetes, the prevalence of CVD was 28.2%, with 28.1% in the training cohort (n = 219) and 28.1% in the validation cohort (n = 73).

The descriptive statistics of prediabetes and diabetes cohorts according to the CVD status are presented in Table 1. Regarding demographic characteristics, individuals with prediabetes with CVD were older, more likely to be male and non-Hispanic white, had a lower income, were less likely to be married and living with a partner, were more likely to be obese, drank less, smoked less, had poor oral health, had less sleep time, had higher rates of sleep disorders, hypertension, and stomach intestinal illness, were less physically active, and were more sedentary than their CVD-free counterparts. Individuals with diabetes and CVD were older, predominantly non-Hispanic white, less likely to be married or living with a partner, more likely to be former smokers, less likely to be physically active, more likely to be sedentary, had less sleep time, and had higher rates of hypertension and sleep disorders than their CVD-free counterparts.

Table 1 Descriptive statistics of study participants by prediabetes and diabetes status.

The CVD risk predictors in prediabetes and diabetes cohorts are presented in Table 2. In the prediabetes cohort, CVD risk factors included age, smoking status, MLR, platelets, WBCs, red cell distribution width, lactate dehydrogenase (LDH), sleep disorders, and hypertension. In the diabetes cohort, the CVD risk factors were age, smoking, marital status, SIRI, NLR, red blood cell distribution width, LDH, HDLC, sleep disorders, hypertension, and physical activity. Additionally, the heatmap in Fig. 2 shows that the association between the variables was similar in the prediabetes and diabetes cohorts.

Table 2 Nomogram predictors score range by prediabetes and diabetes.
Fig. 2
figure 2

The correlation between variables in the prediabetes and diabetes cohorts.

Determination of CVD risk factors

Figure 3 presents the outcomes of Boruta feature selection. The Boruta algorithm is a feature selection technique that uses the Random Forest algorithm. The algorithm generates shadow features from the original features and combines them into a random-forest model. During the selection process, if the importance of the original feature outweighs that of the shadow feature, the feature is considered confirmed. Eventually, the important feature is marked in green (“confirmed”) for further analyses, while the excluded features are marked in red (“rejected”) or blue (“tentative”). In our study, based on Boruta feature selection, age, smoking status, SII, SIRI, NLR, MLR, AISI, platelet count, white blood cell count, red cell distribution width, LDH, sleep disorders, and hypertension were chosen to develop a nomogram model for the prediabetes cohort (Fig. 3a). Age, material status, smoking status, SIRI, NLR, red cell distribution width, LDH, HDL cholesterol, sleep disorders, hypertension, and physical activity were selected to develop a nomogram model for the diabetes cohort (Fig. 3b).

Fig. 3
figure 3

Boruta algorithm feature selection.

Elimination of multicollinearity

After feature selection, multicollinearity was eliminated. Multicollinearity pertains to the degree of information sharing among variables, which poses a challenge in accurately identifying the individual effects of each independent variable on the dependent variable. We evaluated multicollinearity by calculating the VIF and eliminated it by removing the variables with the highest VIF one by one until all of them had a VIF < 4. We excluded SII, SIRI, NLR, and AISI from the prediabetes model because they all had a VIF < 4 (Supplemental Table 1). Subsequently, we developed a web-based dynamic nomogram using the selected features.

Goodness of fit testing

The Hosmer–Lemeshow (HL) test, a critical method for assessing goodness of fit, was used to evaluate the performance and robustness of the nomogram model21. This test assesses the concordance between observed outcomes and predictions by dividing the predicted probabilities into deciles and comparing the observed frequencies of events within these deciles to the expected frequencies calculated from the model. The HL test yielded p-values of 0.9364 and 0.9475 for the training and validation sets, respectively, in the pre-diabetes prediction model. Concurrently, the diabetes prediction model had p-values of 0.6157 and 0.7271 for the training and validation sets, respectively. These p-values are significantly higher than the conventional threshold of 0.05, indicating a lack of substantial evidence to reject the hypothesis of an adequate model fit. These findings support the commendable calibration of the models, confirming that the predicted probabilities of CVD risk closely match the observed outcomes across the various stages of diabetes.

Construction of nomograms for predicting CVD risk

Figure 4 shows the construction of nomograms for predicting the risk of CVD in the prediabetes and diabetes cohorts. Each predictor received a score ranging from 0 to 100 points. The total points for each variable were added, and a vertical line was drawn downward from the sum to represent the likelihood of CVD. A higher score indicated an increased risk of CVD for the post-prediabetes and diabetes cohorts. Each predictor had a different score range depending on its level. We created a web calculator to make it easier and more intuitive to predict CVD risk probability (Prediabetes prediction calculator: https://zhaol2022713269.shinyapps.io/dynnomapp-1/, Diabetes prediction calculator: https://zhaol2022713269.shinyapps.io/dynnomapp-2/).

Fig. 4
figure 4

Nomogram models for predicting the probability of cardiovascular disease (CVD) risk in the prediabetes and diabetes cohorts.

The area under the ROC (AUC) was used to determine the predictive performance for CVD risk in two nomogram models. The nomogram for the prediabetes cohort was effective in detecting CVD, with an AUC of 0.800 (95% confidence interval (CI): 0.780–0.818, p < 0.001), a sensitivity of 82.75%, and a specificity of 63.87% in the training cohort (Fig. 5a). The validation cohort maintained predictive strength, resulting in a higher AUC of 0.842 (95% CI 0.809–0.871, p < 0.001), a sensitivity of 89.53%, and a specificity of 68.30% (Fig. 5b). Similarly, the nomogram for the diabetes cohorts showed a significant AUC of 0.779 (95% CI 0.748–0.807, p < 0.001), with a sensitivity of 68.95%, and a specificity of 71.86% in the training set (Fig. 5c). The nomogram demonstrated a moderate but significant AUC of 0.728 (95% CI 0.669–0.781, p < 0.001) in the validation cohort, with a sensitivity of 80.82% and specificity of 60.43% (Fig. 5d).

Fig. 5
figure 5

Receiver operating characteristic (ROC) analyses of the predictive nomograms for cardiovascular disease (CVD) risk in the prediabetes (a-b) and diabetes cohorts (c-d).

Performance assessment of the nomogram for predicting CVD risk

Internal validation using 1,000 bootstrap samples revealed that the nomogram model for prediabetes had a C-index of 0.800 in the training cohort with a mean absolute error (MAE) of 0.017 (Fig. 6a). The nomogram model calibration curve in the training cohort showed acceptable calibration, with a Brier score of 0.107, indicating good calibration, with no difference between the actual and predicted probabilities. Internal validation with 1,000 bootstrap samples revealed that the nomogram model for prediabetes had a C-index of 0.863 in the validation cohort and an MAE of 0.03 (Fig. 6b). The nomogram calibration curve in the validation cohort showed acceptable calibration with a Brier score of 0.095, indicating that the predicted probabilities were closely aligned with the actual CVD rates. In parallel, the nomogram for diabetes performed admirably, as evidenced by a C-index of 0.779 and an MAE of 0.026 in the training cohort (Fig. 6c), indicating significant predictive congruence. This was further supported by a Brier score of 0.162, indicating robust calibration of the model. The validation cohort yielded a consistent C-index of 0.780 and a slightly higher MAE of 0.058 (Fig. 6d), with the calibration curve achieving a Brier score of 0.159, confirming the fidelity of the predicted probabilities to the actual CVD rates.

Fig. 6
figure 6

Calibration curves of the nomograms for predicting cardiovascular disease (CVD) risk in the prediabetes (a,b) and diabetes cohorts (c,d).

DCA was used to assess the clinical applicability of the nomogram for prediabetes (Fig. 7a). The DCA in the training cohort (green line) showed that when the threshold probability is between 1 and 84%, the developed nomogram can provide a greater net benefit than intervention or non-intervention. The DCA in the validation cohort (red line) showed that the developed nomogram could provide a greater net benefit than intervention or non-intervention when the probability falls between 1 and 96%. Parallel evaluation of the nomogram for diabetes confirmed this pattern of clinical relevance (Fig. 7b). In the training cohort, the nomogram (green line) outperformed intervention or non-intervention across threshold probabilities ranging from 1 to 91%. The DCA of the validation cohort (red line) confirmed the applicability of the monogram, providing an increased net benefit across the entire range of threshold probabilities considered, from 1 to 93%.

Fig. 7
figure 7

Decision curve analysis and clinical impact curves of the nomograms for predicting cardiovascular disease (CVD) risk in the prediabetes (a) and diabetes cohorts (b).

Finally, combustion ion chromatography analysis was used to stratify CVD risk. We discovered a similar phenomenon in prediabetes and diabetes cohorts at a low-risk level. The number of patients with CVD based on the nomogram model differed from that of the true CVD population in the training cohort (Figs. 8a,c). However, at the low-risk threshold, the number of patients with CVD predicted by the nomogram model was similar to that of the true CVD population in the validation cohort (Figs. 8b,d). At the high-risk threshold, the CIC curves were more closely aligned with those of the training and validation cohorts. Overall, our findings suggest that our nomogram model is more clinically useful for predicting CVD in prediabetes survivors than intervention or non-intervention.

Fig. 8
figure 8

Clinical impact curve analysis for probability stratification of cardiovascular disease (CVD) risk in the training (ac) and validation cohorts (bd).

Discussion

This population-based cross-sectional study aimed to develop and validate nomogram models for predicting CVD risk in patients with prediabetes and diabetes. The optimal predictors used in the nomograms were selected using the Boruta feature-selection algorithm. The fitness and robustness of the models were validated by conducting the HL test. The predicted outcomes of the nomograms were in good agreement with the observed outcomes, as supported by the calibration curves, ROC curve analyses, and C-index values. The C-index value of the nomogram was approximately 0.9 for prediabetes and 0.8 for diabetes. The DCA results showed that the nomograms developed in our study have clinically relevant values for improving predictive precision and guiding clinical decision-making regarding CVD risk in patients with prediabetes and diabetes.

Our study findings revealed that both the prediabetes and diabetes cohorts shared some risk factors for CVD, including age, smoking status, red blood cell distribution width, LDH, sleep disorders, and hypertension. We somewhat expected the current findings, given the prevalence of shared risk factors among individuals with prediabetes, diabetes, or CVD. Overall, prediabetes and diabetes can contribute to CVD via various macrovascular complications, such as proinflammation, endothelial dysfunction, stroke, myocardial dysfunction, congestive heart failure, peripheral vascular diseases, and others22,23.

Although we used data from the USA for our study, it is critical to acknowledge the potential variations in cardiovascular disease (CVD) risk factors across different populations, including those in China and Korea. Genetic predispositions, dietary habits, lifestyle factors, and ethnic groups may contribute to these differences. For example, the dietary habits prevalent in East Asian populations, which often include higher intakes of fish, vegetables, and soy products compared to the Western diet, may contribute to ethnic differences in cholesterol levels and hypertension24. Furthermore, smoking rates, a significant CVD risk factor, vary markedly between regions, with traditionally higher prevalence among East Asian men relative to their Western counterparts25. Additionally, genetic polymorphisms affecting lipid metabolism may contribute to ethnic differences in the susceptibility to CVD26. Finally, while traditional CVD risk factors like smoking, blood pressure, and total cholesterol predict risk across ethnic groups, they do not fully account for ethnic differences in CVD27,28, because there have been reports of ethnic differences in visceral adiposity, insulin resistance, diabetes, and novel risk markers such as CRP, adiponectin, and plasma homocysteine. As a result, a future study that takes ethnicity into account will help improve risk prediction models in people with prediabetes and diabetes, as well as create and test ethnicity-specific models for predicting CVD risk29.

When comparing our model to well-established CVD risk prediction tools30,31, it becomes evident that traditional risk factors like age, smoking status, blood pressure, cholesterol levels, and BMI are typically emphasized in these tools. Furthermore, adding new markers like PLR, SIRI, RDW, and LDH to our model is based on new evidence from recent studies32,33 and makes it more accurate than traditional models, especially in high-risk groups like diabetics, where metabolic dysregulation is key for cardiovascular outcomes. Additionally, our findings enhance our understanding of CVD risk by emphasizing the importance of inflammatory indices. This is consistent with recent literature, which emphasizes the importance of inflammation in cardiovascular outcomes. For example, the link between SII and CVD has been extensively studied, highlighting the prognostic value of inflammatory markers in predicting cardiovascular events34,35.

Our CVD risk prediction nomograms have important practical implications in clinical settings. Compared to existing risk assessment tools recommended by the American Diabetes Association (ADA) and the American Heart Association (AHA)36, our model offers enhanced capabilities for early identification and personalized risk stratification of cardiovascular disease in patients with prediabetes and diabetes. By incorporating both traditional and novel risk factors, our model is consistent with the growing body of evidence supporting the inclusion of inflammatory and hematological markers in risk stratification frameworks. This comprehensive approach improves predictive accuracy while also aligning with the current trend toward more integrative and patient-centered care. Our nomogram's use in a variety of clinical settings may enable more effective prevention strategies and optimized therapeutic interventions for those at high risk of CVD, ultimately leading to better long-term cardiovascular health outcomes.

Currently, there are ten classes of orally available pharmacological agents to treat type-2 diabetes mellitus. They include sulfonylureas, meglitinides, metformin, thiazolidinediones, alpha-glucosidase inhibitors, dipeptidyl peptidase IV inhibitors, bile acid sequestrants, dopamine agonists, sodium-glucose transport protein 2 inhibitors, and oral glucagon-like peptide 1 receptor agonists37. All the medications have glucose-lowering effects as well as lipid-lowering effects via various mechanisms of action37. Some medications help to prevent the progression from prediabetes to diabetes and CVD38 and to reduce major adverse cardiovascular events in patients with prediabetes and diabetes, as well as new diabetes onset 39. In the larger context of diabetes treatment, recent years have seen significant advances that go beyond the development of new pharmacological agents. It includes novel approaches to personalized medicine, advancements in patient monitoring technologies, and the integration of multidisciplinary care teams, all of which help diabetic patients have better cardiovascular outcomes. For example, advances in continuous glucose monitoring systems40 and the use of artificial intelligence41,42 to personalize treatment regimens have improved glycemic control while reducing fluctuations that increase cardiovascular risk. Furthermore, studies have demonstrated that multidisciplinary care approaches, involving coordinated management by endocrinologists, cardiologists, and dietitians, enhance treatment adherence and risk factor management, including blood pressure and lipid levels43. People increasingly regard these non-pharmacological strategies as critical components of comprehensive diabetes care, supplementing the effects of medication. As a result, adopting a lifestyle that emphasizes a healthy diet and physical activity in addition to medications remains the foundation of management and can help prevent the progression from pre-diabetes to diabetes, as well as from diabetes to CVD44.

As we refine our understanding of these advances’ treatment in the diabetic population, it becomes essential to consider how each predictor contributes to the overall predictive model for CVD. By dissecting the impact of predictors in this study, we can better understand their roles and interactions within this evolving framework.

Our study findings show that age is associated with an increased risk of both prediabetes and diabetes and is an independent risk factor for CVD worldwide45. CVDs, including hypertension, ischemic heart disease, atrial fibrillation, and heart failure, become more common as the population gets older. Additionally, age is the most significant risk factor of sudden death due to arteriosclerotic heart disease46. As a result, many CVD risk prediction models have incorporated age as an independent risk factor47,48. Pathologically, aging results in ROS accumulated damage due to a decrease in the body's ability to scavenge them49. Aging is also associated with the gradual deterioration of homeostatic mechanisms50 and reparative capacities51, making older individuals especially vulnerable to the negative effects of chronic hyperglycemia, which is common in both prediabetes and diabetes. Prolonged exposure to high glucose levels promotes the production of advanced glycation end products, which worsens vascular damage by stiffening the collagen matrix and impairing nitric oxide-mediated vasodilation52.

Cigarette smoking is a leading cause of avoidable morbidity and mortality worldwide9. Our study findings show that the risk scores for current and former smokers are similar, which emphasizes the critical point that cessation lowers risk but does not eliminate the cumulative effect of previous smoking on vascular health53. Accordingly, previous studies have shown that even years after quitting, the vascular system retains the memory of smoking exposure, as evidenced by persistent endothelial dysfunction and a proatherogenic environment54. Additionally, current smoking poses an ongoing risk because it worsens endothelial dysfunction55, which is likely exacerbated by metabolic disorders such as prediabetes and diabetes conditions.

Red cell distribution width (RCDW) and LDH levels are risk factors for prediabetes, diabetes, and CVD. RCDW measures differences in erythrocyte volume and size and is an index of anemia because red blood cells carry oxygen from the lungs to every cell in the body. Thus, an increase in RCDW may indicate a broader state of hematological dysregulation56 and is linked to increased inflammatory reactions57, nutritional deficiencies58, and oxidative stress59, all of which are common in diabetes and prediabetes. Additionally, a substantial body of evidence suggests that an elevated RCDW value is linked to CVD conditions, such as acute myocardial infarction60, peripheral artery disease61, atrial fibrillation62, heart failure63, hypertension64, and ischemic cerebrovascular disease65.

LDH catalyzes the reduction and oxidation reactions between pyruvate and lactate, involving the transfer of electrons between the two species. Since LDH is normally found at low concentrations in the blood, elevated serum LDH has been used as a biomarker of CVD and arterial fibrillation in healthy Chinese individuals32, a biomarker of renal outcomes and CVD mortality in patients with diabetic kidney disease66, and a predictor of cardiac insufficiency in older patients with acute myocardial infarction67. Furthermore, as an inflammatory marker, LDH may be used as a biomarker for glucose monitoring in those with prediabetes and diabetes68,69. Taken together, high RDW and LDH values suggest a higher risk of prediabetes, diabetes, and CVD; however, it is unclear how they relate to the pathology of these conditions.

Sleep disorders and hypertension are important risk factors for CVD. Sleep problems have been linked to a higher risk of all-cause and specific CVD morbidity and mortality in the general population70,71 and patients with diabetes72,73. The adverse effects of sleep problems on CVD have been summarized in recent systematic reviews74 and systematic meta-analyses75. Several negative characteristics, such as inflammation, impaired glucose tolerance, hyperinsulinemia, and dyslipidemia, have been linked to the impact of sleep disorders on CVD risk in prediabetes and diabetes conditions76. Concurrently, it is well-established that hypertension is an additive risk factor for CVD in individuals with prediabetes and diabetes77. Poor glycemic control, oxidative stress, insulin resistance, and low-grade inflammation are metabolic complications that contribute to CVD risk associated with prediabetes and diabetes78. The mechanisms by which hypertension and diabetes contribute to CVD have been summarized in a previous review79.

Our findings showed that MLR, platelet count, and WBC count are unique predictors of CVD risk in the prediabetes cohort. In line with the current findings, a large body of evidence suggests that the MLR and WBC count are inflammatory markers that contribute to the pathology and progression of prediabetes, diabetes, or CVD. For example, MLR is a novel inflammatory biomarker associated with major complications of type 2 diabetes, such as diabetic neuropathy80, diabetic retinopathy81, diabetic kidney injury82, and diabetic foot ulcers83. MLR is also an independent predictor of major adverse cardiovascular and cerebrovascular events in patients with coronary artery disease who underwent percutaneous coronary intervention84. In middle-aged and older Chinese adults, elevated WBC count is associated with obesity-related metabolic complications, such as impaired glucose tolerance and elevated HbA1c levels85. In a 10-year longitudinal community-based Korean cohort study of non-obese adults, a high WBC count predicted the development of type 2 diabetes86. This was also linked to a higher risk of coronary heart disease in a Mendelian randomization study using data from a genome-wide association studies (GWAS) dataset of heart failure involving 47,309 cases and 930,014 controls87. Platelet-induced hyper-aggregation is likely to increase the risk of CVDs in patients with altered glucose homeostasis, including both prediabetes and diabetes88,89. Therefore, the inverse relationship between platelet counts and CVD risk in the prediabetes cohort in this study is unforeseen and has not been elucidated. It remains to be confirmed in future studies.

Additionally, our findings showed that marital status, SIRI score, NLR, HDL-C, and physical inactivity were CVD risk factors specific to the diabetes cohort. Consistent with current findings, the risk factors unique to diabetes are well-known risk factors for CVD. For example, marital status is a sociodemographic risk factor for diabetes and CVD. Living alone, such as being separated and/or being divorced or widowed, is linked to an increased risk of morbidity90 and mortality in those with diabetes91 and CVD92,93. Collectively, the findings from current and previous studies suggest that marital status may influence the relationship between diabetes and CVD94.

HDLC has a protective effect against CVD95 because of its role in reverse cholesterol transport96,97 and its anti-inflammatory properties98,99. Individuals with diabetes show reduced HDLC and impaired anti-inflammatory capacities. In a diabetic environment with insulin resistance, dyslipidemia, and high oxidative stress due to elevated blood glucose or impaired glucose metabolism, HDLC glycation reduces anti-inflammatory capacity and cholesterol efflux from peripheral tissues. Glycation reduces sphingosine-1-phosphate (S1P) in HDLC. Several studies have indicated that in diabetes, not only the quantity but also the quality of HDL cholesterol is critical100,101; glycated or otherwise modified forms of HDL may be less effective or even pro-inflammatory102,103. Moreover, diabetes can alter lipid metabolism, leading to a complex interplay between HDL levels and the risk of CVD.

Physical inactivity is associated with insulin resistance, dyslipidemia, and a pro-inflammatory state, which serve as harbingers for cardiovascular events104,105. Diabetes, a condition characterized by metabolic disturbances, intensifies the negative effects of low physical activity levels. Failure to engage in adequate physical activity may potentiate the already heightened risk of cardiovascular complications, as physical inactivity itself can lead to worsening insulin sensitivity and endothelial dysfunction and escalate the risk for thrombosis106,107. By contrast, physical activity, with its myriad physiological benefits, is the cornerstone for promoting cardiovascular health. Regular exercise modulates glucose metabolism, improves endothelial function, and decreases inflammation, and all these activities collectively reduce the risk of CVDs108,109,110,111. Clinical trials and population studies have consistently demonstrated the benefits of regular physical activity for diabetes management and CVD risk reduction109,112,113. The Diabetes Prevention Program (DPP) and subsequent follow-up studies have shown that lifestyle interventions, including physical activity, significantly reduce the incidence of cardiovascular events in individuals with diabetes114,115,116.

This study had some limitations. First, the cross-sectional nature of the study limits any causal relationship between the risk factors included in the models and CVD. Therefore, to predict the risk of CVD in patients with prediabetes and diabetes, a prospective cohort study is required to ascertain how these risk factors interact. Second, external validation is necessary to determine the reproducibility and generalizability of the prediction model for new and different patients. Third, the NHANES database has missing data entries, which could compromise the accuracy and reliability of the model, although we adopted a suitable approach to data interpolation. Lastly, while traditional CVD risk factors like smoking, blood pressure, and total cholesterol predict risk across ethnic groups, they do not fully account for ethnic differences in CVD27,117,118 because there have been reports of ethnic differences in visceral adiposity, insulin resistance, diabetes, and novel risk markers such as CRP, adiponectin, and plasma homocysteine. Therefore, a future study considering those ethnicity differences will help make risk prediction models work better in people with prediabetes and diabetes, as well as help create and test models that are specific to race for predicting CVD risk29.

In conclusion, our study is a pioneering endeavor in developing and validating nomograms to predict CVD risk in patients with prediabetes and diabetes. This dual nomogram model encompasses an array of predictive factors, including social and demographic characteristics, comorbidities, health behaviors, metabolic profiles, and inflammatory markers. The strength of our model is its ability to accurately predict CVD risk in patients with prediabetes and diabetes, indicating its significant clinical utility. Importantly, our research underscores the seamless transition from prediabetes to diabetes, highlighting the continuum of risk and the effectiveness of the model in identifying the increasing risk of CVD. The inclusion of variables across a spectrum of health determinants reinforces the comprehensive nature of our nomogram, which extends beyond traditional risk factors and offers a holistic assessment of CVD risk. Our findings have profound clinical implications, providing a tool for the early detection and stratification of CVD risk and facilitating timely interventions that could arrest or reverse the progression of cardiometabolic consequences. Across the continuum of glycemic control from prediabetes to overt diabetes, our models affirm the necessity for proactive health management strategies and demonstrate the potential for individualized patient care under the guidance of robust predictive analytics.