Introduction

Tuberculosis (TB), a global public health emergency, remains a significant threat to human health, and has a high mortality rate among infectious diseases1,2. Spinal tuberculosis (STB), the most common form of extrapulmonary tuberculosis, accounts for 50–70% of osteoarticular tuberculosis cases3. STB occurs when Mycobacterium tuberculosis travels through the bloodstream to the spine4. The early clinical signs of spinal tuberculosis are atypical, and as the condition progresses, it results in the destruction of bone and the spinal cord, leading to spinal cord injury and kyphosis5,6. Spinal cord injury (SCI) is one of the most common and serious complications in STB patients, and cause motor and sensory dysfunction, and even paraplegia, significantly affecting the physical and mental health of individuals7. Despite substantial efforts in diagnosing and treating spinal tuberculosis6, the prevention of complications in STB patients, especially those with SCI, remains challenging due to the presence of drug-resistant bacteria and late-stage detection8. An effective and concise prediction of SCI in STB patients is essential for establishing appropriate treatment plans and helping family members make informed decisions. However, predicting SCI in STB patients is challenging due to the complexity and variability of the syndrome, which involves various risk factors.

Machine learning (ML) algorithms are becoming increasingly crucial in various scientific domains9. The ML model, a subset of artificial intelligence, found applications in fields such as medicine, pharmacy, biology, and others10,11,12,13. Recently, there has been growing interest in using ML algorithms to study STB. For instance, Shuo D et al. accurately distinguished STB and spinal metastases on the basis of deep learning algorithms14. Li Z et al. developed a diagnostic model of STB from CT images and spinal metastases using deep learning algorithms15. They also developed a diagnostic model for STB using CT image features and deep learning. Moreover, several risk factors associated with STB have been identified through ML algorithms16,17,18. While several ML models have been employed to predict STB and identify risk factors, there is a lack of predictive models for SCI relying on ML algorithms. Therefore, there is an urgent need to establish a predictive model that healthcare professionals can trust to effectively predict SCI in patients with STB.

Our research aimed to create a practical model for predicting SCI in STB patients. To achieve this goal, multiple machine learning algorithms were utilized to develop a predictive model based on clinical data from patients with STB across three different hospitals.

Materials and methods

Patients

A review and analysis of pertinent medical data from 373 patients with STB at the First Affiliated Hospital of Guangxi Medical University, spanning from June 2012 to June 2021 were conducted to construct and validate the prediction model. Additionally, data from 100 STB patients at Bai Se People’s Hospital, Bei Jing Ji Shui Tan Hospital Gui Zhou Hospital, from July 2021 to January 2023 were collected to form a prospective cohort to test the prediction model. The inclusion criteria were as follows: (1) Patients with a confirmed diagnosis of STB. (2) Patients with no history of SCI resulting from other diseases. (3) Patients with no history of hematological system diseases. (4) Patients with complete clinical information. The exclusion criteria included were as follows: (1) Post-operative pathological diagnosis that did not confirm STB. (2) Complications with other diseases leading to SCI. (3) Complications with tumors, hematological system diseases, or immune system disorders. (4) Availability of only fragmentary information. Ethical approval for this study was obtained from the Ethics Committee at the participating hospitals (Supplementary Materials).

Data gathering

The data, which included clinical characteristics and results from laboratory examinations, were gathered from patients who were admitted for the first time. General information about the patients, such as their age, gender, body mass index (BMI), presence of diabetes and hypertension, American Spinal Injury Association (ASIA) grade, oswestry disability index (ODI) scores, Japanese Orthopedic Association (JOA) score, and visual analog scale (VAS) rating, were collected. ASIA, ODI, JOA, and VAS scores were evaluated by two experienced specialists. The laboratory parameters consisted of the white blood cell (WBC)count, neutrophil count (NEU), lymphocyte count (LYM), monocyte count (MONO), C-reactive protein (CRP), erythrocyte sedimentation rate (ESR), hemoglobin (HGB), platelet(PLT), albumin (ALB), total protein(TP), aspartate aminotransferase (AST), alanine transaminase (ALT), urea, serum creatinine (Scr), and uric acid (UA) levels.

Prediction model development, validation, and testing

After exclusions, a total of 329 patients from the First Affiliated Hospital of Guangxi Medical University were included in a retrospective cohort to create and validate the predictive model. Additionally, 80 patients from two other hospitals were included in a prospective cohort to test the model. In each cohort, all patients with complicated SCI (ASIA: A, B, C, D) were categorized as the SCI group, while the rest were classified as the No-SCI group. To further assess the severity, ODI, JOA, and VAS score were compared between groups.

The detailed processes of model construction were as follows: Dataset Partitioning and Data Imbalance Assessment: 329 patients were randomly divided in the retrospective cohort into a training set (n = 246) and a validation set (n = 83). Additionally, a total of 80 patients were defined in the prospective cohort as the test set. The ratio of the SCI group (n = 141) to the NO-SCI group(n = 105) in the training set was 1.34:1, indicating the absence of data imbalance issues19. Screening of characteristic indicators: Clinical characteristics and laboratory parameters with significant statistical differences were identified (p < 0.05) between groups through univariate analysis conducted in the training set. Systematic Analysis of multiple machine learning classifiers: Using the selected indicators, ten supervised ML classifiers were employed to construct prediction models. These classifiers included decision tree (DT), random forest (RF), Xtreme Gradient Boosting (XGBoost), least absolute shrinkage and selection Operator (LASSO) regression, support vector machine (SVM), multilayer perceptron (MLP), light gradient boosting machine (LightGBM), K-nearest neighbor (KNN), logistic regression, and stacking ensemble learning. To enhance the models, we performed hyperparameter tuning using grid search for each classifier. Grid search, a hyperparameter tuning technique, systematically explores a predefined set of hyperparameter values to identify optimal combinations for machine learning models. This process entailed creating a grid of hyperparameter values, where each point represents a unique combination. Subsequently, the model was trained and evaluated for each combination, with recorded performance metrics.The model was then trained and evaluated for each combination of hyperparameters, and the performance metrics were recorded20. To account for variations in model performance due to random data splits, a fivefold cross-validation procedure was performed in the training cohort21. Model Performance Evaluation and Optimal Model Selection: Model performance was evaluated using the receiver operating characteristic (ROC) curves, area under the curve (AUC), and precision-recall (PR) curves for each model22,23. Additionally, calibration curve analysis and decision curve analysis (DCA) were conducted to assess the robustness and clinical applicability of each model24,25. Based on the AUC, calibration curve analysis, and DCA results, the optimal model was determined. Subsequently, the AUC result was calculated for the test set to assess the performance of the optimal model. Model Agnostic Language for Exploration and Explanation (DALEX) Package: The DALEX package was used to explain the optimal model, quantify the contribution of each indicator to the predictive model and rank the importance of each feature. Moreover, the SHapley Additive eXplanation (SHAP) method was employed for single-sample prediction and interpretation26,27. Finally, the model was deployed on the web by an R Shiny app.

Statistical analysis

The data analysis was conducted using SPSS (IBM version 26.0) and R statistical software (version 4.2.2). For continuous variables with a normal distribution, the t test was utilized, and the results are presented as mean ± standard deviation (SD). Continuous variables with a non-normal distribution were assessed using the Mann–Whitney U test, and the results are displayed as the medians (percentiles). Categorical variables were examined using either the chi-square test or Fisher’s exact test, and the outcomes are expressed as numbers (percentages). We considered statistical significance at a p-value less than 0.05.

Ethics statement

Approval has been attained for the studies involving human respondents by the Ethics Department of Guangxi Medical University’s First Affiliated Hospital, the Ethics Department of Beijing Ji Shui Tan Hospital Guizhou Hospital and the Ethics Department of Baise People's Hospital.

Consent form

Informed consent was obtained from all patients icluded in this study and/or their legal guardians. Following the requirements of national legislation and institution, informed consent was obtained from all participants and/or their legal guardians. All experiments and methods were performed in accrodance with relevat named guidelines and regulations.

Result

Patient clinical characteristics and laboratory results

A total of 473 patients with STB were included from three medical institutions. After exclusions, we obtained a retrospective cohort consisting of 329 cases, including 246 cases in the training cohort and 83 cases in the validation cohort. Additionally, 80 patients in the testing cohort were included in the prospective cohort (Fig. 1). As demonstrated in Table 1, significant differences were found in age (p = 0.0003), complicated hypertension (p = 0.0006), CRP (p = 0.002), NEU (P = 0.0002), LYM (p = 0.0006), MONO (p = 0.0009), HGB (p = 0.0007), PLT (p = 0.0003), ESR (p = 0.0004), ALB (p = 0.0005) in the training set, while most indicators in validation and testing set had no difference (Supplementary Tables 1, 2). Notably, significant differences in MONO were observed in all the sets. Furthermore, as shown in Fig. 2, in the retrospective cohort and the prospective cohort, the ODI and VAS scores were significantly greater in the SCI group, while the JOA score was lower. These findings indicated that patients in the SCI group had a greater disease severity.

Figure 1
figure 1

The flowchart of this study.

Table 1 Baseline characteristics of STB patients with and without SCI in training set.
Figure 2
figure 2

The severity of STB patients between two groups. (A) The differences in ODI, JOA, VAS scores between two groups in the retrospective cohort. (B) The differences in ODI, JOA, VAS scores between two groups in the prospective cohort. ODI oswestry disability index, JOA Japanese Orthopedic Association, VAS visual analog scale, STB spinal tuberculosis, SCI spinal cord injury. ***p-value < 0.001.

Feature screening

To enhance the models’ performance, univariate analysis was employed to examine the variables. As depicted in Table 1, significant differences in age, complicated hypertension, NEU, LYM, MONO, CRP, ESR, HGB, PLT, and ALB were identified between the SCI group and the No-SCI group in the training set. These 10 clinical features were subsequently utilized to construct the predictive model.

Comprehensive analysis of multiple machine learning algorithms

We trained ten different models, namely, DT, RF, XGBoost, LASSO regression, SVM, MLP, LightGBM, KNN, logistic regression, and stacking ensemble learning. Subsequently, the models were evaluated using AUC values. The results indicated that in the training cohort, LightGBM, RF, and the stacking ensemble learning model performed the best22, with RF achieving the highest AUC in the validation cohort (Fig. 3A and B). The results of the precision-recall curves demonstrated that in the training cohort, the LightGBM, RF, stacking ensemble learning, and KNN models outperformed the other models, while in the validation cohort, the LightGBM, SVM, stacking ensemble learning, and RF models outperformed the other models (Fig. 3C and D). Moreover, decision curve analysis (DCA) and calibration curve analysis were conducted to assess the clinical efficacy of each machine learning model. DCA revealed that RF, LightGBM, and KNN were better suited for clinical application (Fig. 3E). Consistently, RF, LightGBM, and KNN demonstrated higher accuracy according to the calibration curves (Fig. 3F). Based on these results, we conclude that RF can be considered as the best performing model.

Figure 3
figure 3

Comprehensive analysis of mutiple ML algorithms. (A) ROC and AUC value of ML models in training set. (B) ROC and AUC value of ML models in validation set. (C) PRs of ML models in training set. (D) PRs of ML models in validation set. (E) DCA of ML models in validation set. (F)The calibration curve of ML models in validation set. ML machine learning, ROC receiver operating characteristic curves, AUC area under the curve, PRs precision-recall curves, DCA decision curve analysis, DT decision tree, RF random forest, XGBoost Xtreme gradient boosting, LASSO least absolute shrinkage and selection operator, SVM support vector machine, MLP multilayer perceptron, LightGBM light gradient boosting machine, KNN K-nearest neighbor, bs brier score.

Optimal model establishment and assessment

The prediction model was constructed using the random forest (RF) algorithm. In Fig. 4A, the AUC values for the training set (AUC = 0.858), the validation set (AUC = 0.769), and the testing set (AUC = 0.816) are displayed. The model was considered successful since the AUC value in the testing set was higher than that in the validation set. Furthermore, the learning curve indicated a favorable fit and stability between the training and validation sets (Fig. 4B)28. In addition, based on our calibration procedures, we observed consistent performance of the RF model's probability outputs across different probability levels, with the calibration curve closely resembling the ideal 45-degree diagonal line (Fig. 3F). The results of statistical metrics indicated that the RF model's Brier score (0.193) reached reasonable levels, further confirming the effectiveness of our model's probability calibration. Therefore, the RF model is a useful approach for predicting SCI in patients with spinal tuberculosis (STB).

Figure 4
figure 4

Random forest model assessment. (A) ROC and AUC value of random forest model in training, validation and testing set. (B) Learning curve. ROC receiver operating characteristic curves, AUC area under the curve, RF random forest.

Prediction model interpretation and deployment

We employed the DALEX R package to elucidate how the selected parameters predict the progression of spinal tuberculosis and assess their importance in the model. In Fig. 5A and B, the rankings of importance of the ten features are presented, with MONO emerging as the most significant factor for SCI in STB patients. To enhance the interpretability of the model, two representative samples were provided using the SHAP model. One sample was from an STB patient without SCI (Fig. 5C), while the other belonged to the SCI group (Fig. 5D). Finally, Fig. 6 shows the predictive model constructed via the web (http://127.0.0.1:7806).

Figure 5
figure 5

The model interpretation. (A,B) Feature importance ranking contribute to the model. (C) The model interpretation in one patient without SCI by SHAP. (D) The model interpretation in one patient with SCI by SHAP. NEU neutrophil count, LYM lymphocyte count, MONO monocyte count, CRP C-reactive protein, ESR erythrocyte sedimentation rate, HGB hemoglobin, PLT platelets, ALB albumin, Hypertension, combined hypertension.

Figure 6
figure 6

The model deployment. NEU neutrophil count, LYM lymphocyte count, MONO monocyte count, CRP C-reactive protein, ESR erythrocyte sedimentation rate, HGB hemoglobin, PLT platelets, ALB albumin, Hypertension, combined hypertension.

Discussion

STB tends to affect younger individuals and is not uncommonly associated with SCI, which can lead to disability and even death29,30. The progression of STB is a major contributor to severe spinal complications and poses a significant challenge for achieving positive outcomes in STB patients. Therefore, timely identification of patients at risk of experiencing spinal cord injury and identification of key factors involved in disease progression are crucial in clinical practice. In this study, we identified important clinical and laboratory examination characteristics and employed a beneficial ML-based model to predict the occurrence of SCI in STB patients. We believe that ML-based models are valuable tools in clinical practice because they are noninvasive, rapid, and user-friendly, aiding in the prediction of spinal cord injury in STB patients.

SCI results from the progression of STB. Given the low early detection rate and poor prognosis of STB patients, there is a growing focus on identifying biomarkers for the occurrence and development of this disease16,17,31,32. Several clinical indicators related to the progression and prognosis of STB have been investigated. Immune cells, key players in immune function, play a crucial role in the progression of STB. For instance, Yao Y et al. identified two subphenotypes of spinal tuberculosis of varying severity using unsupervised machine learning and found significant differences in the infiltration levels of immune cells (lymphocytes, monocytes, neutrophils), which are related to SCI33. A multicenter study demonstrated that the monocyte-to-lymphocyte ratio (MLR) and neutrophil-to-lymphocyte ratio (NLR) were notably greater in patients with active tuberculosis than in those with latent tuberculosis34. Additionally, well-known indicators,such as ESR and CRP concentration, are commonly used to evaluate the degree of inflammation in patients with tuberculosis35 and are associated with the prognosis of STB36,37. Nutritional status is also a vital factor affecting the progression and prognosis of spinal tuberculosis patients, and research has shown that the serum ALB concentration is an important laboratory marker for predicting SCI and prognosis in STB patients38. In this study, we ultimately selected ten clinical parameters, including age, NEU, LYM, MONO, CRP, ESR, HGB, PLT, ALB, and complicated hypertension, to establish the predictive model. The results consistently indicated that patients in the SCI group were more likely to be older and to have complicated hypertension, NEU, MONO, CRP, ESR, and PLT than were those in the No-SCI group, while LYM, HGB, and ALB were more common in the No-SCI group.

Moreover, based on the importance ranking of the ten clinical factors, MONO was identified as the most significant predictor. Recent studies have suggested a close association between the dysregulation of monocytes and disease progression. A retrospective study demonstrated that a high monocyte-to-lymphocyte ratio (MLR) was closely linked to the severity and occurrence of SCI in patients with spinal tuberculosis39. Another recent study reported that elevated monocyte levels contributed to the progression of tuberculosis40. Furthermore, a single-center retrospective study revealed that a high monocyte-to-lymphocyte ratio was a risk factor for clinical progression in patients with pulmonary Mycobacterium avium complex disease41. Additionally, a growing body of research indicates that monocytes and macrophages play crucial roles in the prognosis of non-infectious diseases34,42. Interestingly, the results of the present study align with the findings of the aforementioned studies, revealing a markedly elevated level of monocytes in the SCI group. Therefore, monocytes may be a crucial factor in the progression of spinal tuberculosis and may be involved in the development of spinal cord injury in STB patients.

Monocytes, which originate from monocyte precursors in the bone marrow, are recruited to infection sites. These cells differentiate into macrophages when they respond to antigens to defend against infections43. These macrophages are part of the mononuclear phagocyte system and play crucial roles in defense mechanisms, tissue development, and maintaining the body’s balance44. The abilities of these bacteria to engulf foreign particles and kill bacteria, as well as produce inflammatory cytokines, are important for supporting adaptive immune responses. Monocyte-macrophages, acting as a “double-edged sword,” are essential for defending the body against pathogenic infections, but their hypersensitive reactions can lead to damage to normal tissues and organs during infections45. Additionally, they can play both protective and pathogenic roles in various diseases46, which depend on the surrounding environment that regulates their phenotype and function. There are two main subtypes of macrophages: (1) Classically activated M1 macrophages, which typically produce pro-inflammatory cytokines such as TNF-α, IL-1β, IL-12, and IL-23, promoting local inflammation and helping eliminate pathogens, virus-infected cells, and transformed cells. (2) M2 macrophages generally produce anti-inflammatory cytokines such as IL-10 and TGF-β to reduce local inflammation. They have decreased antigen presentation capability, limited oxidant production, and increased production of anti-inflammatory cytokines, which helps prevent excessive tissue damage. The balance between M1/M2 macrophages in an organ during inflammation or injury can determine its fate. When this balance is disrupted, macrophages can contribute to tissue damage and necrosis. STB is a chronic infectious disease, and local immune status is a significant factor in Mycobacterium tuberculosis survival and tissue destruction. M1 macrophages, which produce high levels of inflammatory cytokines and proteolytic enzymes in the context of chronic inflammation, can contribute to spine malformation and SCI. Previous studies conducted by our team revealed a significant increase in M1 macrophages in STB patients47. Therefore, it is reasonable to consider that monocytes and macrophages may be significant factors contributing to SCI in patients with STB.

Despite numerous key factors being associated with SCI in patients with spinal tuberculosis, no predictive model has been developed. ML-based predictive models are gaining popularity due to their precision and are increasingly applied in spinal diseases treatment48. In this study, ten meticulously chosen features were employed to construct predictive models. To establish the reliability of our findings, ten machine learning algorithms were utilized in the creation of these models. Our comprehensive evaluation of the results, considering measures such as the AUC, precision-recall curves, decision curve analysis, and calibration curves, indicated that the random forest (RF) model outperformed the other nine models. The learning curve also indicated that the RF model exhibited effective performance, underscoring the clinical value of ML models in predicting spinal cord injury in patients with spinal tuberculosis. Furthermore, additional clinical data were collected to validate the model externally, ensuring its generalizability and reproducibility, which are essential for translating our results into clinical practice.

In recent years, the introduction of advanced machine learning models, particularly black box models, has significantly propelled the state-of-the-art in various domains49. Models like deep neural networks and ensemble methods have demonstrated unparalleled performance in tackling complex tasks. However, their widespread adoption has given rise to a critical concern—the inherent opacity in comprehending their decision-making processes. This opacity results from the intricate interplay of features and parameters, posing a challenge for human comprehension. Not only does this opacity hinder interpretability, but it also raises crucial issues related to trust, accountability, and ethical considerations. In response to these challenges, there has been a growing emphasis on the development and adoption of explainable artificial intelligence (XAI) methods, such as Local Interpretable Model-agnostic Explanations (LIME) or SHapley Additive exPlanations (SHAP)50. In our study, to provide a more comprehensive understanding of the predictive model and address the opacity of black-box models, we employed the DALEX R package to assess feature importance. This analysis revealed key indicators associated with spinal tuberculosis. Additionally, we utilized SHAP to interpret single-sample predictions of the model, aiming to enhance transparency and interpretability in our findings.

While the results show promise, it is important to acknowledge several limitations in this study. First, despite being a multicenter study, the sample size was relatively small, which could introduce bias into several of the results. One potentially effective strategy is the application of generative methods for data augmentation. By employing techniques such as rotation, scaling, and flipping, we aim to artificially create additional training samples, thereby expanding the dataset. This approach has been widely utilized in the literature, as evidenced by the work mentioned in reference51,52, while other approaches include transfer learning, active learning, adversarial training, and so on. Second, to enhance the accuracy and performance of the model, it would have been beneficial to include more favorable clinical indicators related to the prognosis of spinal tuberculosis, such as radiomic features. Third, even though prospective studies were conducted to enhance the reliability and generalizability of our findings, data collection uncertainties in prospective cohorts from different regions may lead to unavoidable bias. Finally, the potential molecular mechanisms underlying the key factors for determining the prognosis of spinal tuberculosis patients have not been elucidated.

Conclusions

In summary, this study successfully developed a valuable predictive model for spinal cord injury in patients with spinal tuberculosis. This model was created using a combination of multiple machine-learning algorithms and data from multiple clinical centers. Furthermore, we established a personalized risk assessment tool for spinal cord injury in spinal tuberculosis patients. Finally, we deployed the model on the web. Notably, monocytes may play a key role in the development of spinal cord injury in these patients, according to the variable importance ranking. This research offers an efficient and rapid approach for frontline clinicians and patients to predict the risk of spinal cord injury in patients with spinal tuberculosis and provides valuable guidance for clinical decision-making.