Introduction

Pancreatic cancer is among the group of malignant tumors of the digestive tract that mainly originate from the ductal epithelium and alveolar cells of the pancreas. Approximately 95% of these tumors are pancreatic ductal carcinoma (PDAC), which has the fourth-highest mortality rate among cancers1. According to a global report in the year 2020, there were 495,773 newly diagnosed cases of pancreatic cancer and 466,003 fatalities. Notably, the incidence of pancreatic cancer was almost equivalent to the observed mortality rate2. Pancreatic cancer is projected to become the second most common cause of cancer-related deaths worldwide by 20303,4. Despite significant advances in diagnosis and treatment, the prognosis of PDAC remains poor, with a 5-year survival rate of only 8–9%5,6. Surgical resection remains the only curative approach, but is often associated with poor outcomes and many postoperative complications7. Several factors, including age8, tumor stage, tumor size, lymph node metastasis, and treatment modalities, have been identified to impact the survival of PDAC patients. Notably, marital status has been shown to be an independent prognostic factor for perioperative and long-term survival among pancreatic cancer patients9. However, previous studies failed to consider the influence of confounding factors. Therefore, it is essential to account for confounding factors and thoroughly examine the association between marital status and the prognosis of PDAC in clinical practice.

Marriage, as a social phenomenon, represents a form of social support that bears great significance in the lives of human beings. Previous studies indicates that one's marital status can considerably influence their physical and mental well-being, including but not limited to cancer incidence and prognosis10. Marital support, emotional, financial stability and access to healthcare resources have been proposed to significantly influence cancer outcomes for married patients. In contrast, unmarried individuals are more likely to experience high levels of stress, social isolation, and lack of support, which may lead to poorer survival rates among cancer patients. Previous research has demonstrated that unmarried individuals have a higher risk of mortality in several types of cancers, including non-small cell lung cancer11, breast cancer12, laryngeal squamous cell carcinoma13, duodenal adenocarcinoma14, and esophageal cancer15, among others.

In this study, we utilized the SEER database to analyze the marital status of PDAC patients at the time of diagnosis and employed propensity score matching (PSM) to investigate the potential association between marital status and prognosis of PDAC. Additionally, we leveraged machine learning techniques to predict the survival time of married patients with PDAC.

Materials and methods

Ethics approval and consent to participate

The SEER database provides publicly available data for this study, which means that obtaining informed consent from participants or ethical approval from an institutional review board is not necessary. We obtained access to the 1979–2019 SEER Research Data File by signing a Data-Use Agreement that outlines the terms and conditions for access.

Data source and patient selection

We utilized the SEER*Stat software (Version 8.4.0.1) to gather comprehensive data submitted to SEER until November 2021. In order to obtain patients with primary pancreatic site, we implemented the International Classification of Diseases for Oncology (ICD-O-3) topographical codes (C25.0-C25.3, C25.7-C25.9). We included patients diagnosed with ICD-O-3 histology/behavior codes of 8140/3 (adenocarcinoma) or 8500/3 (infiltrating duct adenocarcinoma) as part of our inclusion criteria. On the other hand, patients with pancreatic Islets of Langerhans (C25.4) tumor origin were excluded. Also, patients with missing/unknown/undifferentiated data on marital status, 6th AJCC stage and T/N/M stage, race, tumor differentiation, treatment information and the cause of death, as well as those with unknown or less than 1 month survival time were eliminated.

Following the application of our selection criteria, we identified a sum of 24,044 pancreatic ductal adenocarcinoma (PDAC) patients to serve as pertinent subjects for this investigation. Two patient clusters were then defined according to the marital status, namely, the married or unmarried group. A detailed flow-process diagram representation of our rigorous screening procedure is provided by Fig. 1.

Figure 1
figure 1

The flow-process diagram for selecting patients based on inclusion and exclusion criteria.

Variable classification

Our analysis incorporated a range of factors from the database such as sex, age at diagnosis, marital status, race, grade, TNM stage (6th), and primary site surgery. Age was dichotomized into two groups: those aged < 50 years and those aged ≥ 50 years. With regard to marital status, we distinguished participants as either married or unmarried groups based on their recorded statuses at the time of diagnosis. The unmarried group was composed of those who were divorced/separated, single, or widowed.

Outcome measurement

In our study, we operationalized overall survival (OS) as the interval from the date of diagnosis to either the date of patient's decease or the last recorded follow-up instance if still alive. Similarly, cancer-specific survival (CSS) was gauged by determining the duration from the date of diagnosis to the date of death attributable solely to PDAC.

Statistical analysis

To minimize potential confounding variables between married and unmarried patients, we gathered data on potential covariates such as sex, age, race, grade, TNM stage (6th), and primary site surgery for 1-to-1 propensity score matching (the nearest-neighbor method with a stringent caliper of 0.001), utilizing the R package of MatchIt. We utilized the chi-square test to assess differences in categorical variables and estimated OS and CSS by generating survival curves using the Kaplan–Meier method. Through the implementation of log-rank tests, we evaluated survival comparisons between distinct groups. To investigate possible prognostic factors and examine the hazard ratios, we employed both univariate and multivariable Cox proportional-hazards regression models.

With the aim of establishing a machine learning model, patients within the married group were partitioned into a training set and a test set at random, at an 8:2 ratio. Within the training set, we developed the K-nearest neighbor, artificial neural network, Naïve Bayes, and random forest models aimed at predicting the 5-year CSS and OS of married patients with PDAC. K-nearest neighbor (KNN) is a non-parametric algorithm that classifies or predicts outcomes based on the majority class or average of ‘k’ closest data points in the feature space. Artificial Neural Network (ANN) is composed of interconnected nodes (neurons) organized in layers, designed to learn and make predictions by adjusting the weights of connections during the training process. Naïve Bayes is a probabilistic algorithm that leverages Bayes' theorem, assuming independence between features, to calculate the likelihood of a particular class based on the observed data. Random Forest (RF) in machine learning prediction models is an ensemble learning method that constructs multiple decision trees during training and outputs the mode of the classes (classification) or the mean prediction (regression) of the individual trees for robust and accurate predictions.

All statistical analyses were carried out using R software (version 4.1.3) and SPSS software (version 25) with statistical significance set at two-sided P < 0.05.

Results

Pathological features and baseline characteristics

The SEER database provided data on a total of 206,968 PDAC patients for potential inclusion in our study. Ultimately, 24,044 individuals were deemed suitable after a series of screening procedures, as delineated in Fig. 1. Notably, of the eligible patients, 15,024 (62.49%) were classified as married and 9,020 (37.51%) as unmarried. Additional details regarding pathological features are elaborated in Table 1. Moreover, after executing the primary comparisons, significant differences were noted between the married and unmarried cohorts with regard to sex, race, TNM stage, and surgery status, with all values recorded as P ≤ 0.001 (Table 1).

Table 1 Baseline characteristics of patients patients with PDAC based on marital status.

The primary comparison assessed the impact of marital status on OS and CSS

In univariate Cox regression analysis, mortality rates associated with PDAC were demonstrated to be significantly linked with seven variables, including sex, age, race, grade, TNM stage, primary site surgery, and marital status for both OS and CSS (P < 0.05; Table 2). Upon conducting multivariate Cox regression analysis to further investigate survival factors, we found that marital status, as well as sex, race, grade, TNM stage, and surgery status, emerged as independent prognostic factors that significantly influenced OS and CSS outcomes in patients with PDAC (P < 0.001; Table 3).

Table 2 Univariate analysis to assess the impact of marital status on OS/CSS in PDAC.
Table 3 Multivariate analysis to assess the impact of marital status on OS/CSS in PDAC.

The secondary comparison assessed the impact of marital status on both OS and CSS

To eliminate for potential confounding variables such as age, sex, and race between the married and unmarried groups, we employed the 1:1 propensity score matching method. After matching, 8043 married patients and an equal number of unmarried patients (for a total of 8043 individuals) were successfully enrolled. Notably, the baseline characteristics were found to be well-balanced between the two groups (Table 4; Fig. 2), and no significant differences were observed (P > 0.05).

Table 4 Baseline characteristics of patients patients with PDAC based on marital status after propensity-score matching.
Figure 2
figure 2

Propensity score matching for married and unmarried groups.

The findings indicate that, with the exception of race, all baseline characteristics were significant predictors of both OS and CSS (Table 5). In the univariate analysis after propensity-score matching, being unmarried (with reference to married) remained a statistically significant predictive risk factor of death (OS: HR = 0.870, 95% CI = 0.842–0.898, P < 0.001; CSS: HR = 0.882, 95% CI = 0.853–0.912, P < 0.001). Upon subjecting relevant variables to further multivariate analysis, all components maintained independent significance in predicting OS/CSS with the exception of sex. Moreover, unmarried status (with reference to married) exhibited a noteworthy negative influence on survival outcomes (OS: HR = 0.834, 95% CI 0.808–0.861, P < 0.001; CSS: HR = 0.845, 95% CI 0.817–0.873, P < 0.001; Table 5). It is worth noting that patients diagnosed prior to age 50, those with stage I cancer, well-differentiated tumors, and those who had undergone surgery were observed to be more likely to experience an improvement in both OS and CSS compared to their respective reference groups (Table 5).

Table 5 Univariate and multivariate analysis of the impact of marital status on survival outcomes in PDAC.

The Kaplan–Meier curves presented in Fig. 3 indicate that unmarried individuals have a significantly lower survival rate than married individuals (P < 0.001). To further investigate the prognosis of different unmarried statuses, we grouped unmarried patients into separated/divorced, single, and widowed subgroups. As shown in Fig. 4, we found that there was a significant difference between their OS/CSS and different marital statuses (P < 0.001).

Figure 3
figure 3

Kaplan–Meier survival curves of PDAC patients between married and unmarried groups. (A) Overall survival. (B) Cancer-specific survival.

Figure 4
figure 4

Kaplan–Meier survival curves of PDAC patients among married, separated/divorced, single and widowed. (A) Overall survival. (B) Cancer-specific survival.

In the secondary comparison, we utilized a forest plot to evaluate the impact of different kinds of unmarried statuses versus married status. As illustrated in Fig. 5, separated/divorced patients (OS: aHR = 1.134, 95% CI 1.082–1.189, P < 0.001; CSS: aHR = 1.119, 95% CI 1.066–1.175, P < 0.001), single patients (OS: aHR = 1.142, 95% CI 1.091–1.196, P < 0.001; CSS: aHR = 1.140, 95% CI 1.087–1.195, P < 0.001), and widowed patients (OS: aHR = 1.319, 95% CI = 1.261–1.377, P < 0.001; CSS: aHR = 1.291, 95% CI 1.233–1.352, P < 0.001) exhibit poorer survival outcomes relative to married patients. Additionally, we observed that widowed patients have the highest risk of death among the three unmarried statuses (Figs. 4 and 5).

Figure 5
figure 5

The impact of marital status on CSS and OS in the secondary comparison. Circles represent the aHRs with the 95% CIs indicated by horizontal bars.

Machine-learning based outcome prediction in patients who married

To explore the factors that influence the survival of married patients with PDAC, we utilized age, sex, race, tumor differentiation, TNM stage, and surgery status as input parameters for developing machine learning prediction models of the 5-year CSS and 5-year OS. The performance metrics of the algorithms for the four models are presented in Table 6. Among the machine learning models, the random forest model exhibits superior discrimination performance. For predicting the 5-year CSS, the random forest model achieves an AUROC of 0.734, accuracy of 0.592, recall of 0.552, specificity of 0.806, precision of 0.939, and F1 score of 0.695. The 5-year OS results are 0.795, 0.572, 0.536, 0.940, 0.989, and 0.695 for AUROC, accuracy, recall, specificity, precision, and F1 score, respectively. Artificial neural network, naïve bayes, and k-nearest neighbor follow with AUROCs of 0.788, 0.771, and 0.708, respectively. Receiver operating characteristics (ROC) curves and AUROCs of the four models are displayed in Fig. 6. By using GridSearch, the hyperparameters of the optimal random forest model were: N_estimators = 100, Max_depth = 10, Min_samples_leaf = 2, Min_samples_split = 4, Max_features = auto (Table S1).

Table 6 Discrimination tests of four machine learning models for predicting 5-year CSS and 5-year OS.
Figure 6
figure 6

Comparison of receiver operating characteristics (ROC) curves between the four machine learning models for 5-year CSS and 5-year OS prediction.

The calibration curves demonstrated an excellent agreement between predictions and observations (Fig. 7). For predicting the 5-year CSS, the k-nearest neighbor, artificial neural network, naïve bayes and random forest models gave brier scores of 0.125, 0.118, 0.134, and 0.118, respectively. Similarly, while the 5-year OS, brier scores of 0.080, 0.073, 0.106, and 0.072 were obtained using the same models, as outlined in Table 7.

Figure 7
figure 7

Calibration curves for testing the stability of four prediction models. The logical calibration curve is shown in solid blue, and the statistics are displayed in the top left corner of each graph.

Table 7 Calibration tests of four machine learning models for predicting 5-year CSS and 5-year OS.

In this study, the clinical effectiveness of four predictive models was assessed using decision curves and clinical impact curves. The DCA curve (Fig. 8) indicated that the random forest model had a greater net benefit compared to the "treat none" or "treat all" schemes across a threshold probability range of 0.6 to 1.0. Further, the random forest model exhibited superior clinical impact when compared to the other models. Notably, when the threshold probability was set above 75% (Fig. 9), the number of positive cases predicted by the models (i.e., those at high risk) was closely matched the number of true-positive cases (i.e., those who actually had high-risk outcomes). Considering all four evaluation metrics, it can be concluded that the random forest algorithm performed the best for prediction purposes and could offer more precise and systematic treatment guidance and support to married patients with PDAC.

Figure 8
figure 8

Decision curve analysis of eight prediction models.

Figure 9
figure 9

Clinical impact curve analysis of eight prediction models.

Discussion

Marital status has been shown to be associated with survival in chronic diseases such as cancer, with married individuals having a longer life expectancy and better quality of life in various diseases. For instance, Cheng Xu et al. used a matching method to discover that married patients had better 5-year CSS/OS than unmarried patients with NPC from 1973 to 201216. Gino Inverso et al. also observed a significant protective impact of marital status on metastatic oral cancer and laryngeal cancer17. However, while studies have confirmed that marital status is a prognostic factor for pancreatic cancer using SEER database, the available studies have failed to exclude confounding factors9,18,19. Previous studies have found that sex, age, stage, race20 and surgery21 were associated with the survival of PDAC patients. Therefore, to improve comparability between married and unmarried patients, we conducted a 1:1 propensity matching using the SEER database to screen eligible patients with PDAC, resulting in relatively reliable results based on well-matched datasets. As well as a larger sample size, our study could provide more robust results compared with previous studies. Male sex, age over 50 years, higher TNM stage, worse tumor differentiation, and no surgical treatment were determined to be risk factors for prognosis, and married patients had better survival outcomes; however, their unmarried counterparts had significantly poor OS/CSS.

The current study found that marital status plays a significant role in PDAC patients, and we suspect that several possible reasons exist. First, marriage provides positive social support. It has been found that widowed, divorced, and separated individuals lack legal relationships, partner support and help during diagnosis and treatment, and are hence at a higher risk of psychological distress22. Similarly, relative to married patients, unmarried patients are more likely to experience negative emotional states for a prolonged period due to the absence of social support and partner companionship, which could lead to physiological dysfunctions resulting from long-term exposure to glucocorticoids and catecholamines, negatively affecting the tumor microenvironment and tumor growth, migration and stimulating angiogenesis, thus affecting the prognosis23. Healthy marital status plays an essential role in establishing a good psychological state, reducing negative emotions, such as anxiety and depression, and improving survival rate10,24. Secondly, stable marital relationships are typically associated with higher economic status, and family members such as spouses and children may provide financial and spiritual support for long-term treatment25. In other words, stable marital status can improve patient compliance with the treatment regimen comparatively26. In addition, married people with a good economic base are more likely to purchase health insurance and can receive some Medicaid at the time of diagnosis27. Furthermore, studies have found that patients with private health insurance are found in a greater proportion of early stages of cancer, have longer survival time and better prognosis. Patients without private health insurance, on the other hand, are usually detected at an advanced stage of cancer and have a poor prognosis28. Thirdly, married individuals typically adopt healthier lifestyles, with better diets, more exercise, and less substance abuse, contributing to better healthy outcomes. It has been shown that bad habits such as smoking and alcoholism are risk factors for the development of pancreatic ductal carcinoma, while unmarried people are more likely to be infected26. Lastly, in daily life, family members of married patients are more likely to detect early symptoms of PDAC, resulting in early detection and diagnosis and positive impacts on disease treatment.

Recently, machine learning has been widely used in the medical field29. Some researchers have developed prognostic prediction models for pancreatic cancer using machine learning methods, because machine learning algorithms are more accurate than traditional statistical methods in predicting survival outcome in the fifth year30. Specifically, in this study, the k-nearest neighbor, artificial neural network, naïve bayes, and random forest algorithms were used to predict the 5-year CSS and OS for married patients with PDAC. The results indicate that the random forest algorithm outperformed the other models in predicting 5-year CSS/OS, especially in its good discriminative performance and its AUROC value was high, indicating that the model could better distinguish between lives and deaths. Furthermore, since random forest has good generalization capability, it can avoid the overfitting issue. Moreover, the random forest model stands out in prognostic prediction tasks due to its superior predictive accuracy, robustness to noisy data, interpretability through feature importance analysis, capacity to handle non-linearity, generalization to unseen data, stability, and ease of implementation. Our study presents the first predictive model based on machine learning algorithms that predicts the survival impact of married patients with PDAC, which demonstrates excellent performance and provides doctors with an easily accessible and more accurate survival prediction tool for married patients with PDAC, which may guide clinical practice better. It must be admitted that with the rapid evolution of machine learning, particularly in deep learning, ensemble methods, and reinforcement learning, have led to models with increased predictive power. Therefore, we believe that our prediction model would be improved with the development of machine learning and provide more accurate prediction.

Although marriage has positive impacts on cancer outcomes, it is vital to note that not all marriages are beneficial for health. Marital conflict and stress may lead to negative effects on health, including increased risk of depression, anxiety, and heart disease. Thus, future studies should investigate the quality of marital relationships and the impact of marital therapy on cancer patient health outcomes.

Limitations

Although the results of this study indicates that marital status is a significant prognostic factor for PDAC, it is not without limitations.

Firstly, the SEER database contains limited clinical characteristics of patients and lacks critical risk factors such as tobacco smoking, alcohol consumption, type 2 diabetes, chronic pancreatitis, and family history of pancreatic cancer31. The lack of accurate screening before data matching may result in biased conclusions. Particularly, diabetes confers a 3.05-fold increased risk of PDAC onset in diabetic individuals compared to non-diabetic individuals32. Secondly, the absence of data on the quality of life of patients in the SEER database, such as socioeconomic level and living environment, the quality of life of patients were not available for inclusion in our analysis. Thirdly, the marital status extracted in this study was recorded only at diagnosis, and dynamic follow-up surveys assessing changes in marital status during PDAC treatment were not taken. This may pose information bias on clinical outcomes. We cannot understand marital status during the later treatment of patients, which may have some information bias. Fourthly, our classification of PDAC patients living together with partners but not legally married as single patients may underestimate survivorship outcomes among this group, which may be better than that of unmarried or single patients. Despite the limited proportion of such patients, they may impact the conclusions of this study. Finally, this study's generalizability may be limited to the population under investigation, as cultural variations, disparities in living standards, and economic differences between countries may influence the applicability of these findings to patients in other regions.

Conclusion

Our study provides evidence that marital status is an independent prognostic factor for PDAC. Future studies should investigate the mechanisms behind this association and the impact of marital quality and therapy on cancer outcomes. We established machine learning predictions about the survival of married patients with PDAC, with the RF model performing best.