Introduction

Thyroid cancer is one of the most common endocrine malignancies. The incidence of thyroid cancer has substantially increased worldwide in the last several decades1,2,3. Differentiated thyroid carcinoma (DTC), including papillary thyroid carcinoma (PTC), follicular thyroid carcinoma (FTC), and Hürthle cell carcinoma, is the most common malignancy of the thyroid gland4. In terms of overall incidence, DTC accounts for 1%–1.5% of all cases of cancer, and its incidence is three times higher in female patients than male patients2,5,6.

Generally, DTC has an excellent prognosis because of its indolent features. According to a Surveillance, Epidemiology, and End Results report, patients with DTC have an overall survival rate of 90%–95%7. However, some clinicopathological risk factors, including age, tumor size, lymph node (LN) metastasis, and extrathyroidal extension (ETE), are associated with the aggressive nature and high recurrence of DTC8. The American Thyroid Association (ATA) management guidelines have proposed a clinicopathological risk stratification system that can be used for classifying patients as those who are at low, intermediate, or high risk. Using this system, several factors, such as ETE, LN metastasis, aggressive histology, vascular invasion, and multifocality were found to be associated with increased risk of recurrence9. However, sex is not considered a risk factor for recurrence.

DTC is more prevalent in women than men but the cause is not fully elucidated10. Li et al. reported that male patients had a 0.78-fold risk of developing malignancy and larger tumor size than female patients11. Moreover, male patients had a significantly higher prevalence of advanced-stage thyroid cancer, LN metastasis, and ETE12. Some studies have reported that sex, age, tumor size, aggressive histology, ETE, and LN metastasis were significant prognostic factors for thyroid cancer13,14,15,16,17. The impact of sex on the risk of DTC recurrence is controversial, and results pertaining to this concept vary in the literature14,16,18,19,20,21. Toniato et al. and several other studies have concluded that sex does not significantly contribute to recurrence and survival16,20. By contrast, Zahedi et al. studied patients with DTC who were > 18 years old and showed that the recurrence rate was higher in males than in females, a finding supported by other studies14,18,19. In another novel study, Choi H. et al. summarized secular trends by analyzing results to date and found that the poor outcome of PTC associated with men decreased over time, whereas aggressive pathological features remained the same or increased over time21.

Therefore, the current study aimed to compare the clinicopathological characteristics and long-term oncologic outcomes between females and males with DTC using propensity score matching to reduce selection bias.

Results

Comparison of clinicopathological characteristics between female and male patients before and after propensity score matching

Table 1 shows the baseline clinicopathological characteristics of female and male patients before and after propensity score matching. In terms of age, males were significantly younger than females (45.8 ± 11.8 vs. 46.8 ± 12.2 years, p = 0.017). The surgical extent was significantly more extensive in males than females (p = 0.001). Tumor size was significantly larger in males than females (1.1 ± 0.8 vs. 0.9 ± 0.7 cm, p < 0.001). The incidence of lymphatic and vascular invasions was significantly higher in males than females (34.3% vs. 24.6%, p < 0.001 and 4.0% vs. 1.9%, p < 0.001, respectively). Female patients had a significantly higher prevalence of ETE that male patients (5.7% vs. 3.9%. p = 0.022). Moreover, the proportion of harvested and positive LNs was higher in males than females (12.6 ± 16.7 vs. 10.7 ± 12.1, p < 0.001 and 3.1 ± 4.9 vs. 1.7 ± 3.4, p < 0.001, respectively). Males had significantly more advanced N and TNM stage than females (p < 0.001 and p = 0.021, respectively). A significantly higher recurrence rate was observed in males than females before matching (3.3% vs. 2.2%, p = 0.030).

Table 1 Comparison of clinicopathological characteristics between females and males before and after propensity score matching.

Propensity score matching yielded 1040 matched pairs of patients. There were no significant differences in clinicopathological characteristics, including the recurrence rate, between the matched groups (Table 1).

Univariate and multivariate analyses of the risk factors for recurrence before and after propensity score matching

Table 2 presents the univariate and multivariate Cox regression analysis results for the risk factors of recurrence before propensity score matching. Age ≤ 45 years (hazard ratio (HR) 1.841; 95% confidence interval (CI) 1.278–2.652; p = 0.001), ETE (HR 2.196; 95% CI 1.305–3.695; p = 0.003), vascular invasion (HR 2.504; 95% CI 1.286–4.876; p = 0.007), and number of positive LNs were considered to be significant predictors of recurrence (HR 1.062; 95% CI 1.041–1.083; p < 0.001). Among the various risk factors, N1a stage was the most significant predictor of recurrence (HR 3.159; 95% CI 2.015–4.955; p < 0.001).

Table 2 Univariate and multivariate analyses of risk factors for recurrence before propensity score matching.

After propensity score matching, the number of positive LNs (HR 1.193; 95% CI 1.129–1.260; p < 0.001) and N1a stage (HR 2.401; 95% CI 1.010–5.708; p = 0.047) were found to be significant predictors of recurrence (Table 3). In the Kaplan–Meier analysis, disease-free survival (DFS) was significantly differed between the two groups before matching (p = 0.021, Fig. 1). However, there was no significant difference in DFS between the two groups after matching (p = 0.492, Fig. 2).

Table 3 Univariate and multivariate analyses of risk factors for recurrence after propensity score matching.
Figure 1
figure 1

Disease-free survival curves of female and male groups before propensity score matching (log-rank p = 0.021).

Figure 2
figure 2

Disease-free survival curves of female and male groups after propensity score matching (log-rank p = 0.492).

Subgroup analysis between female and male patients aged between 20 and 45 years before and after propensity score matching

The baseline clinicopathological characteristics of female and male patients aged between 20 and 45 years are summarized in Table 4. Before propensity score matching, age was considered to be a significant factor. That is, males were significantly older than females (36.8 ± 5.2 vs. 35.8 ± 6.2 years, p = 0.001). Moreover, males underwent significantly more extensive surgeries than females (p = 0.001). The incidence of lymphatic invasion was significantly higher in males than in females (36.8% vs. 30.3%, p = 0.003). Male patients had a significantly higher proportion of harvested and positive LNs than female patients (13.9 ± 16.9 vs. 11.4 ± 13.0, p < 0.001 and 3.6 ± 5.3 vs. 2.2 ± 3.8, p < 0.001, respectively). N stage was significantly more advanced in males than females (p < 0.001). However, there were no significant differences in recurrence rates between the two groups (p = 0.086).

Table 4 Subgroup comparison between female and male participants aged 20–45 years before and after propensity score matching.

Propensity score matching yielded 515 matched pairs of patients. There were no significant differences in clinicopathological characteristics between the matched groups (Table 4).

Table 5 depicts the risk factors for recurrence among patients aged between 20 and 45 years after propensity score matching. Lymphatic invasion (HR 2.214; 95% CI 1.118–4.381; p = 0.023) and the proportion of positive LNs (HR 1.071; 95% CI 1.028– 1.115; p < 0.001) were considered to be significant predictors of recurrence.

Table 5 Univariate and multivariate analyses of risk factors for recurrence in patients aged 20–45 years after propensity score matching.

Subgroup analysis between female and male patients aged over 45 years before and after propensity score matching

Table 6 presents the baseline clinicopathological characteristics of females and males aged over 45 years before and after propensity score matching. The tumor size was significantly larger in males than in females (1.1 ± 0.9 vs. 1.0 ± 0.7 cm, p < 0.001). Males had a significantly higher prevalence of ETE than females (4.5% vs 7.5%, p < 0.024). The incidence of lymphatic and vascular invasions was significantly higher in males than in females (31.2% vs. 19.7%, p < 0.001 and 4.5% vs. 1.7%, p < 0.001, respectively). Males had a significantly higher proportion of positive LNs than females (2.4 ± 4.2 vs. 1.3 ± 2.6, p < 0.001). Further, male patients had a significantly higher T, N, and TNM stage than female patients (p < 0.001, p < 0.001 and p < 0.001, respectively). The recurrence rate was significantly higher in males than in females (2.9% vs 1.3%, p = 0.012). After matching, the recurrence rate did not significantly differ between the two groups (p = 0.301).

Table 6 Subgroup comparison between females and males aged > 45 years before and after propensity score matching.

The risk factors for recurrence in patients aged > 45 years after propensity score matching are shown in Table 7. Multifocality (HR 3.438; 95% CI 1.143–10.342; p = 0.028) and the proportion of positive LNs (HR 1.259; 95% CI 1.117– 1.429; p < 0.001) were found to be significant predictors of recurrence.

Table 7 Univariate and multivariate analyses of risk factors for recurrence in patients aged > 45 years after propensity score matching.

Discussion

DTC is the most common endocrine malignancy, and it usually has a favorable prognosis, with a 10-year disease-specific survival rate of > 90%2,7,22,23,24. However, some patients can develop local recurrence or distant metastasis to the lung and/or bone, resulting in death. This finding indicates that DTC can progress via different clinical pathways. Several clinicopathological factors, including age, tumor size, ETE, multifocality, LN metastasis, and histologic type, were found to be significant prognostic factors for DTC25,26,27,28. The ATA management guidelines have proposed a risk stratification system based on significant prognostic factors9. The American Joint Committee on Cancer/Union for International Cancer Control TNM staging system has been widely used for predicting the prognosis of DTC29. However, sex has not been considered a prognostic factor in the commonly used staging systems.

Male sex was associated with a lower DFS and disease-specific survival based on previous studies18,30,31,32,33. However, some reports showed that the overall survival was not affected by sex34,35. A prognostic factor correlated with sex is not a consistent finding, and the effect of sex on DFS has not yet been validated. Recently, Zahedi et al. reported that the risk of DTC recurrence is higher in men than in women18. Thus, this study aimed to evaluate the effect of sex on long-term prognosis of DTC.

This study compared the clinicopathological characteristics and long-term oncologic outcomes of male and female patients with DTC. Results showed that compared with females, males had significantly larger tumor sizes, higher prevalence of ETE and lymphatic and vascular invasions and greater proportions of positive LNs and TNM stage. This result is consistent with that of previous studies showing that men were more likely to present with more aggressive disease characteristics11,12,36,37. The recurrence rate was significantly higher in males than in females (3.3% vs. 2.2%, p = 0.030). However, this result may not be accurate with respect to disease severity.

In this study, the impact of sex was assessed in patients with DTC. However, a significant difference was observed between males and females in terms of several clinicopathological characteristics. That is, males were significantly younger than females. Furthermore, male sex was significantly associated with more aggressive tumor characteristics, including larger tumor size, higher incidence of ETE and lymphatic and vascular invasion, and greater TNM stage and proportion of positive LNs. Therefore, the findings of this study were affected by several confounding factors, including selection bias, between the two groups. To minimize the effects of these factors, propensity score matching analysis was performed to adjust for several clinicopathological characteristics. After matching, the recurrence rates between female and male patient did not significantly differ (2.5% vs. 3.0%, p = 0.591).

We performed multivariate Cox regression and Kaplan–Meier analyses to identify whether sex was independently associated with DFS before and after propensity score matching. Before matching, male sex was considered to be a significant predictor of recurrence (HR 1.552; p = 0.022) based on the univariate analysis. A statistically significant difference was also observed in the Kaplan–Meier analysis (log-rank test, p = 0.021). However, sex was not an independent prognostic factor for recurrence in multivariate analysis. Several studies have reported that male sex was not an independent risk factor for survival in DTC20,37. After matching, our analyses confirmed that sex was not a significant predictor of DTC recurrence.

It is difficult to explain why the DFS did not significantly differ between male and female patients with DTC in current study. However, this result may be attributed to the long-term decreasing prognostic potential of male sex. Choi et al. have assessed secular trends in the prognostic factors for PTC. They reported that the risk for poor outcomes associated with male sex decreased over time in PTC. However, the risk associated with pathologic characteristics remained the same or increased over time21. Further, another study has shown that young women had a better prognosis, and the outcomes were similar between men and those aged > 55 years33. Therefore, a subgroup analysis was conducted to evaluate the differences in prognosis between male and female patients according to age. The patients were divided into two groups, which were as follows: young age group (≤ 45 years) and old age group (> 45 years). We validated that sex was not an independent prognostic factor for recurrence in both age groups in the multivariate analysis after propensity score matching. The cause of the difference in DTC presentation between males and females is unclear. However, the increased number of screening tests might be correlated to these changes. In the past, men were less likely to visit hospitals than women; thus, they present with more aggressive stages of thyroid cancer. In addition, since a high proportion of women present with benign thyroid disease, such as thyroiditis, screening is faster due to serial follow-up. Therefore, women with thyroid cancer could be diagnosed at earlier stage. However, with the higher number of health examinations performed, the number of male patients with incidentally detected thyroid cancer has been increasing recently. Machens et al. emphasized the need for earlier diagnosis and intervention in men to validate whether male sex is an ominous prognostic factor of advanced-stage thyroid cancer12.

Another study showed that sex was an age-specific effect modifier for the incidence of PTC. Thus, the sex gap in the incidence of PTC decreased with age33,38. Since the proportion of elderly male patients is relatively higher, age might be an important confounding factor when conducting a comparison between two groups according to sex. Hence, sex might not be an independent prognosis factor39.

Based on the results of our research, large multicenter studies with propensity score matching on sex difference should be conducted to determine if our results are confirmed. That is, sex is not an independent prognosis factor. To date, most studies, such as that conducted by Zahedi et al., did not perform propensity score matching18. In addition, we might conduct a study with a subgroup analysis according to tumor size. Lee et al. showed that male sex is an independent prognostic factor for recurrence in PTC > 1 cm, but not in thyroid microcarcinoma19. By conducting an analysis using propensity score matching on whether sex has a tumor size-specific effect, further research can identify whether the management of males and females should vary based on tumor size.

This study had several limitations. That is, it was retrospective in nature. Moreover, there might have been selection bias because data were collected at a single tertiary institution and do not represent the entire patient population. However, propensity score matching was performed to adjust for differences in clinicopathological characteristics and to minimize bias. In propensity score matching analysis, data loss from 3486 patients should also be considered. Finally, the mean follow-up period was relatively short (99.9 ± 18.7 months). Hence, a longer follow-up is required to predict the prognosis of patients with DTC, as it has indolent characteristics. In addition, although all patients were followed up for the appropriate levothyroxine dose, RAI ablation, and serum Tg level, those parameters were not appropriately included in this study. Because these parameters are crucial and may affect the propensity score matching, more reliable results would have been obtained if the above parameters were included.

This study had several important strengths. Each patient was followed up, and a standardized laboratory and imaging protocol was used in a single institution. Moreover, this is one of the largest cohort studies conducted on this topic to date. To the best of our knowledge, few studies have used propensity score matching to evaluate the effect of sex on the long-term prognosis for DTC.

In conclusion, compared with females, male patients were more likely to present with more aggressive disease characteristics, including larger tumor size, higher prevalence of ETE and lymphatic and vascular invasions, greater proportion of positive LNs, higher grade of TNM stage, and higher recurrence rate. However, after propensity score matching, no significant difference was observed in the recurrence rates between the matched groups. Therefore, male sex was not found to be an independent prognostic factor for DTC recurrence. Further studies with larger cohorts should be conducted to validate this result.

Methods

Patients

We retrospectively analyzed 5827 patients with DTC who underwent thyroid surgery from January 2009 to December 2015 at Seoul St. Mary’s Hospital (Seoul, Korea). In total, 198 patients with incomplete data and 63 who were lost to follow-up were excluded from the analysis. The medical charts and pathology reports of 5566 patients were reviewed and analyzed. According to sex, patients were categorized into two groups (female and male groups). The mean follow-up duration was 99.9 ± 18.7 (range 66–137) months. This study was conducted in accordance with the Declaration of Helsinki (as revised in 2013) and was approved by the institutional review board of Seoul St. Mary’s Hospital, The Catholic University of Korea (IRB No: KC20RISI0829), The need for informed consent was waived due to the retrospective nature of this study.

Preoperative workup

The preoperative workup included physical examination, serum thyroid function test, neck ultrasonography (US), and computed tomography (CT) scan. Thyroid nodules were diagnosed based on preoperative US-guided fine needle aspiration cytology (FNAC). FNAC was performed by experienced radiologists. The diagnoses made by using FNAC were based on the Bethesda system40. To validate the size and location of tumors, presence/absence of ETE, LN status, and other abnormal findings in the neck, all patients diagnosed with DTC were evaluated via preoperative US and neck CT scan. If there were palpable LNs or suspicious LNs were found on preoperative US or neck CT, FNAC and washout thyroglobulin testing were performed. The surgical extent was decided in accordance with the ATA management guideline.

Postoperative management and follow-up

Postoperative care and follow-up were conducted according to the ATA management guidelines regardless of sex. Patients underwent physical examination, neck US, and serum thyroid function testing at 3–6-month intervals and annually thereafter. All patients took suppressive doses of L-thyroxine and were regularly followed up by physical examination, thyroid function testing, Tg and anti-Tg antibody concentration measurements, and neck US at 3–6-month intervals and annually thereafter. Radioactive iodine (RAI) ablation was performed at 6–8 weeks postoperatively, and whole body scans were performed at 5–7 days after RAI ablation in patients who underwent total thyroidectomy.

Those with recurrence or distant metastasis on routine follow-up evaluations underwent additional diagnostic imaging, including CT scan, positron emission tomography/CT scan, and/or radioactive iodine whole body scan, to determine the location and extent of suspected recurrence. In cases of suspected recurrence in the remnant thyroid, resection bed, or lymph nodes, diagnosis was confirmed via histologic examination using FNAC or a surgical biopsy specimen.

Primary and secondary endpoints

The primary endpoint was DFS between males and females after propensity score matching, and the secondary endpoint was the clinicopathological characteristic differences between males and females before and after propensity score matching.

Statistical analysis

Continuous variables were presented as means with standard deviations and categorical variables were presented as numbers with percentages. Student’s t-test and the Pearson’s chi-square test or Fisher’s exact test were used to compare continuous and categorical variables, respectively. Univariate and multivariate Cox regression analyses were performed to validate the predictors of DFS, and HRs with 95% CIs were calculated. DFS was compared by performing Kaplan–Meier survival analysis with the log-rank test.

To reduce the effects of selection bias and potential confounding factors, propensity score matching was performed using various clinicopathological characteristics: age, extent of operation, tumor size, ETE, lymphatic invasion, vascular invasion, harvested LNs, positive LNs, N stage, and TNM stage. Individual patient propensity scores were calculated via logistic regression analysis. After propensity score matching, the clinicopathological characteristics representative of long-term oncologic outcome and recurrence were compared between females and males. A p value of < 0.05 was considered statistically significant. All statistical analyses were performed by using the Statistical Package for the Social Sciences software for Windows version 23.0 (IBM SPSS Statistics for Windows, IBM Corp., Armonk, NY).