Introduction

Pertussis, also known as whooping cough, is a highly contagious and vaccine-preventable disease. Despite high vaccination coverage, pertussis is still a significant threat to vulnerable populations, particularly in low- and middle-income countries1. Relevant research data revealed 24.1 million pertussis cases and 160,700 deaths from pertussis in children < 5 years old in 2014. In developing countries, the average case‒fatality ratio is estimated to be 4% in infants < 12 months old and 1% in children 1 to 4 years old; moreover, pertussis might account for 1% of the mortality of children < 5 years old2. However, incomplete historical vaccine coverage combined with waning immunity induced by vaccination or previous infection has led to an increased number of people becoming susceptible to pertussis over time and the potential for resurgence of the disease at any time3. While approximately 4000 pertussis cases were reported annually in the US in the 1980s, the number of reported cases increased to 25,827, 27,500, and 48,277 in 2004, 2010, and 2012, respectively4. According to the recent data, Australia and the United Kingdom also experienced rising incidence rates of pertussis after 20225, and in Spain, the incidence in the last 3 months is 75 times higher compared to the same period in 2023, indicating a confirmed epidemic outbreak of pertussis6. This phenomenon has aroused worldwide concern and was defined as the resurgence of pertussis. According to the surveillance data from the Chinese Center for Disease Control and Prevention, 9611 cases of pertussis were reported from January to December in 2020, and as many as 39,781 cases from January to December in 2021, while the number was only 4475 cases in 2019. Seroepidemiology survey evidence suggested that this phenomenon was likely to be seriously underestimated7,8. Therefore, preventing pertussis is an important global public health problem. The early identification and appropriate treatment of pertussis are the keys to preventing its widespread spread, and pertussis is currently a hot topic of common concern.

However, there are not enough reliable criteria for the clinical case definition of pertussis. It has been reported that the clinical case definitions of both the World Health Organization (WHO)2 and Global Pertussis Initiative9 have limitations, and neither sensitivity nor specificity could achieve good diagnostic efficacy10,11,12. Thus, early and accurate diagnosis of pertussis remains a major challenge for clinicians at present. A meta-analysis reported that peripheral blood leucocyte ratios might be useful infection biomarkers for diagnosing bacterial and viral diseases13. In 2017, Ganeshalingham et al.14 first reported that the neutrophil-to-lymphocyte ratio (NLR) was closely associated with pertussis in infants, an NLR greater than 1 may indicate life-threatening pertussis in infants. The area under the curve (AUC) of the receiver operating characteristic curve (ROC) for the multivariable prediction model of malignant pertussis with these variables (white blood cell count and neutrophil-to-lymphocyte ratio) was 0.96 (95% CI 0.91–1.00)15. These studies demonstrated that the peripheral blood parameters had the potential clinical utility in the diagnostic prediction of malignant pertussis. However, the potential diagnostic value of peripheral blood parameters in children with pertussis has not been examined. Therefore, in this study, we retrospectively selected the clinical data of children with suspected pertussis from Zigong (China) to develop and validate a diagnostic prediction model for pertussis based on blood parameters to provide a reference for early identification of pertussis in the future.

Materials and methods

According to the surveillance data from the Chinese Center for Disease Control and Prevention, China had experienced a resurgence or outbreak of pertussis since 2019. We selected the participants who were all suspected cases of pertussis and who aged less than 14 years on admission at Zigong First People’s Hospital (Zigong Academy of Medical Sciences) from January 2020 to December 2021 (n = 500) according to the WHO definition 2 for suspected case of pertussis. All participants underwent polymerase chain reaction test for pertussis in nasopharyngeal swabs or upper respiratory tract aspirates. Participants who had duplicate admission information (n = 9), cough duration more than 3 months (n = 5), congenital heart disease (n = 1), and incomplete blood count data (n = 8) were excluded. Finally, a total of 477 suspected pertussis children were ultimately included (Supplementary Fig. S1). The study was approved by the Ethics Committee of the Zigong First People’s Hospital [Ethics (Research) No. 63, 2022]. Informed consent was obtained from the legal guardians. The study was also performed in accordance with the ethical standards established in the 1964 Declaration of Helsinki and its later amendments or comparable ethical standards.

Suspected pertussis cases were identified by the WHO definition for case finding (a suspected case was a person of any age with a cough lasting ≥ 2 weeks or of any duration in an infant or any person in an outbreak setting, without a more likely diagnosis and with at least one of the following symptoms, based on observation or parental report: paroxysms (fits) of coughing, inspiratory whooping, post-tussive vomiting or vomiting without other apparent cause, apnea (only in < 1 year of age) or clinician suspicion of pertussis) 2.

Hospital demographics, immunization status, epidemiological history, complete blood count (first blood parameters on admission), C-reactive protein (CRP) level and Bordetella pertussis polymerase chain reaction (PCR) results were reviewed. The neutrophil-to-lymphocyte ratio (NLR), lymphocyte-to-monocyte ratio (LMR), platelet-to-lymphocyte ratio (PLR), platelet-to-mean platelet volume ratio (PLT-MPV-R), and platelet distribution width-to-mean platelet volume ratio (PDW-MPV-R) were calculated simultaneously based on complete blood count parameters. PCR was used as the gold standard test for diagnosing pertussis, and the children were divided into pertussis and non-pertussis groups according to PCR records.

Based on an expected sensitivity and specificity of 50% and a prevalence of 5%, the minimum sample needed was 96. Descriptive epidemiological methods were used to retrospectively analyze and summarize the demographics, immunization status, epidemiological history, complete blood count and CRP level of the children. IBM SPSS Statistics software 22.0 was used to analyze all the data. The normally distributed data are expressed as x ± s, and a t test was used for comparisons between groups. The median M (Q1, Q3) was used for non-normally distributed data, and the Wilcoxon rank sum test was used for comparisons between groups. Enumeration data are presented as percentages (%). Categorical variables were analyzed by the chi-square test. A P value < 0.05 was considered to indicate statistical significance.

The random number table method was used to extract 75% of the data as the training cohort and the remaining 25% of the data as the validation cohort. R 4.2.3 software was used to develop a pertussis diagnostic prediction model based on the training cohort and validate the model in the validation cohort. The blood parameters of pertussis and non-pertussis children in the training cohort were compared by univariate analysis. The statistically significant parameters in the training cohort were analyzed via stepwise (forward) regression to determine the independent factors associated with pertussis. These factors were subsequently used to develop a diagnostic prediction model. A nomogram that is widely used as a prognostic device in medicine16 was developed in R software to visualize the data analysis results. To reduce the complexity of the models as much as possible, the number of variables in the model was gradually reduced based on the maximum points assigned to the variable in the nomogram, and the best prediction model was determined by the AUC and net reclassification improvement (NRI)17,18. Typically, the AUC greater than 0.7 is used to suggest a reasonable estimation. The NRI is an alternative method to the AUC for assessing improvements in the diagnostic prediction of a new model. R software was used to calculate the NRI. The NRI = 0 indicated no improvement in the new model. If the NRI > 0, the improvement in the new model was considered to be positive; otherwise, it was considered to be negative. Finally, ROC curve analysis, calibration plots and decision curve analysis (DCA) were used to evaluate the discrimination, calibration and clinical net benefit of the diagnostic prediction model.

Ethical approval

The study was conducted according to the guidelines of the Declaration of Helsinki, and approved by the Ethics Committee of the Zigong First People's Hospital (protocol code Ethics (Research) No. 63, 2022).

Informed consent

Informed consent was obtained from the legal guardians involved in the study.

Results

Univariate analysis

A total of 299 patients were confirmed to have pertussis by PCR, and 178 patients were confirmed to have non-pertussis (Supplementary Fig. S1). There were no significant differences in age, sex or epidemiological history between the pertussis group and the non-pertussis group. The prevalence of vaccine coverage (84.95% vs. 91.57%, chi-square [χ2] = 4.451, P = 0.035) and sources from urban areas (61.20% vs. 73.03%, χ2 = 6.921, P = 0.009) in the pertussis group was significantly lower than that in the non-pertussis group. The complete blood count or ratio and CRP level of these children are summarized in Table 1. The children were randomly divided into a training cohort and a validation cohort at a ratio of 3:1. The training and validation cohorts were also comparable in terms of the data shown in Table 1 (P > 0.05).

Table 1 Comparison of demographic and blood parameters of the children.

Stepwise regression analysis

Further statistical analysis revealed statistically significant differences in blood parameters, especially blood cell ratios and CRP levels, between the pertussis group and the non-pertussis group in the training cohort (Table 2). Significant factors (P < 0.05) according to univariate analysis of the training cohort were analyzed via stepwise regression. The stepwise entry probability was 0.05, and the stepwise removal probability was 0.1. The stepwise (forward) regression analysis showed that white blood cell (WBC) [P = 0.024, odds ratio (OR) = 1.062, 95% confidence interval (CI) 1.008–1.119], hematocrit (HCT) [P = 0.006, OR 1.124, 95% CI 1.035–1.220], lymphocyte (LYMPH) [P = 0.002, OR 1.026, 95% CI 1.010–1.042], CRP [P = 0.014, OR 0.948, 95% CI 0.909–0.989], and PDW-MPV-R [P = 0.000, OR 44.641, 95% CI 8.926–223.265] were independent factors associated with pertussis (Table 3). The variance inflation factor (VIF) values were all < 2, indicating that no collinearity existed between screened variables.

Table 2 Comparison of demographic and blood parameters in training set.
Table 3 Stepwise regression analyses on variables associated with pertussis.

Development of diagnostic prediction model

The WBC (×109/L), HCT (%), LYMPH (%), CRP(mg/L), PDW-MPV-R were used to develop the nomogram for pertussis via R 4.2.3 software (Fig. 1a). To simplify the model as much as possible, the number of variables in the model was gradually reduced based on the corresponding number of maximum points assigned to the variable in the nomogram. The variables were sorted according to the maximum points assigned to them in the nomogram as the order of PDW-MPV-R>CRP>WBC>HCT>LYMPH (Fig. 1a). Models A (including PDW-MPV-R, CRP, WBC, HCT, and LYMPH), B (without LYMPH), C (without HCT and LYMPH), D (including PDW-MPV-R and CRP), and E (including only PDW-MPV-R) were subsequently developed. To determine the best diagnostic prediction model, we first compared the models by using the AUC value. Compared with those of Model A, the AUC values of Model B (P > 0.05), Model C (P > 0.05) and Model D (P > 0.05) gradually decreased without statistical significance. Model E was rejected because the AUC was < 0.7 (P < 0.01). Then, we used the NRI to evaluate the models again. Model C, which was the best-fitting diagnostic prediction model when compared with model A, had the smallest negative improvement in the NRI (P > 0.05) (Table 4).

Figure 1
figure 1

Nomogram for pertussis: (a) shows the nomogram used to estimate the probability of pertussis in children. (b) Shows a simplified nomogram used to estimate the probability of pertussis in children.

Table 4 Comparison of diagnostic prediction models.

Validation of diagnostic prediction model

Finally, WBC, CRP and PDW-MPV-R were included in the diagnostic prediction model. We constructed the nomogram again according to these variables (Fig. 1b). Figure 1b showed an example of using the nomogram to predict the probability of a child having pertussis. The total points was determined based on the individual points calculated by using the nomogram, the probability of pertussis was determined based on the total points. Most children in the present study had total points ranged from 90 to 160 points, the probability of having pertussis was more than 50% when the total points were more than 124 points. ROC curve analysis showed that the AUC value of the model was 0.77 (95% CI 0.72–0.82) in training cohort (Fig. 2a) and 0.80 (95% CI 0.73–0.88) in validation cohort (Fig. 2b). Briefly, the sensitivity and specificity were 72.1% and 72.6%, respectively, in training cohort and 74% and 72.1% respectively, in validation cohort, indicating favorable discrimination for the diagnostic prediction model. We selected the bootstrap method for internal validation based on the number of patients, and the calibration plots showed high consistency between the predicted and observed probabilities in both the training cohort (Fig. 3a) and the validation cohort (Fig. 3b). The Hosmer–Lemeshow goodness of fit test also showed a good fit in both the training cohort (χ2 = 8.744, P = 0.364) and the validation cohort (χ2 = 8.318, P = 0.403). In summary, the model for pertussis had considerable discriminative and calibrating ability. DCA curves showed that the model added more clinical net benefits for both the treat-all-patients scheme and the treat-none scheme in both the training cohort (Fig. 4a) and the validation cohort (Fig. 4b).

Figure 2
figure 2

ROC carves. (a) and (b) show the ROC curves of the diagnostic prediction model for children with pertussis in the training cohort and validation cohort. The area under the ROC curve was used to evaluate the advantages and disadvantages of the model.

Figure 3
figure 3

Calibration plots. (a) and (b) show the calibration plots of the diagnostic prediction model in the training cohort and validation cohort. The diagonal 45-degree line indicates perfect prediction.

Figure 4
figure 4

Decision curve analysis. (a) and (b) show the decision curve analysis results of the diagnostic prediction model in the training cohort and validation cohort. The net benefit calculated by adding true positives and subtracting false positives corresponds to the Y-axis; the X-axis represents the threshold probability. The blue curve shows that the net benefit of the model was better than that of the treat-all-patients scheme and treat-none scheme.

Discussion

Pertussis (whooping cough) is a respiratory infection caused by Bordetella pertussis, which can affect all ages. Since 1980, the global pertussis case count has fallen more than 90% as immunization coverage has increased19. However, as the number of typical cases of pertussis has decreased after vaccination, clinicians can not be able to detect pertussis in the early stage according to the current clinical case definition, which often results in undiagnosed or misdiagnosed pertussis. Therefore, we constructed a model to predict pertussis in children and confirmed that the model has considerable discriminative and calibrating ability. Simply analyzing blood parameters could reliably help clinicians diagnose pertussis in children.

Previous studies on pertussis have proposed several infection biomarkers, such as white blood cells, lymphocytes and neutrophils, related to the diagnosis of patients with pertussis. These biomarkers were fully considered in our study. Guinto-Ocampo et al.20 reported that patients with higher white blood cell counts (P = 0.02), higher percentage of lymphocytes (P = 0.00) and higher absolute lymphocyte counts ([ALC], P = 0.00) were more likely to have a positive test for pertussis in infants. ROC curve analysis showed that the AUC of the ALC was 0.81 (95% CI 0.72–0.90), with a sensitivity of 89% and a specificity of 75%, indicating that blood cell markers had a good ability to predict pertussis, which was consistent with our results. Al Maani et al.21 also revealed that white blood cell count and lymphocytosis could be used as reliable predictors for the diagnosis of pertussis, especially in the absence of specific confirmatory tests. In addition, several studies have suggested that peripheral blood cell indicators were related to the progression of the pertussis in children. A study reported that higher WBC, ALC and absolute neutrophil count (ANC) were significant predictors of critical pertussis in children22. Ganeshalingham et al. demonstrated the potential clinical utility of WBC, ALC and ANC in the prediction of malignant pertussis15. Coquaz-Garoudet et al. proposed that rapid leukocyte growth and leukocytosis with neutrophil predominance during acute pertussis infection were associated with death23, which should prompt the attention of clinicians to get a better expectated survival in infants. Therefore, biomarkers based on routine blood counts or ratios are suitable for identifying and managing pertussis among clinicians.

In 2001, Zahorec first reported that the ratio of neutrophil-to-lymphocyte ratio may indicate the severity of affliction and could be a marker of the immune system to systemic inflammation or sepsis24. Another study noted that the NLR was a better predictor of bacteremia than routine parameters such as CRP concentration, WBC count and ANC25, which has led people to pay attention to these markers. Ganeshalingham et al.14 first reported that an NLR greater than 1 may indicate life-threatening pertussis in infants. However, the diagnostic value of the NLR, PLR and LMR for predicting pertussis was not confirmed in our study. However, we were surprised to find that PDW-MPV-R was significantly associated with pertussis (P = 0.000, OR 44.641, 95% CI 8.926–223.265). Reviewing the literature, we speculated that pertussis infection may lead to platelet activation and subsequent changes in platelet shape26, which are mainly reflected in the MPV and PDW. Therefore, the PDW-MPV-R might be a novel biomarker for pertussis infection in children, and additional data are needed to validate this indicator. In addition, we also found that the level of CRP, which is a traditional bacterial marker, was not elevated in patients with pertussis, which was consistent with the results of previous studies27.

Notably, individual markers may not always be perfect, and multivariate predictive models that can comprehensively use disease-related data to improve diagnostic accuracy may be more useful. It was reported that a data-driven algorithm including a suspicion of pertussis by a physician, whooping, cyanosis and absence of fever was accurate (79.9%) and specific (94.0%) and had high predictive value for laboratory-confirmed pertussis28. Daluwatte et al.29 developed a machine learning model for pertussis based on signs, diseases and symptoms from clinician notes and demographic information within electronic health-care records; this model predicted pertussis with an area under the precision-recall of 0.24, a recall of 0.72 and a specificity of 0.94. These studies illustrated that although the clinical case definition was not sensitive enough, the model based on clinical symptom information still had the unique ability to identify pertussis in children. In addition, another study proposed that the recall of a machine learning model combining pertussis symptoms and routine blood test results was 0.72, and the sensitivity and specificity were 0.83 and 0.61, respectively30. In our study, we also demonstrated that the prediction model that included only blood parameters had a considerable ability to predict pertussis in children: the AUC-ROC was 0.77, the sensitivity was 0.72, and the specificity was 0.73. As far as we know, the current clinical definition of pertussis could better identify patients with typical manifestations, however, for patients with atypical presentations, this definition made it difficult to accurately select them from patients with other respiratory infections10,11,31,32. The model we had developed provided some reference for such clinical conditions and might be helpful in attempting to address these problems. Similarly, appropriate management of patients predicted to be positive by our model might play an important role in stopping the widespread spread of pertussis in the population. Further more, the treatment of pertussis also emphasized the early administration of medications, which helped to lessen pertussis disease severity and duration, as well as improved the quality of life of patients33,34, but these settings relied on reliable identification of the early course of the diseases. Although these models could not be used to confirm the presence of pertussis when compare to PCR test, they might be useful in low-resource settings where laboratory confirmation was unavailable.

Regrettably, statistical prediction models are usually not applicable because of complex mathematical formulas. Nomograms could reduce these models into a single numerical estimate of the probability of a clinical event35, which provides a scientific basis for clinical decision making and fulfils our drive toward personalized medicine16,36. In our study, we also constructed a nomogram for pertussis based on blood parameters and validated the statistically significant clinical net benefits in a limited sample size. We also followed the recommendation of the Transparent Reporting of a Multivariate Prediction Model for Individual Prognosis or Diagnosis (TRIPOD) statement to use bootstrapping methods in calculating the C-index or AUC and calibration plots. These favorable results were replicated well in the validation cohort. However, we have to recognize the clinical limitations of our prediction model, which cannot provide evidence as conclusive as PCR results, and cannot decisively direct the treatment in suspected cases of pertussis. In practical clinical settings, the results of this model would only serve as a reference, with treatment decisions primarily based on the patient’s clinical condition.

Although the model based on blood parameters might be a useful tool for evaluating the probability of patients with pertussis, the present study had other deficiencies too. For instance, we only used single-center data for internal validation, which might limit the wide applicability of the model. In order to prevent overfitting of the model, we also did not incorporate the clinical features for which had been fully considered when selecting the study population. The time of blood parameter detection which was carried out at different stages of the disease was also not uniform, although it may be particularly close to the real-world clinical situations. Therefore, multicenter clinical validation is needed to evaluate and improve the external utility of our model in the future.

Conclusion

In summary, the prediction model based on blood parameters has considerable discriminative and calibrating ability. The model may be useful for predicting the probability of pertussis in children, but the clinical decisions should be based on the patient’s clinical condition in real-world clinical settings.