Introduction

Idiopathic pulmonary fibrosis (IPF), the most common form of idiopathic interstitial pneumonia (IIP), is a chronic progressive lung disease of unknown cause with significantly worse prognosis than other forms of IIP1, 2. Medical treatments for this deadly disease are dependent on the accurate diagnosis of IPF3, 4.

Diagnostic guidelines for IPF were updated in 2018 by the American Thoracic Society (ATS), the European Respiratory Society (ERS), the Japanese Respiratory Society (JRS), and the Latin American Thoracic Society (ALAT)5. High-resolution computed tomography (HRCT) patterns for usual interstitial pneumonia (UIP) were further refined to patterns for UIP, probable UIP, indeterminate for UIP and alternative diagnoses. In contrast to the recommendation of Fleischner society which were made against performing surgical lung biopsy in patients with newly detected IIP who has a CT pattern of probable UIP6, updated ATS/ERS/JRS/ALAT guideline conditionally recommends surgical lung biopsy for the patient who has a CT pattern of probable UIP, that deciding whether or not to undergo biopsy in patients depends on clinical likelihood of IPF5, 7. Moreover, performing bronchoalveolar lavage (BAL) is also conditionally recommended in group of patients with CT pattern of probable UIP, indeterminate for UIP and alternative diagnosis prior to surgical lung biopsy to distinguish IPF from the other alternative diagnosis including eosinophilic pneumonia, sarcoidosis, and hypersensitivity pneumonitis (HP). These recommendations, however, are conditional, and it is currently unclear how changes in the guidelines for IPF will affect their diagnostic performance in real clinical practice.

The purpose of the current study was validate the latest 2018 diagnostic guideline of IPF in the cohort of fibrosing interstitial lung disease (ILD) and evaluate the diagnostic and prognostic implications compared to previous 2011 guideline in clinical practice. The interobserver agreement, diagnostic performance and survival outcomes were compared. Furthermore, the added diagnostic value of cellular analysis of BAL fluid was assessed in patients with CT patterns of probable UIP, indeterminate for UIP and alternative diagnoses following the guidline.

Results

The study cohort included 535 patients (mean age 60.8 ± 9.0 years, 348 men) with fibrosing ILD (Fig. 1). The clinical characteristics of these patients are summarized in Table 1.

Figure 1
figure 1

Flow chart of patient inclusion. ILD interstitial lung disease; IPF idiopathic pulmonary fibrosis; UIP usual interstitial pneumonia; NSIP nonspecific interstitial pneumonia; cHP chronic hypersensitivity pneumonitis; MDD multidisciplinary discussion; BAL bronchoalveolar lavage.

Table 1 Demographic and clinical characteristics of patients assorted by diagnostic categories.

Interobserver agreement

Interobserver agreement of 2018 criteria showed moderate reliability (κ = 0.53; range: 0.54‒0.67) comparable to 2011 criteria (κ = 0.56; range: 0.56‒0.70). Interobserver agreement was substantial for UIP (κ = 0.66; range: 0.59‒0.76), fair for probable UIP (κ = 0.40; range: 0.53‒0.60) and moderate for indeterminate for UIP (κ = 0.49; range: 0.39‒0.57) and alternative diagnosis (κ = 0.54; range: 0.47‒0.63). Interobserver agreement across the 2018 diagnostic categories was lower in patients with a more than moderate degree of emphysema than in patients with no or a mild degree of emphysema (Fleiss κ = 0.37 vs 0.55).

Differences in patient classification and diagnostic performances of the 2011 and 2018 diagsnotic criteria

Assessments of CT patterns and changes in classification in the 535 patients according to the 2011 and 2018 criteria are demonstrated in Table 2. Of the 219 patients with possible UIP according to 2011 criteria, 175 (80.0%) were reclassified as having a probable UIP, 42 (19.2%) as an indeterminate for UIP and two (0.9%) as an alternative diagnosis. The proportion of patients diagnosed with IPF was significantly higher in patients with UIP CT pattern than in those with probable UIP CT pattern (89.7% vs 62.9%; p < 0.001), but did not differ significantly in patients with probable UIP compared with patients with indeterminate for UIP (62.9% vs 78.6%; p = 0.05; Supplementary table 1).

Table 2 Difference in categorization of fibrosing interstitial lung disease based on 2018 and 2011 diagnostic criteria.

Assessment of its diagnostic performance showed that the CT pattern of UIP had a high specificity (90.3%; 95% confidence interval [CI]: 85.3–94.1%) and positive predictive value (PPV, 89.7%; 95% CI: 84.8–93.1%) but a low sensitivity (48.7%; 95% CI: 43.2–54.1%) for IPF diagnosis (Table 3). Compared with 2011 criteria for possible UIP, the CT pattern of probable UIP had a lower sensitivity (63.2% vs 82.8%), PPV (62.9% vs 65.8%) and negative predictive value (NPV, 63.7% vs 77.3%) but a higher specificity (63.3% vs 57.6%). IPF diagnosis based on CT patterns of UIP and probable UIP showed a lower specificity (57.1%; 95% CI: 49.9–64.2%) and PPV (76.6%; 95% CI: 73.4–79.5%) but a higher sensitivity (81.1%; 95% CI: 76.5–85.1%) than diagnosis based on UIP alone.

Table 3 Diagnostic performance according to the CT pattern for the diagnosis of idiopathic pulmonary fibrosis.

Characteristics of patients categorized as interminate for UIP

In the 42 patients with a CT pattern of indeterminate for UIP, 28.6% (12/42) showed CT findings of subtle reticulation with or without mild ground-glass opacities as an early ILD. The other 71.4% (30/42) were truly indeterminate for UIP, the CT patterns and/or distribution of lung fibrosis that did not suggest any specific etiology and among such patients, 10 patients (23.8%, 10/42) showed fibrosis mixed with emphysema.

Among patients with IPF, those indeterminate for UIP showed a longer smoking history than those with other CT patterns (43.8 ± 22.6 vs 34.1 ± 19.2 pack-years; p = 0.022). In addition, the percentage with more than a moderate degree of emphysema was significantly higher in patients indeterminate for UIP (27.3%) than in those with UIP (8.5%), probable UIP (6.4%) and alternative diagnoses (6.5%) (p = 0.003). Baseline forced vital capacity (FVC, 82.5% ± 19.7% vs 70.1% ± 18.3%; p = 0.001), diffusing capacity of the lung for carbon monoxide (DLCO, 69.0% ± 23.9% vs 57.3% ± 17.8%; p = 0.002) and forced expiratory volume in 1 s (FEV1, 86.0% ± 20.7% vs 77.9% ± 18.8%; p = 0.029) were also significantly higher in patients with CT pattern of indeterminate for UIP than in those with other CT patterns, suggesting better pulmonary function.

Survival analysis

Significant prognostic differences were observed when patients were grouped by the HRCT criteria of both the 2011 and 2018 guidelines (log-rank test, p < 0.001; Supplementary Fig. 1). Based on 2018 criteria, survival was significantly longer in patients with CT pattern of indeterminate for UIP than in those with the other CT patterns (median survival of patients with CT pattern of indeterminate for UIP, 11.1 years [95% CI: 7.5–14.7] in IPF cohort; 11.5 years [95% CI: 10.5–12.4] in total cohort). After adjusting covariates, patients with CT pattern of indeterminate for UIP and probable UIP showed significantly better prognosis compared with UIP pattern (adjusted hazard ratio [HR] = 0.37 [95% CI: 0.22–0.61], p < 0.001 for CT pattern of indeterminate for UIP; adjusted HR = 0.61 [95% CI: 0.46–0.81], p = 001 for CT pattern of probable UIP; Table 4) in total patients, regardless of diagnosis (i.e. IPF or non-IPF). In IPF cohort, CT pattern of indeterminate for UIP was associated with better survival compared with UIP pattern (adjusted HR = 0.42 [95% CI: 0.25–0.71], p = 0.001) but CT pattern of probable UIP was not significantly associated with survival (adjusted HR = 0.79 [95% CI: 0.58–1.08], p = 0.142). Among the patients with CT pattern of probable UIP, survival difference was observed according to diagnosis. Idiopathic nonspecific interstitial pneumonia (iNSIP) showed better prognosis compared to IPF but chronic HP (cHP) showed comparable survival outcome compared to IPF with same CT pattern (adjusted HR = 0.23 [0.12–0.44], p < 0.001 for iNSIP; adjusted HR = 0.84 [0.50–2.38], p = 0.837 for cHP). Adjusted survival curves according to HRCT crteria of 2011 and 2018 guidelines are shown in Fig. 2.

Table 4 Cox proportional hazard analysis for survival time.
Figure 2
figure 2

Adjusted survival curves of overall survival stratified according to the CT patterns of 2011 and 2018 guidelines for diagnosis of idiopathic pulmonary fibrosis. (a,b) Patients with fibrosing interstitial lung disease (total cohort) and (c,d) patients with idiopathic pulmonary fibrosis as a function of CT pattern according to the 2011 and 2018 diagnostic criteria, respectively. Survival curves were adjusted by age, sex, baseline forced vital capacity, diffusing capacity of the lung for carbon monoxide and use of anti-fibrotics.

Added value of BAL fluid analysis in the diagnosis of IPF

To evaluate the added value of BAL fluid analysis in the diagnosis of IPF, the 336 patients with CT patterns of non-definite UIP (probable UIP, indeterminate for UIP, alternative diagnosis) who underwent BAL fluid analysis were randomly divided into development (n = 142) and validation (n = 94) cohorts. Their clinical characteristics is presented on Supplementary table 2.

In the development cohort, median lymphocyte counts in BAL fluid differed significantly between IPF and non-IPF groups (11 [IQR: 3–22] vs 19 [IQR: 10–55]; p < 0.001) (Supplementary Fig. 2), but median neutrophil counts were similar (4 [IQR: 1–7] vs 3 [IQR: 1–7]; p = 0.815). The optimal cut-off value for lymphocyte counts in BAL fluid for the diagnosis of IPF was 15%. The ability of clinical variables, including age, sex and lymphocyte count in BAL fluid, to predict IPF in the development cohort was tested relative to CT pattern. Older age (≥ 60 years), male sex, low lymphocyte count (≤ 15%) in BAL fluid, and probable UIP pattern were all significant predictors of IPF in univariable and multivariable analysis (Supplementary table 3). Subgroup analysis among the patients with probable UIP CT pattern, showed that male sex and low lymphocyte count (≤ 15%) were significant predictors of IPF in multivariable analysis (Supplementary table 4).

Among the patients with CT patterns of probable UIP who underwent BAL fluid analysis, 73.0% (54/74) in the development cohort and 71.1% (32/45) in the validation cohort were correctly classified as either IPF or non-IPF based on the cut-off value of lymphocyte count of 15% in BAL fluid. Among the patients with CT patterns of probable UIP, 66.2% (49/74) in the development cohort and 46.7% (21/45) were correctly classified as either IPF or non-IPF based on male with age ≥ 60.

In the validation cohort, the addition of lymphocyte count in BAL fluid to CT pattern alone or together with either older age or male sex improved specificity, PPV and diagnostic accuracy (Table 5). Diagnosing IPF when CT pattern showed probable UIP with lymphocyte count ≤ 15% in BAL fluid, and either male sex or age ≥ 60 years showed a high specificity of 90.6% (95% CI: 79.3–96.9) and a PPV of 80.8% (95% CI: 63.4–91.1). The model incorporating result of BAL fluid analysis had higher net benefits persistently than the other models for risk thresholds > 15% in the validation cohort of non-definite UIP CT pattern and for risk thresholds > 20% in the validation cohort of probable UIP CT pattern (Fig. 3). The net reclassification improvement of the model incorporating result of BAL fluid analysis over the model of age and sex was 0.75 (95% CI: 0.19–1.31) in the validation cohort of probable UIP CT pattern, which showed significant improvement in classification accuracy for IPF diagnosis (p = 0.007).

Table 5 Test characteristics of probable UIP pattern on CT alone and together with clinical features for IPF diagnosis in patients with CT pattern of non-definite UIP.
Figure 3
figure 3

Decision curve analysis of prediction models for the diagnosis of idiopathic pulmonary fibrosis in patients with non-definite CT pattern in the validation cohort. (a) Net benefits (proportion of true-positive results minus weighted proportion of false-positive results with weight equal to the ratio of risk threshold to 1 minus risk threshold) of combined model of probable UIP CT pattern, demographic characteristics (sex and age) plus low lymphocyte count in bronchoalveolar lavage fluid were comparable to or higher than those of models with CT pattern alone or combined with age and sex for risk thresholds > 15% in patients with CT pattern of non-definite UIP in the validation cohort. (b) Net benefits of combined model of demographic characteristics (sex and age) plus low lymphocyte count in bronchoalveolar lavage fluid were higher than those of models with demographic characteristics alone for risk thresholds > 20% in patients with CT pattern of probable UIP in the validation cohort.

Discussion

The present study showed that application of the latest 2018 diagnostic criteria for IPF proposed by ARS/ERS/JRS/ALAT resulted in the reclassification of patients categorized as having possible UIP based on 2011 criteria into two categories, probable UIP and indeterminate for UIP. Prognoses differed significantly among patients grouped by both 2018 and 2011 criteria, with patients classified as indeterminate for UIP according to 2018 criteria showing significantly better survival than the other groups. Although diagnosing IPF based on a CT pattern of probable UIP on 2018 criteria showed increase in sensitivity, its specificity and PPV remained insufficient. The latest ARS/ERS/JRS/ALAT guideline recommend cellular analysis of BAL fluid for patients with newly detected ILD of unknown cause who are clinically suspected of having IPF and have an HRCT pattern of probable UIP, indeterminate for UIP or an alternative diagnosis. In our study, the inclusion of BAL fluid analysis increased diagnostic performance, including specificity and PPV, compared with a model that included CT pattern and appropriate clinical features increasing the likelihood of IPF, i.e. old age and male.

A CT pattern of probable UIP has been found to indicate a higher likelihood of UIP on biopsy8,9,10, especially when compared with a CT pattern of indeterminate for UIP (82.4% vs 54.2%; p = 0.01)11. Our study found, however, that the proportion of patients diagnosed with IPF did not differ significantly in patients with CT patterns of probable UIP from those with indeterminate for UIP (62.9% vs 78.6%; p = 0.05). The discrepant results might be due to different methods used for accounting for the outcome (i.e. histologic UIP vs multidisciplinary diagnosis of IPF) and different proportions of diseases in the tested cohort as there were small numbers of patients with NSIP (9%) and patients with connective tissue disease were not clearly excluded in the Chung et al.’s study11. In addition, coexistence of more than moderate degree of emphysema was present in patients who were classified as indeterminate for UIP and the emphysema could affect the evaluation of CT pattern, which could lead to interpret as indeterminate for UIP, finally increasing the proportion of IPF in patients with CT pattern indeterminate for UIP. However, this was not addressed in the 2018 guideline.

This result also affected the diagnostic performance and CT pattern of probable UIP demonstrated a slightly higher specificity (63.3% vs 57.6%) but a lower PPV compared to possible UIP category based on 2011 criteria (62.9% vs 65.8%). As demonstrated in our study, the PPV can be variable and may not be satisfactory regarding the study population of cohort, and among the patients with probable UIP CT pattern, there still can be a significant heterogeneity of underlying disease that certain proportion of iNSIP also can show similar CT findings. This argues the assertion to forgo surgical lung biopsy in patients with probable UIP CT pattern. When we encounter fibrosing ILD without an identifiable cause and suspected to be IPF in clinical practice, the most problematic differential diagnosis, which should be differentiated from IPF would be iNSIP or cHP. Moreover, interobserver agreement for probable UIP was fair (κ = 0.40), which was quite unsatisfying to use as a final confirmative diagnostic criteria in clinical decision making to guide further invasive diagnostic procedures. Among patients with CT pattern of probable UIP, compared to non-IPF patients (i.e. iNSIP), prognosis was also different and the survival time was shorter in IPF patients which was consistent with a previous study12. This emphasizes that the patients with CT pattern of probable UIP are still a heterogeneous group, and increasing the clinical likelihood of IPF is important. Diagnosing IPF based on a probable UIP CT pattern should be applied carefully in a selected population with a high clinical likelihood of IPF. Although our study cohort did not include all diseases that can cause fibrosing ILD but only major diseases, we think that our study cohort better reflects the problems regarding the diagnosis of IPF in real clinical practice.

The discriminatory value of BAL in the real-world population is poorly defined and remains controversial even among experts13,14,15. In this study of Asian cohort, adding BAL to rule out the possibility of non-IPF, provide better diagnostic performance for diagnosing IPF in patients with non-definitive UIP. We found that the optimum lymphocyte count cut-off was 15%, a percentage lower than expected. BAL lymphocyte content is similar in IPF and healthy control subjects16, and when increased in IPF is associated with moderate to severe alveolar septal inflammation, raising the possibility of an alternative disease, such as cHP, iNSIP or connetive tissue disease-associated ILD. Many studies have demonstrated differences in BAL cell counts between lung diseases , but their variability has always been found to be great17, 18, resulting in considerable overlap that might not allow a reliable diagnosis in individual patients regardless of statistical significance. However, it can be used as a safe guard and important complementary method for optimal selection of patients to undergo a diagnostic surgical biopsy in patients with probable UIP CT pattern.

Patients categorized as indeterminate for UIP were a heterogeneous group but including the majority of patients with heavy smokers and patients with emphysema which was in accord with Tzilas et al.19 or early stage ILD showing better pulmonary function. Morevoer, regardless of final diagnoses of IPF or non-IPF, patients with CT pattern of indeterminate for UIP had a good prognosis. Our study showed that prognosis of patients with CT pattern of possible UIP can be better stratified based on 2018 criteria.

There are several limations in our study. A main limitation of our study lies in the lack of an external validation cohort which to confirm our findings. However, the scarcity of well-characterised populations of IPF patients, even in tertiary centers, is well recognised. External validation with a cohort of a different prevalence of disease or a prospective study would be helpful to further provide the evidence of routine use of BAL. Second, although we included patients who underwent surgical lung bipsy to build a cohort with less diagnostic uncertainty, it can cause selection bias, mainly in patients with IPF this could include patients with more atypical features compared to patients who did not undergo lung biopsy, which can cause a lower PPV in patients with probable UIP CT pattern. However, the rate of biopsies has changed over time and many patients with typical clinical and imaging features also underwent biopsies in the early period and are included in the study cohort. Third, we did not involve all fibrosing ILD which we can encounter in the real clinical practice and our study cohort may not fully reflect the real prevalence of fibrosing ILD. However, we included iNSIP and cHP since those are the major fibrosing ILD which should be differentiated from IPF in clinical practice. Lastly, as a single center retrospective study, all three readers, thoracic radiologists, were from the same institution, which can limit the generalizability of the results of the study.

In conclusion, the 2018 ATS/ERS/JRS/ALAT diagnostic criteria for IPF provide a better prognostic stratification in patients with CT pattern of possible UIP than 2011 criteria. However, diagnosis of IPF based on CT pattern for probable UIP remains insufficient. Adding BAL fluid results can improve the diagnostic certainty for IPF diagnosis in patients with probable UIP CT patterns.

Materials and methods

Study population

This retrospective study was approved by the Institutional Review Board of Asan Medical Center (IRB number 2018-1284), which waived the requirement for written informed consent. Patients newly diagnosed with IPF and a clinically relevant control group consisting of patients with iNSIP and cHP who underwent surgical lung biopsy at Asan Medical Center, Seoul, South Korea, between July 1995 and January 2016 were included (Fig. 1). The final diagnoses were revalidated through multidisciplinary discussion by clinician, radiologists and pathologists with reference to the 2011 ATS/ERS/JRS/ALAT guideline20. Clinical data (presentation, antigen exposures, smoking status, associated disease and lung function changes), radiological and histopathological findings were discussed for the final diagnosis. All diagnoses of iNSIP were only made when the histopathologic features suggest fibrotic NSIP with compatible radiological findings21, 22. A diagnosis of cHP was only made when the histopathological features were typical showing the presence of poorly formed nonnecrotizing granulomas with chronic fibrosing interstitial pneumonia or airway-centered fibrosis on surgical lung biopsy23, 24. For diagnosis of cHP, the presence of an inciting antigen, compatible HRCT imaging, and typical histopathologic findings were considered during the multidisciplinary discussion. The results of BAL fluid analysis were not considered for the multidisciplinary diagnosis in our institution. In our center, the clinical diagnosis of IPF is reached by first excluding other known causes of ILD, and then by the presence of UIP pattern on HRCT in patients not subjected to surgical lung biopsy. In patients who underwent surgical lung biopsy, the diagnoses are based on combinations of HRCT and the surgical lung biopsy pattern followed by multidisciplinary discussion. The approximate rate of biopsy in patients with IPF in our center was about 30% between 2004 and 2017 as reported in a previously published study25. However, the rate of biopsies has changed over the long study inclusion period from 1995 to 2016, many patients with typical clinical and imaging features also underwent biopsies in the early period of the study inclusion. In IPF patients of our study cohort, histopathological UIP pattern was present on surgical lung biopsy in patients with CT pattern of alternative diagnosis, and either the histopathological UIP pattern or probable UIP pattern was present in patients with CT pattern indeterminate for UIP. Patients with auto-immune features (i.e., marked lymphoid follicles or plasmacytosis) were excluded priorly and were not screened for study inclusion as we only included a confirmed diagnosis of IPF, iNSIP and cHP. However, 4 patients were excluded who were finally diagnosed with a connective tissue disease and 6 patients were excluded as unclassifiable ILD during the course of a retrospective revalidation of their diagnosis. Patients who were unavailable for histopathologic evaluation or baseline CT and those who were proven to have an isolated cellular NSIP were excluded. Clinical data, including age, sex, smoking history, the results of 6 min walk tests, pulmonary function tests, BAL fluid analysis, histopathologic findings of surgical lung biopsy, and survival were collected. BAL was was performed in accordance with the guidelines26. All methods were performed in accordance with the relevant guidelines and regulations.

CT evaluation

HRCT scans were perfomed in both supine and prone positions. HRCT were indepedently evaluated by three thoracic radiologists (K.D., E.J.C. and J.C., with 18, 15 and 4 years of experience in thoracic radiology, respectively), who were blinded to clinical data and final diagnosis. HRCT scans were simultaneously classified into three categories (UIP, possible UIP or inconsistent with UIP) according to 2011 criteria, or into four categories (UIP, probable UIP, indeterminate for UIP or alternative diagnosis) according to 2018 criteria5, 20. For the discordant cases of all three readers, we performed additional reading session and discordant results resolved by consensus. The extent of emphysema was evaluated as mild (1–10%), moderate (11–25%), severe (26–50%) and very severe (> 50%).

Statistical analysis

All statistical analyses were performed using SPSS (IBM SPSS Statistics for Windows, Version 21.0, IBM Corp., Armonk, N.Y., USA, www.ibm.com/products/spss-statistics) and R (R: A language and environment for statistical computing, R Foundation for Statistical Computing, Vienna, Austria, www.R-project.org) software. To assess the differences in variables between groups, the t-tests or Mann–Whitney U-tests was used for continuous variables and the χ2 test was used for categorical variables. The generalized interobserver agreement for all three readers was evaluated using Fleiss’ κ. Cox proportional hazards models with adjusted survival curves were used to evaluate the associations between CT pattern and survival time. The prognostic impacts were adjusted by age, gender, treatment history of anti-fibrotic agent, baseline FVC and baseline DLCO.

To evaluate the added value of BAL fluid analysis for diagnosing IPF in patients with non-definite UIP CT patterns (probable UIP, indeterminate for UIP and alternative diagnosis), such patients were randomly divided into development and independent validation cohorts (3:2 ratio). The correlations with a IPF diagnosis and the results of BAL fluid analysis, along with CT pattern and clinical characteristics such as old age (≥ 60 years) and male sex, were assessed by logistic regression analyses5. The best-cutoff for cell count (%) in BAL fluid in the development cohort was determined using Youden’s index27. The sensitivity, specificity, PPV, and NPV for the diagnosis of IPF were calculated for each regression model. Diagnostic performance was quantified by measuring the AUC, and to quantify the improvement of usefulness added by BAL fluid analysis, a net reclassification improvement was also evaluated28. The clinical utility of each prediction model was assessed by decision curve analysis, by quantifying the net benefits at different threshold probabilities29, 30. A p value < 0.05 was considered statistically significant.