Introduction

Ambient air pollution (AAP) is a complex mixture of various gaseous pollutants and solid particles1,2. According to the World Health Organization (WHO) Ambient Air Quality Database 2022, more than 80% of populations living in urban areas with air monitoring devices, are exposed to air quality levels exceeding WHO thresholds3. AAP is estimated to have caused 4.2 million premature deaths globally in 20194. Many studies indicate that AAP can increase the risk of myocardial infarction5, stroke6, headache7,8,9,10, disorders in fetal development11, asthma12, Chronic obstructive pulmonary disease (COPD)2, Attention deficit hyperactivity disorder (ADHD) in children13, and neurological diseases14. Furthermore, in 2013, AAP has been classified by the International Agency for Research on Cancer (IARC) as a Group 1 human carcinogen, mostly due to the evidence which related it to lung cancer15.

According to the report of the IARC in 2020, breast cancer (BC) has now become the most prevalent cancer worldwide, surpassing lung cancer16. It is also one of the leading causes of death globally17,18 and its burden has been increasing in many parts of the world over the past decades19. In 2020, 2.3 million women worldwide were diagnosed with BC, which corresponds to 1 in 8 diagnosed cancers; and 685,000 deaths occurred, which accounts for 1 in 6 deaths due to cancer in women18,20,21. By the end of 2020, there were 7.8 million women diagnosed with BC in the recent 5 years, making it the world's most common cancer21. It is predicted that by 2040, the burden of BC will increase to over 3 million morbidities and 1 million mortalities annually19.

BC is a complex multifactorial disease with a couple of known risk factors including female gender, older age, history of BC in the family, overweight and obesity, physical inactivity, history of radiation exposure, reproductive history (such as early menarche and late first pregnancy), tobacco and alcohol use, history of other benign breast diseases, short breastfeeding periods, and postmenopausal hormone therapy or oral contraceptives21,22. However, almost half of diagnosed breast cancers in women have no detectable risk factor21, suggesting a need to identify still unknown risk factors.

There is growing evidence that AAP can be a risk factor for BC23,24. Ecological studies propose that BC risk is higher in urban areas with higher air pollution compared to rural areas25. AAP contains many carcinogens that may perform as endocrine disruptors and cause oxidative DNA-damage which may affect BC risk25,26,27. However, studies have indicated inconsistent results about the impact of AAP on BC23,24,28,29,30. Uncertainty about the effect of AAP on BC is because it is difficult to prove causality due to the long latent period as well as low-dose exposure in the environment31. Meanwhile, the dissimilar findings of published studies can be partly justified by the diversity in AAP and exposure measurement methods and variations in study design32. Therefore, as Wei et al. 2021 recommended, there is a need to conduct studies especially in developing countries, with improved exposure measurement and covariate adjustments33.

In this regard, many researchers have explored a multiple-pollutant (instead of single pollutant) approach to evaluate the effects of air pollution, because humans are usually exposed to a complex mixture of air pollutants34, and in models that assess the impact of a single pollutant, it is difficult to determine whether an observed association reflects the impact of the specific pollutant being investigated, or the effect of other pollutants coinciding with it35. Although a few studies have shown the effect of air pollution on the incidence of BC36, but there is limited information about its effect on the severity of the disease. Therefore, the objective of this study was to investigate the effect of multiple air pollutants on BC stages diagnosed in Tehran, Iran. Also, as a secondary objective we assessed the pollutants’ impact on BC stages in different socio-economic levels.

Methods

Study subjects

Data was inquired from the Cancer Research Center (CRC) of Shahid Beheshti University of Medical Sciences in Tehran. All eligible female patients diagnosed with BC (ICD-O-3 C50.0–C50.9) according to pathology report, between 2008 and 2016 in different districts of Tehran were included. The Institutional Review Board (IRB) of the CRC approved the study protocol.

Data about patients’ characteristics including demographic factors (age at diagnosis, education level, and marital status), lifestyle factors (smoking status), reproductive factors (ages at first menstruation and pregnancy, number of pregnancies and deliveries), Estrogen/Progesterone receptor status, and clinical pathologic information including stage at diagnosis (non-aggressive, stage I, II, or III&IV), number of metastatic lymph nodes, family history of BC and diabetes was available for each patient. The frequency of missing data for all variables was low (≤ 5.5%).

Ethical approval was obtained (Code: IR.SBMU.CRC.REC.1400.008) from the Ethics Committee of the CRC of Shahid Beheshti University of Medical Sciences in Tehran, Iran, and all methods were performed under the relevant guidelines and regulations.

Residence, highway proximity, and neighbourhood socioeconomic status

Residential addresses at diagnosis were geocoded to latitude and longitude coordinates using address or street locators.

An index of socioeconomic status (SES) was created based on principal component analysis of sixteen district -based indicators of SES. The socioeconomic indicators of the 22 districts of Tehran were extracted from a local study37. This SES index was assigned to participants’ addresses at diagnosis and was categorized into quartiles.

The distance of each patient to the highway was assigned as a proxy for traffic-related exposures. The distance of each patient’s address at diagnosis to the nearest main street or highway was calculated in the ESRI 2016 data layer. Distance to the highway was categorized in 3 groups as < 400, 400 to 800, and > 800 m.

Long-term air pollution exposure assessment

Land use regression (LUR) models, were used to estimate exposure levels of PM10 (Particulate matter smaller than 10 microns), SO2 (sulphur dioxide), NO (nitric oxide), NO2 (nitrogen dioxide), and NOX (oxides of nitrogen) based on measurements conducted at 23 regulatory network monitoring sites in Tehran, in 201038,39. The volatile organic compound (VOC) concentration levels were obtained from spatial models that were built using long term measurements across about 180 sites in Tehran with very good performance40. More details about the exposure assessment methods have been described elsewhere41. Based on the patient’s geocoded residential locations at the time of diagnosis of breast cancer, air pollution exposure was estimated for each patient.

The ArcGIS Software (ArcGIS Locator version 10.0, ESRI, Redlands, CA, USA) and Tehran ArcGIS Shapefile Map Layers were used to geocode the residential addresses of the study subjects (X and Y coordinates of addresses).

Statistical analyses

Data were summarized with mean ± standard deviation (SD) for continuous and frequency (percentage) for categorical variables. Chi-square tests was used to assess the difference among categorical variables in different categories of BC. Kolmogorov–Smirnov test was used to test the normality of the pollutants data and because the data were not normally distributed, Spearman’s correlation test was used to examine the correlations between different air pollutants and SES status.

The statistical analysis consisted of three steps: at first, we applied the weighted quantile sum (WQS) regression analysis to estimate the joint effect of air pollution mixtures on the stage of BC; in the second stage, we estimated the effect of each pollution and multipollutants on the stage of BC. Associations with BC risk were modeled using multinomial logistic regression models to estimate odds ratios (OR) and 95% confidence intervals (95% CI). Finally, the effect of each air pollutant on the stage of BC in different levels of SES was estimated.

The weighted quantile sum (WQS) regression analysis was done using the R package “gWQS” and Quantile-based g-Computation estimates was done suing the “qgcomp” package in R software. The WQS, developed specifically for the context of environmental mixtures analysis, is an increasingly common approach for multivariate regression in a high-dimensional dataset that operates in a supervised framework, creating a single score (the weighted quantile sum) that summarizes the overall exposure to the mixture, and by including this score in a regression model to evaluate the overall effect of the mixture on the outcome of interest42. The score is calculated as a weighted sum (so that exposures with weaker effects on the outcome have lower weight in the index) of all exposures categorized into quartiles (or more groups) so that extreme values have less impact on the weight estimation. A recent approach introduced by Keil et al. (2020) called Quantile-based g-Computation estimates the overall mixture effect with the same procedure used by WQS, but estimates the parameters of a marginal structural model, rather than the standard regression used in this study43. This approach, is under the common assumptions in causal inference such as exchangeability, causal consistency, positivity, no interference, and correct model specification. This model also improves the causal interpretation of the overall effect44.

After that, bivariate multinomial logistic regression analysis was conducted to explore the association of independent variables and BC stages. The cancer stage variable (outcome) had four categories: non-aggressive, stage I, stage II, stage III & IV, in which non-aggressive was the reference category. Thus, each pollutant and all confounder variables (with a P-value < 0.2 in bivariate analysis) were modeled by multivariate-adjusted multinomial logistic regression analysis. We also stratified the models by age (≥ 50 years old as menopause and < 50 years as pre-menopause).

Because the analysis examined the relations between BC stage and numerous correlated air pollutants, we used two different methods for parameterizing air pollutants in our study, single pollutant and multipollutant. The lowest-quantile category of multipollutants was used as the reference for comparison.

Finally, to understand differences in air pollutant and BC stage associations by SES, we assessed this association in low (quartile 1 and 2) and high (quartile 3 and 4) SES levels. Since data were from 22 districts, robust standard errors by cluster were incorporated into all analyses. Missing data were replaced by the variables’ mode or median value. Data description and analyses were conducted using STATA 17 and statistical software R (version 4.0.2, License GPLv2).

Ethical approval and consent to participate

Participants in all studies provided written informed consent. Ethical approval was obtained (IR.SBMU.CRC.REC.1400.008) from the Ethics Committee of the Cancer Research Center (CRC) of Shahid Beheshti University of Medical Sciences, Tehran, Iran. The patient data was anonymous and strictly confidential.

Results

Study population characteristic

The study population consisted of 1164 BC cases aged 20 years and older residing in 22 urban districts of Tehran during 2008–2016. We had to exclude the data of 16 subjects who lived in remote suburbs of Tehran, which air pollutant surveillance was not done. Finally, 1148 cases entered the analyses.

The distribution of BC patients in different regions of Tehran is shown in Fig. 1. The mean (standard deviation) of age at diagnosis in our study was 50.25 (11.67) years. Regarding the demographic and clinical variables, there were statistically significant differences in age at diagnosis, education level, marital status, family history of BC, diabetes, smoking, pregnancy status, and ER–PR status among patients diagnosed at different stages of BC (Table 1). The distribution of BC patients in different categories is shown in Table 1. The most common stage of BC was stage 2 with 35.6%, followed by stage III & IV with 31.40% of cases (Fig. 2).

Figure 1
figure 1

Spatial distribution of breast cancer patients in different areas of Tehran in 2008–2016 (n = 1148).

Table 1 The demographic and clinical characteristics of women diagnosed with breast cancer in different areas of Tehran in 2008–2016.
Figure 2
figure 2

Breast cancer severity status among women diagnosed with breast cancer (n = 1148).

Generalized weighted quantile sum regression analysis (gWQS model)

The Spearman correlations among pollutants and SES status is shown in Fig. 3. The highest values were among benzene compounds (benzene, toluene, ethylbenzene, m-xylene, p-xylene, o-xylene, and TBTEX) (P < 0.05). The SES status had inverse correlations with all pollutants varying from − 0.01 to − 0.70 (Fig. 3).

Figure 3
figure 3

Spearman coefficients of correlation between air pollutants and SES in Tehran, Iran, during 2008–2016. SES: Socioeconomic Status, * Correlation is significant at P<0.01.

Figure 4a shows the contribution of each pollutant to the construction of the composite variable (multi-pollution) used in the multipollutant model. In this figure, the red dotted line indicates the significance level, and according to the concentration and pollutant weights, the contributions of ethylbenzene, NO2, and benzene were relatively small, while the contribution of PM10, p-xylene, o-xylene, and NO were relatively prominent in the multipollution variable. Furthermore, Fig. 4b shows the positive and negative weight of variables in the construction of the multipollution variable. Also, of the four important and influential variables, p-xylene and NO variables had positive weights and o-xylene and PM10 variables had negative weights.

Figure 4
figure 4

Distribution of (a) quantile sum regression model index weights and (b) positive and negative weights estimation in the simulated dataset of multipollutant percentile quartile for each air pollution.

Summary statistics for each pollutant, multipollution variable, and proximity to the highway of BC cases are shown in Table 2.

Table 2 Summary of air pollution and highway proximity variables.

Severity of BC and ambient air pollutants analysis

Table 3 shows the Odds ratios (OR) and 95% confidence intervals (95% CI) of the crude analyses of the associations between the independent variables and cancer stages by multinomial logistic regression analyses. Education level, smoking status, diabetes, family history of BC, age at first menstruation, number of pregnancies and deliveries, and highway proximity were included in the multivariate model, because, at least one of their categories had a P-value < 0.2.

Table 3 Crude Odds ratio (OR) between independent variables and breast cancer stages.

An increase of 10 μg/m3 in ethylbenzene, o-xylene, m-xylene, and 10 ppb of NO in the adjusted model corresponded to OR = 10.41 (95% CI 1.32–82.41), 4.07 (95% CI 1.46–11.33), 2.89 (95% CI 1.08–7.73) and 1.08 (95% CI 1.00–1.15) increases in the odds of stage I BC in comparison to non-invasive BC, respectively. Also, benzene (OR = 1.16, 95% CI 1.01–1.33) and o-xylene (OR = 1.18, 95% CI 1.02–1.38) were significantly associated with an increased odds of BC (stages III & IV) compared to non-aggressive stages. However, an increase of 10 ppb in SO2 was significantly associated with a decreased odds of stage III and IV BC, in patients over 50 years old (Table 4).

Table 4 Adjusted Odds ratios and 95% confidence intervals between each 10-unit increase in air pollutants and breast cancer stages.

In patients under 50 years old, in multi-pollutant models, the high multiple-air-pollution quartile was associated with higher odds of stage I BC (OR = 3.16, 95% CI 1.17–8.53) when compared with the low multiple-air-pollution quartile (Table 4).

Furthermore, our results showed that the adjusted odds of BC stage I and stage III&IV, and air pollution exposure was higher among low SES cases (stages III & IV vs non-aggressive: ORO-xylene = 2.69 and ORM-xylene = 1.83, stages I vs non-aggressive: ORBenzene = 3.67, OREthylbenzene = 7.15, and ORO-xylene = 2.49) (Table 5).

Table 5 Adjusted Odds ratios and 95% confidence intervals of the association between each 10 unit increase in air pollutants and stage of cancers in SES categories.

Discussion

To our best knowledge, this is the first study to examine the association between air pollutants (single and mixtures) and BC severity. The findings suggest that air pollutant exposure, especially in patients who were diagnosed under 50 years old was associated with a higher stage of BC at diagnosis. This might mean that air pollutants are increasing the speed of breast cancer development and progress.

Similar to various other studies45, the spatial correlations between the individual air pollutants were rather high (− 0.51 to 0.96); however our study used a new approach and investigated the effect of a mixture of air pollutants as well. The interest in determining the simultaneous effect of multiple pollutant exposure on health outcomes, and the identification of dominant pollutants has been growing in recent years. These studies can probably explain the health outcomes much better than single pollutant studies45,46,47,48.

A recent approach introduced by Keil et al. (2020) called Quantile-based g-Computation gWQS estimates the overall mixture effect with the same procedure used by WQS, but estimates the parameters of a marginal structural model, rather than the standard regression. This model also improves the causal interpretation of the overall effect of multiple pollutants40. It combines pollutants into a weighted additive index, which is used to estimate an overall mixture effect through a bootstrap resampling procedure and avoids overfitting and collinearity42. This model has been used in an increasing number of studies42,49,50. Our study indicated that the dominant pollutants in the gWQS model were p-xylene, NO, o-xylene, and PM10.

Our results showed a significant association between air pollution and BC severity by adjusting for smoking status, diabetes, family history of BC, age at first menstruation, number of pregnancies, and highway proximity. Our study demonstrated some associations between ethylbenzene, o-xylene, m-xylene, and NO and stage of BC among women under 50 years old and o-xylene and benzene among women over 50 years old in univariate models. Previous studies also suggest air pollution might be related to breast cancer, particularly among women with a positive family history and age of under 50 years old32,51.

One meta-analysis of 36 effect estimates for PM2.5, PM10, and NO2 has confirmed that decreasing long-term NO2 exposure or correlated air pollutant exposures could lower breast cancer risk; and also showed that associations of NO2 levels with breast cancer risk were higher in premenopausal than in postmenopausal women52. In this current study, a significant association was seen between air pollutant exposures and severity of BC in premenopausal women. This shows that the effect of air pollutants on BC can be different in different periods of a women’s life and may be stronger during premenopause. Differences in cancer morphology or hormonal subtypes in pre- and post-menopausal women might explain this difference in the effect of air pollutants53. Some studies found positive associations between air pollution and BC in postmenopausal women54,55.

Two recent reviews suggested an significant increased risk of breast cancer associated with an increase in nitrogen dioxide (NO2) and nitrogen oxide (NOx) levels, both of which are proxies for traffic exposure56,57. Also a nested case–control study within the French E3N cohort showed an increased odds of breast cancer associated with long-term exposure to NO2 air pollution54.

Hwang et al. in a nationwide analysis in South Korea (2005–2016) showed that the ambient air pollutant concentrations were positively and signifcantly associated with breast cancer odds, and per 10 ppb NO2 increase, the odds of BC increased by OR = 1.14 (95% CI = 1.12–1.16)58. A cohort study conducted between 1980 and 1985 in urban centers in Canada, showed that exposure to NO2 increases the risk of premenopausal breast cancer, and the rate ratio (RR) for an increase of 9.7 ppb (the interquartile range) was 1.13 (95% CI 0.94–1.37) among premenopausal patients59.

Studies have shown that NO can directly inhibit the activity of caspases providing an efficient means to block apoptosis and can increase breast cancer development through estrogen and progesterone pathways, which are both involved in the carcinogenesis of breast cancer60.

Results of a review study in 2018 showed that many individual air pollutants are genotoxic and some are estrogenic or anti-estrogenic. The polycyclic aromatic hydrocarbons (PAHs) are the most-studied component of air pollution in relation to breast cancer and include hundreds of compounds and their metabolites with different biologic activities which are thought to specifically caused mammary gland tumors36. PAHs also activate CYP3A4 via PXR receptors, and can affect estrogen metabolism through these routes as well61. The role of PAHs in tumor progression, has been suggested in some studies62, and the results from other studies done in different geographic locations showed that some VOCs are human carcinogens with strong evidence for genotoxicity, increased PAH-DNA adducts, TP53 polymorphisms and mutations36.

Although a nationwide analysis in South Korea showed that SO2 concentrations were positively and significantly associated with the odds of breast cancer (per 1 ppb SO2, OR = 1.04, 95% CI = 1.02–1.05)58, we found inverse effects of SO2 on the severity of breast cancer among women over 50 years old.

In this study, we found negative and statistically significant correlations between air pollution and SES level. Interestingly, significant associations were seen between air pollution and BC severity in low SES regions as well. The high correlation between SES and air pollutants suggests that part of the effect of air pollutants on BC maybe explained by low SES. Population features, neighborhood deprivation, and air pollution levels are often interconnected, although the direction of associations maybe different in different areas63. Recent studies suggest this pattern could be linked to the composition of the air pollution mixture or the intrinsic vulnerability of the population52. A recent Multiethnic Cohort (MEC) among African American, European American, Japanese American, and Latina American women diagnosed with breast cancer reported the harmful impact of air pollutants on breast cancer survival, and that this association may be confounded by socioeconomic factors64. One American study reported that the worst socio-demographic environmental quality, increased the odds of distant metastatic breast cancer by 10% in non-metro-urbanized counties (OR 1.10; 95% CI 1.00–1.20, P = 0.035)65.

In this study, breast cancer patients data was collected from the Shahid Beheshti University of Medical Sciences Cancer Research Center and oncologists private offices, which included patients from all areas of Tehran. But it did not cover all the patients in Tehran.

Up to our knowledge, this is the largest study to date to examine the association between air pollutants and BC in Iran and the first study to explore the associations between BC severity and single and multiple air pollutants.

This study has the advantage of using WQS inference and the flexible g-computation method which allows to explore the nonlinear and non-additive effects of individual pollutants and their mixture as a whole. Quantile g-computation is able to estimate the parameters of a marginal structural model43. In the grouped weighted quantile sum (GWQS), multiple groups of pollutants are allowed to be included in the GWQS regression model, and the components of the multi-pollutant mixture are allowed to have different magnitudes and directions42.

In this study confounders were controlled for in the analysis; however, additional information on some potentially important individual confounders, such as genetic predisposition of patients for certain cancer types, diet, physical activity, and exposure to indoor pollutants were not available, and this could have led to residual confounding.

Another limitation of this study was that we used exposure data gathered at a single point in time analyzed with LUR models to estimate the concentration of air pollutants in the long run. Nevetheless, the temporal stability of these models for traffic-related air pollution has been shown in studies. Researchers have commented that LUR models were able to provide reliable estimates for a period of 7 years in Vancouver66.

We had no information about the residential history of patients which could have confounded our analyses. Additional studies are needed to determine the effect of relocation on air pollution exposure and the incidence and severity of BC.

Another limitation of this study was that we tested multiple hypothesis with a type I error equal to 0.05, and some of these comparison might have become significant by chance.

Conclusion

In summary, we found substantial evidence that higher air pollutants particularly NO, ethylbenzene, o-xylene, m-xylene, and benzene in outdoor air were associated with increased odds of BC stages at diagnosis. Furthermore, the association between air pollutants and BC severity appeared higher in premenopausal women. Our work has implications for future environmental justice studies investigating the influence of SES on the association between air pollutants and BC. Additionally, more research on this association will improve our understanding of the mechanisms underlying the role of air pollutants on the severity of BC.