INTRODUCTION

Forest fire smoke is a complex mixture of gases and solids1 that can result in episodes of poor air quality at local, regional, and global scales.2, 3, 4, 5 Exposure to forest fire smoke has been associated with a range of adverse health effects,6 from decreased birth weight7 to premature mortality.8 The most consistent evidence is related to respiratory outcomes including increases in pharmaceutical dispensations,9 physician visits,10 emergency room visits,11, 12 and hospital admissions.13, 14 Over the past decade, the province of British Columbia (BC), Canada has experienced four of the worst wildfire seasons on record,15 partly due to a severe infestation of mountain pine beetle that has left 170,000 km2 of dead timber as fuel.16 This, in combination with more severe fire weather as the global climate changes, means that BC is expecting more intense and frequent fires in the coming decades.17, 18

Public health authorities are aware that forest fire smoke presents a health risk to the populations under their jurisdiction. During the extreme fire season of 2010, many of the BC provincial medical health officers used ad-hoc active surveillance methods to evaluate the health effects of forest fire smoke within their communities, including daily phone calls to local emergency rooms and pharmacies. One mandate of Environmental Health Services at the BC Center for Disease Control (BCCDC) is to support these front-line personnel with relevant information and tools. In a debriefing session following the 2010 fire season, the medical health officers recommended the development of a province-wide system for passive surveillance of the health effects associated with smoke exposure. In 2012, the BCCDC released a pilot version of the BC Asthma Medication Surveillance (BCAMS) product, to detect potential impacts from forest fire smoke.19 The objective of BCAMS is to provide local health authorities with a product they can use to make evidence-based decisions about interventions in their communities. The system reports on daily dispensations of salbutamol sulfate, which is used to relieve acute symptoms of obstructive lung diseases. Previous work found that salbutamol dispensations increased rapidly and markedly during smoke exposure episodes across BC.9 The BCAMS product reports on salbutamol dispensations in 89 geographic local health areas (LHAs) and combines this information with daily PM2.5 measurements, where available. However, less than 50% of the LHAs are covered by the provincial air quality monitoring network, and the 2012 product did not include PM2.5 concentrations for LHAs without monitors. The primary objective of this work was to generate PM2.5 estimates for the vast areas of the province without PM2.5 measurements.

Forest fires in Canada are typically episodic in both space and time, which makes it challenging to estimate population smoke exposure with adequate spatial and temporal resolution. Available tools include air quality monitors, remote sensing observations, physical models of smoke behavior, and information about fire and meteorological conditions. All of these tools have limitations, and none is the universally accepted “gold-standard” for smoke exposure assessment. For example, the air quality monitoring network measures surface PM2.5, but the data are not source-specific, and they have insufficient spatial coverage to capture smoke exposure for the entire population. On the other hand, remote sensing observations cover vast geographic areas, but do not measure surface concentrations and can easily be obscured by cloud. Here we combine the strengths of multiple tools to develop an empirical model for estimation of forest fire smoke-related PM2.5 in areas of BC without air quality monitors. The input data include PM2.5 measurements, remotely sensed aerosols, remotely sensed fires, hand-drawn tracings of smoke plumes from satellite images, fire danger ratings, and the atmospheric venting index. The model is trained on data from severe fire periods, and its performance is tested on low, moderate, and severe fire days by comparing its estimates with monitoring observations. We also develop a method for assigning population-based estimates to each LHA for use in the BCAMS product and for epidemiologic research.

MATERIALS AND METHODS

Study Area and Period

The province of BC (Figure 1) has a population of 4.6 million20 and a total area of 944,735 km2, approximately two-thirds of which is covered by forests. In the last decade, an average of 130,782 km2 has been burned every year.

Figure 1
figure 1

Study area and model estimate base grid. Left panel: location of British Columbia in North America, boundary of our model prediction grid and the boundary within which the sum FRP is calculated to identify days with different intensity of fire activities. Right panel: local health areas in BC (divided by the gray lines), the model training grid cells (where PM2.5 monitors are located) and prediction grid cells.

The study covered fire seasons from 2003 to 2012, and the smoke exposure assessment model was built with data from days with the most severe fire activity. These days were identified using the sum of fire radiative power (FRP) as measured by the Moderate Resolution Spectroradiometer (MODIS) instruments aboard the Aqua and Terra satellites. The FRP is proportional to aerosol emissions,21, 22 such that days with the highest sums of FRP indicate the most smoky conditions. The daily sum of FRP was calculated for the study area (Figure 1) for all days from April to September for the 2003–2012 period. Data from days in the 95th percentile were used to train the statistical model. Data from high, moderate, and low-smoke days were used to evaluate its performance.

Training and Prediction Cells

Given our objective of estimating the population exposure to smoke-related PM2.5, only areas where the population resides were included in the analyses. A grid with a resolution of 5 km × 5 km over the province was created and overlaid with the locations of all BC communities. Grid cells that contained one or more communities and their surrounding eight cells were selected to produce the base grid for the modeling. Herein we refer to the total of 6033 cells as “prediction cells”, and “training cells” refers to the subset of 70 prediction cells that contained at least one surface PM2.5 monitoring station at some time during the study period (Figure 1). The number of training cells varied day-by-day over the study period, because some PM2.5 monitors were added, some were taken offline, and some failed over short- and long-term periods.

Data

The following paragraphs describe the dependent and independent variables used to construct the model.

PM lag0 and PM lag1

We retrieved hourly measurements of PM2.5 concentrations (in μg/m3) from up to 85 monitoring stations from the BC Ministry of Environment website (http://envistaweb.env.gov.bc.ca/). Daily averages were calculated from midnight to midnight, Pacific Standard Time. The current day average (PMlag0) in each available training cell was used as the response variable for the regression model, and the previous day average (PMlag1) was included as one of the potentially predictive variables, based on previous work.23 If there was more than one functional monitoring station in a single training cell on a given training day, the one with the highest PM2.5 concentration was selected to represent the cell. When the final trained model was applied, each prediction cell was assigned the PMlag1 value from the nearest monitoring site, with a median distance of 53 km and maximum distance of 645 km.

AOD

We obtained aerosol optical depth (AOD) at a resolution of 10 km from the National Aeronautics and Space Administration (NASA) (http://ladsweb.nascom.nasa.gov/data/search.html). The AOD is a unitless measure of light absorption and extinction in the atmosphere. Several studies have reported associations between columnar AOD and surface measurements of PM2.5.24, 25 The MODIS instruments overpass most areas of the Earth four times daily, and all data that wholly or partially covered BC were downloaded. Because the AOD measurement for a column can be nullified by the presence of cloud, we assigned the nearest valid AOD value within a 50 km radius, or set the value to null if no valid values were found. A daily average of assigned AOD values was calculated for each training and prediction cell.

FRP

We obtained the FRP also detected by MODIS from the Fire Information Resource management System at NASA. It is one of the attributes recorded for each fire detected by the instruments, indicating the rate of energy emitted from the fire in Gigawatts (GW). Each training and prediction cell was assigned the daily sum of all FRP values within a 100 km radius. Other radii were tested in the exploratory analyses, and the 100 km was found to be most strongly associated with the PM2.5 measurements.

HMS

The Hazard Mapping System (HMS) provides smoke plume shapes hand-drawn by trained analysts at the US National Oceanic and Atmospheric Administration (NOAA), who integrate observations from seven different satellites.26 All plumes observed within one 24-hour period during the daytime were dissolved into a single shape showing areas that had been covered by smoke. Training and prediction cells that had their centers covered by the plume were assigned a value of 1, otherwise a 0. Complete HMS data were not available from 2003 to August 2005, but we included some archived data for the extreme 2003 fire season in the southern interior of BC.10

FDR

We retrieved daily Fire Danger Rating (FDR) values for 268 stations across BC from the provincial Wildfire Management Branch. The FDR is a numeric index between 1 and 5 that indicates the fire risk in an area based on meteorological conditions, fuel availability, fuel moisture, and other indicators.27 The FDR from the nearest station was assigned to the training and prediction cells.

VI

The BC Ministry of Environment and Environment Canada provide daily forecasts of the venting index (VI) at 29 stations across the province. The VI ranges from 1 to 100, indicating the potential for the atmosphere to disperse airborne pollutants, based on the wind speed in the mixed layer and the thickness of the mixed layer. The VI of the nearest station was assigned to the training and prediction cells.

Model Fitting, Selection and Evaluation

All data were processed and analyzed in the R statistical computing environment (R Core Team, Vienna, Austria). First, we used simple linear regression to assess the associations between the PMlag0 response variable and the seven potentially predictive covariates (PMlag1, AOD, FRP, HMS, FDR, and VI) in the training cells. All data were pooled such that each day of data from each available training cell was treated as independent from the others in space and time. These assumptions were checked using mixed-effects models that included random effects for station number and smoke day; no significant changes to the fixed effects of the covariates were observed. Next, multiple linear regressions were fitted with all potentially predictive covariates that were associated with PMlag0 in the simple linear regressions, using a forward stepwise approach to maximize the adjusted R-squared (R2) of the model. Given that1 these models will be used in near-real-time for our surveillance work and2 values for three of the covariates (AOD, HMS, and VI) were/are sometimes missing, we also fitted supplemental models without these variables. In preliminary analyses we found that the highest PMlag0 estimated by the models was 150 μg/m3, and only 12 out of 3305 training observations were in the 150 to 257.8 μg/m3 range. Thus, all observed PMlag0 values over 150 μg/m3 were set to 150 μg/m3 for fitting the final models.

The overall model performance was evaluated using data in time periods with different smoke intensity. The percentiles of the log-normally distributed daily sum of provincial FRP were used to indicate the degree of smokiness. The predicted PMlag0 values were compared with the observed PM2.5 concentrations for high, moderate, and low-smoke days in the 95th, 50th–55th, and 5th percentiles of the FRP distribution, respectively. Because the high-smoke days were used to train the model, we used a leave-one-out approach to evaluate its performance in this range. Data for the high-smoke days came from eight years between 2003 and 2012 (there were no large fires in 2008 and 2011). A model with the variables identified during the selection process was fitted to all of the data excluding those from one year, and then used to make predictions for the excluded year.

Basic model evaluation statistics were calculated for each test to assess the agreement. These included the normalized root mean squared error (NRMSE) and Pearson’s correlation coefficient (r). The NRMSE is a measure of the average difference between predictions and observations, with smaller NRMSE values indicating better prediction accuracy.

Population-Based Exposure Assignment

The BCAMS surveillance system summarizes data aggregated to the LHA geographic units (Figure 1), so we assigned the exposure estimates from the model to each LHA based on the spatial distribution of population within the region. First, estimates were assigned to the geographic centers of dissemination areas (DAs) from the 2006 census, most of which have a population of 400–700 persons.28 Population-weighted averages were calculated using all estimates for the DAs within each LHA boundary, which ranged from 3 to 474 across the LHAs.

RESULTS

A total of 92 high-smoke days were identified from 2003 to 2012 (Table 1). Most of these days occurred in 2003, 2004, 2009, and 2010, which were also recorded as extreme fire seasons by the Wildfire Management Branch.15 Province-wide FRP and PM2.5 concentrations varied across the high, moderate, and low-smoke days (Table 1).

Table 1 Summary of province-wide FRP sums and PM2.5 concentrations on high, moderate, and low-smoke days.

Simple linear regressions between PMlag0 and each of the six potentially predictive covariates all resulted in significant associations at the α=0.05 level. The PMlag1 had the highest variance explained (R2=0.60), followed by AOD (R2=0.23), FRP (R2=0.12), HMS (R2=0.11), VI (R2=0.03) and FDR (R2=0.02).

The best multiple regression model included all candidate variables, except the FDR (Table 2). Similar to results of the simple linear regressions, PMlag1 was the most important variable for explaining the variance in PM2.5, followed by AOD and FRP. The model explained a total of 71% variance in the PMlag0 values. The total number of observations used in the all-variable regression was 2062, with 1048 observations excluded due to missing values mainly in AOD, HMS, and VI. To supplement the final model when missing values occurred, three models excluding one of AOD, HMS or VI, were developed. These models explained 65%, 69%, and 70% of the variance, respectively.

Table 2 Final model summary.

Fitted values for high-smoke days were calculated from the final model where possible, and from the supplemental models where data were missing. When the complete set of estimates was compared with the observed PM2.5 concentrations in the training cells, the correlation was 0.84 and the NRMSE was 55.6%. When estimates omitting one year of data were compared with the observed PM2.5 concentrations on high-smoke days in the omitted year, the mean r and NRMSE values across all tests were 0.70% and 84%, respectively. Better performance was generally observed in years with intense fire seasons (Table 3).

Table 3 Summary of leave-one-out analyses for high-smoke days, omitting one year of data from the training model and using the results to estimate concentrations during the omitted year.

When predicted values for the moderate and low-smoke days were compared with observed values in the training cells, the correlations were 0.59 and 0.57, respectively, and the NRMSE values were 91.8% and 101.7%. The correlations increased and the error decreased with increasing fire intensity. Scatterplots show clear differences in modeled and observed PM2.5 across the smoke day categories (Figure 2).

Figure 2
figure 2

Scatterplots of model estimates against monitor observations.

One day was selected from each of the high, moderate, and low-fire-smoke groups as an example to show the final output of model estimates across the prediction cells (Figure 3a). Uncontrolled fires were burning in the north and central interior on the high-smoke day (sum FPR=67.0GW) and the mean estimate was 8.3 μg/m3, with a range from 2.0 to 38.7 μg/m3. Some small fires were observed in the northeast part of the province on the moderate-smoke day (sum FRP=3.7GW) and the mean estimate was 5.6 μg/m3, with a range from 2.2 to 14.3 μg/m3. No fire was observed on the low-smoke day (sum FRP=0) and the mean estimate across the province was 4.2 μg/m3, with a range from 1.6 to 11.6 μg/m3. The DA population-weighted average of estimates for these three days was assigned to each of the LHAs (Figure 3b). The mean (range) of estimates for the high, moderate, and low-smoke days for all LHAs was 6.1 (2.1–31.4), 6.0 (2.8–12.9) and 4.1 (2.0–5.9) μg/m3, respectively.

Figure 3
figure 3

(a) Examples of model estimate outputs. (b) Model estimates assigned to local health areas.

DISCUSSION

Our objective was to model smoke-related PM2.5 concentrations in populated areas of BC that are not covered by the air quality monitoring network. We constructed a linear regression model using multiple sources of data, including PMlag1 monitor measurements, remotely sensed fire radiative power, aerosol optical depth, smoke plume images, and a venting index that indicated pollutant dispersion potential. The final model explained 71% of the variance in the current day PM2.5 measurements. Comparison between model estimates and monitor observations suggested good agreement on days with different degrees of smokiness, with overall correlations from 0.57 to 0.84 and NRMSE values less than 102%. The results from the leave-one-out approach indicated better model performance during intense forest fire seasons.

Our model performs well compared with other existing PM2.5 models optimized for forest fire smoke. Price et al.23 developed an empirical model for Sydney and Perth, Australia, to estimate smoke-related PM levels based on FRP, fire danger, measured PM2.5, and meteorological variables. The best models explained 56% and 31% of the variance in PM2.5 concentrations at Sydney and Perth, respectively. In addition, our model performance was comparable with that of physical dispersion models. Smoke-related PM10 (PM<10 micrometer in aerodynamic diameter) estimated from the CALPUFF model in Henderson et al.29 had a mean correlation of 0.60 with measured concentrations, and evaluation of smoke forecasting systems in Canada and Europe30, 31 generally had correlations from 0.29 to 0.5 with PM2.5 observations.

One key difference between our work and that by Price et al. is that we developed an empirical model for a vast region rather than a single site. In this model we assumed spatial independence in the observed data from different sites. To evaluate this assumption, we calculated the Moran’s I value32 for the mean regression residuals at the different monitor locations. No significant spatial autocorrelation was found at the α=0.05 level (I=0.07, P=0.11), suggesting that most of the spatial pattern in the PM2.5 measurements was explained by the model variables. One key similarity between our work and that of Price et al. was the importance of PMlag1 for the PMlag0 model. Although this is not a concern for the training cells (all of which contained monitoring stations), many of the predictions cells are distant from those monitoring stations (Figure 1). The median distance between the prediction grids and their nearest monitor was 53 km, with an interquartile range of 22 to 104 km, and a maximum of 645 km. We chose to retain the PMlag0 variable despite this limitation because it allows the models to reflect the known state of province-wide air quality, as observed with a relatively dense monitoring network. Although estimates for prediction cells at greater distances from monitoring stations are likely less accurate than those in closer proximity, inclusion of the other variables offers an improvement over no data, or use of the nearest PM2.5 observation alone. On the basis of the training data, our model explains 71% of the variance, a model with PMlag1 alone explains 60% of the variance, and a model without PMlag1 explains 42% of the variance. To further evaluate the model, it would be ideal to have field measurements in areas far from monitors. However, our primary objective was to estimate population-weighted exposure at the LHA level using DA centroids, and the median distance between the DA population centroids and the nearest monitor was only 5 km, with an interquartile range of 3 to 11 km, so model estimates for grids far removed from monitors did not have much weight in the final results. Overall, our model without the PMlag1 is comparable with other empirical and physical models of forest fire smoke. This suggests that our methods could be used for similar model construction in regions with smaller monitoring networks.

One advantage of our model over smoke forecasting tools is that it captures smoke originating from outside of the modeling domain. In July 2012, smoke from the massive forest fires in Siberia was transported across the Pacific to BC, resulting in province-wide air quality degradation for several days, starting on 7 July (Figure 4). According to our model estimates, the Siberian smoke lingered until 13 July, which is confirmed using news and BCAMS reports from the period.19

Figure 4
figure 4

Model estimates for fire smoke transported from Siberia in July 2012.

Although our model is optimized to capture the impacts of forest fire smoke, the inclusion of variables that are not specific to fire (PMlag0, AOD, and VI) means that it will capture the impacts of other sources. This is not a concern for the BCAMS application because any increase in PM2.5 has public health relevance and may trigger increased salbutamol dispensations. However, this limitation must be considered when using model estimates for epidemiologic studies specific to the health impacts of forest fire smoke. One strategy would be to limit epidemiologic analyses to smoky days based on the sum of FRP, as previously done by Elliott et al.9 Alternately, an empirical model that is even more specific to fire smoke could be developed by adding in further fire-relevant variables, such as concentrations forecast by physical models. The BlueSky Western Canada Wildfire Smoke Forecasting System has been operational since 2010.33 It produces PM2.5 forecasts at a 10 km resolution by inputting smoke emission estimates and meteorological parameters to the HYSPLIT dispersion model, and these estimates have been successfully correlated with population health indicators in BC.31 During the summer of 2013 the BCAMS product included a side-by-side plot of smoke information from our model, BlueSky, and provincial PM2.5 monitors. However, we did not include data from BlueSky in our model because BlueSky is early in its Canadian development, and data are often missing when technical changes are implemented (Figure 5). Furthermore, our surveillance algorithm and epidemiologic studies use data that start in 2003, and BlueSky has only been operational since 2010, which includes only one intense fire season. As the record of BlueSky estimates gets longer and more stable, we will explore their integration into our empirical model. By incorporating BlueSky and some other data sources of hourly information such as the AOD measurements from the Geostationary Operational Environmental Satellite (GOES), it is also possible to refine the model to estimate hourly PM2.5.

Figure 5
figure 5

Example of the side-by-side plot of smoke information in the 2013 British Columbia Asthma Medication Surveillance (BCAMS) product. Blue line indicates PM2.5 measurements from monitor within the local health area, red line and green line indicate the population-weighted average estimates from the model in this study and forecasts from the BlueSky system, respectively.

With our methods to assign model estimates by LHA, we are able to provide improved information on forest fire smoke exposure in the 2013 version of the BCAMS product (Figure 5). In addition, with model estimates of a much finer spatial resolution than many other exposure assessment tools, we are able to conduct epidemiologic studies with health outcome data aggregated at a much finer spatial resolution, such as residential postal codes.

In conclusion, we have developed a smoke-optimized empirical model that estimates PM2.5 concentrations by combining multiple data sources that reflect aerosol measurements, fire information, and atmospheric conditions. Model estimates agreed with PM2.5 observations on high, moderate, and low-smoke days. The spatial resolution and geographic coverage of the model offers improvements over exposure estimates from the air quality monitoring network alone. Its simplicity allows easy application in time-constrained public health surveillance activities, and its coverage of unmonitored areas suggests utility as an exposure assessment tool for epidemiologic studies to further investigate the health effects of forest fire smoke exposure.