Introduction

Air pollution is the world’s largest environmental health risk that accounts for 6.7 million deaths every year with 4.2 million deaths globally in 2019 due to exposure to atmospheric particulate matter that causes cardiovascular and respiratory diseases, and cancers according to recent estimates by the World Health Organisation (WHO)1. The greatest atmospheric (outdoor) air pollution-related deaths were in the Southeast Asia region. The impact of air pollution on human health is of growing concern due to increasing exposure to air pollution, with almost all (99%) of the world’s population in 2019 living in areas where the air pollution levels exceed the safe WHO air quality guideline (AQG) limits2. There is also concern about the implications of air pollution on the climate and ecosystem around the world3,4. Therefore, concerted action is urgently needed to reduce air pollution to protect the populations from health risks and mitigate climate change.

Effective and sustainable air quality management strategies for cleaner air need to be implemented to address the global air pollution emergency. Research relating to air pollution has been receiving remarkable interest due to the urgency of understanding the influence of air pollution and its trends5. An air quality forecasting model could be an important tool in providing estimates and future predictions of air pollutant concentrations for policymakers to develop legislation and policies to reduce air pollution as well as to alert the public when air pollution events are expected6. Two common methods used in air quality forecasting are statistical modelling and chemical transport modelling, which can be based on meteorological conditions to account for atmospheric dilution and diffusion capacity5,7. Statistical models are suitable for describing site-specific associations between air pollutants and meteorological parameters, they are easier, faster and more accurate than the chemical transport models8 and no costly emission inventories and computer resources are required9. However, statistical models are highly dependent on the time series data and they require a large amount of historical data5.

A popular statistical method uses machine learning models10. Lasheras et al. (2020) built and analyze different statistical and machine learning PM10 forecasting models based on the concentrations of six air pollutants: PM10, sulfur dioxide (SO2), nitrogen monoxide (NO), nitrogen dioxide (NO2), carbon monoxide (CO) and ozone (O3) as input variables11. For instance, deep learning models (a subset of machine learning) such as artificial neural networks (ANNs) tend to have higher accuracy than statistical models but they are unstable and have a high dependence on data5. On the other hand, the chemical transport models explicitly describe all major physico-chemical and meteorological processes associated with air pollution12. The drawbacks of chemical transport models are inaccuracies in describing the physico-chemical processes due to inadequate information on pollutant sources and they are not able to accurately predict extreme events, the spatio-temporal variation of air pollutants and time series for short and medium ranges9.

The atmospheric air quality in Brunei Darussalam, a Southeast Asian country, is usually considered clean despite being seasonally affected by the transboundary smoke haze, in which the atmosphere contains high concentrations of particulate matter (PM), especially PM equal to or smaller than 10 µm in diameter (PM10) that can penetrate deep into the lungs. The highest daily PM10 concentration observed across Brunei Darussalam was 100.90 μg/m3 (moderate air quality) in September 2019 during the south-west (SW) monsoon season, which was caused by the transboundary smoke haze from hotspots in the Borneo region13. Although various approaches have been proposed for air pollution forecasting, statistical/empirical models for predicting the concentrations of atmospheric PM in Brunei-Muara district from time and meteorological inputs are yet to be developed.

For this reason, the present study aims to analyze the temporal variations and associations of PM10 concentrations and meteorological parameters, and develop predictive forecasting models for atmospheric PM10 concentration in Brunei-Muara district that accounts for changes in meteorology over time. The objectives of the study are:

  1. 1.

    to determine the correlation between PM10 and the individual meteorological parameters (such as temperature, wind speed, wind direction and total rainfall) in different monsoon seasons as well as the combined effects of different rain and wind conditions on PM10 to examine their contribution to atmospheric PM10 pollution;

  2. 2.

    to build PM10 predictive forecasting models using four modelling approaches (such as multiple linear regression (MLR), random forest (RF), extreme gradient boosting (XGBoost) and artificial neural network (ANN)) from the available PM10 concentration and meteorological data;

  3. 3.

    to add the previous 1, 2 or 3 days’ PM10 concentration to the model to enhance the model’s predictive power in explaining variability14;

  4. 4.

    to validate and test the models for predicting and forecasting the next 1, 2 or 3 days’ PM10 concentration of the studied area; and

  5. 5.

    to evaluate the models’ performances in explaining the daily variability of PM10 concentration.

The presented results will provide a better understanding of the effect of meteorological parameters on atmospheric PM10 in Brunei-Muara district. To the best of the author’s knowledge, this study is the first to explore and quantify the combined effects of different rain and wind conditions on PM10 concentrations in Brunei-Muara district, which can be helpful to policymakers when developing air pollution mitigation measures or assessing the effectiveness of those measures for the studied area. The study also intends to demonstrate the significant potential of the proposed PM10 predictive forecasting models in producing more accurate PM10 predictions that can provide early information against unhealthy atmospheric PM10 concentrations to the local community so that the necessary precautionary measures can be taken to minimize exposure to air pollution to protect public health.

Study area and data

Brunei-Muara district is located in the northeast of Brunei Darussalam, bordering the South China Sea and Labuan (Malaysia) to the north, Brunei Bay to the east, Limbang, Sarawak (Malaysia) to the south and the Bruneian district of Tutong to the west15 (Fig. 1). It is the most populated district in Brunei Darussalam, with 318,530 people (as of 2021), containing about 72% of the country’s population although it has the smallest area (570 km2) among the four districts of Brunei Darussalam15,16. Brunei-Muara is where the country’s capital (Bandar Seri Begawan), the seat of government ministries and departmental headquarters, and the center of commercial activities are located15. It also houses the country’s only international airport (Brunei International Airport) and main deep-water port (Muara Port).

Figure 1
figure 1

Locations of PM10 and meteorological monitoring stations in Brunei-Muara district, Brunei Darussalam.

Daily average PM10 concentrations (μg/m3) and meteorological data measured in Brunei-Muara district from 2009 to 2019 (11 years) were obtained from the Department of Environment, Parks and Recreation (JASTRe), Ministry of Development, Brunei Darussalam and the Brunei Darussalam Meteorological Department (BDMD), Ministry of Transport and Infocommunications, Brunei Darussalam, respectively. The meteorological parameters include the daily average temperature (°C), wind speed (m/s) and wind direction (°), and daily total rainfall (mm). The time parameters are the observations’ day, month and year. The data are classified into four monsoon seasons, namely: north-east (NE) monsoon (from December to March), inter-monsoon 1 (from April to May), south-west (SW) monsoon (from June to September) and inter-monsoon 2 (from October to November).

Methods

This study employed XLSTAT software for statistical analysis and modelling. The statistical analysis includes descriptive statistics (such as minimum, maximum, mean and standard deviation) of daily average PM10 concentrations and meteorological parameters during different seasons in Brunei-Muara district from 2009 to 2019. To explore the impact of meteorological parameters on PM10 pollution in different seasons, Pearson correlation tests were performed between daily average concentrations of PM10 (\(y\)-variable) and daily average temperature, wind speed and wind direction, and daily total rainfall (\(x\)-variables) that were recorded from 2009 to 2019, with the seasons as the subsamples. The Pearson correlation coefficient \(r\) was calculated using:

$$r = \frac{{n\left( {\sum xy} \right) - \left( {\sum x} \right)\left( {\sum y} \right)}}{{\sqrt {\left[ {n\sum x^{2} - \left( {\sum x} \right)^{2} } \right]\left[ {n\sum y^{2} - \left( {\sum y} \right)^{2} } \right]} }}, - 1 < r < 1$$

where \(n\) is the number of observations, \(x\) is the value of the \(x\)-variable in the sample and \(y\) is the value of the \(y\)-variable in the sample. A Pearson correlation coefficient represents the degree of the linear correlation between two variables. The correlations were also tested with a significance level of 5%. The computed coefficients of determination R2 correspond to the squared of the Pearson correlation coefficients, which measure the strength of the correlation.

Data from 2009 to 2018 (10 years) were first modelled using the MLR approach to describe the variance of the time and meteorological parameters on the daily variation of PM10 concentrations for seven scenarios. For Scenario 1/Baseline (i.e., PM10 model with time and meteorological parameters only), the explanatory variables were the time parameters (that include the observations’ day, month and year) and the meteorological parameters (that include the daily temperature, wind speed, wind direction and total rainfall). For Scenarios 2 to 4 (i.e., PM10 models with the previous day’s PM10, the average of the previous 2 days’ PM10 and the average of the previous 3 days’ PM10; which are denoted by PM10,t-1, PM10,t-2 and PM10,t-3, respectively), the explanatory variables were all the time and meteorological parameters and the PM10,t-1, PM10,t-2 or PM10,t-3 variable. For Scenarios 5 to 7 (i.e., PM10 models for the next 1 day, 2 days and 3 days; which are denoted by PM10,t+1, PM10,t+2 and PM10,t+3, respectively), the explanatory variables were all the time and meteorological parameters, the corresponding PM10 concentrations and the PM10,t-1 variable. Variables that might be either constant or too correlated with other variables used in the model were not taken into account by the model and the model’s tolerance value was 0.0001. The interactions in the model were set at 3 and the best model was selected based on the lowest mean square errors (MSE). The minimum variables for all MLR models were set at 2 and the maximum variables were set at 7 for Scenario 1/Baseline, 8 for Scenarios 2 to 4, and 9 for Scenarios 5 to 7. In this study, 90% of the data were used for model learning and 10% of the data were randomly selected for model validation. Once the best model was built, it was tested to make predictions on the 2019 data (1 year).

Then, the RF approach was applied to build more efficient predictive regression models for daily PM10 concentrations by generating several predictors and then combining their respective predictions. Data from 2009 to 2018 (10 years) were used to develop random forest PM10 models for the seven scenarios. The explanatory variables used in the RF models for Scenarios 1 to 7 were the same as those used in the MLR method. For the forest parameters, the sampling method was random with replacement and the forest type was bagging with 90% of the data used to generate the trees. The number of trees in the forest was set at 300 and the maximum allowable construction time of all trees in the forest was set at 300 s with a convergence of the machine learning algorithm set at every 100 trees. For the tree parameters, the minimum node size was 7 for Scenario 1, 8 for Scenarios 2 to 4) and 9 for Scenarios 5 to 7, respectively). The minimum son size was set at 2 and the maximum tree depth was set at 20 with a complexity parameter value of 0.0001 for all the models. All the random forest PM10 models were validated with 10% of the data, which were randomly selected, and tested to make predictions on the 2019 data (1 year).

Next, an XGBoost approach was employed to build combined boosted ensemble regression models for daily PM10 concentrations prediction based on the available data from 2009 to 2018 (10 years) for the seven scenarios. The explanatory variables used in the XGBoost models for Scenarios 1 to 7 were the same as those used in the MLR method. The maximum number of iterations of the model was 50 and the learning rate was 0.3 with zero minimum loss reduction. The objective/loss function was quadratic and the metric of the loss function was the root mean square error (RMSE). For the tree parameters, the minimum son size was 2 and the maximum tree depth was 6. When the PM10 models for Scenarios 1 to were built, they were validated with 10% of the data (selected at random) and then tested to make predictions on the 2019 data (1 year).

Lastly, an ANN approach was employed to build more complex and efficient predictive regression models for daily PM10 concentrations for the seven scenarios. The ANN model consists of an input layer (explanatory variables for Scenarios 1 to 7), two hidden layers and an output layer (PM10 variable for Scenario 1 to 4, PM10,t+1 variable for Scenario 5, PM10,t+2 variable for Scenario 6 and PM10,t+3 variable for Scenario 7) with a group of interconnected nodes (artificial neurons). The explanatory variables used in the ANN models for Scenarios 1 to 7 were the same as those used in the MLR method. The neuralnet function in XLSTAT-R was used, which calls the neuralnet function from the neuralnet package in R developed by Stefan Fritsch and Frauke Guenther (2022)17. Data from 2009 to 2019 (11 years) were rescaled and randomly split into a training sample (80% of the data), a validation sample (10% of the data) and a test sample (10% of the data). The number of neurons in the hidden layers was 2,2 (that corresponds to 2 neurons in the first hidden layer and 2 neurons in the second hidden layer) for Scenario 1, 3,2 for Scenarios 2 to 4, and 4,2 for Scenarios 5 to 7. The threshold value was 0.01 and the maximum steps were 100,000. The algorithm RProp + was chosen, which refers to resilient backpropagation with weight backtracking. The error function was squared errors and the activation function was logistic with linear output.

The models’ error and accuracy between the predicted and observed values for Scenarios 1 to 7 were evaluated by several performance metrics, which include the RMSE, R2, mean absolute percentage error (MAPE), Willmott’s index of agreement (WIA) and Legates and McCabe index (LCI). The best model was chosen when it has a minimal error (i.e., RMSE close to 0 and/or MAPE close to 0%) and it has high accuracy (i.e., R2, WIA and/or LCI close to 1).

Results and discussion

Table 1 summarizes the seasonal characteristics of PM10 pollution and meteorological conditions in Brunei-Muara district from 2009 to 2019. In this time period, only 1.1% of the daily average PM10 concentrations were greater than the 2006 WHO Global AQG limit for daily average PM10 concentration (i.e., 50 μg/m3)18 and only 1.4% of the daily average PM10 concentrations were greater than the 2021 WHO Global AQG limit for daily average PM10 concentration (i.e., 45 μg/m3)19, mostly during the SW monsoon season in June, August and September, and occasionally continued during the inter-monsoon 2 season in October (Fig. 2a). The maximum daily average concentration of PM10 observed in Brunei-Muara was 97.4 μg/m3 in September 2019, with a seasonal mean concentration of PM10 of 17.9 μg/m3 and a standard deviation of 12.2 μg/m3 during the SW monsoon season (Table 1).

Table 1 Statistical summary of seasonal PM10 pollution and meteorological conditions in Brunei-Muara district from 2009 to 2019.
Figure 2
figure 2

Monthly variations of PM10 concentrations (a), temperature (b), wind speed (c), wind direction (d) and total rainfall (e), and monthly frequencies of the classified wind types (f), wind (cardinal) directions (g) and rainfall types (h) in Brunei-Muara district from 2009 to 2019.

Brunei-Muara district has a tropical climate and it is usually very warm and wet throughout the year. From 2009 to 2019, the daily average temperatures ranged between 23.2 °C and 31.0 °C, with the hottest day recorded in May 2014 (Table 1 and Fig. 2b). The daily wind speed in Brunei-Muara can be as low as 1.1 m/s (as recorded during the Inter-monsoon 2 season in November 2017) and it can reach as high as 7.1 m/s (as recorded during the NE monsoon season in February 2016) (Fig. 2c). The wind types experienced in Brunei-Muara from 2009 to 2019 were either light air (0.3–1.5 m/s), light breeze (1.6–3.3 m/s), gentle breeze (3.4–5.4 m/s) or moderate breeze (5.5–7.9 m/s) (Fig. 2f) based on the wind classifications described by the World Meteorological Organization (WMO)20. Usually, the wind blows as a light breeze throughout the year (Fig. 2f), with monthly mean wind speeds ranging from 2.20 m/s to 2.74 m/s (Fig. 2c). The prevailing winds were from the north-northeast (NNE) direction throughout the year (Fig. 2g), with mean wind degree directions between 18.8° and 21.3° (Fig. 2d).

The highest daily total rainfall was 275.0 mm in September 2019 (Table 1 and Fig. 2e). The rainfall types that occurred in Brunei-Muara from 2009 to 2019 were either no rain (0 mm/day), drizzle (0.1–19.9 mm/day), light rain (20–59.9 mm/day), moderate rain (60.0–239.9 mm/day) or heavy rain (240.0–1,199.9 mm/day) (Fig. 2h) according to the rainfall classifications recommended by the WMO20. Drizzle was the most frequent type of rainfall (44.9% of the total observations) that often occurs during the NE monsoon (December to March) and both inter-monsoon (April to May and October to November) seasons. The dry season usually occurs during the SW monsoon season (June to September), with 43.9% of the observations in this season without any rain.

The Pearson correlation coefficients \(r\) between PM10 concentrations and the selected meteorological parameters for Brunei-Muara district from 2009 to 2019 were computed for different monsoon seasons (Table 2). PM10 was positively and moderately correlated (0.30 \(<r<\) 0.49) with temperature during NE monsoon, inter-monsoon 1 and SW monsoon seasons (December to September), and it is positively and weakly correlated (\(r<\) 0.29) during the inter-monsoon 2 season (October to November). This implies that PM10 absorbs sunlight and warms the Earth’s atmosphere21.

Table 2 Pearson correlation coefficients between PM10 concentrations and meteorological parameters during different monsoon seasons in Brunei-Muara district from 2009 to 2019.

Between PM10 and wind speed, a weak positive correlation (\(r<\) 0.29) was observed during the NE monsoon season (December to March) and weak negative correlations (\(r<-\) 0.29) were observed during SW monsoon and both inter-monsoon seasons (April to November). On the other hand, a weak negative correlation was observed between PM10 and wind direction during NE monsoon, and weak positive correlations were observed during SW monsoon and both inter-monsoon seasons. This implies that the atmospheric PM10 concentrations can be diluted and diffused by particle dispersion and/or transported to a greater height or to a nearby area when the wind speed increases22,23 and blows from the NNE direction, generally during SW monsoon and both inter-monsoon seasons (April to November). However, during the NE monsoon season (December to March), more PM10 in the area can be blown away at reduced wind speed when the wind blows from the NNE and N directions.

The correlation between PM10 and total rainfall was negative and weak in all the seasons, indicating that more PM10 in the atmosphere can be washed away when the rain gets heavier24. All the coefficients between PM10 and the selected meteorological parameters, and most of the coefficients between the selected meteorological parameters were statistically significant (p-values \(<\) 0.0001) at a 0.05 significant level (values with an asterisk * in Table 2). The associations between PM10 and each of the selected meteorological parameters during different monsoon seasons are illustrated in Fig. 3.

Figure 3
figure 3

Scatter plots of daily PM10 concentrations against temperature (a–d), wind speed (e–h), wind direction (i–l) and total rainfall (m–p) during different monsoon seasons in Brunei-Muara district from 2009 to 2019. R2 is the coefficient of determination that corresponds to the squared of the Pearson correlation coefficient \(r\).

Figure 4 shows the changes in mean daily PM10 concentration (from the previous day) for different rain and wind conditions in Brunei-Muara district from 2009 to 2019. When there was no rain in the day, the mean daily PM10 concentration was increased by only 5% (minimum) during a gentle breeze and by 15% (maximum) during light air (Fig. 4a), suggesting that PM10 in the atmosphere can be blown away as the wind speed increases. A drizzling day with a gentle breeze can reduce the mean daily PM10 concentration by only 6% (Fig. 4b), about 2% more than those during light and moderate rain with a light breeze (Fig. 4a). When the wind speed was increased from light breeze to gentle breeze during light rain, the mean daily PM10 concentration was reduced 2.5 times more (i.e., from 1.7% to 4.3%) (Fig. 4a). From 2009 to 2019, moderate rain with light air in Brunei-Muara district reduced the mean daily PM10 concentrations by 7% from the previous day’s PM10 concentration, the highest among the different rain and wind conditions observed in the area (Fig. 4a).

Figure 4
figure 4

Combined effects of different rain and wind conditions on mean daily PM10 concentrations for Brunei-Muara district from 2009 to 2019 (a–b).

Four modelling approaches (that include MLR, RF, XGBoost and ANN) were trained, validated and tested for predicting daily PM10 concentrations in Brunei-Muara district in seven scenarios. The first/baseline scenario was PM10 predictive models with time and meteorological parameters only. The models’ performances are presented in Table 3, which showed very low but acceptable accuracy with a good agreement between the predicted and observed daily PM10 concentrations, in which the best results were exhibited by XGBoost during models’ training (RMSE = 2.739 µg/m3, R2 = 0.898, MAPE = 16.67%, WIA = 0.844 and LCI = 0.687) and validation (RMSE = 4.645 µg/m3, R2 = 0.593, MAPE = 27.97%, WIA = 0.706 and LCI = 0.411), and ANN during model testing (RMSE = 7.612 µg/m3, R2 = 0.243, MAPE = 40.20%, WIA = 0.602 and LCI = 0.204). This indicates that some daily variability of PM10 in the studied area is not captured by the model. For Scenario 1, the most important variable in predicting PM10 concentration is the year (with the highest relative contribution of about 30% to the XGBoost model and 29% to the RF model among the variables), suggesting that there is a link between PM10 concentration and the year.

Table 3 Performances of models.

The second to fourth scenarios were PM10 predictive models with the previous 1, 2 or 3 days’ PM10 concentration (PM10,t-1, PM10,t-2 and PM10,t-3, respectively) added to the model. As shown in Table 3, the predictive power of the PM10 model in explaining variability tends to increase when the previous day’s PM10 concentration (also called the PM10 lag effect) was added to the MLR, RF, XGBoost and ANN models. For example, the MLR PM10 model with PM10,t-1 variable added to the model can explain more variability (by 64% on the training sample, 60% on the validation sample and 92% on the test sample) than that with the time and meteorological parameters only (i.e., Scenario 1). For Scenario 2, this MLR model appears to be a very well fit and acceptably accurate with very good agreement between the predicted and observed daily PM10 concentrations with RMSE = 1.549 µg/m3, R2 = 0.984, MAPE = 5.77%, WIA = 0.924 and LCI = 0.847 during model testing, outperforming the RF, XGBoost and ANN models (Table 3), which was in contrast to previous studies by other researchers25,26. This could be due to the differences in location/topography and meteorological conditions of the present study area from those of other studies, resulting in varying dispersal of air pollutants. The variability of PM10 of the best MLR model for Scenario 2 was explained by eight interaction variables, which were: (1) (temperature × PM10,t-1), (2) (month × year × PM10,t-1), (3) (month × wind direction × PM10,t-1), (4) (total rainfall × wind direction × PM10,t-1), (5) (wind speed × wind direction × PM10,t-1), (6) (temperature × wind direction × PM10,t-1), (7) (total rainfall × wind speed × wind direction), (8) (total rainfall × wind speed × PM10,t-1). The interaction variables (1) to (5) provide significant information in explaining the variability of PM10 for Scenario 2.

The most influential variable of the PM10 models with PM10,t-1, PM10,t-2 or PM10,t-3 variables added to the model was the interaction variables (temperature × PM10,t-1), (temperature × PM10,t-2) and (temperature × PM10,t-3), respectively. Figure 5a–d illustrates the effect of adding PM10,t-1, PM10,t-2 or PM10,t-3 variable to the ANN PM10 model during model testing. For all four modelling approaches, it was seen that the models’ performances during training, validation and testing tend to decrease as the number of lag days of the PM10 concentration increases (due to increasing errors) and the best model performance was achieved when PM10,t-1 variable was added to the model. This indicates that the addition of PM10,t-1 variable to the model is sufficient to produce a reliable PM10 hindcasting capability.

Figure 5
figure 5

Scatter plots of predicted daily PM10 concentrations against observed daily PM10 concentrations for the artificial neural network (ANN) PM10 models without and with the previous 1, 2 or 3 days’ PM10 lag effect (a–d), the ANN PM10,t+1, PM10,t+2 model and PM10,t+3 models with the previous day’s PM10 lag effect (e–g) on the test sample for Brunei-Muara district from 2009 to 2019.

The fifth and seventh scenarios were PM10 predictive forecasting models for the next 1, 2 or 3 days (PM10,t+1, PM10,t+2 and PM10,t+3, respectively) with PM10,t-1 variable added to the model. The models’ performances in Table 3 tend to show some reductions in the predictive power of the PM10 model in explaining variability for the next 1, 2 or 3 days (PM10,t+1, PM10,t+2 and PM10,t+3, respectively) when compared with the second scenario (i.e., PM10 model with PM10,t-1 variable added to the model). For example, the MLR PM10,t+1 model explain lesser variability (by 0.5% on the training sample, 0.8% on the validation sample and 17.6% on the test sample) than the MLR PM10 model with PM10,t-1 variable added to the model. The RF PM10,t+1 model appears to be a good fit and agreement between the predicted and observed daily PM10 concentrations despite having a low but acceptable accuracy during model testing with RMSE = 5.094 µg/m3, R2 = 0.822, MAPE = 14.85%, WIA = 0.790 and LCI = 0.580, outperforming the MLR, XGBoost and ANN PM10,t+1 models (Table 3). This indicates the capability of the random forest models to forecast PM10 concentrations for the next day. The most important variable in forecasting the next 2 and 3 days’ PM10 concentrations is the PM10 variable (with the highest relative contribution of 50% to the RF PM10,t+1 model among the variables).

For all four modelling approaches, the predictive power in explaining variability during training, validation and testing decreases as the prediction days were increased (due to increasing errors in the model). The ANN PM10,t+2 and PM10,t+3 models were considered to have better model performances with a good agreement between the predicted and observed daily PM10 concentrations despite having a very low but acceptable accuracy (RMSE = 5.107 µg/m3, R2 = 0.603, MAPE = 24.96%, WIA = 0.741 and LCI = 0.482 for ANN PM10,t+2 model and RMSE = 6.657 µg/m3, R2 = 0.504, MAPE = 32.69%, WIA = 0.697 and LCI = 0.395 for ANN PM10,t+3 model) during model testing than the MLR, RF and XGBoost PM10,t+2 and PM10,t+3 models (Table 3). The performances of the ANN PM10,t+1, PM10,t+2 and PM10,t+3 models with PM10,t-1 variable added to the model during model testing are illustrated in Fig. 5e–g, which show a minor increase (1%) in the explanatory power of the ANN PM10,t+2 model and a small decrease (9%) in the explanatory power of the ANN PM10,t+3 model when compared to the ANN PM10,t+1 models.

Conclusions

Temporal variations of PM10 concentrations and meteorological parameters (that include the daily temperature, wind speed, wind direction and total rainfall) in Brunei-Muara district from 2009 to 2019 were examined in this study. The Pearson correlation analysis showed that PM10 increases with increasing atmospheric temperature. More PM10 can be blown away if the wind speed increases and the wind blows from the NNE direction. Heavier rain can also wash away more PM10 in the atmosphere, thus improving the air quality in the studied area. Observations on the combined effects of rain and wind conditions in Brunei-Muara district from 2009 to 2019 revealed that moderate rain with light air reduced the most PM10 pollution in the area, with a 7% reduction in the mean daily PM10 concentrations from the previous day’s PM10 concentration. The MLR PM10 models, particularly with the previous day’s PM10 lag effect (PM10,t-1), can be used to predict daily PM10 concentrations more accurately than the RF, XGBoost and ANN PM10 models, provided that the meteorological conditions are known. Meanwhile, the RF PM10,t+1 model with PM10,t-1 variable added to the model showed more accurate forecasts for the next day’s PM10 concentration and the ANN PM10,t+1 model with PM10,t-1 variable added to the model showed more accurate forecasts for the next 2 and 3 days’ PM10 concentrations. This research can provide a method for predicting PM10 concentrations for the studied area where PM10 concentration data are not available. Due to the rapid climate change, it was recommended to improve the PM10 predictive forecasting models’ ability to capture greater daily variability of PM10 through the inclusion of meteorological and/or PM10 concentrations data in Brunei-Muara district beyond 2018 on the models in future studies when available.