Various approaches have been proposed to model PM 2.5 in the recent decade, with
satellite-derived aerosol optical depth, land-use variables, chemical transport model
predictions, and several meteorological variables as major predictor variables. Our
study used an ensemble model that integrated multiple machine learning algorithms
and predictor variables to estimate daily PM 2.5 at a resolution of 1 km×1 km across
the contiguous United States. We used a generalized additive model that accounted
for geographic difference to combine PM 2.5 estimates from neural network, random
forest, and gradient boosting. The three machine learning algorithms were based on
multiple predictor variables, including satellite data, meteorological variables,
land-use variables, elevation, chemical transport model predictions, several reanalysis
datasets, and others. The model training results from 2000 to 2015 indicated good
model performance with a 10-fold cross-validated R 2 of 0.86 for daily PM 2.5 predictions.
For annual PM 2.5 estimates, the cross-validated R 2 was 0.89. Our model demonstrated
good performance up to 60 μg/m 3 . Using trained PM 2.5 model and predictor variables,
we predicted daily PM 2.5 from 2000 to 2015 at every 1 km×1 km grid cell in the contiguous
United States. We also used localized land-use variables within 1 km×1 km grids to
downscale PM 2.5 predictions to 100 m × 100 m grid cells. To characterize uncertainty,
we used meteorological variables, land-use variables, and elevation to model the monthly
standard deviation of the difference between daily monitored and predicted PM 2.5
for every 1 km×1 km grid cell. This PM 2.5 prediction dataset, including the downscaled
and uncertainty predictions, allows epidemiologists to accurately estimate the adverse
health effect of PM 2.5 . Compared with model performance of individual base learners,
an ensemble model would achieve a better overall estimation. It is worth exploring
other ensemble model formats to synthesize estimations from different models or from
different groups to improve overall performance.