Coupling mobile phone data with machine learning: How misclassification errors in ambient PM2.5 exposure estimates are produced?

Sci Total Environ. 2020 Nov 25:745:141034. doi: 10.1016/j.scitotenv.2020.141034. Epub 2020 Jul 18.

Abstract

Background: Most studies relying on time-activity diary or traditional air pollution modelling approach are insufficient to suggest the impacts of ignoring individual mobility and air pollution variations on misclassification errors in exposure estimates. Moreover, very few studies have examined whether such impacts differ across socioeconomic groups.

Objectives: We aim to examine how ignoring individual mobility and PM2.5 variations produces misclassification errors in ambient PM2.5 exposure estimates.

Methods: We developed a geo-informed backward propagation neural network model to estimate hourly PM2.5 concentrations in terms of remote sensing and geospatial big data. Combining the estimated PM2.5 concentrations and individual trajectories derived from 755,468 mobile phone users on a weekday in Shenzhen, China, we estimated four types of individual total PM2.5 exposures during weekdays at multi-temporal scales. The estimate ignoring individual mobility, PM2.5 variations or both was compared with the hypothetical error-free estimate using paired sample t-test. We then quantified the exposure misclassification error using Pearson correlation analysis. Moreover, we examined whether the misclassification error differs across different socioeconomic groups. Taking findings of ignoring individual mobility as an example, we further investigated whether such findings are robust to the different selections of time.

Results: We found that the estimate ignoring PM2.5 variations, individual mobility or both was statistically different from the hypothetical error-free estimate. Ignoring both factors produced the largest exposure misclassification error. The misclassification error was larger in the estimate ignoring PM2.5 variations than that ignoring individual mobility. People with high economic status suffered from a larger exposure misclassification error. The findings were robust to the different selections of time.

Conclusions: Ignoring individual mobility, PM2.5 variations or both leads to misclassification errors in ambient PM2.5 exposure estimates. A larger misclassification error occurs in the estimate neglecting PM2.5 variations than that ignoring individual mobility, which is seldom reported before.

Keywords: Machine learning; Misclassification errors; Mobile phone location data; PM2.5 exposure estimate.