Coupling mobile phone data with machine learning: How misclassification errors in ambient PM2.5 exposure estimates are produced?

Huagui Guo; Qingming Zhan; Hung Chak Ho; Fei Yao; Xingang Zhou; Jiansheng Wu; Weifeng Li

doi:10.1016/j.scitotenv.2020.141034

Coupling mobile phone data with machine learning: How misclassification errors in ambient PM2.5 exposure estimates are produced?

Sci Total Environ. 2020 Nov 25:745:141034. doi: 10.1016/j.scitotenv.2020.141034. Epub 2020 Jul 18.

Authors

Huagui Guo¹, Qingming Zhan², Hung Chak Ho³, Fei Yao⁴, Xingang Zhou⁵, Jiansheng Wu⁶, Weifeng Li⁷

Affiliations

¹ Department of Urban Planning and Design, The University of Hong Kong, Hong Kong, China; Shenzhen Institute of Research and Innovation, The University of Hong Kong, Shenzhen 518057, PR China. Electronic address: huaguiguo@hku.hk.
² School of Urban Design, Wuhan University, Wuhan 430072, PR China. Electronic address: qmzhan@whu.edu.cn.
³ Department of Urban Planning and Design, The University of Hong Kong, Hong Kong, China. Electronic address: hcho21@hku.hk.
⁴ School of GeoSciences, The University of Edinburgh, Edinburgh EH9 3FF, United Kingdom. Electronic address: Fei.Yao@ed.ac.uk.
⁵ College of Architecture and Urban Planning, Tongji University, Shanghai 200092, PR China. Electronic address: zxg@tongji.edu.cn.
⁶ Key Laboratory for Urban Habitat Environmental Science and Technology, Shenzhen Graduate School, Peking University, Shenzhen 518055, PR China; Key Laboratory for Earth Surface Processes, Ministry of Education, College of Urban and Environmental Sciences, Peking University, Beijing 100871, PR China. Electronic address: wujs@pkusz.edu.cn.
⁷ Department of Urban Planning and Design, The University of Hong Kong, Hong Kong, China; Shenzhen Institute of Research and Innovation, The University of Hong Kong, Shenzhen 518057, PR China. Electronic address: wfli@hku.hk.

PMID: 32758750
DOI: 10.1016/j.scitotenv.2020.141034

Abstract

Background: Most studies relying on time-activity diary or traditional air pollution modelling approach are insufficient to suggest the impacts of ignoring individual mobility and air pollution variations on misclassification errors in exposure estimates. Moreover, very few studies have examined whether such impacts differ across socioeconomic groups.

Objectives: We aim to examine how ignoring individual mobility and PM2.5 variations produces misclassification errors in ambient PM2.5 exposure estimates.

Methods: We developed a geo-informed backward propagation neural network model to estimate hourly PM2.5 concentrations in terms of remote sensing and geospatial big data. Combining the estimated PM2.5 concentrations and individual trajectories derived from 755,468 mobile phone users on a weekday in Shenzhen, China, we estimated four types of individual total PM2.5 exposures during weekdays at multi-temporal scales. The estimate ignoring individual mobility, PM2.5 variations or both was compared with the hypothetical error-free estimate using paired sample t-test. We then quantified the exposure misclassification error using Pearson correlation analysis. Moreover, we examined whether the misclassification error differs across different socioeconomic groups. Taking findings of ignoring individual mobility as an example, we further investigated whether such findings are robust to the different selections of time.

Results: We found that the estimate ignoring PM2.5 variations, individual mobility or both was statistically different from the hypothetical error-free estimate. Ignoring both factors produced the largest exposure misclassification error. The misclassification error was larger in the estimate ignoring PM2.5 variations than that ignoring individual mobility. People with high economic status suffered from a larger exposure misclassification error. The findings were robust to the different selections of time.

Conclusions: Ignoring individual mobility, PM2.5 variations or both leads to misclassification errors in ambient PM2.5 exposure estimates. A larger misclassification error occurs in the estimate neglecting PM2.5 variations than that ignoring individual mobility, which is seldom reported before.

Keywords: Machine learning; Misclassification errors; Mobile phone location data; PM2.5 exposure estimate.