34
views
0
recommends
+1 Recommend
0 collections
    0
    shares
      • Record: found
      • Abstract: found
      • Article: found
      Is Open Access

      A systematic review of data mining and machine learning for air pollution epidemiology

      research-article

      Read this article at

      Bookmark
          There is no author summary for this article yet. Authors can add summaries to their articles on ScienceOpen to make them more accessible to a non-specialist audience.

          Abstract

          Background

          Data measuring airborne pollutants, public health and environmental factors are increasingly being stored and merged. These big datasets offer great potential, but also challenge traditional epidemiological methods. This has motivated the exploration of alternative methods to make predictions, find patterns and extract information. To this end, data mining and machine learning algorithms are increasingly being applied to air pollution epidemiology.

          Methods

          We conducted a systematic literature review on the application of data mining and machine learning methods in air pollution epidemiology. We carried out our search process in PubMed, the MEDLINE database and Google Scholar. Research articles applying data mining and machine learning methods to air pollution epidemiology were queried and reviewed.

          Results

          Our search queries resulted in 400 research articles. Our fine-grained analysis employed our inclusion/exclusion criteria to reduce the results to 47 articles, which we separate into three primary areas of interest: 1) source apportionment; 2) forecasting/prediction of air pollution/quality or exposure; and 3) generating hypotheses. Early applications had a preference for artificial neural networks. In more recent work, decision trees, support vector machines, k-means clustering and the APRIORI algorithm have been widely applied. Our survey shows that the majority of the research has been conducted in Europe, China and the USA, and that data mining is becoming an increasingly common tool in environmental health. For potential new directions, we have identified that deep learning and geo-spacial pattern mining are two burgeoning areas of data mining that have good potential for future applications in air pollution epidemiology.

          Conclusions

          We carried out a systematic review identifying the current trends, challenges and new directions to explore in the application of data mining methods to air pollution epidemiology. This work shows that data mining is increasingly being applied in air pollution epidemiology.

          The potential to support air pollution epidemiology continues to grow with advancements in data mining related to temporal and geo-spacial mining, and deep learning. This is further supported by new sensors and storage mediums that enable larger, better quality data. This suggests that many more fruitful applications can be expected in the future.

          Related collections

          Most cited references48

          • Record: found
          • Abstract: not found
          • Book Chapter: not found

          Ensemble Methods in Machine Learning

            Bookmark
            • Record: found
            • Abstract: not found
            • Article: not found

            A quantitative assessment of source contributions to inhalable particulate matter pollution in metropolitan Boston

              Bookmark
              • Record: found
              • Abstract: found
              • Article: not found

              Differential respiratory health effects from the 2008 northern California wildfires: A spatiotemporal approach.

              We investigated health effects associated with fine particulate matter during a long-lived, large wildfire complex in northern California in the summer of 2008. We estimated exposure to PM2.5 for each day using an exposure prediction model created through data-adaptive machine learning methods from a large set of spatiotemporal data sets. We then used Poisson generalized estimating equations to calculate the effect of exposure to 24-hour average PM2.5 on cardiovascular and respiratory hospitalizations and ED visits. We further assessed effect modification by sex, age, and area-level socioeconomic status (SES). We observed a linear increase in risk for asthma hospitalizations (RR=1.07, 95% CI=(1.05, 1.10) per 5µg/m(3) increase) and asthma ED visits (RR=1.06, 95% CI=(1.05, 1.07) per 5µg/m(3) increase) with increasing PM2.5 during the wildfires. ED visits for chronic obstructive pulmonary disease (COPD) were associated with PM2.5 during the fires (RR=1.02 (95% CI=(1.01, 1.04) per 5µg/m(3) increase) and this effect was significantly different from that found before the fires but not after. We did not find consistent effects of wildfire smoke on other health outcomes. The effect of PM2.5 during the wildfire period was more pronounced in women compared to men and in adults, ages 20-64, compared to children and adults 65 or older. We also found some effect modification by area-level median income for respiratory ED visits during the wildfires, with the highest effects observed in the ZIP codes with the lowest median income. Using a novel spatiotemporal exposure model, we found some evidence of differential susceptibility to exposure to wildfire smoke.
                Bookmark

                Author and article information

                Contributors
                cbelling@ualberta.ca
                mohomedj@ualberta.ca
                zaiane@ualberta.ca
                osornio@ualberta.ca
                Journal
                BMC Public Health
                BMC Public Health
                BMC Public Health
                BioMed Central (London )
                1471-2458
                28 November 2017
                28 November 2017
                2017
                : 17
                : 907
                Affiliations
                [1 ]GRID grid.17089.37, Department of Computing Science, , University of Alberta, ; Edmonton, Canada
                [2 ]GRID grid.17089.37, Department of Paediatrics, , University of Alberta, ; Edmonto, Canada
                Author information
                http://orcid.org/0000-0002-3567-7834
                Article
                4914
                10.1186/s12889-017-4914-3
                5704396
                29179711
                d50f7768-2efb-4aee-b2ae-7e9a4db8ad2b
                © The Author(s) 2017

                Open Access This article is distributed under the terms of the Creative Commons Attribution 4.0 International License( http://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution, and reproduction in any medium, provided you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made. The Creative Commons Public Domain Dedication waiver( http://creativecommons.org/publicdomain/zero/1.0/) applies to the data made available in this article, unless otherwise stated.

                History
                : 26 April 2017
                : 14 November 2017
                Funding
                Funded by: FundRef http://dx.doi.org/10.13039/501100000024, Canadian Institutes of Health Research;
                Funded by: FundRef http://dx.doi.org/10.13039/501100000024, Canadian Institutes of Health Research;
                Funded by: FundRef http://dx.doi.org/10.13039/501100000038, Natural Sciences and Engineering Research Council of Canada;
                Funded by: FundRef http://dx.doi.org/10.13039/501100000038, Natural Sciences and Engineering Research Council of Canada;
                Funded by: Alberta Machine Intelligence Institute
                Categories
                Research Article
                Custom metadata
                © The Author(s) 2017

                Public health
                epidemiology,air pollution,exposure,data mining,big data,machine learning,association mining
                Public health
                epidemiology, air pollution, exposure, data mining, big data, machine learning, association mining

                Comments

                Comment on this article