2
views
0
recommends
+1 Recommend
0 collections
    0
    shares
      • Record: found
      • Abstract: found
      • Article: found
      Is Open Access

      Using Geospatial Data and Random Forest To Predict PFAS Contamination in Fish Tissue in the Columbia River Basin, United States

      research-article

      Read this article at

          There is no author summary for this article yet. Authors can add summaries to their articles on ScienceOpen to make them more accessible to a non-specialist audience.

          Abstract

          Decision makers in the Columbia River Basin (CRB) are currently challenged with identifying and characterizing the extent of per- and polyfluoroalkyl substances (PFAS) contamination and human exposure to PFAS. This work aims to develop and pilot a methodology to help decision makers target and prioritize sampling investigations and identify contaminated natural resources. Here we use random forest models to predict ∑PFAS in fish tissue; understanding PFAS levels in fish is particularly important in the CRB because fish can be a major component of tribal and indigenous people diet. Geospatial data, including land cover and distances to known or potential PFAS sources and industries, were leveraged as predictors for modeling. Models were developed and evaluated for Washington state and Oregon using limited available empirical data. Mapped predictions show several areas where detectable concentrations of PFAS in fish tissue are predicted to occur, but prior sampling has not yet confirmed. Variable importance is analyzed to identify potentially important sources of PFAS in fish in this region. The cost-effective methodologies demonstrated here can help address sparsity of existing PFAS occurrence data in environmental media in this and other regions while also giving insights into potentially important drivers and sources of PFAS in fish.

          Abstract

          High-exposure risk populations and data-limited regions can use geospatial data and random forest models to efficiently identify potential hotspots of PFAS in environmental media like fish to target their sampling and remediation efforts.

          Related collections

          Most cited references35

          • Record: found
          • Abstract: not found
          • Article: not found

          Random Forests

            Bookmark
            • Record: found
            • Abstract: not found
            • Article: not found

            Greedy function approximation: A gradient boosting machine.

              Bookmark
              • Record: found
              • Abstract: found
              • Article: found
              Is Open Access

              Bias in random forest variable importance measures: Illustrations, sources and a solution

              Background Variable importance measures for random forests have been receiving increased attention as a means of variable selection in many classification tasks in bioinformatics and related scientific fields, for instance to select a subset of genetic markers relevant for the prediction of a certain disease. We show that random forest variable importance measures are a sensible means for variable selection in many applications, but are not reliable in situations where potential predictor variables vary in their scale of measurement or their number of categories. This is particularly important in genomics and computational biology, where predictors often include variables of different types, for example when predictors include both sequence data and continuous variables such as folding energy, or when amino acid sequence data show different numbers of categories. Results Simulation studies are presented illustrating that, when random forest variable importance measures are used with data of varying types, the results are misleading because suboptimal predictor variables may be artificially preferred in variable selection. The two mechanisms underlying this deficiency are biased variable selection in the individual classification trees used to build the random forest on one hand, and effects induced by bootstrap sampling with replacement on the other hand. Conclusion We propose to employ an alternative implementation of random forests, that provides unbiased variable selection in the individual classification trees. When this method is applied using subsampling without replacement, the resulting variable importance measures can be used reliably for variable selection even in situations where the potential predictor variables vary in their scale of measurement or their number of categories. The usage of both random forest algorithms and their variable importance measures in the R system for statistical computing is illustrated and documented thoroughly in an application re-analyzing data from a study on RNA editing. Therefore the suggested method can be applied straightforwardly by scientists in bioinformatics research.
                Bookmark

                Author and article information

                Journal
                Environ Sci Technol
                Environ Sci Technol
                es
                esthag
                Environmental Science & Technology
                American Chemical Society
                0013-936X
                1520-5851
                05 September 2023
                19 September 2023
                : 57
                : 37
                : 14024-14035
                Affiliations
                []Center for Public Health and Environmental Assessment, Office of Research and Development, U.S. Environmental Protection Agency , Research Triangle Park, North Carolina 27709, United States
                []Region 08, Water Division, U.S. Environmental Protection Agency , Helena, Montana 59626, United States
                Author notes
                Author information
                https://orcid.org/0000-0001-6206-7890
                https://orcid.org/0000-0002-7696-0900
                https://orcid.org/0000-0002-9650-3483
                Article
                10.1021/acs.est.3c03670
                10515492
                37669088
                f4b778b8-b60d-4e12-8a6e-7a8fad832d19
                Not subject to U.S. Copyright. Published 2023 by American Chemical Society

                Permits the broadest form of re-use including for commercial purposes, provided that author attribution and integrity are maintained ( https://creativecommons.org/licenses/by/4.0/).

                History
                : 17 May 2023
                : 09 August 2023
                : 08 August 2023
                Categories
                Article
                Custom metadata
                es3c03670
                es3c03670

                General environmental science
                variable importance,industry,land cover,sources,washington,oregon,tribes
                General environmental science
                variable importance, industry, land cover, sources, washington, oregon, tribes

                Comments

                Comment on this article