12
views
0
recommends
+1 Recommend
0 collections
    0
    shares
      • Record: found
      • Abstract: found
      • Article: found
      Is Open Access

      Water quality prediction based on sparse dataset using enhanced machine learning

      research-article

      Read this article at

      Bookmark
          There is no author summary for this article yet. Authors can add summaries to their articles on ScienceOpen to make them more accessible to a non-specialist audience.

          Abstract

          Water quality in surface bodies remains a pressing issue worldwide. While some regions have rich water quality data, less attention is given to areas that lack sufficient data. Therefore, it is crucial to explore novel ways of managing source-oriented surface water pollution in scenarios with infrequent data collection such as weekly or monthly. Here we showed sparse-dataset-based prediction of water pollution using machine learning. We investigated the efficacy of a traditional Recurrent Neural Network alongside three Long Short-Term Memory (LSTM) models, integrated with the Load Estimator (LOADEST). The research was conducted at a river-lake confluence, an area with intricate hydrological patterns. We found that the Self-Attentive LSTM (SA-LSTM) model outperformed the other three machine learning models in predicting water quality, achieving Nash-Sutcliffe Efficiency (NSE) scores of 0.71 for COD Mn and 0.57 for NH 3N when utilizing LOADEST-augmented water quality data (referred to as the SA-LSTM-LOADEST model). The SA-LSTM-LOADEST model improved upon the standalone SA-LSTM model by reducing the Root Mean Square Error (RMSE) by 24.6% for COD Mn and 21.3% for NH 3N. Furthermore, the model maintained its predictive accuracy when data collection intervals were extended from weekly to monthly. Additionally, the SA-LSTM-LOADEST model demonstrated the capability to forecast pollution loads up to ten days in advance. This study shows promise for improving water quality modeling in regions with limited monitoring capabilities.

          Graphical abstract

          Highlights

          • Integrated Self-Attention, LSTM, and LOADEST excels with sparse datasets.

          • Our model efficiently predicts daily pollution loads with weekly-monthly data.

          • The model reduces error by 20–30% compared to standalone machine learning.

          Related collections

          Most cited references115

          • Record: found
          • Abstract: found
          • Article: not found

          Deep learning.

          Deep learning allows computational models that are composed of multiple processing layers to learn representations of data with multiple levels of abstraction. These methods have dramatically improved the state-of-the-art in speech recognition, visual object recognition, object detection and many other domains such as drug discovery and genomics. Deep learning discovers intricate structure in large data sets by using the backpropagation algorithm to indicate how a machine should change its internal parameters that are used to compute the representation in each layer from the representation in the previous layer. Deep convolutional nets have brought about breakthroughs in processing images, video, speech and audio, whereas recurrent nets have shone light on sequential data such as text and speech.
            Bookmark
            • Record: found
            • Abstract: found
            • Article: not found

            Long Short-Term Memory

            Learning to store information over extended time intervals by recurrent backpropagation takes a very long time, mostly because of insufficient, decaying error backflow. We briefly review Hochreiter's (1991) analysis of this problem, then address it by introducing a novel, efficient, gradient-based method called long short-term memory (LSTM). Truncating the gradient where this does not do harm, LSTM can learn to bridge minimal time lags in excess of 1000 discrete-time steps by enforcing constant error flow through constant error carousels within special units. Multiplicative gate units learn to open and close access to the constant error flow. LSTM is local in space and time; its computational complexity per time step and weight is O(1). Our experiments with artificial data involve local, distributed, real-valued, and noisy pattern representations. In comparisons with real-time recurrent learning, back propagation through time, recurrent cascade correlation, Elman nets, and neural sequence chunking, LSTM leads to many more successful runs, and learns much faster. LSTM also solves complex, artificial long-time-lag tasks that have never been solved by previous recurrent network algorithms.
              Bookmark
              • Record: found
              • Abstract: found
              • Article: not found

              A new look at the statistical model identification

              IEEE Transactions on Automatic Control, 19(6), 716-723
                Bookmark

                Author and article information

                Contributors
                Journal
                Environ Sci Ecotechnol
                Environ Sci Ecotechnol
                Environmental Science and Ecotechnology
                Elsevier
                2096-9643
                2666-4984
                01 March 2024
                July 2024
                01 March 2024
                : 20
                : 100402
                Affiliations
                [a ]State Key Laboratory of Water Resources Engineering and Management, Wuhan University, Wuhan 430072, China
                [b ]Institute for Water-Carbon Cycles and Carbon Neutrality, Wuhan University, Wuhan 430072, China
                [c ]Department of Civil and Environmental Engineering, National University of Singapore, 117578 Singapore
                [d ]Key Laboratory of Water Cycle and Related Land Surface Processes, Institute of Geographic Sciences and Natural Resources Research, Chinese Academy of Sciences, Beijing 100101, China
                Author notes
                [* ]Corresponding author. State Key Laboratory of Water Resources Engineering and Management, Wuhan University, Wuhan 430072, China. xiajun666@ 123456whu.edu.cn
                [** ]Corresponding author. wangyl@ 123456igsnrr.ac.cn
                Article
                S2666-4984(24)00016-4 100402
                10.1016/j.ese.2024.100402
                10998092
                38585199
                bcefe3f5-df0d-4bf8-b3ac-6957844c47fa
                © 2024 The Authors

                This is an open access article under the CC BY-NC-ND license (http://creativecommons.org/licenses/by-nc-nd/4.0/).

                History
                : 2 March 2023
                : 18 February 2024
                : 19 February 2024
                Categories
                Original Research

                water quality modeling,sparse measurement,river-lake confluence,long short-term memory,load estimator,machine learning

                Comments

                Comment on this article