Water quality prediction based on sparse dataset using enhanced machine learning

There is no author summary for this article yet. Authors can add summaries to their articles on ScienceOpen to make them more accessible to a non-specialist audience.

Abstract

Water quality in surface bodies remains a pressing issue worldwide. While some regions have rich water quality data, less attention is given to areas that lack sufficient data. Therefore, it is crucial to explore novel ways of managing source-oriented surface water pollution in scenarios with infrequent data collection such as weekly or monthly. Here we showed sparse-dataset-based prediction of water pollution using machine learning. We investigated the efficacy of a traditional Recurrent Neural Network alongside three Long Short-Term Memory (LSTM) models, integrated with the Load Estimator (LOADEST). The research was conducted at a river-lake confluence, an area with intricate hydrological patterns. We found that the Self-Attentive LSTM (SA-LSTM) model outperformed the other three machine learning models in predicting water quality, achieving Nash-Sutcliffe Efficiency (NSE) scores of 0.71 for COD _Mn and 0.57 for NH ₃N when utilizing LOADEST-augmented water quality data (referred to as the SA-LSTM-LOADEST model). The SA-LSTM-LOADEST model improved upon the standalone SA-LSTM model by reducing the Root Mean Square Error (RMSE) by 24.6% for COD _Mn and 21.3% for NH ₃N. Furthermore, the model maintained its predictive accuracy when data collection intervals were extended from weekly to monthly. Additionally, the SA-LSTM-LOADEST model demonstrated the capability to forecast pollution loads up to ten days in advance. This study shows promise for improving water quality modeling in regions with limited monitoring capabilities.

Graphical abstract

Highlights

•

Integrated Self-Attention, LSTM, and LOADEST excels with sparse datasets.
•

Our model efficiently predicts daily pollution loads with weekly-monthly data.
•

The model reduces error by 20–30% compared to standalone machine learning.

Related collections

Most cited references 115

Record: found
Abstract: found
Article: not found

Deep learning.

Yann LeCun, Yoshua Bengio, Geoffrey E Hinton (2015)

Deep learning allows computational models that are composed of multiple processing layers to learn representations of data with multiple levels of abstraction. These methods have dramatically improved the state-of-the-art in speech recognition, visual object recognition, object detection and many other domains such as drug discovery and genomics. Deep learning discovers intricate structure in large data sets by using the backpropagation algorithm to indicate how a machine should change its internal parameters that are used to compute the representation in each layer from the representation in the previous layer. Deep convolutional nets have brought about breakthroughs in processing images, video, speech and audio, whereas recurrent nets have shone light on sequential data such as text and speech.

0 comments Cited 10074 times – based on 0 reviews      Review now

Bookmark

Record: found
Abstract: found
Article: not found

Long Short-Term Memory

Jürgen Schmidhuber, Jürgen Schmidhuber (2003)

Learning to store information over extended time intervals by recurrent backpropagation takes a very long time, mostly because of insufficient, decaying error backflow. We briefly review Hochreiter's (1991) analysis of this problem, then address it by introducing a novel, efficient, gradient-based method called long short-term memory (LSTM). Truncating the gradient where this does not do harm, LSTM can learn to bridge minimal time lags in excess of 1000 discrete-time steps by enforcing constant error flow through constant error carousels within special units. Multiplicative gate units learn to open and close access to the constant error flow. LSTM is local in space and time; its computational complexity per time step and weight is O(1). Our experiments with artificial data involve local, distributed, real-valued, and noisy pattern representations. In comparisons with real-time recurrent learning, back propagation through time, recurrent cascade correlation, Elman nets, and neural sequence chunking, LSTM leads to many more successful runs, and learns much faster. LSTM also solves complex, artificial long-time-lag tasks that have never been solved by previous recurrent network algorithms.

0 comments Cited 7846 times – based on 0 reviews      Review now

Bookmark

Record: found
Abstract: found
Article: not found

A new look at the statistical model identification

H Akaike, Francisco Casacuberta (1975)

IEEE Transactions on Automatic Control, 19(6), 716-723

0 comments Cited 3413 times – based on 0 reviews      Review now

Bookmark

All references

Author and article information

Contributors

Jun Xia

Yueling Wang

Journal

Journal ID (nlm-ta): Environ Sci Ecotechnol

Journal ID (iso-abbrev): Environ Sci Ecotechnol

Title: Environmental Science and Ecotechnology

Publisher: Elsevier

ISSN (Print): 2096-9643

ISSN (Electronic): 2666-4984

Publication date PMC-release: 01 March 2024

Publication date Collection: July 2024

Publication date (Electronic): 01 March 2024

Volume: 20

Electronic Location Identifier: 100402

Affiliations

[a ]State Key Laboratory of Water Resources Engineering and Management, Wuhan University, Wuhan 430072, China

[b ]Institute for Water-Carbon Cycles and Carbon Neutrality, Wuhan University, Wuhan 430072, China

[c ]Department of Civil and Environmental Engineering, National University of Singapore, 117578 Singapore

[d ]Key Laboratory of Water Cycle and Related Land Surface Processes, Institute of Geographic Sciences and Natural Resources Research, Chinese Academy of Sciences, Beijing 100101, China

Author notes

[* ]Corresponding author. State Key Laboratory of Water Resources Engineering and Management, Wuhan University, Wuhan 430072, China. xiajun666@ 123456whu.edu.cn

[** ]Corresponding author. wangyl@ 123456igsnrr.ac.cn

Article

Publisher Item ID: S2666-4984(24)00016-4 Publisher ID: 100402

DOI: 10.1016/j.ese.2024.100402

PMC ID: 10998092

PubMed ID: 38585199

SO-VID: bcefe3f5-df0d-4bf8-b3ac-6957844c47fa

License:

This is an open access article under the CC BY-NC-ND license (http://creativecommons.org/licenses/by-nc-nd/4.0/).

History

Date received : 2 March 2023

Date revision received : 18 February 2024

Date accepted : 19 February 2024

Comments

Comment on this article

scite_

Smart Citations

Citing PublicationsSupportingMentioningContrasting

View Citations

See how this article has been cited at scite.ai

scite shows how a scientific paper has been cited by providing the context of the citation, a classification describing whether it supports, mentions, or contrasts the cited claim, and a label indicating in which section the citation was made.

Cited by 2

See all cited by

Most referenced authors 1,812

See all reference authors

Water quality prediction based on sparse dataset using enhanced machine learning

Read this article at

Abstract

Graphical abstract

Highlights

Related collections

Annual Reviews AI, Machine Learning, and Society

Most cited references 115

Deep learning.

Long Short-Term Memory

A new look at the statistical model identification

Author and article information

Contributors

Journal

Affiliations

Author notes

Article

History

Categories

Comments

Comment on this article

Similar content 119

Cited by 2

Most referenced authors 1,812