Predicting dengue transmission rates by comparing different machine learning models with vector indices and meteorological data

There is no author summary for this article yet. Authors can add summaries to their articles on ScienceOpen to make them more accessible to a non-specialist audience.

Abstract

Machine learning algorithms (ML) are receiving a lot of attention in the development of predictive models for monitoring dengue transmission rates. Previous work has focused only on specific weather variables and algorithms, and there is still a need for a model that uses more variables and algorithms that have higher performance. In this study, we use vector indices and meteorological data as predictors to develop the ML models. We trained and validated seven ML algorithms, including an ensemble ML method, and compared their performance using the receiver operating characteristic (ROC) with the area under the curve (AUC), accuracy and F1 score. Our results show that an ensemble ML such as XG Boost, AdaBoost and Random Forest perform better than the logistics regression, Naïve Bayens, decision tree, and support vector machine (SVM), with XGBoost having the highest AUC, accuracy and F1 score. Analysis of the importance of the variables showed that the container index was the least important. By removing this variable, the ML models improved their performance by at least 6% in AUC and F1 score. Our result provides a framework for future studies on the use of predictive models in the development of an early warning system.

Related collections

Most cited references 28

Record: found
Abstract: found
Article: not found

The global distribution and burden of dengue

Samir Bhatt, Peter W. Gething, Oliver J Brady … (2013)

Dengue is a systemic viral infection transmitted between humans by Aedes mosquitoes 1 . For some patients dengue is a life-threatening illness 2 . There are currently no licensed vaccines or specific therapeutics, and substantial vector control efforts have not stopped its rapid emergence and global spread 3 . The contemporary worldwide distribution of the risk of dengue virus infection 4 and its public health burden are poorly known 2,5 . Here we undertake an exhaustive assembly of known records of dengue occurrence worldwide, and use a formal modelling framework to map the global distribution of dengue risk. We then pair the resulting risk map with detailed longitudinal information from dengue cohort studies and population surfaces to infer the public health burden of dengue in 2010. We predict dengue to be ubiquitous throughout the tropics, with local spatial variations in risk influenced strongly by rainfall, temperature and the degree of urbanisation. Using cartographic approaches, we estimate there to be 390 million (95 percent credible interval 284-528) dengue infections per year, of which 96 million (67-136) manifest apparently (any level of clinical or sub-clinical severity). This infection total is more than three times the dengue burden estimate of the World Health Organization 2 . Stratification of our estimates by country allows comparison with national dengue reporting, after taking into account the probability of an apparent infection being formally reported. The most notable differences are discussed. These new risk maps and infection estimates provide novel insights into the global, regional and national public health burden imposed by dengue. We anticipate that they will provide a starting point for a wider discussion about the global impact of this disease and will help guide improvements in disease control strategies using vaccine, drug and vector control methods and in their economic evaluation. [285]

0 comments Cited 1527 times – based on 0 reviews      Review now

Bookmark

Record: found
Abstract: found
Article: found

Is Open Access

The advantages of the Matthews correlation coefficient (MCC) over F1 score and accuracy in binary classification evaluation

Davide Chicco, Giuseppe Jurman (2020)

Background To evaluate binary classifications and their confusion matrices, scientific researchers can employ several statistical rates, accordingly to the goal of the experiment they are investigating. Despite being a crucial issue in machine learning, no widespread consensus has been reached on a unified elective chosen measure yet. Accuracy and F1 score computed on confusion matrices have been (and still are) among the most popular adopted metrics in binary classification tasks. However, these statistical measures can dangerously show overoptimistic inflated results, especially on imbalanced datasets. Results The Matthews correlation coefficient (MCC), instead, is a more reliable statistical rate which produces a high score only if the prediction obtained good results in all of the four confusion matrix categories (true positives, false negatives, true negatives, and false positives), proportionally both to the size of positive elements and the size of negative elements in the dataset. Conclusions In this article, we show how MCC produces a more informative and truthful score in evaluating binary classifications than accuracy and F1 score, by first explaining the mathematical properties, and then the asset of MCC in six synthetic use cases and in a real genomics scenario. We believe that the Matthews correlation coefficient should be preferred to accuracy and F1 score in evaluating binary classification tasks by all scientific communities.

0 comments Cited 844 times – based on 0 reviews      Review now

Bookmark

Record: found
Abstract: found
Article: found

Is Open Access

Comparing different supervised machine learning algorithms for disease prediction

Shahadat Uddin, Arif Khan, Md Ekramul Hossain … (2019)

Background Supervised machine learning algorithms have been a dominant method in the data mining field. Disease prediction using health data has recently shown a potential application area for these methods. This study ai7ms to identify the key trends among different types of supervised machine learning algorithms, and their performance and usage for disease risk prediction. Methods In this study, extensive research efforts were made to identify those studies that applied more than one supervised machine learning algorithm on single disease prediction. Two databases (i.e., Scopus and PubMed) were searched for different types of search items. Thus, we selected 48 articles in total for the comparison among variants supervised machine learning algorithms for disease prediction. Results We found that the Support Vector Machine (SVM) algorithm is applied most frequently (in 29 studies) followed by the Naïve Bayes algorithm (in 23 studies). However, the Random Forest (RF) algorithm showed superior accuracy comparatively. Of the 17 studies where it was applied, RF showed the highest accuracy in 9 of them, i.e., 53%. This was followed by SVM which topped in 41% of the studies it was considered. Conclusion This study provides a wide overview of the relative performance of different variants of supervised machine learning algorithms for disease prediction. This important information of relative performance can be used to aid researchers in the selection of an appropriate supervised machine learning algorithm for their studies.

0 comments Cited 280 times – based on 0 reviews      Review now

Bookmark

All references

Author and article information

Contributors

Song Quan Ong: songquan.ong@ums.edu.my

Journal

Journal ID (nlm-ta): Sci Rep

Journal ID (iso-abbrev): Sci Rep

Title: Scientific Reports

Publisher: Nature Publishing Group UK (London )

ISSN (Electronic): 2045-2322

Publication date (Electronic): 5 November 2023

Publication date PMC-release: 5 November 2023

Publication date Collection: 2023

Volume: 13

Electronic Location Identifier: 19129

Affiliations

[1 ]Entomology Laboratory, Institute for Tropical Biology and Conservation, Universiti Malaysia Sabah, ( https://ror.org/040v70252) Jalan UMS, 88400 Kota Kinabalu, Sabah Malaysia

[2 ]Faculty of Computer and Mathematical Sciences, Universiti Teknologi MARA, Perak Branch, ( https://ror.org/05n8tts92) Tapah Campus, 35400 Tapah, Malaysia

[3 ]Centre for Communicable Diseases Research, Institute for Public Health, National Institutes of Health, Ministry of Health, ( https://ror.org/05ddxe180) Shah Alam, Malaysia

[4 ]Entomology and Pest Unit, Federal Territory of Kuala Lumpur and Putrajaya Health Department, Jalan Cenderasari, 50590 Kuala Lumpur, Malaysia

[5 ]Phytochemistry Unit, Herbal Medicine Research Centre, Institute for Medical Research, National Health Institute, ( https://ror.org/03bpc5f92) Setia Alam, Malaysia

[6 ]School of Electrical and Electronics Engineering, Universiti Sains Malaysia, ( https://ror.org/02rgb2k63) Penang, Malaysia

Article

Publisher ID: 46342

DOI: 10.1038/s41598-023-46342-2

PMC ID: 10625978

PubMed ID: 37926755

SO-VID: bc50fe89-4dca-48d6-b39f-602ab55d64bf

License:

Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article's Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article's Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/.

History

Date received : 29 April 2023

Date accepted : 31 October 2023

Custom metadata

ScienceOpen disciplines: Uncategorized

Keywords: computational models,data mining,machine learning,infectious diseases

Data availability:

ScienceOpen disciplines: Uncategorized

Keywords: computational models, data mining, machine learning, infectious diseases

Comments

Comment on this article

scite_

Cited by 1

Enabling countries to manage outbreaks: statistical, operational, and contextual analysis of the early warning and response system (EWARS-csd) for dengue outbreaks
Authors: Mikaela Schlesinger, Franklyn Edwin Prieto Alvarado, Milena Edith Borbón Ramos …

See all cited by

Most referenced authors 1,959

See all reference authors

Predicting dengue transmission rates by comparing different machine learning models with vector indices and meteorological data

Read this article at

Abstract

Related collections

Annual Reviews AI, Machine Learning, and Society

Most cited references 28

The global distribution and burden of dengue

The advantages of the Matthews correlation coefficient (MCC) over F1 score and accuracy in binary classification evaluation

Comparing different supervised machine learning algorithms for disease prediction

Author and article information

Contributors

Journal

Affiliations

Article

History

Categories

Custom metadata

Comments

Comment on this article

Similar content 87

Cited by 1

Most referenced authors 1,959