0
views
0
recommends
+1 Recommend
0 collections
    0
    shares
      • Record: found
      • Abstract: found
      • Article: found
      Is Open Access

      Predicting dengue transmission rates by comparing different machine learning models with vector indices and meteorological data

      research-article

      Read this article at

      Bookmark
          There is no author summary for this article yet. Authors can add summaries to their articles on ScienceOpen to make them more accessible to a non-specialist audience.

          Abstract

          Machine learning algorithms (ML) are receiving a lot of attention in the development of predictive models for monitoring dengue transmission rates. Previous work has focused only on specific weather variables and algorithms, and there is still a need for a model that uses more variables and algorithms that have higher performance. In this study, we use vector indices and meteorological data as predictors to develop the ML models. We trained and validated seven ML algorithms, including an ensemble ML method, and compared their performance using the receiver operating characteristic (ROC) with the area under the curve (AUC), accuracy and F1 score. Our results show that an ensemble ML such as XG Boost, AdaBoost and Random Forest perform better than the logistics regression, Naïve Bayens, decision tree, and support vector machine (SVM), with XGBoost having the highest AUC, accuracy and F1 score. Analysis of the importance of the variables showed that the container index was the least important. By removing this variable, the ML models improved their performance by at least 6% in AUC and F1 score. Our result provides a framework for future studies on the use of predictive models in the development of an early warning system.

          Related collections

          Most cited references28

          • Record: found
          • Abstract: found
          • Article: not found

          The global distribution and burden of dengue

          Dengue is a systemic viral infection transmitted between humans by Aedes mosquitoes 1 . For some patients dengue is a life-threatening illness 2 . There are currently no licensed vaccines or specific therapeutics, and substantial vector control efforts have not stopped its rapid emergence and global spread 3 . The contemporary worldwide distribution of the risk of dengue virus infection 4 and its public health burden are poorly known 2,5 . Here we undertake an exhaustive assembly of known records of dengue occurrence worldwide, and use a formal modelling framework to map the global distribution of dengue risk. We then pair the resulting risk map with detailed longitudinal information from dengue cohort studies and population surfaces to infer the public health burden of dengue in 2010. We predict dengue to be ubiquitous throughout the tropics, with local spatial variations in risk influenced strongly by rainfall, temperature and the degree of urbanisation. Using cartographic approaches, we estimate there to be 390 million (95 percent credible interval 284-528) dengue infections per year, of which 96 million (67-136) manifest apparently (any level of clinical or sub-clinical severity). This infection total is more than three times the dengue burden estimate of the World Health Organization 2 . Stratification of our estimates by country allows comparison with national dengue reporting, after taking into account the probability of an apparent infection being formally reported. The most notable differences are discussed. These new risk maps and infection estimates provide novel insights into the global, regional and national public health burden imposed by dengue. We anticipate that they will provide a starting point for a wider discussion about the global impact of this disease and will help guide improvements in disease control strategies using vaccine, drug and vector control methods and in their economic evaluation. [285]
            Bookmark
            • Record: found
            • Abstract: found
            • Article: found
            Is Open Access

            The advantages of the Matthews correlation coefficient (MCC) over F1 score and accuracy in binary classification evaluation

            Background To evaluate binary classifications and their confusion matrices, scientific researchers can employ several statistical rates, accordingly to the goal of the experiment they are investigating. Despite being a crucial issue in machine learning, no widespread consensus has been reached on a unified elective chosen measure yet. Accuracy and F1 score computed on confusion matrices have been (and still are) among the most popular adopted metrics in binary classification tasks. However, these statistical measures can dangerously show overoptimistic inflated results, especially on imbalanced datasets. Results The Matthews correlation coefficient (MCC), instead, is a more reliable statistical rate which produces a high score only if the prediction obtained good results in all of the four confusion matrix categories (true positives, false negatives, true negatives, and false positives), proportionally both to the size of positive elements and the size of negative elements in the dataset. Conclusions In this article, we show how MCC produces a more informative and truthful score in evaluating binary classifications than accuracy and F1 score, by first explaining the mathematical properties, and then the asset of MCC in six synthetic use cases and in a real genomics scenario. We believe that the Matthews correlation coefficient should be preferred to accuracy and F1 score in evaluating binary classification tasks by all scientific communities.
              Bookmark
              • Record: found
              • Abstract: found
              • Article: found
              Is Open Access

              Comparing different supervised machine learning algorithms for disease prediction

              Background Supervised machine learning algorithms have been a dominant method in the data mining field. Disease prediction using health data has recently shown a potential application area for these methods. This study ai7ms to identify the key trends among different types of supervised machine learning algorithms, and their performance and usage for disease risk prediction. Methods In this study, extensive research efforts were made to identify those studies that applied more than one supervised machine learning algorithm on single disease prediction. Two databases (i.e., Scopus and PubMed) were searched for different types of search items. Thus, we selected 48 articles in total for the comparison among variants supervised machine learning algorithms for disease prediction. Results We found that the Support Vector Machine (SVM) algorithm is applied most frequently (in 29 studies) followed by the Naïve Bayes algorithm (in 23 studies). However, the Random Forest (RF) algorithm showed superior accuracy comparatively. Of the 17 studies where it was applied, RF showed the highest accuracy in 9 of them, i.e., 53%. This was followed by SVM which topped in 41% of the studies it was considered. Conclusion This study provides a wide overview of the relative performance of different variants of supervised machine learning algorithms for disease prediction. This important information of relative performance can be used to aid researchers in the selection of an appropriate supervised machine learning algorithm for their studies.
                Bookmark

                Author and article information

                Contributors
                songquan.ong@ums.edu.my
                Journal
                Sci Rep
                Sci Rep
                Scientific Reports
                Nature Publishing Group UK (London )
                2045-2322
                5 November 2023
                5 November 2023
                2023
                : 13
                : 19129
                Affiliations
                [1 ]Entomology Laboratory, Institute for Tropical Biology and Conservation, Universiti Malaysia Sabah, ( https://ror.org/040v70252) Jalan UMS, 88400 Kota Kinabalu, Sabah Malaysia
                [2 ]Faculty of Computer and Mathematical Sciences, Universiti Teknologi MARA, Perak Branch, ( https://ror.org/05n8tts92) Tapah Campus, 35400 Tapah, Malaysia
                [3 ]Centre for Communicable Diseases Research, Institute for Public Health, National Institutes of Health, Ministry of Health, ( https://ror.org/05ddxe180) Shah Alam, Malaysia
                [4 ]Entomology and Pest Unit, Federal Territory of Kuala Lumpur and Putrajaya Health Department, Jalan Cenderasari, 50590 Kuala Lumpur, Malaysia
                [5 ]Phytochemistry Unit, Herbal Medicine Research Centre, Institute for Medical Research, National Health Institute, ( https://ror.org/03bpc5f92) Setia Alam, Malaysia
                [6 ]School of Electrical and Electronics Engineering, Universiti Sains Malaysia, ( https://ror.org/02rgb2k63) Penang, Malaysia
                Article
                46342
                10.1038/s41598-023-46342-2
                10625978
                37926755
                bc50fe89-4dca-48d6-b39f-602ab55d64bf
                © The Author(s) 2023

                Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article's Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article's Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/.

                History
                : 29 April 2023
                : 31 October 2023
                Categories
                Article
                Custom metadata
                © Springer Nature Limited 2023

                Uncategorized
                computational models,data mining,machine learning,infectious diseases
                Uncategorized
                computational models, data mining, machine learning, infectious diseases

                Comments

                Comment on this article