32
views
0
recommends
+1 Recommend
0 collections
    0
    shares
      • Record: found
      • Abstract: found
      • Article: found
      Is Open Access

      Accurate Diabetes Risk Stratification Using Machine Learning: Role of Missing Value and Outliers

      research-article

      Read this article at

      Bookmark
          There is no author summary for this article yet. Authors can add summaries to their articles on ScienceOpen to make them more accessible to a non-specialist audience.

          Abstract

          Diabetes mellitus is a group of metabolic diseases in which blood sugar levels are too high. About 8.8% of the world was diabetic in 2017. It is projected that this will reach nearly 10% by 2045. The major challenge is that when machine learning-based classifiers are applied to such data sets for risk stratification, leads to lower performance. Thus, our objective is to develop an optimized and robust machine learning (ML) system under the assumption that missing values or outliers if replaced by a median configuration will yield higher risk stratification accuracy. This ML-based risk stratification is designed, optimized and evaluated, where: (i) the features are extracted and optimized from the six feature selection techniques (random forest, logistic regression, mutual information, principal component analysis, analysis of variance, and Fisher discriminant ratio) and combined with ten different types of classifiers (linear discriminant analysis, quadratic discriminant analysis, naïve Bayes, Gaussian process classification, support vector machine, artificial neural network, Adaboost, logistic regression, decision tree, and random forest) under the hypothesis that both missing values and outliers when replaced by computed medians will improve the risk stratification accuracy. Pima Indian diabetic dataset (768 patients: 268 diabetic and 500 controls) was used. Our results demonstrate that on replacing the missing values and outliers by group median and median values, respectively and further using the combination of random forest feature selection and random forest classification technique yields an accuracy, sensitivity, specificity, positive predictive value, negative predictive value and area under the curve as: 92.26%, 95.96%, 79.72%, 91.14%, 91.20%, and 0.93, respectively. This is an improvement of 10% over previously developed techniques published in literature. The system was validated for its stability and reliability. RF-based model showed the best performance when outliers are replaced by median values.

          Electronic supplementary material

          The online version of this article (10.1007/s10916-018-0940-7) contains supplementary material, which is available to authorized users.

          Related collections

          Most cited references31

          • Record: found
          • Abstract: not found
          • Article: not found

          Detecting outliers: Do not use standard deviation around the mean, use absolute deviation around the median

            Bookmark
            • Record: found
            • Abstract: not found
            • Article: not found

            Geometrical and Statistical Properties of Systems of Linear Inequalities with Applications in Pattern Recognition

              Bookmark
              • Record: found
              • Abstract: not found
              • Article: not found

              The global implications of diabetes and cancer.

                Bookmark

                Author and article information

                Contributors
                (916)-749-5628 , jasjit.suri@atheropoint.com
                Journal
                J Med Syst
                J Med Syst
                Journal of Medical Systems
                Springer US (New York )
                0148-5598
                1573-689X
                10 April 2018
                10 April 2018
                2018
                : 42
                : 5
                : 92
                Affiliations
                [1 ]ISNI 0000 0004 0451 7306, GRID grid.412656.2, Department of Statistics, , University of Rajshahi, ; Rajshahi, Bangladesh
                [2 ]The JiVitA Project of Johns Hopkins University, Gaibandha, Bangladesh
                [3 ]GRID grid.443086.d, Department of Computer Science and Engineering, , Rajshahi University of Engineering and Technology, ; Rajshahi, Bangladesh
                [4 ]ISNI 0000 0004 1936 9094, GRID grid.40263.33, Brown University, ; Providence, RI USA
                [5 ]ISNI 0000 0001 0441 1219, GRID grid.412118.f, Statistics Discipline, , Khulna University, ; Khulna, Bangladesh
                [6 ]ISNI 0000 0001 2113 1622, GRID grid.266623.5, Department of Bioengineering, , University of Louisville, ; Louisville, KY USA
                [7 ]Stroke Monitoring and Diagnostic Division, AtheroPoint LLC, Roseville, CA USA
                [8 ]Knowledge Engineering Center, Global Biomedical Technologies, Roseville, CA USA
                Article
                940
                10.1007/s10916-018-0940-7
                5893681
                29637403
                af318f0f-345d-4260-aadc-f6efa5887ec9
                © The Author(s) 2018

                Open Access This article is distributed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution, and reproduction in any medium, provided you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made.

                History
                : 13 February 2018
                : 7 March 2018
                : 14 March 2018
                Categories
                Article
                Custom metadata
                © Springer Science+Business Media, LLC, part of Springer Nature 2018

                Public health
                diabetes,missing values,outliers,risk stratification,feature selection,machine learning
                Public health
                diabetes, missing values, outliers, risk stratification, feature selection, machine learning

                Comments

                Comment on this article