11
views
0
recommends
+1 Recommend
0 collections
    0
    shares
      • Record: found
      • Abstract: found
      • Article: found
      Is Open Access

      Identification of clinical factors related to prediction of alcohol use disorder from electronic health records using feature selection methods

      research-article

      Read this article at

      Bookmark
          There is no author summary for this article yet. Authors can add summaries to their articles on ScienceOpen to make them more accessible to a non-specialist audience.

          Abstract

          Background

          High dimensionality in electronic health records (EHR) causes a significant computational problem for any systematic search for predictive, diagnostic, or prognostic patterns. Feature selection (FS) methods have been indicated to be effective in feature reduction as well as in identifying risk factors related to prediction of clinical disorders. This paper examines the prediction of patients with alcohol use disorder (AUD) using machine learning (ML) and attempts to identify risk factors related to the diagnosis of AUD.

          Methods

          A FS framework consisting of two operational levels, base selectors and ensemble selectors. The first level consists of five FS methods: three filter methods, one wrapper method, and one embedded method. Base selector outputs are aggregated to develop four ensemble FS methods. The outputs of FS method were then fed into three ML algorithms: support vector machine (SVM), K-nearest neighbor (KNN), and random forest (RF) to compare and identify the best feature subset for the prediction of AUD from EHRs.

          Results

          In terms of feature reduction, the embedded FS method could significantly reduce the number of features from 361 to 131. In terms of classification performance, RF based on 272 features selected by our proposed ensemble method (Union FS) with the highest accuracy in predicting patients with AUD, 96%, outperformed all other models in terms of AUROC, AUPRC, Precision, Recall, and F1-Score. Considering the limitations of embedded and wrapper methods, the best overall performance was achieved by our proposed Union Filter FS, which reduced the number of features to 223 and improved Precision, Recall, and F1-Score in RF from 0.77, 0.65, and 0.71 to 0.87, 0.81, and 0.84, respectively. Our findings indicate that, besides gender, age, and length of stay at the hospital, diagnosis related to digestive organs, bones, muscles and connective tissue, and the nervous systems are important clinical factors related to the prediction of patients with AUD.

          Conclusion

          Our proposed FS method could improve the classification performance significantly. It could identify clinical factors related to prediction of AUD from EHRs, thereby effectively helping clinical staff to identify and treat AUD patients and improving medical knowledge of the AUD condition. Moreover, the diversity of features among female and male patients as well as gender disparity were investigated using FS methods and ML techniques.

          Related collections

          Most cited references59

          • Record: found
          • Abstract: not found
          • Article: not found

          Random Forests

            Bookmark
            • Record: found
            • Abstract: found
            • Article: not found

            SMOTE: Synthetic Minority Over-sampling Technique

            An approach to the construction of classifiers from imbalanced datasets is described. A dataset is imbalanced if the classification categories are not approximately equally represented. Often real-world data sets are predominately composed of ``normal'' examples with only a small percentage of ``abnormal'' or ``interesting'' examples. It is also the case that the cost of misclassifying an abnormal (interesting) example as a normal example is often much higher than the cost of the reverse error. Under-sampling of the majority (normal) class has been proposed as a good means of increasing the sensitivity of a classifier to the minority class. This paper shows that a combination of our method of over-sampling the minority (abnormal) class and under-sampling the majority (normal) class can achieve better classifier performance (in ROC space) than only under-sampling the majority class. This paper also shows that a combination of our method of over-sampling the minority class and under-sampling the majority class can achieve better classifier performance (in ROC space) than varying the loss ratios in Ripper or class priors in Naive Bayes. Our method of over-sampling the minority class involves creating synthetic minority class examples. Experiments are performed using C4.5, Ripper and a Naive Bayes classifier. The method is evaluated using the area under the Receiver Operating Characteristic curve (AUC) and the ROC convex hull strategy.
              Bookmark
              • Record: found
              • Abstract: not found
              • Article: not found

              Wrappers for feature subset selection

                Bookmark

                Author and article information

                Contributors
                aleb@mmmi.sdu.dk
                ukwiil@mmmi.sdu.dk
                amin@mmmi.sdu.dk
                marm@imada.sdu.dk
                kjeld.andersen@rsyd.dk
                ansnielsen@health.sdu.dk
                Journal
                BMC Med Inform Decis Mak
                BMC Med Inform Decis Mak
                BMC Medical Informatics and Decision Making
                BioMed Central (London )
                1472-6947
                23 November 2022
                23 November 2022
                2022
                : 22
                : 304
                Affiliations
                [1 ]GRID grid.10825.3e, ISNI 0000 0001 0728 0170, SDU Health Informatics and Technology, The Maersk Mc-Kinney Moller Institute, , University of Southern Denmark, ; Odense, Denmark
                [2 ]GRID grid.10825.3e, ISNI 0000 0001 0728 0170, Department of Mathematics and Computer Science, , University of Southern Denmark, ; Odense, Denmark
                [3 ]GRID grid.10825.3e, ISNI 0000 0001 0728 0170, Unit for Clinical Alcohol Research, Clinical Institute, , University of Southern Denmark, ; Odense, Denmark
                Article
                2051
                10.1186/s12911-022-02051-w
                9686074
                34983500
                ed3219a8-1d35-4689-9485-b3010097086c
                © The Author(s) 2022

                Open AccessThis article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article's Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article's Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/. The Creative Commons Public Domain Dedication waiver ( http://creativecommons.org/publicdomain/zero/1.0/) applies to the data made available in this article, unless otherwise stated in a credit line to the data.

                History
                : 19 December 2021
                : 16 November 2022
                Funding
                Funded by: 5a DE-DK project Access & Acceleration
                Categories
                Research
                Custom metadata
                © The Author(s) 2022

                Bioinformatics & Computational biology
                alcohol use disorder,clinical factor identification,gender disparity,machine learning,feature selection

                Comments

                Comment on this article