7
views
0
recommends
+1 Recommend
0 collections
    0
    shares
      • Record: found
      • Abstract: found
      • Article: found
      Is Open Access

      Application of machine learning approaches to predict the 5-year survival status of patients with esophageal cancer

      research-article

      Read this article at

          There is no author summary for this article yet. Authors can add summaries to their articles on ScienceOpen to make them more accessible to a non-specialist audience.

          Abstract

          Background

          Accurate prognostic estimation for esophageal cancer (EC) patients plays an important role in the process of clinical decision-making. The objective of this study was to develop an effective model to predict the 5-year survival status of EC patients using machine learning (ML) algorithms.

          Methods

          We retrieved the information of patients diagnosed with EC between 2010 and 2015 from the Surveillance, Epidemiology, and End Results (SEER) Program, including 24 features. A total of 8 ML models were applied to the selected dataset to classify the EC patients in terms of 5-year survival status, including 3 newly developed gradient boosting models (GBM), XGBoost, CatBoost, and LightGBM, 2 commonly used tree-based models, gradient boosting decision trees (GBDT) and random forest (RF), and 3 other ML models, artificial neural networks (ANN), naive Bayes (NB), and support vector machines (SVM). A 5-fold cross-validation was used in model performance measurement.

          Results

          After excluding records with missing data, the final study population comprised 10,588 patients. Feature selection was conducted based on the χ 2 test, however, the experiment results showed that the complete dataset provided better prediction of outcomes than the dataset with removal of non-significant features. Among the 8 models, XGBoost had the best performance [area under the receiver operating characteristic (ROC) curve (AUC): 0.852 for XGBoost, 0.849 for CatBoost, 0.850 for LightGBM, 0.846 for GBDT, 0.838 for RF, 0.844 for ANN, 0.833 for NB, and 0.789 for SVM]. The accuracy and logistic loss of XGBoost were 0.875 and 0.301, respectively, which were also the best performances. In the XGBoost model, the SHapley Additive exPlanations (SHAP) value was calculated and the result indicated that the four features: reason no cancer-directed surgery, Surg Prim Site, age, and stage group had the greatest impact on predicting the outcomes.

          Conclusions

          The XGBoost model and the complete dataset can be used to construct an accurate prognostic model for patients diagnosed with EC which may be applicable in clinical practice in the future.

          Related collections

          Most cited references27

          • Record: found
          • Abstract: not found
          • Conference Proceedings: not found

          Lightgbm: a highly efficient gradient boosting decision tree

            Bookmark
            • Record: found
            • Abstract: found
            • Article: not found

            Predicting breast cancer survivability: a comparison of three data mining methods.

            The prediction of breast cancer survivability has been a challenging research problem for many researchers. Since the early dates of the related research, much advancement has been recorded in several related fields. For instance, thanks to innovative biomedical technologies, better explanatory prognostic factors are being measured and recorded; thanks to low cost computer hardware and software technologies, high volume better quality data is being collected and stored automatically; and finally thanks to better analytical methods, those voluminous data is being processed effectively and efficiently. Therefore, the main objective of this manuscript is to report on a research project where we took advantage of those available technological advancements to develop prediction models for breast cancer survivability. We used two popular data mining algorithms (artificial neural networks and decision trees) along with a most commonly used statistical method (logistic regression) to develop the prediction models using a large dataset (more than 200,000 cases). We also used 10-fold cross-validation methods to measure the unbiased estimate of the three prediction models for performance comparison purposes. The results indicated that the decision tree (C5) is the best predictor with 93.6% accuracy on the holdout sample (this prediction accuracy is better than any reported in the literature), artificial neural networks came out to be the second with 91.2% accuracy and the logistic regression models came out to be the worst of the three with 89.2% accuracy. The comparative study of multiple prediction models for breast cancer survivability using a large dataset along with a 10-fold cross-validation provided us with an insight into the relative prediction ability of different data mining methods. Using sensitivity analysis on neural network models provided us with the prioritized importance of the prognostic factors used in the study.
              Bookmark
              • Record: found
              • Abstract: not found
              • Article: not found

              Predicting the Future Burden of Esophageal Cancer by Histological Subtype: International Trends in Incidence up to 2030

                Bookmark

                Author and article information

                Journal
                J Thorac Dis
                J Thorac Dis
                JTD
                Journal of Thoracic Disease
                AME Publishing Company
                2072-1439
                2077-6624
                November 2021
                November 2021
                : 13
                : 11
                : 6240-6251
                Affiliations
                [1 ]deptDepartment of Thoracic Surgery , Fujian Medical University Union Hospital , Fuzhou, China;
                [2 ]deptKey Laboratory of Cardio-Thoracic Surgery (Fujian Medical University) , Fujian Province University , Fuzhou, China
                Author notes

                Contributions: (I) Conception and design: X Gong, C Chen; (II) Administrative support: None; (III) Provision of study materials or patients: None; (IV) Collection and assembly of data: None; (V) Data analysis and interpretation: X Gong, B Zheng; (VI) Manuscript writing: All authors; (VII) Final approval of manuscript: All authors.

                Correspondence to: Chun Chen. Department of Thoracic Surgery, Fujian Medical University Union Hospital, 29 Xinquan Road, Fuzhou 350001, China. Email: chenchun0209@ 123456fjmu.edu.cn .
                Article
                jtd-13-11-6240
                10.21037/jtd-21-1107
                8662490
                34992804
                52500129-6513-4f62-94cd-4df5eec2becb
                2021 Journal of Thoracic Disease. All rights reserved.

                Open Access Statement: This is an Open Access article distributed in accordance with the Creative Commons Attribution-NonCommercial-NoDerivs 4.0 International License (CC BY-NC-ND 4.0), which permits the non-commercial replication and distribution of the article with the strict proviso that no changes or edits are made and the original work is properly cited (including links to both the formal publication through the relevant DOI and the license). See: https://creativecommons.org/licenses/by-nc-nd/4.0.

                History
                : 05 July 2021
                : 24 September 2021
                Categories
                Original Article

                esophageal cancer (ec),survival,machine learning (ml),surveillance, epidemiology, and end results (seer)

                Comments

                Comment on this article