3
views
0
recommends
+1 Recommend
0 collections
    0
    shares
      • Record: found
      • Abstract: found
      • Article: found
      Is Open Access

      Machine Learning With K-Means Dimensional Reduction for Predicting Survival Outcomes in Patients With Breast Cancer

      research-article

      Read this article at

      Bookmark
          There is no author summary for this article yet. Authors can add summaries to their articles on ScienceOpen to make them more accessible to a non-specialist audience.

          Abstract

          Objective:

          Despite existing prognostic markers, breast cancer prognosis remains a difficult subject due to the complex relationships between many contributing factors and survival. This study seeks to integrate multiple clinicopathological and genomic factors with dimensional reduction across machine learning algorithms to compare survival predictions.

          Methods:

          This is a secondary analysis of the data from a prospective cohort study of female patients with breast cancer enrolled in the Molecular Taxonomy of Breast Cancer International Consortium (METABRIC). We constructed a series of predictive models: ensemble models (Gradient Boosting and Random Forest), support vector machine (SVM), and artificial neural networks (ANN) for 5-year survival based on clinicopathological and gene expression data after K-means clustering with K-nearest-neighbor (KNN) classification. Model performance was evaluated by receiver operating characteristic (ROC) curve, accuracy, and calibration slope (CS). Model stability was assessed over 10 random runs in terms of ROC, accuracy, CS, and variable importance.

          Results:

          The analytic cohort is composed of 1874 patients with breast cancer. Overall, the median age was 62 years; the 5-year survival rate was 75%. ROC and accuracy were not significantly different between models (ROC and accuracy around 0.67 and 0.72 across models, respectively). However, ensemble methods resulted in better fit (CS) with stable measures of variable importance across 10 random training/validation splits. K-means clustering of gene expression profiles on training data points along with KNN classification of validation data points was a robust method of dimensional reduction. Furthermore, the gene expression cluster with the highest mortality risk was an influential factor in model prediction.

          Conclusions:

          Using machine learning methods to construct predictive models for 5-year survival in patients with breast cancer, we demonstrated discrimination ability across models with new insight into the stability and utility of dimensional reduction on genomic features in breast cancer survival prediction.

          Related collections

          Most cited references25

          • Record: found
          • Abstract: found
          • Article: not found

          Integrative analysis of complex cancer genomics and clinical profiles using the cBioPortal.

          The cBioPortal for Cancer Genomics (http://cbioportal.org) provides a Web resource for exploring, visualizing, and analyzing multidimensional cancer genomics data. The portal reduces molecular profiling data from cancer tissues and cell lines into readily understandable genetic, epigenetic, gene expression, and proteomic events. The query interface combined with customized data storage enables researchers to interactively explore genetic alterations across samples, genes, and pathways and, when available in the underlying data, to link these to clinical outcomes. The portal provides graphical summaries of gene-level data from multiple platforms, network visualization and analysis, survival analysis, patient-centric queries, and software programmatic access. The intuitive Web interface of the portal makes complex cancer genomics profiles accessible to researchers and clinicians without requiring bioinformatics expertise, thus facilitating biological discoveries. Here, we provide a practical guide to the analysis and visualization features of the cBioPortal for Cancer Genomics.
            Bookmark
            • Record: found
            • Abstract: found
            • Article: not found

            Systematic and integrative analysis of large gene lists using DAVID bioinformatics resources.

            DAVID bioinformatics resources consists of an integrated biological knowledgebase and analytic tools aimed at systematically extracting biological meaning from large gene/protein lists. This protocol explains how to use DAVID, a high-throughput and integrated data-mining environment, to analyze gene lists derived from high-throughput genomic experiments. The procedure first requires uploading a gene list containing any number of common gene identifiers followed by analysis using one or more text and pathway-mining tools such as gene functional classification, functional annotation chart or clustering and functional annotation table. By following this protocol, investigators are able to gain an in-depth understanding of the biological themes in lists of genes that are enriched in genome-scale studies.
              Bookmark
              • Record: found
              • Abstract: found
              • Article: not found

              Cancer incidence and mortality worldwide: sources, methods and major patterns in GLOBOCAN 2012.

              Estimates of the worldwide incidence and mortality from 27 major cancers and for all cancers combined for 2012 are now available in the GLOBOCAN series of the International Agency for Research on Cancer. We review the sources and methods used in compiling the national cancer incidence and mortality estimates, and briefly describe the key results by cancer site and in 20 large "areas" of the world. Overall, there were 14.1 million new cases and 8.2 million deaths in 2012. The most commonly diagnosed cancers were lung (1.82 million), breast (1.67 million), and colorectal (1.36 million); the most common causes of cancer death were lung cancer (1.6 million deaths), liver cancer (745,000 deaths), and stomach cancer (723,000 deaths). © 2014 UICC.
                Bookmark

                Author and article information

                Journal
                Cancer Inform
                Cancer Inform
                CIX
                spcix
                Cancer Informatics
                SAGE Publications (Sage UK: London, England )
                1176-9351
                09 November 2018
                2018
                : 17
                : 1176935118810215
                Affiliations
                [1 ]Department of Biostatistics, Harvard T.H. Chan School of Public Health, Boston, MA, USA
                [2 ]Department of Emergency Medicine, Massachusetts General Hospital, Harvard Medical School, Boston, MA, USA
                Author notes
                [*]Melissa Zhao, Harvard T.H. Chan School of Public Health, 677 Huntington Avenue Boston, MA 02115, USA. Email: mzhao@ 123456hsph.harvard.edu
                Author information
                https://orcid.org/0000-0002-5190-3635
                Article
                10.1177_1176935118810215
                10.1177/1176935118810215
                6238199
                30455569
                a1a725fa-3d8a-4750-85cb-07d137017270
                © The Author(s) 2018

                This article is distributed under the terms of the Creative Commons Attribution-NonCommercial 4.0 License ( http://www.creativecommons.org/licenses/by-nc/4.0/) which permits non-commercial use, reproduction and distribution of the work without further permission provided the original work is attributed as specified on the SAGE and Open Access pages ( https://us.sagepub.com/en-us/nam/open-access-at-sage).

                History
                : 28 September 2018
                : 3 October 2018
                Categories
                Original Research
                Custom metadata
                January-December 2018

                Oncology & Radiotherapy
                breast cancer,survival,machine learning methods,prediction
                Oncology & Radiotherapy
                breast cancer, survival, machine learning methods, prediction

                Comments

                Comment on this article