92
views
0
recommends
+1 Recommend
0 collections
    0
    shares
      • Record: found
      • Abstract: found
      • Article: found
      Is Open Access

      Machine learning applications in cancer prognosis and prediction

      review-article
      a , a , b , a , c , a , b , *
      Computational and Structural Biotechnology Journal
      Research Network of Computational and Structural Biotechnology
      ML, Machine Learning, ANN, Artificial Neural Network, SVM, Support Vector Machine, DT, Decision Tree, BN, Bayesian Network, SSL, Semi-supervised Learning, TCGA, The Cancer Genome Atlas Research Network, HTT, High-throughput Technologies, OSCC, Oral Squamous Cell Carcinoma, CFS, Correlation based Feature Selection, AUC, Area Under Curve, ROC, Receiver Operating Characteristic, BCRSVM, Breast Cancer Support Vector Machine, PPI, Protein–Protein Interaction, GEO, Gene Expression Omnibus, LCS, Learning Classifying Systems, ES, Early Stopping algorithm, SEER, Surveillance, Epidemiology and End results Database, NSCLC, Non-small Cell Lung Cancer, NCI caArray, National Cancer Institute Array Data Management System, Machine learning, Cancer susceptibility, Predictive models, Cancer recurrence, Cancer survival

      Read this article at

      Bookmark
          There is no author summary for this article yet. Authors can add summaries to their articles on ScienceOpen to make them more accessible to a non-specialist audience.

          Abstract

          Cancer has been characterized as a heterogeneous disease consisting of many different subtypes. The early diagnosis and prognosis of a cancer type have become a necessity in cancer research, as it can facilitate the subsequent clinical management of patients. The importance of classifying cancer patients into high or low risk groups has led many research teams, from the biomedical and the bioinformatics field, to study the application of machine learning (ML) methods. Therefore, these techniques have been utilized as an aim to model the progression and treatment of cancerous conditions. In addition, the ability of ML tools to detect key features from complex datasets reveals their importance. A variety of these techniques, including Artificial Neural Networks (ANNs), Bayesian Networks (BNs), Support Vector Machines (SVMs) and Decision Trees (DTs) have been widely applied in cancer research for the development of predictive models, resulting in effective and accurate decision making. Even though it is evident that the use of ML methods can improve our understanding of cancer progression, an appropriate level of validation is needed in order for these methods to be considered in the everyday clinical practice. In this work, we present a review of recent ML approaches employed in the modeling of cancer progression. The predictive models discussed here are based on various supervised ML techniques as well as on different input features and data samples. Given the growing trend on the application of ML methods in cancer research, we present here the most recent publications that employ these techniques as an aim to model cancer risk or patient outcomes.

          Related collections

          Most cited references50

          • Record: found
          • Abstract: found
          • Article: not found

          Prediction of cancer outcome with microarrays: a multiple random validation strategy

          General studies of microarray gene-expression profiling have been undertaken to predict cancer outcome. Knowledge of this gene-expression profile or molecular signature should improve treatment of patients by allowing treatment to be tailored to the severity of the disease. We reanalysed data from the seven largest published studies that have attempted to predict prognosis of cancer patients on the basis of DNA microarray analysis. The standard strategy is to identify a molecular signature (ie, the subset of genes most differentially expressed in patients with different outcomes) in a training set of patients and to estimate the proportion of misclassifications with this signature on an independent validation set of patients. We expanded this strategy (based on unique training and validation sets) by using multiple random sets, to study the stability of the molecular signature and the proportion of misclassifications. The list of genes identified as predictors of prognosis was highly unstable; molecular signatures strongly depended on the selection of patients in the training sets. For all but one study, the proportion misclassified decreased as the number of patients in the training set increased. Because of inadequate validation, our chosen studies published overoptimistic results compared with those from our own analyses. Five of the seven studies did not classify patients better than chance. The prognostic value of published microarray results in cancer studies should be considered with caution. We advocate the use of validation by repeated random sampling.
            Bookmark
            • Record: found
            • Abstract: found
            • Article: not found

            Prognostic value of a combined estrogen receptor, progesterone receptor, Ki-67, and human epidermal growth factor receptor 2 immunohistochemical score and comparison with the Genomic Health recurrence score in early breast cancer.

            We recently reported that the mRNA-based, 21-gene Genomic Health recurrence score (GHI-RS) provided additional prognostic information regarding distant recurrence beyond that obtained from classical clinicopathologic factors (age, nodal status, tumor size, grade, endocrine treatment) in women with early breast cancer, confirming earlier reports. The aim of this article is to determine how much of this information is contained in standard immunohistochemical (IHC) markers. The primary cohort comprised 1,125 estrogen receptor-positive (ER-positive) patients from the Arimidex, Tamoxifen, Alone or in Combination (ATAC) trial who did not receive adjuvant chemotherapy, had the GHI-RS computed, and had adequate tissue for the four IHC measurements: ER, progesterone receptor (PgR), human epidermal growth factor receptor 2 (HER2), and Ki-67. Distant recurrence was the primary end point, and proportional hazards models were used with sample splitting to control for overfitting. A prognostic model that used classical variables and the four IHC markers (IHC4 score) was created and assessed in a separate cohort of 786 patients. All four IHC markers provided independent prognostic information in the presence of classical variables. In sample-splitting analyses, the information in the IHC4 score was found to be similar to that in the GHI-RS, and little additional prognostic value was seen in the combined use of both scores. The prognostic value of the IHC4 score was further validated in the second separate cohort. This study suggests that the amount of prognostic information contained in four widely performed IHC assays is similar to that in the GHI-RS. Additional studies are needed to determine the general applicability of the IHC4 score.
              Bookmark
              • Record: found
              • Abstract: found
              • Article: not found

              Outcome signature genes in breast cancer: is there a unique set?

              Predicting the metastatic potential of primary malignant tissues has direct bearing on the choice of therapy. Several microarray studies yielded gene sets whose expression profiles successfully predicted survival. Nevertheless, the overlap between these gene sets is almost zero. Such small overlaps were observed also in other complex diseases, and the variables that could account for the differences had evoked a wide interest. One of the main open questions in this context is whether the disparity can be attributed only to trivial reasons such as different technologies, different patients and different types of analyses. To answer this question, we concentrated on a single breast cancer dataset, and analyzed it by a single method, the one which was used by van't Veer et al. to produce a set of outcome-predictive genes. We showed that, in fact, the resulting set of genes is not unique; it is strongly influenced by the subset of patients used for gene selection. Many equally predictive lists could have been produced from the same analysis. Three main properties of the data explain this sensitivity: (1) many genes are correlated with survival; (2) the differences between these correlations are small; (3) the correlations fluctuate strongly when measured over different subsets of patients. A possible biological explanation for these properties is discussed. eytan.domany@weizmann.ac.il http://www.weizmann.ac.il/physics/complex/compphys/downloads/liate/
                Bookmark

                Author and article information

                Contributors
                Journal
                Comput Struct Biotechnol J
                Comput Struct Biotechnol J
                Computational and Structural Biotechnology Journal
                Research Network of Computational and Structural Biotechnology
                2001-0370
                15 November 2014
                2015
                15 November 2014
                : 13
                : 8-17
                Affiliations
                [a ]Unit of Medical Technology and Intelligent Information Systems, Dept. of Materials Science and Engineering, University of Ioannina, Ioannina, Greece
                [b ]IMBB — FORTH, Dept. of Biomedical Research, Ioannina, Greece
                [c ]Molecular Oncology Unit, Department of Biological Chemistry, Medical School, University of Athens, Athens, Greece
                Author notes
                [* ]Corresponding author at: Unit of Medical Technology and Intelligent Information Systems, Dept. of Materials Science and Engineering, University of Ioannina, Ioannina, Greece. fotiadis@ 123456cc.uoi.gr
                Article
                S2001-0370(14)00046-4
                10.1016/j.csbj.2014.11.005
                4348437
                25750696
                29cf76bc-40f3-4e9a-bda2-64d0b096deb6
                © 2014 Kourou et al. Published by Elsevier B.V. on behalf of the Research Network of Computational and Structural Biotechnology.

                This is an open access article under the CC BY license (http://creativecommons.org/licenses/by/3.0/).

                History
                Categories
                Mini Review

                ml, machine learning,ann, artificial neural network,svm, support vector machine,dt, decision tree,bn, bayesian network,ssl, semi-supervised learning,tcga, the cancer genome atlas research network,htt, high-throughput technologies,oscc, oral squamous cell carcinoma,cfs, correlation based feature selection,auc, area under curve,roc, receiver operating characteristic,bcrsvm, breast cancer support vector machine,ppi, protein–protein interaction,geo, gene expression omnibus,lcs, learning classifying systems,es, early stopping algorithm,seer, surveillance, epidemiology and end results database,nsclc, non-small cell lung cancer,nci caarray, national cancer institute array data management system,machine learning,cancer susceptibility,predictive models,cancer recurrence,cancer survival

                Comments

                Comment on this article