14
views
0
recommends
+1 Recommend
0 collections
    0
    shares
      • Record: found
      • Abstract: found
      • Article: not found

      Prediction of human breast and colon cancers from imbalanced data using nearest neighbor and support vector machines.

      Read this article at

      ScienceOpenPublisherPubMed
      Bookmark
          There is no author summary for this article yet. Authors can add summaries to their articles on ScienceOpen to make them more accessible to a non-specialist audience.

          Abstract

          This study proposes a novel prediction approach for human breast and colon cancers using different feature spaces. The proposed scheme consists of two stages: the preprocessor and the predictor. In the preprocessor stage, the mega-trend diffusion (MTD) technique is employed to increase the samples of the minority class, thereby balancing the dataset. In the predictor stage, machine-learning approaches of K-nearest neighbor (KNN) and support vector machines (SVM) are used to develop hybrid MTD-SVM and MTD-KNN prediction models. MTD-SVM model has provided the best values of accuracy, G-mean and Matthew's correlation coefficient of 96.71%, 96.70% and 71.98% for cancer/non-cancer dataset, breast/non-breast cancer dataset and colon/non-colon cancer dataset, respectively. We found that hybrid MTD-SVM is the best with respect to prediction performance and computational cost. MTD-KNN model has achieved moderately better prediction as compared to hybrid MTD-NB (Naïve Bayes) but at the expense of higher computing cost. MTD-KNN model is faster than MTD-RF (random forest) but its prediction is not better than MTD-RF. To the best of our knowledge, the reported results are the best results, so far, for these datasets. The proposed scheme indicates that the developed models can be used as a tool for the prediction of cancer. This scheme may be useful for study of any sequential information such as protein sequence or any nucleic acid sequence.

          Related collections

          Author and article information

          Journal
          Comput Methods Programs Biomed
          Computer methods and programs in biomedicine
          1872-7565
          0169-2607
          Mar 2014
          : 113
          : 3
          Affiliations
          [1 ] Department of Computer & Information Sciences, Pakistan Institute of Engineering & Applied Sciences, Nilore, 45650 Islamabad, Pakistan. Electronic address: abdulmajiid@pieas.edu.pk.
          [2 ] Department of Computer & Information Sciences, Pakistan Institute of Engineering & Applied Sciences, Nilore, 45650 Islamabad, Pakistan. Electronic address: safdarali@pieas.edu.pk.
          [3 ] Department of Computer & Information Sciences, Pakistan Institute of Engineering & Applied Sciences, Nilore, 45650 Islamabad, Pakistan. Electronic address: mubashar@pieas.edu.pk.
          [4 ] Department of Computer & Information Sciences, Pakistan Institute of Engineering & Applied Sciences, Nilore, 45650 Islamabad, Pakistan. Electronic address: nabeela.kausar@pieas.edu.pk.
          Article
          S0169-2607(14)00002-9
          10.1016/j.cmpb.2014.01.001
          24472367
          a09a365b-1e22-4037-805f-4b7bdfc8f2e1
          Copyright © 2014 Elsevier Ireland Ltd. All rights reserved.
          History

          Breast/colon cancer,K-nearest neighbor,Mega-trend diffusion,Naïve Bayes,Random forest,Support vector machines

          Comments

          Comment on this article