Prediction of human breast and colon cancers from imbalanced data using nearest neighbor and support vector machines.

There is no author summary for this article yet. Authors can add summaries to their articles on ScienceOpen to make them more accessible to a non-specialist audience.

Abstract

This study proposes a novel prediction approach for human breast and colon cancers using different feature spaces. The proposed scheme consists of two stages: the preprocessor and the predictor. In the preprocessor stage, the mega-trend diffusion (MTD) technique is employed to increase the samples of the minority class, thereby balancing the dataset. In the predictor stage, machine-learning approaches of K-nearest neighbor (KNN) and support vector machines (SVM) are used to develop hybrid MTD-SVM and MTD-KNN prediction models. MTD-SVM model has provided the best values of accuracy, G-mean and Matthew's correlation coefficient of 96.71%, 96.70% and 71.98% for cancer/non-cancer dataset, breast/non-breast cancer dataset and colon/non-colon cancer dataset, respectively. We found that hybrid MTD-SVM is the best with respect to prediction performance and computational cost. MTD-KNN model has achieved moderately better prediction as compared to hybrid MTD-NB (Naïve Bayes) but at the expense of higher computing cost. MTD-KNN model is faster than MTD-RF (random forest) but its prediction is not better than MTD-RF. To the best of our knowledge, the reported results are the best results, so far, for these datasets. The proposed scheme indicates that the developed models can be used as a tool for the prediction of cancer. This scheme may be useful for study of any sequential information such as protein sequence or any nucleic acid sequence.

Related collections

Author and article information

Journal

Journal ID (iso-abbrev): Comput Methods Programs Biomed

Title: Computer methods and programs in biomedicine

ISSN (Electronic): 1872-7565

ISSN (Print): 0169-2607

Publication date (Electronic): Mar 2014

Volume: 113

Issue: 3

Affiliations

[1 ] Department of Computer & Information Sciences, Pakistan Institute of Engineering & Applied Sciences, Nilore, 45650 Islamabad, Pakistan. Electronic address: abdulmajiid@pieas.edu.pk.

[2 ] Department of Computer & Information Sciences, Pakistan Institute of Engineering & Applied Sciences, Nilore, 45650 Islamabad, Pakistan. Electronic address: safdarali@pieas.edu.pk.

[3 ] Department of Computer & Information Sciences, Pakistan Institute of Engineering & Applied Sciences, Nilore, 45650 Islamabad, Pakistan. Electronic address: mubashar@pieas.edu.pk.

[4 ] Department of Computer & Information Sciences, Pakistan Institute of Engineering & Applied Sciences, Nilore, 45650 Islamabad, Pakistan. Electronic address: nabeela.kausar@pieas.edu.pk.

Article

Publisher Item ID: S0169-2607(14)00002-9

DOI: 10.1016/j.cmpb.2014.01.001

PubMed ID: 24472367

SO-VID: a09a365b-1e22-4037-805f-4b7bdfc8f2e1

History

Keywords: Breast/colon cancer,K-nearest neighbor,Mega-trend diffusion,Naïve Bayes,Random forest,Support vector machines

Data availability:

Keywords: Breast/colon cancer, K-nearest neighbor, Mega-trend diffusion, Naïve Bayes, Random forest, Support vector machines

Comments

Comment on this article

scite_

Smart Citations

Citing PublicationsSupportingMentioningContrasting

View Citations

See how this article has been cited at scite.ai

scite shows how a scientific paper has been cited by providing the context of the citation, a classification describing whether it supports, mentions, or contrasts the cited claim, and a label indicating in which section the citation was made.

Cited by 19

See all cited by