17
views
0
recommends
+1 Recommend
0 collections
    0
    shares
      scite_
       
      • Record: found
      • Abstract: found
      • Article: found
      Is Open Access

      IoMT-Based Mitochondrial and Multifactorial Genetic Inheritance Disorder Prediction Using Machine Learning

      research-article
      Overview Bookmark

            Abstract

            A genetic disorder is a serious disease that affects a large number of individuals around the world. There are various types of genetic illnesses, however, we focus on mitochondrial and multifactorial genetic disorders for prediction. Genetic illness is caused by a number of factors, including a defective maternal or paternal gene, excessive abortions, a lack of blood cells, and low white blood cell count. For premature or teenage life development, early detection of genetic diseases is crucial. Although it is difficult to forecast genetic disorders ahead of time, this prediction is very critical since a person's life progress depends on it. Machine learning algorithms are used to diagnose genetic disorders with high accuracy utilizing datasets collected and constructed from a large number of patient medical reports. A lot of studies have been conducted recently employing genome sequencing for illness detection, but fewer studies have been presented using patient medical history. The accuracy of existing studies that use a patient's history is restricted. The internet of medical things (IoMT) based proposed model for genetic disease prediction in this article uses two separate machine learning algorithms: support vector machine (SVM) and K-Nearest Neighbor (KNN). Experimental results show that SVM has outperformed the KNN and existing prediction methods in terms of accuracy. SVM achieved an accuracy of 94.99% and 86.6% for training and testing, respectively.

            Main article text

            1. Introduction

            Genes are the building blocks of heredity. They are passed down through the generations. They contain deoxyribonucleic acid (DNA), which includes protein-making instructions. A mutation is a change in one or more genes that happens on a regular basis. The mutation changes the gene's instructions for making a protein, leading it to either not work properly or not exist at all. This can lead to a genetic disorder, which is a serious illness. One or both parents can pass on a genetic mutation to their children. Everybody is susceptible to mutation at some point in their lives [1]. There are illnesses caused by mutations inherited from the parents at birth. Congenital mutations in a gene or a combination of genes that appear at different times in life might cause other disorders. A mutation of this type may occur at random or as a result of environmental factors [2].

            1.1. Multifactor Genetic Disorder

            These disorders are caused by mutations in numerous genes, and they are typically the consequence of a complex interplay of environmental and nutritional factors. It is sometimes referred to as a complicated or polygenic disease [3]. Cancer, diabetes, and Alzheimer's disease can all be linked to a multifactor genetic condition.

            1.2. Mitochondrial Genetic Disorder

            It is associated with mutations in the mitochondrial nonnuclear DNA. Each mitochondrial genome contains 5 to 10 circular DNA segments. During fertilization, they maintain their organelles as eggs. As a result, this condition is always inherited from the mother [3]. The mitochondrial genetic condition causes mitochondrial encephalopathy, lactic acidosis, stroke-like events, and eye damage. “Every year, about 140 million toddlers are born throughout the world, with ten million of these toddlers being born with a severe birth defect of genetic or partially genetic origin, many of which are identified late,” said Linguraru.

            The genetic disease prediction challenge was first handled as a two-class classification issue for machine learning research, with a classification model consisting of true and false training data. Decision trees, K-NN, naïve Bayesian classifier, and binary SVM classifier were employed [4]. Positive training samples in binary classification systems contain genes associated with known illnesses, whereas negative samples do not. Machine learning technology may be used to detect the presence of a genetic condition utilizing a facial photograph taken at a point of care, such as a pediatric office, maternity ward, or general practitioner clinic, as well as the 'patient's medical history [5].

            The major contributions of this study are given below:

            1. Proposed a IoMT-based machine learning model to predict mitochondrial and multifactorial genetic disorders.

            2. The proposed model will improve previously used machine learning techniques with the help of different simulation parameters.

            3. Proposed framework uses unique data preprocessing techniques to enhance the prediction results.

            4. The proposed model uses various statistical matrixes to check the performance and reliability.

            2. Literature Review

            The identification of the most likely disease candidate genes is an important issue in biomedical research, and several methodologies have been proposed [6, 7]. Formalized paraphrase Most early techniques, such as ToppGene [8], highlighted candidate genes by rating them according to morphological or behavioral systems and correlating these ranks to commonly identified illness genes. These schema techniques have the limitation of being unable to find indirect relationships between genes that do not yet share comparable characteristics or activities. Biological network-driven gene prioritizing approaches have recently been developed to solve this issue [6, 912].

            The coverage of functional genomic data, where new high technologies have provided a huge quantity of behavioral data among biological components, has resulted in the development of such network-based approaches over application techniques as well as protein structures. Machine learning algorithms have recently been effectively implemented to many important biomedical problems [13, 14], including genetic code explanation [15], genetic analysis categorization [16, 17], deductive reasoning of gene monitoring networks [18], drug target prognosis [19, 20], and revelation of epigenetic interactions in malady statistics [21, 22], as well as pharmacology [23]. Machine learning has been used to predict disease-associated genes [24, 25]. The challenge is typically framed as a classification job in which known genetic disorders and biological data linked with medical history data are used to build a classification model that is then used to predict emerging genetic illnesses. So, more pragmatic techniques have been developed. In fact, unary classifiers that can only be trained from positive data have been proposed [26]. To combine data from various sources, this research employed a binary support vector machine. Because the remaining collection may contain genes for unknown disorders, semisupervised learning approaches such as semisupervised binary learning techniques [27] and positive and negative [28] were proposed. In previous research, they used machine learning for genome disorder prediction with the help of DNA sequencing data and unary classification. Due to sequencing data results, they are impactful but not efficient to predict different kinds of genetic disorders with perfect accuracy and on time. The major drawback in previous research is DNA sequencing data. Due to this, results vary from paternal to maternal genes and ignore most of the parameters like abortion counts, etc. The authors [29] employed fine Gaussian SVM on hepatitis C patients using public data and achieved 97.9% resultant accuracy. A previous study [30] used the IoMT architecture empowered with a deep neural network for intrusion detection and achieved a 15% increased test results.

            In this research, we used different supervised machine learning approaches with the help of patient medical history to predict mitochondrial and multifactorial genetic inheritance disorders. With the help of this study, the proposed model easily overcomes the drawbacks of DNA sequencing and achieved the best prediction accuracy. Table 1 shows the limitations of previous studies. It shows that Asif et al. [31] achieved 79% prediction accuracy empowered with RF and SVM used miRNA feature base dataset and having handcrafted features and imbalance data limitation. Alshamlan et al. [32] achieved 81% prediction accuracy empowered with the GBC algorithm used the SRBCT feature base dataset and having handcrafted features and imbalance gene sequence data limitation. KhaderKhader et al. [33] achieved 80.5% prediction accuracy empowered with BA and SVM used gene seq feature base dataset and having imbalance gene sequence limitation.

            3. Materials and Methods

            The ability to forecast genetic disorders allows doctors to provide drugs that are helpful to the patient's health, and patients may easily maintain their health before any severe complications arise. We employed machine learning techniques such as SVM and KNN to predict mitochondrial and multifactorial inheritance gene disease in this research. Following the prediction analysis, we highlighted the model with the best accuracy in this study. Figure 1 shows our workflow from dataset selection to prediction.

            The proposed model uses IoMT technology to gather data from numerous hospitals with the help of different digital devices which can vary from hospital to hospital. With the help of IoMT, the collection of process data is easy and beneficial for further simulations. The suggested model is unique in that it picks and downloads a novel tagged dataset of genomic abnormalities from Kaggle. This dataset consists of 12,280 instances, 28 independent features, and one dependent feature (output class). Data were preprocessed in the early phases of this work, performing data normalization, replacing null or missing values applying different mean techniques, and splitting the dataset into two halves: training and testing.

            The proposed model uses two machine learning techniques in the training phase: SVM and KNN for training on 70% of the dataset. The remaining 30% of the data is utilized for testing. As a consequence, based on the best accuracy, we chose the best-predicted model, which has been described in the simulation result section. Before we describe the simulation results it is appropriate to briefly describe the algorithms employed in this work.

            3.1. Support Vector Machine

            Support vector machine algorithm attempts to process the raw data onto a discrete feature space before generating an ideal interval hyperplane that can discriminate between positive and negative examples. We use a two-class SVM approach in this classification, and we create the training set using molecular sequences and interaction data, as reported in [27]. The positive training data includes all known illness genes, whereas the negative training data includes genes linked with new diseases and an additional 10% of genomic sequences.

            The study [28] also uncovered EPI-related genes using a binary class SVM classifier. 69 binary characteristics of known PID and non-PID genes were combined to produce the classifier. The trained classifier identified 1,442 potential PID genes. In this work, a binary class SVM is trained on 29 functions and 70% of the dataset instances.

            To show the characteristics of yi, linear combination variables βi may be used to choose the vectors of the SVM hyperplane. A hyperplane relation is defined as [34, 35]:

            (1)iβikyi,y=m,
            where k is the kernel function k(x, y) and m is a constant.

            Polynomial kernel function used for the training dataset is as follows [3436]:

            (2)kyi,yj=yi·yjd,
            where k is the kernel function and y is the instance of features.

            SVM classifier minimizes the variables by soft margins.

            (3)1ni=1nmax0,1mizTlib+ßz2.

            The soft margins minimizing classifier is represented by equation (3) above, whereas the hard margins classifier is represented by β. Using a limited optimization problem, soft margin equation (3) can be rewritten as follows [37]:

            (4)minimize1ni=1nζi+ßz2,
            where i = {1,…, n} and ζi is the smallest nonnegative number.

            3.2. K-Nearest Neighbors

            The KNN is a nonlinear predictive model developed in 1951 by Evelyn Fix and Joseph Hodges and later modified by Thomas Cover [28]. It is utilized in the segmentation and prediction of data. For both cases, the feed is a dataset containing the nearest k training sets. The outcome is determined by whether KNN is used for classifying or predicting. To improve prediction outcomes, the suggested model employed KNN for prediction and used a 70% training dataset to train the model based on features by varying the number of k folds. Statistical formation of KNN is given as [38]:

            (5)X|Y=xZr.

            In the KNN classifier, the k-nearest neighbors is given a weight of 1/k, while the remainder are given a weight of 0. The jth nearest neighbor is assigned weight fnj with [38].

            (6)wnj=1.

            4. Dataset

            We used the genome disorder dataset from Kaggle [39]. This dataset contains the medical histories of 12,280 people who have mitochondrial and multifactorial genetic inheritance disorders. There are 28 independent variables and one dependent variable in the genomic disorder dataset. In data preparation, the suggested model uses several missing value strategies to substitute null values.

            5. Simulation Results and Discussion

            SVM and KNN machine learning methods were used to train and test the proposed model. The classification accuracy, miss-classification rate, precision, sensitivity, and F1 score are used to evaluate these algorithms. The suggested model's initial stage involves preprocessing the data, replacing missing values, and dividing the data into two phases: training and testing. The suggested model is subsequently trained for the testing phase using SVM and K-NN machine learning methods. The simulation results from the proposed model are detailed below in terms of several prediction parameters. In the first phase, simulation results demonstrate confusion matrices of training and testing for both machine learning algorithms, and then the comparison of their parameters is presented in the second phase.

            Table 2 shows the simulation parameters of the proposed model of SVM and KNN. It shows that the KNN model uses a total number of 5 neighbors with the exhaustive NS method, Minkowski distance between neighbors and standardize equals true. In parallel SVM uses a polynomial kernel function with auto kernel scale having 3 polynomial orders and standardize equals true.

            The training confusion matrix of the SVM and K-NN algorithms can be seen in Table 3. The trained KNN model's confusion metric yields 6922, 657, 825, and 191 scores of true positive, true negative, false positive, and false negative, respectively. SVM received 6959, 1205, 277, 154 attributes of true positive, true negative, false positive, and false negative. As a result, the suggested model demonstrates that SVM obtains the greatest true positive rate when compared to the KNN model.

            Table 4 depicts the prediction outcomes of both machine learning algorithms using the suggested model. The confusion metric for testing the K-NN model receives 3023, 115, 469, 77 attributes of true positive, true negative, false positive, and false negative, respectively, while the confusion metric for testing the SVM receives 2931, 262, 322, 169 attributes of true positive, true negative, false positive, and false negative.

            The suggested SVM model Figure 2 gets the lowest mean squared error of 0.1089 after 24 epochs. It signifies that the suggested model's prediction results are accurate and efficient. Furthermore, this value has been improved by vary simulation hyper parameters, dataset with numerous numbers of iterations.

            In Table 4 the accuracy, miss-classification rate, sensitivity, precision, and F1 score values are calculated by using the formulas mentioned below [37, 4051].

            (7)Accuracy=True Classified InstancesTotal Instances,Missclassification rate =False Classified InstancesTotal Instances,Sensitivity=TPTP+FN,Precision=TPTP+FP,F1score=2TP2TP+FP+FN.

            The proposed model outcomes are analyzed using accuracy, miss-classification rate, precision, sensitivity, and F1-score analysis parameters. Table 5 presents a comparison of all analytical parameters using the suggested machine learning model. The proposed K-NN model achieves accuracy, miss-classification rate, precision, sensitivity, and F1-score of 88.3 percent, 11.7 percent, 89.35 percent, 97.31 percent, and 93.15 percent, respectively. The proposed SVM-based model achieved 94.99 percent training accuracy, 5.01 percent, 96.17 percent, 97.83 percent, and 96.98 percent miss-classification Rate, precision, sensitivity, and F1-score, respectively. As a result, the suggested model demonstrates that SVM obtains the maximum training accuracy when compared to the KNN model. The suggested model outperforms state-of-the-art machine learning techniques in terms of prediction outcomes. The proposed KNN model achieves 85.1 percent, 14.9 percent, 86.56 percent, 97.51 percent, 91.7 percent prediction accuracy, miss-classification rate, precision, sensitivity, and F1-score, while the proposed SVM model achieves 86.6 percent, 13.4 percent, 90.10 percent, 94.54 percent, 92.26 percent prediction accuracy, miss-classification rate, precision, sensitivity, and F1-score. As a result, the suggested model demonstrates that SVM obtains the maximum prediction accuracy when compared to the K-NN model. Table 6 shows the comparative analysis of previous studies with the proposed model and it shows Asif et al. [31] achieved 79% prediction accuracy empowered with RF and SVM used miRNA feature base dataset and having handcrafted features and imbalance data limitation, Alshamlan et al. [32] achieved 81% prediction accuracy empowered with GBC algorithm used SRBCT feature base dataset and having handcrafted features and imbalance gene sequence data limitation, KhaderKhader et al. [33] achieved 80.5% prediction accuracy empowered with BA and SVM used gene seq feature base dataset and having imbalance gene sequence limitation and on the other side the proposed model achieves 86.6% prediction accuracy empowered with SVM using genetic clinical feature based data and with IoMT technology. The proposed model achieves the best accuracy using the proposed model of SVM with the help of different simulation parameters which are far better than previously researched articles. So, it shows with the varying of simulation parameters models can get the best training and testing results.

            6. Conclusion and Future Work

            Smart machine learning plays a critical role in the early detection of genetic disorders. SVM and K-NN techniques were employed in this study to predict mitochondrial and multifactorial genetic inheritance disorders. The medical history of a patient provides significant information about a genetic problem, and this information is employed by the suggested model to forecast genetic inheritance disorders. SVM has the highest prediction accuracy of 86.6 percent, and it outperforms genetic sequence methods in terms of prediction performance. Patients and physicians will benefit from this research since it will allow them to predict gene abnormalities quickly and save lives. We also intend to develop this study in the future by using multiclass categorization of cancer, dementia, and diabetes, which will be extremely useful in the health care industry.

            Data Availability

            The data used in this paper can be requested from the corresponding author upon request.

            Disclosure

            Atta-ur-Rahman and Muhammad Umar Nasir are the co-first authors.

            Conflicts of Interest

            The authors declare that there are no conflicts of interest.

            References

            1. 2022. https://www.omicsonline.org/genetics/genetic-diseases-review-articles.php

            2. 2022. https://www.imedpub.com/articles/genetic-disorders-a-literature-review.php?aid=28821

            3. Irom B. S.. Genetic disorders: a literature review. Genetic and Molecular Biology Research . 2022. Vol. 4:

            4. Le D. H., Hoai N. X., Kwon Y. K.. A comparative study of classification-based machine learning methods for novel disease gene prediction. Knowledge and Systems Engineering . 2015. Vol. 236:577–588. [Cross Ref] [2-s2.0-84910633544]

            5. 2022. https://healthitanalytics.com/news/machine-learning-tool-detects-genetic-syndromes-in-children

            6. Wang X., Gulbahce N., Yu H.. Network-based methods for human disease gene prediction. Briefings in Functional Genomics . 2011. Vol. 10(5):280–293. [Cross Ref] [2-s2.0-80054991882] [PubMed]

            7. Tranchevent L. C., Capdevila F. B., Nitsch D., De Moor B., De Causmaecker P., Moreau Y.. A guide to web tools to prioritize candidate genes. Briefings in Bioinformatics . 2010. Vol. 12(1):22–32. [Cross Ref] [2-s2.0-79551643743] [PubMed]

            8. Chen J., Xu H., Aronow B. J., Jegga A. G.. Improved human disease candidate gene prioritization using mouse phenotype. BMC Bioinformatics . 2007. Vol. 8(1):p. 392[Cross Ref] [2-s2.0-38049136610]

            9. Le D. H., Kwon Y. K.. GPEC: a Cytoscape plug-in for random walk-based gene prioritization and biomedical evidence collection. Computational Biology and Chemistry . 2012. Vol. 37:17–23. [Cross Ref] [2-s2.0-84858250709] [PubMed]

            10. Le D. H., Kwon Y. K.. Neighbor-favoring weight reinforcement to improve random walk-based disease gene prioritization. Computational Biology and Chemistry . 2013. Vol. 44:1–8. [Cross Ref] [2-s2.0-84874736224] [PubMed]

            11. Le D. H., Dang V. T.. Ontology-based disease similarity network for disease gene prediction. Vietnam Journal of Computer Science . 2016. Vol. 3(3):197–205. [Cross Ref]

            12. Le D. H., Pham V. H.. HGPEC: a Cytoscape app for prediction of novel disease-gene and disease-disease associations and evidence collection based on a random walk on heterogeneous network. BMC Systems Biology . 2017. Vol. 11(1):p. 61[Cross Ref] [2-s2.0-85027581838]

            13. YousefYousef A., Moghadam Charkari N.. A novel method based on new adaptive LVQ neural network for predicting protein-protein interactions from protein sequences. Journal of Theoretical Biology . 2013. Vol. 336:231–239. [Cross Ref] [2-s2.0-84884590545] [PubMed]

            14. Li Y., Wu F. X., Ngom A.. A review on machine learning principles for multi-view biological data integration. Briefings in Bioinformatics . 2018. Vol. 19(2):113–340. [Cross Ref] [2-s2.0-85049474548]

            15. Yip K. Y., Cheng C., Gerstein M.. Machine learning and genome annotation: a match meant to be? Genome Biology . 2013. Vol. 14(5):p. 205[Cross Ref] [2-s2.0-84883746688]

            16. Basford K. E., McLachlan G. J., Rathnayake S. I.. On the classification of microarray gene-expression data. Briefings in Bioinformatics . 2013. Vol. 14(4):402–410. [Cross Ref] [2-s2.0-84889059224] [PubMed]

            17. Le D. H., Van N. T.. Meta-analysis of whole-transcriptome data for prediction of novel genes associated with an autism spectrum disorder. In: Proceedings of the 2017 8th International Conference on Computational Systems-Biology and Bioinformatics; December 2017; Nha Trang, Vietnam. p. 56–61. [Cross Ref] [2-s2.0-85040796151]

            18. Maetschke S. R., Madhamshettiwar P. B., Davis M. J., Ragan M. A.. Supervised, semi-supervised and unsupervised inference of gene regulatory networks. Briefings in Bioinformatics . 2013. Vol. 15(2):195–211. [Cross Ref] [2-s2.0-84892376552] [PubMed]

            19. Ding H., Takigawa I., Mamitsuka H., Zhu S.. Similarity-based machine learning methods for predicting drug-target interactions: a brief review. Briefings in Bioinformatics . 2013. Vol. 15(5):734–747. [Cross Ref] [2-s2.0-84928196309] [PubMed]

            20. Le D. H., Nguyen D. P., Dao A. M.. Significant path selection improves the prediction of novel drug-target interactions. In: Proceedings of the Systems pharmacology approaches for prediction of drug-target interactions; December2016; Jeddah, Saudi Arabia. SoICT. p. 30–35. [Cross Ref] [2-s2.0-85007560373]

            21. Upstill-Goddard R., Eccles D., Fliege J., Collins A.. Machine learning approaches for the discovery of gene-gene interactions in disease data. Briefings in Bioinformatics . 2012. Vol. 14(2):251–260. [Cross Ref] [2-s2.0-84875632541] [PubMed]

            22. Okser S., Pahikkala T., Aittokallio T.. Genetic variants and their interactions in disease risk prediction-machine learning and network perspectives. BioData Mining . 2013. Vol. 6(1):p. 5[Cross Ref] [2-s2.0-84874397060]

            23. Chen H., Engkvist O., Wang Y., Olivecrona M., Blaschke T.. The rise of deep learning in drug discovery. Drug Discovery Today . 2018. Vol. 23(6):1241–1250. [Cross Ref] [2-s2.0-85044626626] [PubMed]

            24. Le D. H., Xuan Hoai N. H., Kwon Y. K.. A comparative study of classification-based machine learning methods for novel disease gene prediction. Advances in Intelligent Systems and Computing . 2015. Vol. 50:577–588. [Cross Ref] [2-s2.0-84910633544]

            25. Le D. H., Nguyen M. H.. Towards more realistic machine learning techniques for the prediction of disease-associated genes. In: Proceedings of the Sixth International Symposium on Information and Communication Technology; December2015; Hue, Vietnam. p. 116–120. [Cross Ref] [2-s2.0-84959314500]

            26. Yu S., Tranchevent L. C., De Moor B., Moreau Y.. Gene prioritization and clustering by multi-view text mining. BMC Bioinformatics . 2010. Vol. 11(1):p. 28[Cross Ref] [2-s2.0-77957714934]

            27. Nguyen T. P., Ho T. B.. Detecting disease genes based on semi-supervised learning and protein-protein interaction networks. Artificial Intelligence in Medicine . 2012. Vol. 54(1):63–71. [Cross Ref] [2-s2.0-83555177301] [PubMed]

            28. AltmanAltman N. S.. An introduction to kernel and nearest-neighbor nonparametric regression. The American Statistician . 1992. Vol. 46(3):p. 175[Cross Ref]

            29. Ghazal T. M., Anam M., Kamrul Hasan M., et al.. Hep-pred: hepatitis c staging prediction using fine Gaussian svm. Computers, Materials & Continua . 2021. Vol. 69(1):191–203. [Cross Ref]

            30. Swrna Priya R. M., Maddikunta P. K. R., Parimala M., et al.. An effective feature engineering for DNN using hybrid PCA-GWO for intrusion detection in IoMT architecture. Computer Communications . 2020. Vol. 160(2020):139–149. [Cross Ref]

            31. Asif M., Hugo F. M., Vicente A. M., Couto F. M.. Identifying disease gene using machine learning and gene functional similarities assessed through gene ontology. PLoS One . 2018. Vol. 13(12)[Cross Ref] [2-s2.0-85058234561]

            32. Alshamlan H. M., Badr G. H., Alohali Y. A.. Genetic Bee Colony (GBC) algorithm: a new gene selection method for microarray cancer classification. Computational Biology and Chemistry . 2015. Vol. 56:49–60. [Cross Ref] [2-s2.0-84983095712] [PubMed]

            33. KhaderKhader A. T., AlomariAlomari O. A., Al Betar M. A., Abualigah L. M.. Gene selection for cancer classification by combining minimum redundancy maximum relevancy and bat-inspired algorithm. International Journal of Data Mining and Bioinformatics . 2017. Vol. 19(1):p. 32[Cross Ref]

            34. 2022. https://monkeylearn.com/blog/introduction-to-support-vector-machines-svm/

            35. Javed A. R, Fahad L. G., Farhan A. A., et al.. Automated cognitive health assessment in smart homes using machine learning. Sustainable Cities and Society . 2021. Vol. 65:[102572] [Cross Ref]

            36. Rehman A., Razzak I., Xu G.. Federated learning for privacy preservation of healthcare data from smartphone-based side-channel attacks. IEEE Journal of Biomedical and Health Informatics . 2022. p. 1 [Cross Ref]

            37. Javed A. R., Sarwar M. U., Beg M. O., Asim M., Baker T., Tawfik H.. A collaborative healthcare framework for shared healthcare plan with ambient intelligence. Human-centric Computing and Information Sciences . 2020. Vol. 10(1):p. 40[Cross Ref]

            38. 2022. https://www.analyticsvidhya.com/blog/2018/03/introduction-k-neighbours-algorithm-clustering/

            39. 2022. https://www.kaggle.com/datasets/aryarishabh/of-genomes-and-genetics-hackerearth-ml-challenge/code

            40. Rahman A.-U., Abbas S., Gollapalli M., et al.. Rainfall prediction system using machine learning fusion for smart cities. Sensors . 2022. Vol. 22(9):p. 3504 [Cross Ref] [PubMed]

            41. Saleem M., Abbas S., Ghazal T. M., Adnan KhanAdnan Khan M., Sahawneh N., Ahmad M.. Smart cities: fusion-based intelligent traffic congestion control system for vehicular networks using machine learning techniques. Egyptian Informatics Journal . 2022. 1–10. [Cross Ref]

            42. Waqas NadeemWaqas Nadeem M., Guan GohGuan Goh H., Adnan KhanAdnan Khan M., Hussain M., Faheem MushtaqFaheem Mushtaq M., a/p Ponnusamya/p Ponnusamy V.. Fusion-based machine learning architecture for heart disease prediction. Computers, Materials & Continua . 2021. Vol. 67(2):2481–2496. [Cross Ref]

            43. Siddiqui S. Y., Athar A., Khan M. A., et al.. Modelling, simulation and optimization of diagnosis cardiovascular disease using computational intelligence approaches. Journal of Medical Imaging and Health Informatics . 2020. Vol. 10(5):1005–1022. [Cross Ref]

            44. Siddiqui S. Y., Haider A., Ghazal T. M., et al.. IoMT cloud-based intelligent prediction of breast cancer stages empowered with deep learning. IEEE Access . 2021. Vol. 9:146478–146491. [Cross Ref]

            45. Amanlou S., Hasan M. K., BakarBakar K. A. A.. Lightweight and secure authentication scheme for IoT network based on publish-subscribe fog computing model. Computer Networks . 2021. Vol. 199:[108465] [Cross Ref]

            46. Javed A. R., Faheem R., Asim M., Baker T., Beg M. O.. A smartphone sensors-based personalized human activity recognition system for sustainable smart cities. Sustainable Cities and Society . 2021. Vol. 71:[102970] [Cross Ref]

            47. Nasir M. U., Ghazal T. M., Khan M. A., et al.. Breast cancer prediction empowered with fine-tuning. Computational Intelligence and Neuroscience . 2022. Vol. 2022:1–9. [5918686] [Cross Ref]

            48. Umar NasirUmar Nasir M., Adnan KhanAdnan Khan M., Zubair M., Ghazal T. M., Said R. A., Al Hamadi H.. Single and mitochondrial gene inheritance disorder prediction using machine learning. Computers, Materials & Continua . 2022. Vol. 73(1):953–963. [Cross Ref]

            49. Rahman A.-u., Alqahtani A., Aldhafferi N., et al.. Histopathologic oral cancer prediction using oral squamous cell carcinoma biopsy empowered with transfer learning. Sensors . 2022. Vol. 22(10):p. 3833[Cross Ref] [PubMed]

            50. Ghazal T. M., Al HamadiAl Hamadi H., Umar Nasir M., et al.. Supervised machine learning empowered multifactorial genetic inheritance disorder prediction. Computational Intelligence and Neuroscience . 2022. Vol. 2022:1–10. [Cross Ref]

            51. Taleb N., Mehmood S., Zubair M., Naseer I., Mago B., Nasir M. U.. Ovary cancer diagnosing empowered with machine learning. In: Proceedings of the International Conference on Business Analytics for Technology and Security (ICBATS); February 2022; Dubai, United Arab Emirates. IEEE. p. 1–6. [Cross Ref]

            Floating objects

            Figure 1

            IoMT-based proposed model for the prediction of genetic disorder.

            Figure 2

            Mean square error of support vector machine.

            Table 1

            Constraints and comparisons of previous studies.

            StudyModelUsed datasetAccuracy (%)ConstraintIoMT
            Asif et al. [31]RF, SVMmiRNA (feature)79Handcrafted features, imbalance dataNo
            Alshamlan et al. [32]GBC algorithmSRBCT (feature)81Handcrafted features, imbalance classes, imbalance gene sequenceNo
            KhaderKhader et al. [33]BA, SVMGene seq (feature)80.5Imbalance gene classesNo
            Table 2

            Simulation parameters of the proposed model of KNN and SVM.

            AlgorithmNeighborsNS methodDistanceStandardize

            KNN5ExhaustiveMinkowskiTrue

            SVMKernel functionPolynomial orderKernel scaleStandardize
            Polynomial3AutoTrue
            Table 3

            Training confusion metrics of the proposed model of KNN and SVM.

            Total instances (8595)12
            SVM
             16922191
             2825657

            KNN
             16959154
             22771205
            Table 4

            Testing confusion metrics of the proposed model of KNN and SVM.

            Total instances (3684)12
            SVM
             12931169
             2322262

            KNN
             1302377
             2469115
            Table 5

            Performance of SVM and KNN models.

            Instances (12280)SVMKNN
            Training (%) (8596 instances)Testing (%) (3684 instances)Training (%) (8596 instances)Testing (%) (3684 instances)
            Accuracy94.9986.688.385.1
            Miss-classification rate5.0113.411.714.9
            Precision96.1790.1089.3586.56
            Sensitivity97.8394.5497.3197.51
            F1-score96.9892.2693.1591.7
            Table 6

            Comparative analysis with previous studies.

            StudyModelDatasetAccuracy (%)IoMT
            Asif et al. [31]RF, SVMmiRNA (feature)79No
            Alshamlan et al. [32]GBC algorithmSRBCT (feature)81No
            KhaderKhader et al. [33]BA, SVMGene seq (feature)80.5No
            The proposed model SVM, KNN Gene clinical (feature) 86.6 Yes

            Author and article information

            Contributors
            Journal
            Comput Intell Neurosci
            Comput Intell Neurosci
            cin
            Computational Intelligence and Neuroscience
            Hindawi
            1687-5265
            1687-5273
            2022
            21 July 2022
            : 2022
            : 2650742
            Affiliations
            1Department of Computer Science (CS), College of Computer Science and Information Technology (CCSIT), Imam Abdulrahman Bin Faisal University, P.O. Box 1982, Dammam 31441, Saudi Arabia
            2Riphah School of Computing and Innovation, Faculty of Computing, Riphah International University, Lahore Campus, Lahore 54000, Pakistan
            3Department of Computer Information Systems (CIS), College of Computer Science and Information Technology (CCSIT), Imam Abdulrahman Bin Faisal University, P.O. Box 1982, Dammam 31441, Saudi Arabia
            4Department of Computer, Deanship of Preparatory Year and Supporting Studies, Imam Abdulrahman Bin Faisal University, P.O. Box 1982, Dammam 31441, Saudi Arabia
            5College of Computer and Information Sciences (CCIS), Jouf University, Saudi Arabia
            6Department of Software, Gachon University, Seongnam 13120, Republic of Korea
            7John von Neumann Faculty of Informatics, Obuda University, Budapest 1034, Hungary
            8Institute of Information Engineering, Automation and Mathematics, The Slovak University of Technology in Bratislava, Bratislava 81107, Slovakia
            9Faculty of Civil Engineering, TU-Dresden, Dresden 01062, Germany
            Author notes

            Academic Editor: Laxmi Lydia

            Author information
            https://orcid.org/0000-0003-1443-8065
            https://orcid.org/0000-0002-8665-1669
            https://orcid.org/0000-0001-9789-5231
            https://orcid.org/0000-0003-4842-0613
            Article
            10.1155/2022/2650742
            9334098
            35909844
            8c089030-37cb-4793-99b5-000d13c2b56a
            Copyright © 2022 Atta-ur Rahman et al.

            This is an open access article distributed under the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.

            History
            : 15 May 2022
            : 4 July 2022
            Categories
            Research Article

            Neurosciences

            Comments

            Comment on this article