0
views
0
recommends
+1 Recommend
0 collections
    0
    shares
      • Record: found
      • Abstract: found
      • Article: found
      Is Open Access

      An ensemble classification method based on machine learning models for malicious Uniform Resource Locators (URL)

      research-article

      Read this article at

      Bookmark
          There is no author summary for this article yet. Authors can add summaries to their articles on ScienceOpen to make them more accessible to a non-specialist audience.

          Abstract

          Web applications are important for various online businesses and operations because of their platform stability and low operation cost. The increasing usage of Internet-of-Things (IoT) devices within a network has contributed to the rise of network intrusion issues due to malicious Uniform Resource Locators (URLs). Generally, malicious URLs are initiated to promote scams, attacks, and frauds which can lead to high-risk intrusion. Several methods have been developed to detect malicious URLs in previous works. There has been a good amount of work done to detect malicious URLs using various methods such as random forest, regression, LightGBM, and more as reported in the literature. However, most of the previous works focused on the binary classification of malicious URLs and are tested on limited URL datasets. Nevertheless, the detection of malicious URLs remains a challenging task that remains open to research. Hence, this work proposed a stacking-based ensemble classifier to perform multi-class classification of malicious URLs on larger URL datasets to justify the robustness of the proposed method. This study focuses on obtaining lexical features directly from the URL to identify malicious websites. Then, the proposed stacking-based ensemble classifier is developed by integrating Random Forest, XGBoost, LightGBM, and CatBoost. In addition, hyperparameter tuning was performed using the Randomized Search method to optimize the proposed classifier. The proposed stacking-based ensemble classifier aims to take advantage of the performance of each machine learning model and aggregate the output to improve prediction accuracy. The classification accuracies of the machine learning model when applied individually are 93.6%, 95.2%, 95.7% and 94.8% for random forest, XGBoost, LightGBM, and CatBoost respectively. The proposed stacking-based ensemble classifier has shown significant results in classifying four classes of malicious URLs (phishing, malware, defacement, and benign) with an average accuracy of 96.8% when benchmarked with previous works.

          Related collections

          Most cited references19

          • Record: found
          • Abstract: not found
          • Article: not found

          Water quality classification using machine learning algorithms

            Bookmark
            • Record: found
            • Abstract: not found
            • Conference Proceedings: not found

            Malicious URL detection using logistic regression, 2021 IEEE International Conference on Omni-Layer Intelligent Systems (COINS), Barcelona, Spain.

              Bookmark
              • Record: found
              • Abstract: not found
              • Conference Proceedings: not found

              A machine learning driven threat intelligence system for malicious URL detection

                Bookmark

                Author and article information

                Contributors
                Role: ConceptualizationRole: Funding acquisitionRole: MethodologyRole: Writing – review & editing
                Role: Formal analysisRole: MethodologyRole: SoftwareRole: Writing – original draft
                Role: Formal analysisRole: InvestigationRole: MethodologyRole: ValidationRole: Writing – original draft
                Role: ConceptualizationRole: MethodologyRole: Visualization
                Role: ConceptualizationRole: MethodologyRole: Visualization
                Role: Editor
                Journal
                PLoS One
                PLoS One
                plos
                PLOS ONE
                Public Library of Science (San Francisco, CA USA )
                1932-6203
                31 May 2024
                2024
                : 19
                : 5
                : e0302196
                Affiliations
                [1 ] Department of Computer Science, King Faisal University, Al Ahsa, Kingdom of Saudi Arabia
                [2 ] Department of Electrical Engineering, Faculty of Engineering, Universiti Malaya, Kuala Lumpur, Malaysia
                [3 ] Department of Biomedical Engineering, Faculty of Engineering, Universiti Malaya, Kuala Lumpur, Malaysia
                [4 ] Centre of Intelligent Systems for Emerging Technology (CISET), Faculty of Engineering, Universiti Malaya, Kuala Lumpur, Malaysia
                [5 ] Department of Documents and Archive, Centre of Documents and Administrative Communication, King Faisal University, Al Ahsa, Kingdom of Saudi Arabia
                Universiti Malaysia Sabah, MALAYSIA
                Author notes

                Competing Interests: The authors have declared that no competing interests exist.

                Author information
                https://orcid.org/0000-0001-5145-510X
                https://orcid.org/0000-0002-9873-4779
                https://orcid.org/0000-0002-0471-3820
                Article
                PONE-D-23-34415
                10.1371/journal.pone.0302196
                11142511
                38820435
                8463875e-28ac-4aa3-a71a-db4a2a1784d3
                © 2024 Sankaranarayanan et al

                This is an open access article distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited.

                History
                : 24 October 2023
                : 30 March 2024
                Page count
                Figures: 7, Tables: 8, Pages: 20
                Funding
                Funded by: funder-id http://dx.doi.org/10.13039/501100004686, Deanship of Scientific Research, King Faisal University;
                Award ID: 4289
                Award Recipient :
                Funded by: funder-id http://dx.doi.org/10.13039/501100004781, Institut Pengurusan dan Pemantauan Penyelidikan, Universiti Malaya;
                Award ID: IMG001-2022
                Award Recipient :
                The work has been funded by following funders which are as follows: - Deanship of Scientific Research, Vice Presidency of Graduate studies and Scientific Research, King Faisal University, Saudi Arabia, Grant Number 4289 - Universiti Malaya, Malaysia with project number IMG001-2022. The funders which are Deanship of Scientific Research, Vice presidency of Graduate Studies and Scientific Research, King Faisal University, KSA and University of Malaya had no role in study design, data collection and analysis, decision to publish or preparation of manuscript.
                Categories
                Research Article
                Computer and Information Sciences
                Computer Security
                Computer and Information Sciences
                Artificial Intelligence
                Machine Learning
                Engineering and Technology
                Management Engineering
                Decision Analysis
                Decision Trees
                Research and Analysis Methods
                Decision Analysis
                Decision Trees
                Engineering and Technology
                Management Engineering
                Decision Analysis
                Decision Trees
                Decision Tree Learning
                Research and Analysis Methods
                Decision Analysis
                Decision Trees
                Decision Tree Learning
                Computer and Information Sciences
                Artificial Intelligence
                Machine Learning
                Decision Tree Learning
                Physical Sciences
                Mathematics
                Applied Mathematics
                Algorithms
                Machine Learning Algorithms
                Research and Analysis Methods
                Simulation and Modeling
                Algorithms
                Machine Learning Algorithms
                Computer and Information Sciences
                Artificial Intelligence
                Machine Learning
                Machine Learning Algorithms
                Biology and Life Sciences
                Organisms
                Eukaryota
                Plants
                Trees
                Computer and Information Sciences
                Artificial Intelligence
                Machine Learning
                Deep Learning
                Physical Sciences
                Mathematics
                Optimization
                Random Searching
                Custom metadata

                Uncategorized
                Uncategorized

                Comments

                Comment on this article