16
views
0
recommends
+1 Recommend
0 collections
    0
    shares
      • Record: found
      • Abstract: not found
      • Article: not found

      iLearn: an integrated platform and meta-learner for feature engineering, machine-learning analysis and modeling of DNA, RNA and protein sequence data

      Read this article at

      ScienceOpenPublisherPubMed
      Bookmark
          There is no author summary for this article yet. Authors can add summaries to their articles on ScienceOpen to make them more accessible to a non-specialist audience.

          Abstract

          With the explosive growth of biological sequences generated in the post-genomic era, one of the most challenging problems in bioinformatics and computational biology is to computationally characterize sequences, structures and functions in an efficient, accurate and high-throughput manner. A number of online web servers and stand-alone tools have been developed to address this to date; however, all these tools have their limitations and drawbacks in terms of their effectiveness, user-friendliness and capacity. Here, we present iLearn, a comprehensive and versatile Python-based toolkit, integrating the functionality of feature extraction, clustering, normalization, selection, dimensionality reduction, predictor construction, best descriptor/model selection, ensemble learning and results visualization for DNA, RNA and protein sequences. iLearn was designed for users that only want to upload their data set and select the functions they need calculated from it, while all necessary procedures and optimal settings are completed automatically by the software. iLearn includes a variety of descriptors for DNA, RNA and proteins, and four feature output formats are supported so as to facilitate direct output usage or communication with other computational tools. In total, iLearn encompasses 16 different types of feature clustering, selection, normalization and dimensionality reduction algorithms, and five commonly used machine-learning algorithms, thereby greatly facilitating feature analysis and predictor construction. iLearn is made freely available via an online web server and a stand-alone toolkit.

          Related collections

          Most cited references40

          • Record: found
          • Abstract: found
          • Article: not found

          Cell-PLoc: a package of Web servers for predicting subcellular localization of proteins in various organisms.

          Information on subcellular localization of proteins is important to molecular cell biology, proteomics, system biology and drug discovery. To provide the vast majority of experimental scientists with a user-friendly tool in these areas, we present a package of Web servers developed recently by hybridizing the 'higher level' approach with the ab initio approach. The package is called Cell-PLoc and contains the following six predictors: Euk-mPLoc, Hum-mPLoc, Plant-PLoc, Gpos-PLoc, Gneg-PLoc and Virus-PLoc, specialized for eukaryotic, human, plant, Gram-positive bacterial, Gram-negative bacterial and viral proteins, respectively. Using these Web servers, one can easily get the desired prediction results with a high expected accuracy, as demonstrated by a series of cross-validation tests on the benchmark data sets that covered up to 22 subcellular location sites and in which none of the proteins included had > or =25% sequence identity to any other protein in the same subcellular-location subset. Some of these Web servers can be particularly used to deal with multiplex proteins as well, which may simultaneously exist at, or move between, two or more different subcellular locations. Proteins with multiple locations or dynamic features of this kind are particularly interesting, because they may have some special biological functions intriguing to investigators in both basic research and drug discovery. This protocol is a step-by-step guide on how to use the Web-server predictors in the Cell-PLoc package. The computational time for each prediction is less than 5 s in most cases. The Cell-PLoc package is freely accessible at http://chou.med.harvard.edu/bioinf/Cell-PLoc.
            Bookmark
            • Record: found
            • Abstract: found
            • Article: not found

            Machine learning in bioinformatics.

            This article reviews machine learning methods for bioinformatics. It presents modelling methods, such as supervised classification, clustering and probabilistic graphical models for knowledge discovery, as well as deterministic and stochastic heuristics for optimization. Applications in genomics, proteomics, systems biology, evolution and text mining are also shown.
              Bookmark
              • Record: found
              • Abstract: not found
              • Article: not found

              Recent progress in protein subcellular location prediction.

                Bookmark

                Author and article information

                Journal
                Briefings in Bioinformatics
                Oxford University Press (OUP)
                1477-4054
                April 24 2019
                April 24 2019
                Affiliations
                [1 ]School of Basic Medical Science, Qingdao University, Dengzhou Road, Qingdao, Shandong, China
                [2 ]State Key Laboratory of Cotton Biology, Institute of Cotton Research of Chinese Academy of Agricultural Sciences (CAAS), Anyang, China
                [3 ]Biomedicine Discovery Institute and Department of Biochemistry and Molecular Biology, Monash University, Melbourne, VIC, Australia
                [4 ]Department of Genetics, School of Medicine, University of Alabama at Birmingham, USA
                [5 ]Department of Cell, Developmental and Integrative Biology, School of Medicine, University of Alabama at Birmingham, AL, USA
                [6 ]Biomedicine Discovery Institute and Department of Microbiology, Monash University, Melbourne, VIC, Australia
                [7 ]Bioinformatics Center, Institute for Chemical Research, Kyoto University, Kyoto, Japan
                [8 ]Monash Centre for Data Science, Faculty of Information Technology, Monash University, Melbourne, VIC, Australia
                [9 ]Gordon Life Science Institute, Boston, MA, USA
                [10 ]Key Laboratory for Neuro-Information of Ministry of Education, School of Life Science and Technology, Center for Informational Biology, University of Electronic Science and Technology of China, Chengdu, China
                [11 ]ARC Centre of Excellence in Advanced Molecular Imaging, Monash University, Melbourne, VIC, Australia
                Article
                10.1093/bib/bbz041
                31067315
                1c60511a-9029-4d63-95d3-81b572056489
                © 2019

                https://academic.oup.com/journals/pages/open_access/funder_policies/chorus/standard_publication_model

                History

                Comments

                Comment on this article