29
views
0
recommends
+1 Recommend
0 collections
    0
    shares
      • Record: found
      • Abstract: found
      • Article: found
      Is Open Access

      Predicting RNA 5-Methylcytosine Sites by Using Essential Sequence Features and Distributions

      research-article

      Read this article at

      Bookmark
          There is no author summary for this article yet. Authors can add summaries to their articles on ScienceOpen to make them more accessible to a non-specialist audience.

          Abstract

          Methylation is one of the most common and considerable modifications in biological systems mediated by multiple enzymes. Recent studies have shown that methylation has been widely identified in different RNA molecules. RNA methylation modifications have various kinds, such as 5-methylcytosine (m 5C). However, for individual methylation sites, their functions still remain to be elucidated. Testing of all methylation sites relies heavily on high-throughput sequencing technology, which is expensive and labor consuming. Thus, computational prediction approaches could serve as a substitute. In this study, multiple machine learning models were used to predict possible RNA m 5C sites on the basis of mRNA sequences in human and mouse. Each site was represented by several features derived from k-mers of an RNA subsequence containing such site as center. The powerful max-relevance and min-redundancy (mRMR) feature selection method was employed to analyse these features. The outcome feature list was fed into incremental feature selection method, incorporating four classification algorithms, to build efficient models. Furthermore, the sites related to features used in the models were also investigated.

          Related collections

          Most cited references65

          • Record: found
          • Abstract: not found
          • Article: not found

          Random Forests

            Bookmark
            • Record: found
            • Abstract: not found
            • Article: not found

            Support-vector networks

              Bookmark
              • Record: found
              • Abstract: found
              • Article: found
              Is Open Access

              CD-HIT: accelerated for clustering the next-generation sequencing data

              Summary: CD-HIT is a widely used program for clustering biological sequences to reduce sequence redundancy and improve the performance of other sequence analyses. In response to the rapid increase in the amount of sequencing data produced by the next-generation sequencing technologies, we have developed a new CD-HIT program accelerated with a novel parallelization strategy and some other techniques to allow efficient clustering of such datasets. Our tests demonstrated very good speedup derived from the parallelization for up to ∼24 cores and a quasi-linear speedup for up to ∼8 cores. The enhanced CD-HIT is capable of handling very large datasets in much shorter time than previous versions. Availability: http://cd-hit.org. Contact: liwz@sdsc.edu Supplementary information: Supplementary data are available at Bioinformatics online.
                Bookmark

                Author and article information

                Contributors
                Journal
                Biomed Res Int
                Biomed Res Int
                BMRI
                BioMed Research International
                Hindawi
                2314-6133
                2314-6141
                2022
                13 January 2022
                : 2022
                : 4035462
                Affiliations
                1School of Life Sciences, Shanghai University, Shanghai 200444, China
                2College of Information Engineering, Shanghai Maritime University, Shanghai 201306, China
                3College of Food Engineering, Jilin Engineering Normal University, Changchun, China
                4Department of Biostatistics, University of Copenhagen, Copenhagen 2099, Denmark
                5Channing Division of Network Medicine, Brigham and Women's Hospital, Harvard Medical School, Boston, MA, USA
                6Bio-Med Big Data Center, CAS Key Laboratory of Computational Biology, Shanghai Institute of Nutrition and Health, University of Chinese Academy of Sciences, Chinese Academy of Sciences, Shanghai 200031, China
                7CAS Key Laboratory of Tissue Microenvironment and Tumor, Shanghai Institute of Nutrition and Health, University of Chinese Academy of Sciences, Chinese Academy of Sciences, Shanghai 200031, China
                Author notes

                Academic Editor: Hesham H. Ali

                Author information
                https://orcid.org/0000-0003-3068-1583
                https://orcid.org/0000-0003-3825-0796
                https://orcid.org/0000-0003-1975-9693
                https://orcid.org/0000-0001-5664-7979
                Article
                10.1155/2022/4035462
                8776474
                35071593
                10febd90-9e25-4e15-aed1-baf7a5c2789d
                Copyright © 2022 Lei Chen et al.

                This is an open access article distributed under the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.

                History
                : 15 September 2021
                : 7 December 2021
                : 22 December 2021
                Funding
                Funded by: National Key R&D Program of China
                Award ID: 2018YFC0910403
                Funded by: Chinese Academy of Sciences
                Award ID: 202002
                Award ID: XDB38050200
                Award ID: XDA26040304
                Categories
                Research Article

                Comments

                Comment on this article