2
views
0
recommends
+1 Recommend
0 collections
    0
    shares
      • Record: found
      • Abstract: found
      • Article: found
      Is Open Access

      TransposonUltimate: software for transposon classification, annotation and detection

      research-article
      , , ,
      Nucleic Acids Research
      Oxford University Press

      Read this article at

      Bookmark
          There is no author summary for this article yet. Authors can add summaries to their articles on ScienceOpen to make them more accessible to a non-specialist audience.

          Abstract

          Most genomes harbor a large number of transposons, and they play an important role in evolution and gene regulation. They are also of interest to clinicians as they are involved in several diseases, including cancer and neurodegeneration. Although several methods for transposon identification are available, they are often highly specialised towards specific tasks or classes of transposons, and they lack common standards such as a unified taxonomy scheme and output file format. We present TransposonUltimate, a powerful bundle of three modules for transposon classification, annotation, and detection of transposition events. TransposonUltimate comes as a Conda package under the GPL-3.0 licence, is well documented and it is easy to install through https://github.com/DerKevinRiehl/TransposonUltimate. We benchmark the classification module on the large TransposonDB covering 891,051 sequences to demonstrate that it outperforms the currently best existing solutions. The annotation and detection modules combine sixteen existing softwares, and we illustrate its use by annotating Caenorhabditis elegans, Rhizophagus irregularis and Oryza sativa subs. japonica genomes. Finally, we use the detection module to discover 29 554 transposition events in the genomes of 20 wild type strains of C. elegans. Databases, assemblies, annotations and further findings can be downloaded from ( https://doi.org/10.5281/zenodo.5518085).

          Related collections

          Most cited references112

          • Record: found
          • Abstract: found
          • Article: found
          Is Open Access

          CD-HIT: accelerated for clustering the next-generation sequencing data

          Summary: CD-HIT is a widely used program for clustering biological sequences to reduce sequence redundancy and improve the performance of other sequence analyses. In response to the rapid increase in the amount of sequencing data produced by the next-generation sequencing technologies, we have developed a new CD-HIT program accelerated with a novel parallelization strategy and some other techniques to allow efficient clustering of such datasets. Our tests demonstrated very good speedup derived from the parallelization for up to ∼24 cores and a quasi-linear speedup for up to ∼8 cores. The enhanced CD-HIT is capable of handling very large datasets in much shorter time than previous versions. Availability: http://cd-hit.org. Contact: liwz@sdsc.edu Supplementary information: Supplementary data are available at Bioinformatics online.
            Bookmark
            • Record: found
            • Abstract: found
            • Article: not found

            Cd-hit: a fast program for clustering and comparing large sets of protein or nucleotide sequences.

            In 2001 and 2002, we published two papers (Bioinformatics, 17, 282-283, Bioinformatics, 18, 77-82) describing an ultrafast protein sequence clustering program called cd-hit. This program can efficiently cluster a huge protein database with millions of sequences. However, the applications of the underlying algorithm are not limited to only protein sequences clustering, here we present several new programs using the same algorithm including cd-hit-2d, cd-hit-est and cd-hit-est-2d. Cd-hit-2d compares two protein datasets and reports similar matches between them; cd-hit-est clusters a DNA/RNA sequence database and cd-hit-est-2d compares two nucleotide datasets. All these programs can handle huge datasets with millions of sequences and can be hundreds of times faster than methods based on the popular sequence comparison and database search tools, such as BLAST.
              Bookmark
              • Record: found
              • Abstract: found
              • Article: not found

              CDD/SPARCLE: the conserved domain database in 2020

              As NLM’s Conserved Domain Database (CDD) enters its 20th year of operations as a publicly available resource, CDD curation staff continues to develop hierarchical classifications of widely distributed protein domain families, and to record conserved sites associated with molecular function, so that they can be mapped onto user queries in support of hypothesis-driven biomolecular research. CDD offers both an archive of pre-computed domain annotations as well as live search services for both single protein or nucleotide queries and larger sets of protein query sequences. CDD staff has continued to characterize protein families via conserved domain architectures and has built up a significant corpus of curated domain architectures in support of naming bacterial proteins in RefSeq. These architecture definitions are available via SPARCLE, the Subfamily Protein Architecture Labeling Engine. CDD can be accessed at https://www.ncbi.nlm.nih.gov/Structure/cdd/cdd.shtml.
                Bookmark

                Author and article information

                Contributors
                Journal
                Nucleic Acids Res
                Nucleic Acids Res
                nar
                Nucleic Acids Research
                Oxford University Press
                0305-1048
                1362-4962
                24 June 2022
                02 March 2022
                02 March 2022
                : 50
                : 11
                : e64
                Affiliations
                Gurdon Institute, University of Cambridge , Cambridge CB2 1QN, UK
                Gurdon Institute, University of Cambridge , Cambridge CB2 1QN, UK
                Wellcome Sanger Institute, Wellcome Genome Campus , Hinxton CB10 1SA, UK
                Gurdon Institute, University of Cambridge , Cambridge CB2 1QN, UK
                Wellcome Sanger Institute, Wellcome Genome Campus , Hinxton CB10 1SA, UK
                Department of Genetics, University of Cambridge , Downing Street, Cambridge CB2 3EH, UK
                Wellcome Sanger Institute, Wellcome Genome Campus , Hinxton CB10 1SA, UK
                Evergrande Center for Immunologic Diseases, Harvard Medical School and Brigham and Women’s Hospital , 75 Francis Street, Boston, MA 02215, USA
                Author notes
                To whom correspondence should be addressed. Tel: +1 857 307 1422; Fax: +1 617 525 5566; Email: mhemberg@ 123456bwh.harvard.edu
                Author information
                https://orcid.org/0000-0001-8895-5239
                Article
                gkac136
                10.1093/nar/gkac136
                9226531
                35234904
                9f0005ee-b6cf-40c4-8c39-fe9e97659528
                © The Author(s) 2022. Published by Oxford University Press on behalf of Nucleic Acids Research.

                This is an Open Access article distributed under the terms of the Creative Commons Attribution License ( https://creativecommons.org/licenses/by/4.0/), which permits unrestricted reuse, distribution, and reproduction in any medium, provided the original work is properly cited.

                History
                : 14 February 2022
                : 09 February 2022
                : 09 June 2021
                Page count
                Pages: 13
                Funding
                Funded by: Cancer Research UK, DOI 10.13039/501100000289;
                Award ID: C13474/A18583
                Award ID: C6946/A14492
                Funded by: Wellcome Trust, DOI 10.13039/100010269;
                Award ID: 219475/Z/19/Z
                Award ID: 092096/Z/10/Z
                Categories
                AcademicSubjects/SCI00010
                Narese/7
                Narese/24
                Methods Online

                Genetics
                Genetics

                Comments

                Comment on this article

                scite_
                0
                0
                0
                0
                Smart Citations
                0
                0
                0
                0
                Citing PublicationsSupportingMentioningContrasting
                View Citations

                See how this article has been cited at scite.ai

                scite shows how a scientific paper has been cited by providing the context of the citation, a classification describing whether it supports, mentions, or contrasts the cited claim, and a label indicating in which section the citation was made.

                Similar content172

                Cited by24

                Most referenced authors1,631