33
views
0
recommends
+1 Recommend
0 collections
    0
    shares
      • Record: found
      • Abstract: found
      • Article: found
      Is Open Access

      Benchmarking transposable element annotation methods for creation of a streamlined, comprehensive pipeline

      research-article

      Read this article at

          There is no author summary for this article yet. Authors can add summaries to their articles on ScienceOpen to make them more accessible to a non-specialist audience.

          Abstract

          Background

          Sequencing technology and assembly algorithms have matured to the point that high-quality de novo assembly is possible for large, repetitive genomes. Current assemblies traverse transposable elements (TEs) and provide an opportunity for comprehensive annotation of TEs. Numerous methods exist for annotation of each class of TEs, but their relative performances have not been systematically compared. Moreover, a comprehensive pipeline is needed to produce a non-redundant library of TEs for species lacking this resource to generate whole-genome TE annotations.

          Results

          We benchmark existing programs based on a carefully curated library of rice TEs. We evaluate the performance of methods annotating long terminal repeat (LTR) retrotransposons, terminal inverted repeat (TIR) transposons, short TIR transposons known as miniature inverted transposable elements (MITEs), and Helitrons. Performance metrics include sensitivity, specificity, accuracy, precision, FDR, and F 1. Using the most robust programs, we create a comprehensive pipeline called Extensive de-novo TE Annotator (EDTA) that produces a filtered non-redundant TE library for annotation of structurally intact and fragmented elements. EDTA also deconvolutes nested TE insertions frequently found in highly repetitive genomic regions. Using other model species with curated TE libraries (maize and Drosophila), EDTA is shown to be robust across both plant and animal species.

          Conclusions

          The benchmarking results and pipeline developed here will greatly facilitate TE annotation in eukaryotic genomes. These annotations will promote a much more in-depth understanding of the diversity and evolution of TEs at both intra- and inter-species levels. EDTA is open-source and freely available: https://github.com/oushujun/EDTA.

          Related collections

          Most cited references48

          • Record: found
          • Abstract: found
          • Article: found
          Is Open Access

          LTRharvest, an efficient and flexible software for de novo detection of LTR retrotransposons

          Background Transposable elements are abundant in eukaryotic genomes and it is believed that they have a significant impact on the evolution of gene and chromosome structure. While there are several completed eukaryotic genome projects, there are only few high quality genome wide annotations of transposable elements. Therefore, there is a considerable demand for computational identification of transposable elements. LTR retrotransposons, an important subclass of transposable elements, are well suited for computational identification, as they contain long terminal repeats (LTRs). Results We have developed a software tool LTRharvest for the de novo detection of full length LTR retrotransposons in large sequence sets. LTRharvest efficiently delivers high quality annotations based on known LTR transposon features like length, distance, and sequence motifs. A quality validation of LTRharvest against a gold standard annotation for Saccharomyces cerevisae and Drosophila melanogaster shows a sensitivity of up to 90% and 97% and specificity of 100% and 72%, respectively. This is comparable or slightly better than annotations for previous software tools. The main advantage of LTRharvest over previous tools is (a) its ability to efficiently handle large datasets from finished or unfinished genome projects, (b) its flexibility in incorporating known sequence features into the prediction, and (c) its availability as an open source software. Conclusion LTRharvest is an efficient software tool delivering high quality annotation of LTR retrotransposons. It can, for example, process the largest human chromosome in approx. 8 minutes on a Linux PC with 4 GB of memory. Its flexibility and small space and run-time requirements makes LTRharvest a very competitive candidate for future LTR retrotransposon annotation projects. Moreover, the structured design and implementation and the availability as open source provides an excellent base for incorporating novel concepts to further improve prediction of LTR retrotransposons.
            Bookmark
            • Record: found
            • Abstract: found
            • Article: not found

            LTR_retriever: a highly accurate and sensitive program for identification of long terminal-repeat retrotransposons

            Long terminal repeat retrotransposons (LTR-RTs) are prevalent in plant genomes. The identification of LTR-RTs is critical for achieving high-quality gene annotation. Based on the well-conserved structure, multiple programs were developed for the de novo identification of LTR-RTs; however, these programs are associated with low specificity and high false discovery rates. Here, we report LTR_retriever, a multithreading-empowered Perl program that identifies LTR-RTs and generates high-quality LTR libraries from genomic sequences. LTR_retriever demonstrated significant improvements by achieving high levels of sensitivity (91%), specificity (97%), accuracy (96%), and precision (90%) in rice (Oryza sativa). LTR_retriever is also compatible with long sequencing reads. With 40k self-corrected PacBio reads equivalent to 4.5× genome coverage in Arabidopsis (Arabidopsis thaliana), the constructed LTR library showed excellent sensitivity and specificity. In addition to canonical LTR-RTs with 5'-TG…CA-3' termini, LTR_retriever also identifies noncanonical LTR-RTs (non-TGCA), which have been largely ignored in genome-wide studies. We identified seven types of noncanonical LTRs from 42 out of 50 plant genomes. The majority of noncanonical LTRs are Copia elements, with which the LTR is four times shorter than that of other Copia elements, which may be a result of their target specificity. Strikingly, non-TGCA Copia elements are often located in genic regions and preferentially insert nearby or within genes, indicating their impact on the evolution of genes and their potential as mutagenesis tools.
              Bookmark
              • Record: found
              • Abstract: not found
              • Article: not found

              Bioconda: sustainable and comprehensive software distribution for the life sciences

                Bookmark

                Author and article information

                Contributors
                oushujun@iastate.edu
                weijia@iastate.edu
                liaoy12@uci.edu
                kchougul@cshl.edu
                jagda@boldsystems.org
                ahelling@uoguelph.ca
                sblanco@boldsystems.org
                telliott@boldsystems.org
                ware@cshl.edu
                thomasp@iastate.edu
                jiangn@msu.edu
                cnhirsch@umn.edu
                mhufford@iastate.edu
                Journal
                Genome Biol
                Genome Biol
                Genome Biology
                BioMed Central (London )
                1474-7596
                1474-760X
                16 December 2019
                16 December 2019
                2019
                : 20
                : 275
                Affiliations
                [1 ]ISNI 0000 0004 1936 7312, GRID grid.34421.30, Department of Ecology, Evolution, and Organismal Biology, , Iowa State University, ; Ames, IA 50011 USA
                [2 ]ISNI 0000 0004 1936 7312, GRID grid.34421.30, Department of Genetics, Development, and Cell Biology, , Iowa State University, ; Ames, IA 50011 USA
                [3 ]ISNI 0000 0001 0668 7243, GRID grid.266093.8, Department of Ecology and Evolutionary Biology, , University of California, ; Irvine, CA 92697 USA
                [4 ]ISNI 0000 0004 0387 3667, GRID grid.225279.9, Cold Spring Harbor Laboratory, ; Cold Spring Harbor, NY 11724 USA
                [5 ]ISNI 0000 0004 1936 8198, GRID grid.34429.38, Centre for Biodiversity Genomics, , University of Guelph, ; Guelph, Ontario N1G 2W1 Canada
                [6 ]ISNI 000000041936877X, GRID grid.5386.8, USDA-ARS NEA Robert W. Holley Center for Agriculture and Health, Cornell University, ; Ithaca, NY 14853 USA
                [7 ]ISNI 0000 0001 2150 1785, GRID grid.17088.36, Department of Horticulture, , Michigan State University, ; East Lansing, MI 48824 USA
                [8 ]ISNI 0000000419368657, GRID grid.17635.36, Department of Agronomy and Plant Genetics, , University of Minnesota, ; Saint Paul, MN 55108 USA
                Author information
                http://orcid.org/0000-0003-3945-1143
                Article
                1905
                10.1186/s13059-019-1905-y
                6913007
                31843001
                c741fd22-7ecc-4c2e-8a46-ac0dc177fe50
                © The Author(s). 2019

                Open AccessThis article is distributed under the terms of the Creative Commons Attribution 4.0 International License ( http://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution, and reproduction in any medium, provided you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made. The Creative Commons Public Domain Dedication waiver ( http://creativecommons.org/publicdomain/zero/1.0/) applies to the data made available in this article, unless otherwise stated.

                History
                : 24 May 2019
                : 28 November 2019
                Funding
                Funded by: FundRef http://dx.doi.org/10.13039/100000152, Division of Molecular and Cellular Biosciences;
                Award ID: IOS-1744001
                Award ID: IOS-1744001
                Award ID: IOS-1546727
                Award ID: IOS-1740874
                Award Recipient :
                Funded by: FundRef http://dx.doi.org/10.13039/100005825, National Institute of Food and Agriculture;
                Award ID: IOW05282
                Award Recipient :
                Funded by: State of Iowa
                Funded by: Canada First Research Excellence Fund Ontario
                Categories
                Research
                Custom metadata
                © The Author(s) 2019

                Genetics
                transposable element,annotation,genome,benchmarking,pipeline
                Genetics
                transposable element, annotation, genome, benchmarking, pipeline

                Comments

                Comment on this article