Assessing genome assembly quality using the LTR Assembly Index (LAI)

There is no author summary for this article yet. Authors can add summaries to their articles on ScienceOpen to make them more accessible to a non-specialist audience.

Abstract

Assembling a plant genome is challenging due to the abundance of repetitive sequences, yet no standard is available to evaluate the assembly of repeat space. LTR retrotransposons (LTR-RTs) are the predominant interspersed repeat that is poorly assembled in draft genomes. Here, we propose a reference-free genome metric called LTR Assembly Index (LAI) that evaluates assembly continuity using LTR-RTs. After correcting for LTR-RT amplification dynamics, we show that LAI is independent of genome size, genomic LTR-RT content, and gene space evaluation metrics (i.e., BUSCO and CEGMA). By comparing genomic sequences produced by various sequencing techniques, we reveal the significant gain of assembly continuity by using long-read-based techniques over short-read-based methods. Moreover, LAI can facilitate iterative assembly improvement with assembler selection and identify low-quality genomic regions. To apply LAI, intact LTR-RTs and total LTR-RTs should contribute at least 0.1% and 5% to the genome size, respectively. The LAI program is freely available on GitHub: https://github.com/oushujun/LTR_retriever.

Related collections

Most cited references 23

Record: found
Abstract: found
Article: not found

Is Open Access

The zebrafish reference genome sequence and its relationship to the human genome.

Kerstin Howe, Matthew D. Clark, Carlos F Torroja … (2013)

Zebrafish have become a popular organism for the study of vertebrate gene function. The virtually transparent embryos of this species, and the ability to accelerate genetic studies by gene knockdown or overexpression, have led to the widespread use of zebrafish in the detailed investigation of vertebrate gene function and increasingly, the study of human genetic disease. However, for effective modelling of human genetic disease it is important to understand the extent to which zebrafish genes and gene structures are related to orthologous human genes. To examine this, we generated a high-quality sequence assembly of the zebrafish genome, made up of an overlapping set of completely sequenced large-insert clones that were ordered and oriented using a high-resolution high-density meiotic map. Detailed automatic and manual annotation provides evidence of more than 26,000 protein-coding genes, the largest gene set of any vertebrate so far sequenced. Comparison to the human reference genome shows that approximately 70% of human genes have at least one obvious zebrafish orthologue. In addition, the high quality of this genome assembly provides a clearer understanding of key genomic features such as a unique repeat content, a scarcity of pseudogenes, an enrichment of zebrafish-specific genes on chromosome 4 and chromosomal regions that influence sex determination.

0 comments Cited 1505 times – based on 0 reviews      Review now

Bookmark

Record: found
Abstract: found
Article: not found

The Sorghum bicolor genome and the diversification of grasses.

Andrew Paterson, John Bowers, Rémy Bruggmann … (2009)

Sorghum, an African grass related to sugar cane and maize, is grown for food, feed, fibre and fuel. We present an initial analysis of the approximately 730-megabase Sorghum bicolor (L.) Moench genome, placing approximately 98% of genes in their chromosomal context using whole-genome shotgun sequence validated by genetic, physical and syntenic information. Genetic recombination is largely confined to about one-third of the sorghum genome with gene order and density similar to those of rice. Retrotransposon accumulation in recombinationally recalcitrant heterochromatin explains the approximately 75% larger genome size of sorghum compared with rice. Although gene and repetitive DNA distributions have been preserved since palaeopolyploidization approximately 70 million years ago, most duplicated gene sets lost one member before the sorghum-rice divergence. Concerted evolution makes one duplicated chromosomal segment appear to be only a few million years old. About 24% of genes are grass-specific and 7% are sorghum-specific. Recent gene and microRNA duplications may contribute to sorghum's drought tolerance.

0 comments Cited 1072 times – based on 0 reviews      Review now

Bookmark

Record: found
Abstract: found
Article: found

Is Open Access

LTRharvest, an efficient and flexible software for de novo detection of LTR retrotransposons

David Ellinghaus, Stefan Kurtz, Ute Willhoeft (2008)

Background Transposable elements are abundant in eukaryotic genomes and it is believed that they have a significant impact on the evolution of gene and chromosome structure. While there are several completed eukaryotic genome projects, there are only few high quality genome wide annotations of transposable elements. Therefore, there is a considerable demand for computational identification of transposable elements. LTR retrotransposons, an important subclass of transposable elements, are well suited for computational identification, as they contain long terminal repeats (LTRs). Results We have developed a software tool LTRharvest for the de novo detection of full length LTR retrotransposons in large sequence sets. LTRharvest efficiently delivers high quality annotations based on known LTR transposon features like length, distance, and sequence motifs. A quality validation of LTRharvest against a gold standard annotation for Saccharomyces cerevisae and Drosophila melanogaster shows a sensitivity of up to 90% and 97% and specificity of 100% and 72%, respectively. This is comparable or slightly better than annotations for previous software tools. The main advantage of LTRharvest over previous tools is (a) its ability to efficiently handle large datasets from finished or unfinished genome projects, (b) its flexibility in incorporating known sequence features into the prediction, and (c) its availability as an open source software. Conclusion LTRharvest is an efficient software tool delivering high quality annotation of LTR retrotransposons. It can, for example, process the largest human chromosome in approx. 8 minutes on a Linux PC with 4 GB of memory. Its flexibility and small space and run-time requirements makes LTRharvest a very competitive candidate for future LTR retrotransposon annotation projects. Moreover, the structured design and implementation and the availability as open source provides an excellent base for incorporating novel concepts to further improve prediction of LTR retrotransposons.

0 comments Cited 679 times – based on 0 reviews      Review now

Bookmark

All references

Author and article information

Journal

Journal ID (nlm-ta): Nucleic Acids Res

Journal ID (iso-abbrev): Nucleic Acids Res

Journal ID (publisher-id): nar

Title: Nucleic Acids Research

Publisher: Oxford University Press

ISSN (Print): 0305-1048

ISSN (Electronic): 1362-4962

Publication date (Print): 30 November 2018

Publication date (Electronic): 10 August 2018

Publication date PMC-release: 10 August 2018

Volume: 46

Issue: 21

Page: e126

Affiliations

[1 ]Department of Horticulture, Michigan State University, East Lansing, MI 48824, USA

[2 ]Program in Ecology, Evolutionary Biology and Behavior, Michigan State University, East Lansing, MI 48824, USA

[3 ]Department of Plant Pathology and Microbiology, University of California, Riverside, CA 92507, USA

Author notes

To whom correspondence should be addressed. Tel: +1 517 353 0379; Fax: +1 517 353 0890; Email: oushujun@ 123456msu.edu

Author information

Shujun Ou http://orcid.org/0000-0001-5938-7180

Article

Publisher ID: gky730

DOI: 10.1093/nar/gky730

PMC ID: 6265445

PubMed ID: 30107434

SO-VID: 3483845e-13dc-423f-9f35-f67f2e6765ea

License:

This is an Open Access article distributed under the terms of the Creative Commons Attribution License ( http://creativecommons.org/licenses/by/4.0/), which permits unrestricted reuse, distribution, and reproduction in any medium, provided the original work is properly cited.

History

Date accepted : 31 July 2018

Date revision received : 26 July 2018

Date received : 20 April 2018

Page count

Pages: 11

Funding

Funded by: National Science Foundation 10.13039/100000001

Award ID: MCB-1121650

Award ID: IOS-1126998

Award ID: IOS-1740874

Funded by: Michigan State University 10.13039/100007709

Award ID: MICL02408

Comments

Comment on this article

scite_

Cited by 304

See all cited by

Most referenced authors 2,354

See all reference authors

- Version 1

Assessing genome assembly quality using the LTR Assembly Index (LAI)

Read this article at

Abstract

Related collections

Genome Engineering using CRISPR

Most cited references 23

The zebrafish reference genome sequence and its relationship to the human genome.

The Sorghum bicolor genome and the diversification of grasses.

LTRharvest, an efficient and flexible software for de novo detection of LTR retrotransposons

Author and article information

Journal

Affiliations

Author notes

Author information

Article

History

Page count

Funding

Categories

Comments

Comment on this article

Similar content 308

Cited by 304

Most referenced authors 2,354