37
views
0
recommends
+1 Recommend
0 collections
    0
    shares
      • Record: found
      • Abstract: found
      • Article: found
      Is Open Access

      An improved genome assembly uncovers prolific tandem repeats in Atlantic cod

      research-article

      Read this article at

      Bookmark
          There is no author summary for this article yet. Authors can add summaries to their articles on ScienceOpen to make them more accessible to a non-specialist audience.

          Abstract

          Background

          The first Atlantic cod ( Gadus morhua) genome assembly published in 2011 was one of the early genome assemblies exclusively based on high-throughput 454 pyrosequencing. Since then, rapid advances in sequencing technologies have led to a multitude of assemblies generated for complex genomes, although many of these are of a fragmented nature with a significant fraction of bases in gaps. The development of long-read sequencing and improved software now enable the generation of more contiguous genome assemblies.

          Results

          By combining data from Illumina, 454 and the longer PacBio sequencing technologies, as well as integrating the results of multiple assembly programs, we have created a substantially improved version of the Atlantic cod genome assembly. The sequence contiguity of this assembly is increased fifty-fold and the proportion of gap-bases has been reduced fifteen-fold. Compared to other vertebrates, the assembly contains an unusual high density of tandem repeats (TRs). Indeed, retrospective analyses reveal that gaps in the first genome assembly were largely associated with these TRs. We show that 21% of the TRs across the assembly, 19% in the promoter regions and 12% in the coding sequences are heterozygous in the sequenced individual.

          Conclusions

          The inclusion of PacBio reads combined with the use of multiple assembly programs drastically improved the Atlantic cod genome assembly by successfully resolving long TRs. The high frequency of heterozygous TRs within or in the vicinity of genes in the genome indicate a considerable standing genomic variation in Atlantic cod populations, which is likely of evolutionary importance.

          Electronic supplementary material

          The online version of this article (doi:10.1186/s12864-016-3448-x) contains supplementary material, which is available to authorized users.

          Related collections

          Most cited references47

          • Record: found
          • Abstract: found
          • Article: not found

          FLASH: fast length adjustment of short reads to improve genome assemblies.

          Next-generation sequencing technologies generate very large numbers of short reads. Even with very deep genome coverage, short read lengths cause problems in de novo assemblies. The use of paired-end libraries with a fragment size shorter than twice the read length provides an opportunity to generate much longer reads by overlapping and merging read pairs before assembling a genome. We present FLASH, a fast computational tool to extend the length of short reads by overlapping paired-end reads from fragment libraries that are sufficiently short. We tested the correctness of the tool on one million simulated read pairs, and we then applied it as a pre-processor for genome assemblies of Illumina reads from the bacterium Staphylococcus aureus and human chromosome 14. FLASH correctly extended and merged reads >99% of the time on simulated reads with an error rate of <1%. With adequately set parameters, FLASH correctly merged reads over 90% of the time even when the reads contained up to 5% errors. When FLASH was used to extend reads prior to assembly, the resulting assemblies had substantially greater N50 lengths for both contigs and scaffolds. The FLASH system is implemented in C and is freely available as open-source code at http://www.cbcb.umd.edu/software/flash. t.magoc@gmail.com.
            Bookmark
            • Record: found
            • Abstract: found
            • Article: not found

            The genomic basis of adaptive evolution in threespine sticklebacks

            Summary Marine stickleback fish have colonized and adapted to innumerable streams and lakes formed since the last ice age, providing an exceptional opportunity to characterize genomic mechanisms underlying repeated ecological adaptation in nature. Here we develop a high quality reference genome assembly for threespine sticklebacks. By sequencing the genomes of 20 additional individuals from a global set of marine and freshwater populations, we identify a genome-wide set of loci that are consistently associated with marine-freshwater divergence. Our results suggest that reuse of globally-shared standing genetic variation, including chromosomal inversions, plays an important role in repeated evolution of distinct marine and freshwater sticklebacks, and in the maintenance of divergent ecotypes during early stages of reproductive isolation. Both coding and regulatory changes occur in the set of loci underlying marine-freshwater evolution, with regulatory changes likely predominating in this classic example of repeated adaptive evolution in nature.
              Bookmark
              • Record: found
              • Abstract: found
              • Article: found
              Is Open Access

              LTRharvest, an efficient and flexible software for de novo detection of LTR retrotransposons

              Background Transposable elements are abundant in eukaryotic genomes and it is believed that they have a significant impact on the evolution of gene and chromosome structure. While there are several completed eukaryotic genome projects, there are only few high quality genome wide annotations of transposable elements. Therefore, there is a considerable demand for computational identification of transposable elements. LTR retrotransposons, an important subclass of transposable elements, are well suited for computational identification, as they contain long terminal repeats (LTRs). Results We have developed a software tool LTRharvest for the de novo detection of full length LTR retrotransposons in large sequence sets. LTRharvest efficiently delivers high quality annotations based on known LTR transposon features like length, distance, and sequence motifs. A quality validation of LTRharvest against a gold standard annotation for Saccharomyces cerevisae and Drosophila melanogaster shows a sensitivity of up to 90% and 97% and specificity of 100% and 72%, respectively. This is comparable or slightly better than annotations for previous software tools. The main advantage of LTRharvest over previous tools is (a) its ability to efficiently handle large datasets from finished or unfinished genome projects, (b) its flexibility in incorporating known sequence features into the prediction, and (c) its availability as an open source software. Conclusion LTRharvest is an efficient software tool delivering high quality annotation of LTR retrotransposons. It can, for example, process the largest human chromosome in approx. 8 minutes on a Linux PC with 4 GB of memory. Its flexibility and small space and run-time requirements makes LTRharvest a very competitive candidate for future LTR retrotransposon annotation projects. Moreover, the structured design and implementation and the availability as open source provides an excellent base for incorporating novel concepts to further improve prediction of LTR retrotransposons.
                Bookmark

                Author and article information

                Contributors
                o.k.torresen@ibv.uio.no
                bastiaan.star@ibv.uio.no
                sissel.jentoft@ibv.uio.no
                willibr@ibv.uio.no
                brian.walenz@nih.gov
                k.s.jakobsen@ibv.uio.no
                lex.nederbragt@ibv.uio.no
                Journal
                BMC Genomics
                BMC Genomics
                BMC Genomics
                BioMed Central (London )
                1471-2164
                18 January 2017
                18 January 2017
                2017
                : 18
                : 95
                Affiliations
                [1 ]ISNI 0000 0004 1936 8921, GRID grid.5510.1, Centre for Ecological and Evolutionary Synthesis, Department of Biosciences, , University of Oslo, ; Oslo, NO-0316 Norway
                [2 ]ISNI 0000 0004 0417 6230, GRID grid.23048.3d, Department of Natural Sciences, , University of Agder, ; Kristiansand, NO-4604 Norway
                [3 ]ISNI 0000 0004 0607 975X, GRID grid.19477.3c, Centre for Integrative Genetics (CIGENE), Department of Animal and Aquacultural Sciences, , Norwegian University of Life Sciences, ; Ås, NO-1432 Norway
                [4 ]GRID grid.469946.0, , J. Craig Venter Institute, ; 9704 Medical Center Drive, Rockville, 20850 MD USA
                [5 ]ISNI 0000 0001 2233 9230, GRID grid.280128.1, , Genome Informatics Section, Computational and Statistical Genomics Branch, National Human Genome Research Institute, National Institutes of Health, ; Bethesda, 20892 MD USA
                [6 ]ISNI 0000000419368710, GRID grid.47100.32, , Yale School of Medicine, Yale University, ; New Haven, 06520 CT USA
                [7 ]GRID grid.423340.2, , Pacific Biosciences, ; Menlo Park CA, USA
                [8 ]ISNI 0000 0004 0427 3161, GRID grid.10917.3e, , Institute of Marine Research, ; Nordnes, Bergen, NO-5817 Norway
                [9 ]ISNI 0000 0004 1936 8921, GRID grid.5510.1, Biomedical Informatics Research Group, Department of Informatics, , University of Oslo, ; Oslo, NO-0316 Norway
                Author information
                http://orcid.org/0000-0001-5539-0999
                Article
                3448
                10.1186/s12864-016-3448-x
                5241972
                28100185
                87df2d34-77a3-4e19-a402-65a33c597e4a
                © The Author(s) 2017

                Open Access This article is distributed under the terms of the Creative Commons Attribution 4.0 International License( http://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution, and reproduction in any medium, provided you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made. The Creative Commons Public Domain Dedication waiver( http://creativecommons.org/publicdomain/zero/1.0/) applies to the data made available in this article, unless otherwise stated.

                History
                : 27 July 2016
                : 20 December 2016
                Funding
                Funded by: Norwegian Research Council
                Award ID: 199806
                Categories
                Research Article
                Custom metadata
                © The Author(s) 2017

                Genetics
                assembly algorithms,assembly consolidation,dinucleotide repeats,gadus morhua,heterozygosity,indel polymorphism,long-read sequencing technology,microsatellites,pacbio,repetitive dna

                Comments

                Comment on this article