11
views
0
recommends
+1 Recommend
1 collections
    0
    shares
      • Record: found
      • Abstract: found
      • Article: found
      Is Open Access

      Improved chromosome-level genome assembly and annotation of the seagrass, Zostera marina (eelgrass)

      research-article

      Read this article at

      Bookmark
          There is no author summary for this article yet. Authors can add summaries to their articles on ScienceOpen to make them more accessible to a non-specialist audience.

          Abstract

          Background: Seagrasses (Alismatales) are the only fully marine angiosperms.  Zostera marina (eelgrass) plays a crucial role in the functioning of coastal marine ecosystems and global carbon sequestration. It is the most widely studied seagrass and has become a marine model system for exploring adaptation under rapid climate change. The original draft genome (v.1.0) of the seagrass  Z. marina (L.) was based on a combination of Illumina mate-pair libraries and fosmid-ends. A total of 25.55 Gb of Illumina and 0.14 Gb of Sanger sequence was obtained representing 47.7× genomic coverage. The assembly resulted in ~2000 unordered scaffolds (L50 of 486 Kb), a final genome assembly size of 203MB, 20,450 protein coding genes and 63% TE content. Here, we present an upgraded chromosome-scale genome assembly and compare v.1.0 and the new v.3.1, reconfirming previous results from Olsen et al. (2016), as well as pointing out new findings.  

          Methods: The same high molecular weight DNA used in the original sequencing of the Finnish clone was used. A high-quality reference genome was assembled with the MECAT assembly pipeline combining PacBio long-read sequencing and Hi-C scaffolding. 

          Results: In total, 75.97 Gb PacBio data was produced. The final assembly comprises six pseudo-chromosomes and 304 unanchored scaffolds with a total length of 260.5Mb and an N50 of 34.6 MB, showing high contiguity and few gaps (~0.5%). 21,483 protein-encoding genes are annotated in this assembly, of which 20,665 (96.2%) obtained at least one functional assignment based on similarity to known proteins. 

          Conclusions: As an important marine angiosperm, the improved  Z. marina genome assembly will further assist evolutionary, ecological, and comparative genomics at the chromosome level. The new genome assembly will further our understanding into the structural and physiological adaptations from land to marine life.

          Related collections

          Most cited references50

          • Record: found
          • Abstract: found
          • Article: not found

          HISAT: a fast spliced aligner with low memory requirements.

          HISAT (hierarchical indexing for spliced alignment of transcripts) is a highly efficient system for aligning reads from RNA sequencing experiments. HISAT uses an indexing scheme based on the Burrows-Wheeler transform and the Ferragina-Manzini (FM) index, employing two types of indexes for alignment: a whole-genome FM index to anchor each alignment and numerous local FM indexes for very rapid extensions of these alignments. HISAT's hierarchical index for the human genome contains 48,000 local FM indexes, each representing a genomic region of ∼64,000 bp. Tests on real and simulated data sets showed that HISAT is the fastest system currently available, with equal or better accuracy than any other method. Despite its large number of indexes, HISAT requires only 4.3 gigabytes of memory. HISAT supports genomes of any size, including those larger than 4 billion bases.
            Bookmark
            • Record: found
            • Abstract: found
            • Article: not found

            The Genome Analysis Toolkit: a MapReduce framework for analyzing next-generation DNA sequencing data.

            Next-generation DNA sequencing (NGS) projects, such as the 1000 Genomes Project, are already revolutionizing our understanding of genetic variation among individuals. However, the massive data sets generated by NGS--the 1000 Genome pilot alone includes nearly five terabases--make writing feature-rich, efficient, and robust analysis tools difficult for even computationally sophisticated individuals. Indeed, many professionals are limited in the scope and the ease with which they can answer scientific questions by the complexity of accessing and manipulating the data produced by these machines. Here, we discuss our Genome Analysis Toolkit (GATK), a structured programming framework designed to ease the development of efficient and robust analysis tools for next-generation DNA sequencers using the functional programming philosophy of MapReduce. The GATK provides a small but rich set of data access patterns that encompass the majority of analysis tool needs. Separating specific analysis calculations from common data management infrastructure enables us to optimize the GATK framework for correctness, stability, and CPU and memory efficiency and to enable distributed and shared memory parallelization. We highlight the capabilities of the GATK by describing the implementation and application of robust, scale-tolerant tools like coverage calculators and single nucleotide polymorphism (SNP) calling. We conclude that the GATK programming framework enables developers and analysts to quickly and easily write efficient and robust NGS tools, many of which have already been incorporated into large-scale sequencing projects like the 1000 Genomes Project and The Cancer Genome Atlas.
              Bookmark
              • Record: found
              • Abstract: found
              • Article: found
              Is Open Access

              InterProScan 5: genome-scale protein function classification

              Motivation: Robust large-scale sequence analysis is a major challenge in modern genomic science, where biologists are frequently trying to characterize many millions of sequences. Here, we describe a new Java-based architecture for the widely used protein function prediction software package InterProScan. Developments include improvements and additions to the outputs of the software and the complete reimplementation of the software framework, resulting in a flexible and stable system that is able to use both multiprocessor machines and/or conventional clusters to achieve scalable distributed data analysis. InterProScan is freely available for download from the EMBl-EBI FTP site and the open source code is hosted at Google Code. Availability and implementation: InterProScan is distributed via FTP at ftp://ftp.ebi.ac.uk/pub/software/unix/iprscan/5/ and the source code is available from http://code.google.com/p/interproscan/. Contact: http://www.ebi.ac.uk/support or interhelp@ebi.ac.uk or mitchell@ebi.ac.uk
                Bookmark

                Author and article information

                Contributors
                Role: Data CurationRole: Formal AnalysisRole: MethodologyRole: SoftwareRole: Visualization
                Role: ConceptualizationRole: Funding AcquisitionRole: Project AdministrationRole: ResourcesRole: Writing – Original Draft PreparationRole: Writing – Review & Editing
                Role: ConceptualizationRole: Funding AcquisitionRole: Resources
                Role: ConceptualizationRole: Resources
                Role: Resources
                Role: Resources
                Role: Project AdministrationRole: ResourcesRole: SoftwareRole: Validation
                Role: Resources
                Role: Data CurationRole: MethodologyRole: ResourcesRole: SoftwareRole: Writing – Original Draft Preparation
                Role: Data CurationRole: MethodologyRole: ResourcesRole: Software
                Role: ConceptualizationRole: Funding AcquisitionRole: Project AdministrationRole: ResourcesRole: SupervisionRole: VisualizationRole: Writing – Review & Editing
                Journal
                F1000Res
                F1000Res
                F1000Research
                F1000 Research Limited (London, UK )
                2046-1402
                15 April 2021
                2021
                : 10
                : 289
                Affiliations
                [1 ]Department of Plant Biotechnology and Bioinformatics, Ghent University - Center for Plant Systems Biology, VIB, Ghent, 9052, Belgium
                [2 ]Groningen Institute of Evolutionary Life Sciences, Groningen, 9747 AG, The Netherlands
                [3 ]GEOMAR Helmholtz Centre for Ocean Research Kiel, Marine Evolutionary Ecology, Kiel, 24105, Germany
                [4 ]Department of Integrative Marine Ecology, Stazione Zoologica Anton Dohrn, Napoli, 80123, Italy
                [5 ]Department of Energy Joint Genome Institute, Lawrence Berkeley National Lab, Berkeley, CA, USA
                [6 ]HudsonAlpha Institute for Biotechnology, Huntsville, AL, USA
                [7 ]Arizona Genomics Institute, School of Plant Sciences, University of Arizona Tucson, Tucson, AZ, 85721, USA
                [8 ]Department of Biochemistry, Genetics and Microbiology, University of Pretoria, Pretoria, South Africa
                [9 ]College of Horticulture, Nanjing Agricultural University, Nanjing, 210014, China
                [1 ]Department of Cell Biology and Radiobiology, The Czech Academy of Sciences, Institute of Biophysics, Brno, Czech Republic
                [1 ]School of Biological Sciences, Nanyang Technological University, Singapore, Singapore
                Author notes

                No competing interests were disclosed.

                Competing interests: No competing interests were disclosed.

                Competing interests: No competing interests were disclosed.

                Author information
                https://orcid.org/0000-0003-4787-9318
                https://orcid.org/0000-0003-4787-9318
                Article
                10.12688/f1000research.38156.1
                8482049
                34621505
                52497abd-ac94-4ec5-b3e3-59723df20f7c
                Copyright: © 2021 Ma X et al.

                This is an open access article distributed under the terms of the Creative Commons Attribution Licence, which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.

                History
                : 1 April 2021
                Funding
                Funded by: the DOE-Joint Genome Institute, Berkeley, CA, USA
                Award ID: 504341
                The work conducted by the U.S. Department of Energy Joint Genome Institute was supported by the Office of Science of the U.S. Department of Energy under Contract No DE-AC02-05CH11231 to Lawrence Berkeley National Laboratory. This work was supported by the DOE-Joint Genome Institute, Berkeley, CA, USA, Community Sequencing Program 2019, Grant Nr. 504341-Marine Angiosperm Genomes Initiative (MAGI) to YVdP, JLO, TBHR and GP.
                Categories
                Research Article
                Articles

                seagrass,zostera marina,eelgrass,chromosome-scale genome assembly,annotation

                Comments

                Comment on this article