34
views
0
recommends
+1 Recommend
0 collections
    0
    shares
      • Record: found
      • Abstract: found
      • Article: found
      Is Open Access

      Sequence locally, think globally: The Darwin Tree of Life Project

      research-article
      The Darwin Tree of Life Project Consortium 1 , 2
      (Collab), (Collab), (Collab), (Collab), (Collab), (Collab), (Collab), (Collab), (Collab), (Collab), (Collab), (Collab), (Collab), (Collab), (Collab), (Collab), (Collab), (Collab), (Collab)
      Proceedings of the National Academy of Sciences of the United States of America
      National Academy of Sciences
      genome, sequencing, biodiversity, assembly

      Read this article at

      Bookmark
          There is no author summary for this article yet. Authors can add summaries to their articles on ScienceOpen to make them more accessible to a non-specialist audience.

          Abstract

          The goals of the Earth Biogenome Project—to sequence the genomes of all eukaryotic life on earth—are as daunting as they are ambitious. The Darwin Tree of Life Project was founded to demonstrate the credibility of these goals and to deliver at-scale genome sequences of unprecedented quality for a biogeographic region: the archipelago of islands that constitute Britain and Ireland. The Darwin Tree of Life Project is a collaboration between biodiversity organizations (museums, botanical gardens, and biodiversity institutes) and genomics institutes. Together, we have built a workflow that collects specimens from the field, robustly identifies them, performs sequencing, generates high-quality, curated assemblies, and releases these openly for the global community to use to build future science and conservation efforts.

          Related collections

          Most cited references39

          • Record: found
          • Abstract: found
          • Article: not found

          De novo assembly of the Aedes aegypti genome using Hi-C yields chromosome-length scaffolds.

          The Zika outbreak, spread by the Aedes aegypti mosquito, highlights the need to create high-quality assemblies of large genomes in a rapid and cost-effective way. Here we combine Hi-C data with existing draft assemblies to generate chromosome-length scaffolds. We validate this method by assembling a human genome, de novo, from short reads alone (67× coverage). We then combine our method with draft sequences to create genome assemblies of the mosquito disease vectors Aeaegypti and Culex quinquefasciatus, each consisting of three scaffolds corresponding to the three chromosomes in each species. These assemblies indicate that almost all genomic rearrangements among these species occur within, rather than between, chromosome arms. The genome assembly procedure we describe is fast, inexpensive, and accurate, and can be applied to many species.
            Bookmark
            • Record: found
            • Abstract: found
            • Article: found
            Is Open Access

            Towards complete and error-free genome assemblies of all vertebrate species

            High-quality and complete reference genome assemblies are fundamental for the application of genomics to biology, disease, and biodiversity conservation. However, such assemblies are available for only a few non-microbial species 1 – 4 . To address this issue, the international Genome 10K (G10K) consortium 5 , 6 has worked over a five-year period to evaluate and develop cost-effective methods for assembling highly accurate and nearly complete reference genomes. Here we present lessons learned from generating assemblies for 16 species that represent six major vertebrate lineages. We confirm that long-read sequencing technologies are essential for maximizing genome quality, and that unresolved complex repeats and haplotype heterozygosity are major sources of assembly error when not handled correctly. Our assemblies correct substantial errors, add missing sequence in some of the best historical reference genomes, and reveal biological discoveries. These include the identification of many false gene duplications, increases in gene sizes, chromosome rearrangements that are specific to lineages, a repeated independent chromosome breakpoint in bat genomes, and a canonical GC-rich pattern in protein-coding genes and their regulatory regions. Adopting these lessons, we have embarked on the Vertebrate Genomes Project (VGP), an international effort to generate high-quality, complete reference genomes for all of the roughly 70,000 extant vertebrate species and to help to enable a new era of discovery across the life sciences. The Vertebrate Genome Project has used an optimized pipeline to generate high-quality genome assemblies for sixteen species (representing all major vertebrate classes), which have led to new biological insights.
              Bookmark
              • Record: found
              • Abstract: found
              • Article: found
              Is Open Access

              BlobToolKit – Interactive Quality Assessment of Genome Assemblies

              Reconstruction of target genomes from sequence data produced by instruments that are agnostic as to the species-of-origin may be confounded by contaminant DNA. Whether introduced during sample processing or through co-extraction alongside the target DNA, if insufficient care is taken during the assembly process, the final assembled genome may be a mixture of data from several species. Such assemblies can confound sequence-based biological inference and, when deposited in public databases, may be included in downstream analyses by users unaware of underlying problems. We present BlobToolKit, a software suite to aid researchers in identifying and isolating non-target data in draft and publicly available genome assemblies. BlobToolKit can be used to process assembly, read and analysis files for fully reproducible interactive exploration in the browser-based Viewer. BlobToolKit can be used during assembly to filter non-target DNA, helping researchers produce assemblies with high biological credibility. We have been running an automated BlobToolKit pipeline on eukaryotic assemblies publicly available in the International Nucleotide Sequence Data Collaboration and are making the results available through a public instance of the Viewer at https://blobtoolkit.genomehubs.org/view. We aim to complete analysis of all publicly available genomes and then maintain currency with the flow of new genomes. We have worked to embed these views into the presentation of genome assemblies at the European Nucleotide Archive, providing an indication of assembly quality alongside the public record with links out to allow full exploration in the Viewer.
                Bookmark

                Author and article information

                Journal
                Proc Natl Acad Sci U S A
                Proc Natl Acad Sci U S A
                pnas
                PNAS
                Proceedings of the National Academy of Sciences of the United States of America
                National Academy of Sciences
                0027-8424
                1091-6490
                18 January 2022
                25 January 2022
                18 January 2022
                : 119
                : 4
                : e2115642118
                Author notes
                2To whom correspondence may be addressed. Email: Mark L. Blaxter, mb35@ 123456sanger.ac.uk .

                Edited by Harris Lewin, Evolution and Ecology and The Genome Center, University of California, Davis, CA; received September 10, 2021; accepted November 1, 2021

                Author contributions: The Darwin Tree of Life Project and M.L.B. designed research, performed research, contributed new reagents/analytic tools, analyzed data, and wrote the paper.

                Article
                202115642
                10.1073/pnas.2115642118
                8797607
                35042805
                8f57aef0-c956-45db-8b01-5e602741d2d9
                Copyright © 2022 the Author(s). Published by PNAS.

                This open access article is distributed under Creative Commons Attribution License 4.0 (CC BY).

                History
                Page count
                Pages: 7
                Funding
                Funded by: Wellcome 100010269
                Award ID: 218328
                Award Recipient : The The Darwin Tree of Life Project Consortium Award Recipient : Mark L Blaxter
                Funded by: Wellcome 100010269
                Award ID: 206194
                Award Recipient : The The Darwin Tree of Life Project Consortium Award Recipient : Mark L Blaxter
                Categories
                418
                447
                544
                The Earth BioGenome Project: The Launch of a Moonshot for Biology
                Perspective
                Biological Sciences
                Evolution
                The Earth BioGenome Project: The Launch of a Moonshot for Biology

                genome,sequencing,biodiversity,assembly
                genome, sequencing, biodiversity, assembly

                Comments

                Comment on this article