7
views
0
recommends
+1 Recommend
0 collections
    0
    shares
      • Record: found
      • Abstract: found
      • Article: found
      Is Open Access

      The Allen Ancient DNA Resource (AADR) a curated compendium of ancient human genomes

      data-paper

      Read this article at

      Bookmark
          There is no author summary for this article yet. Authors can add summaries to their articles on ScienceOpen to make them more accessible to a non-specialist audience.

          Abstract

          More than two hundred papers have reported genome-wide data from ancient humans. While the raw data for the vast majority are fully publicly available testifying to the commitment of the paleogenomics community to open data, formats for both raw data and meta-data differ. There is thus a need for uniform curation and a centralized, version-controlled compendium that researchers can download, analyze, and reference. Since 2019, we have been maintaining the Allen Ancient DNA Resource (AADR), which aims to provide an up-to-date, curated version of the world’s published ancient human DNA data, represented at more than a million single nucleotide polymorphisms (SNPs) at which almost all ancient individuals have been assayed. The AADR has gone through six public releases at the time of writing and review of this manuscript, and crossed the threshold of >10,000 individuals with published genome-wide ancient DNA data at the end of 2022. This note is intended as a citable descriptor of the AADR.

          Related collections

          Most cited references253

          • Record: found
          • Abstract: found
          • Article: found
          Is Open Access

          Fast and accurate short read alignment with Burrows–Wheeler transform

          Motivation: The enormous amount of short reads generated by the new DNA sequencing technologies call for the development of fast and accurate read alignment programs. A first generation of hash table-based methods has been developed, including MAQ, which is accurate, feature rich and fast enough to align short reads from a single individual. However, MAQ does not support gapped alignment for single-end reads, which makes it unsuitable for alignment of longer reads where indels may occur frequently. The speed of MAQ is also a concern when the alignment is scaled up to the resequencing of hundreds of individuals. Results: We implemented Burrows-Wheeler Alignment tool (BWA), a new read alignment package that is based on backward search with Burrows–Wheeler Transform (BWT), to efficiently align short sequencing reads against a large reference sequence such as the human genome, allowing mismatches and gaps. BWA supports both base space reads, e.g. from Illumina sequencing machines, and color space reads from AB SOLiD machines. Evaluations on both simulated and real data suggest that BWA is ∼10–20× faster than MAQ, while achieving similar accuracy. In addition, BWA outputs alignment in the new standard SAM (Sequence Alignment/Map) format. Variant calling and other downstream analyses after the alignment can be achieved with the open source SAMtools software package. Availability: http://maq.sourceforge.net Contact: rd@sanger.ac.uk
            Bookmark
            • Record: found
            • Abstract: found
            • Article: not found

            The Genome Analysis Toolkit: a MapReduce framework for analyzing next-generation DNA sequencing data.

            Next-generation DNA sequencing (NGS) projects, such as the 1000 Genomes Project, are already revolutionizing our understanding of genetic variation among individuals. However, the massive data sets generated by NGS--the 1000 Genome pilot alone includes nearly five terabases--make writing feature-rich, efficient, and robust analysis tools difficult for even computationally sophisticated individuals. Indeed, many professionals are limited in the scope and the ease with which they can answer scientific questions by the complexity of accessing and manipulating the data produced by these machines. Here, we discuss our Genome Analysis Toolkit (GATK), a structured programming framework designed to ease the development of efficient and robust analysis tools for next-generation DNA sequencers using the functional programming philosophy of MapReduce. The GATK provides a small but rich set of data access patterns that encompass the majority of analysis tool needs. Separating specific analysis calculations from common data management infrastructure enables us to optimize the GATK framework for correctness, stability, and CPU and memory efficiency and to enable distributed and shared memory parallelization. We highlight the capabilities of the GATK by describing the implementation and application of robust, scale-tolerant tools like coverage calculators and single nucleotide polymorphism (SNP) calling. We conclude that the GATK programming framework enables developers and analysts to quickly and easily write efficient and robust NGS tools, many of which have already been incorporated into large-scale sequencing projects like the 1000 Genomes Project and The Cancer Genome Atlas.
              Bookmark
              • Record: found
              • Abstract: found
              • Article: found
              Is Open Access

              A global reference for human genetic variation

              The 1000 Genomes Project set out to provide a comprehensive description of common human genetic variation by applying whole-genome sequencing to a diverse set of individuals from multiple populations. Here we report completion of the project, having reconstructed the genomes of 2,504 individuals from 26 populations using a combination of low-coverage whole-genome sequencing, deep exome sequencing, and dense microarray genotyping. We characterized a broad spectrum of genetic variation, in total over 88 million variants (84.7 million single nucleotide polymorphisms (SNPs), 3.6 million short insertions/deletions (indels), and 60,000 structural variants), all phased onto high-quality haplotypes. This resource includes >99% of SNP variants with a frequency of >1% for a variety of ancestries. We describe the distribution of genetic variation across the global sample, and discuss the implications for common disease studies.
                Bookmark

                Author and article information

                Contributors
                shop@genetics.med.harvard.edu
                reich@genetics.med.harvard.edu
                Journal
                Sci Data
                Sci Data
                Scientific Data
                Nature Publishing Group UK (London )
                2052-4463
                10 February 2024
                10 February 2024
                2024
                : 11
                : 182
                Affiliations
                [1 ]GRID grid.38142.3c, ISNI 000000041936754X, Department of Genetics, , Harvard Medical School, ; Boston, MA 02115 USA
                [2 ]Broad Institute of MIT and Harvard, ( https://ror.org/05a0ya142) Cambridge, MA 02142 USA
                [3 ]Howard Hughes Medical Institute, ( https://ror.org/006w34k90) Boston, MA 02115 USA
                [4 ]Department of Human Evolutionary Biology, Harvard University, ( https://ror.org/03vek6s52) Cambridge, MA 02138 USA
                [5 ]Max Planck Institute for Evolutionary Anthropology, ( https://ror.org/02a33b393) Leipzig, 04103 Germany
                [6 ]BIOMICs Research Group, University of the Basque Country, ( https://ror.org/000xsnr85) 01006 Vitoria-Gasteiz, Spain
                Author information
                http://orcid.org/0000-0002-4531-4439
                http://orcid.org/0000-0001-8987-6436
                http://orcid.org/0000-0002-4884-9682
                http://orcid.org/0000-0002-4094-9347
                http://orcid.org/0000-0002-2660-6807
                http://orcid.org/0000-0002-7037-5292
                Article
                3031
                10.1038/s41597-024-03031-7
                10858950
                38341426
                88880e65-cf34-41c2-a040-966787f8c45a
                © The Author(s) 2024

                Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article’s Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article’s Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/.

                History
                : 10 August 2023
                : 31 January 2024
                Funding
                Funded by: FundRef https://doi.org/10.13039/100000093, U.S. Department of Health & Human Services | NIH | Center for Information Technology (Center for Information Technology, National Institutes of Health);
                Award ID: HG012287
                Award ID: HG012287
                Award ID: HG012287
                Award ID: HG012287
                Award ID: HG012287
                Award ID: HG012287
                Award ID: HG012287
                Award ID: HG012287
                Award Recipient :
                Funded by: FundRef https://doi.org/10.13039/100000925, John Templeton Foundation (JTF);
                Award ID: 61220
                Award ID: 61220
                Award ID: 61220
                Award ID: 61220
                Award ID: 61220
                Award ID: 61220
                Award Recipient :
                Funded by: U.S. Department of Health & Human Services | NIH | Center for Information Technology (Center for Information Technology, National Institutes of Health)
                Funded by: U.S. Department of Health & Human Services | NIH | Center for Information Technology (Center for Information Technology, National Institutes of Health)
                Funded by: U.S. Department of Health & Human Services | NIH | Center for Information Technology (Center for Information Technology, National Institutes of Health)
                Funded by: U.S. Department of Health & Human Services | NIH | Center for Information Technology (Center for Information Technology, National Institutes of Health)
                Funded by: U.S. Department of Health & Human Services | NIH | Center for Information Technology (Center for Information Technology, National Institutes of Health)
                Funded by: U.S. Department of Health & Human Services | NIH | Center for Information Technology (Center for Information Technology, National Institutes of Health)
                Funded by: U.S. Department of Health & Human Services | NIH | Center for Information Technology (Center for Information Technology, National Institutes of Health)
                Categories
                Data Descriptor
                Custom metadata
                © Springer Nature Limited 2024

                genetic variation
                genetic variation

                Comments

                Comment on this article