Imputation accuracy across global human populations

There is no author summary for this article yet. Authors can add summaries to their articles on ScienceOpen to make them more accessible to a non-specialist audience.

Related collections

Most cited references 51

Record: found
Abstract: found
Article: found

Is Open Access

The mutational constraint spectrum quantified from variation in 141,456 humans

Konrad J. Karczewski, Laurent C. Francioli, Grace Tiao … (2021)

Genetic variants that inactivate protein-coding genes are a powerful source of information about the phenotypic consequences of gene disruption: genes that are crucial for the function of an organism will be depleted of such variants in natural populations, whereas non-essential genes will tolerate their accumulation. However, predicted loss-of-function variants are enriched for annotation errors, and tend to be found at extremely low frequencies, so their analysis requires careful variant annotation and very large sample sizes 1 . Here we describe the aggregation of 125,748 exomes and 15,708 genomes from human sequencing studies into the Genome Aggregation Database (gnomAD). We identify 443,769 high-confidence predicted loss-of-function variants in this cohort after filtering for artefacts caused by sequencing and annotation errors. Using an improved model of human mutation rates, we classify human protein-coding genes along a spectrum that represents tolerance to inactivation, validate this classification using data from model organisms and engineered human cells, and show that it can be used to improve the power of gene discovery for both common and rare diseases.

0 comments Cited 3513 times – based on 0 reviews      Review now

Bookmark

Record: found
Abstract: found
Article: found

Is Open Access

Second-generation PLINK: rising to the challenge of larger and richer datasets

Christopher Chang, Carson Chow, Laurent Tellier … (2015)

PLINK 1 is a widely used open-source C/C++ toolset for genome-wide association studies (GWAS) and research in population genetics. However, the steady accumulation of data from imputation and whole-genome sequencing studies has exposed a strong need for even faster and more scalable implementations of key functions. In addition, GWAS and population-genetic data now frequently contain probabilistic calls, phase information, and/or multiallelic variants, none of which can be represented by PLINK 1's primary data format. To address these issues, we are developing a second-generation codebase for PLINK. The first major release from this codebase, PLINK 1.9, introduces extensive use of bit-level parallelism, O(sqrt(n))-time/constant-space Hardy-Weinberg equilibrium and Fisher's exact tests, and many other algorithmic improvements. In combination, these changes accelerate most operations by 1-4 orders of magnitude, and allow the program to handle datasets too large to fit in RAM. This will be followed by PLINK 2.0, which will introduce (a) a new data format capable of efficiently representing probabilities, phase, and multiallelic variants, and (b) extensions of many functions to account for the new types of information. The second-generation versions of PLINK will offer dramatic improvements in performance and compatibility. For the first time, users without access to high-end computing resources can perform several essential analyses of the feature-rich and very large genetic datasets coming into use.

0 comments Cited 3072 times – based on 0 reviews      Review now

Bookmark

Record: found
Abstract: found
Article: not found

Next-generation genotype imputation service and methods.

Sayantan Das, Lukas Forer, Sebastian Schönherr … (2016)

Genotype imputation is a key component of genetic association studies, where it increases power, facilitates meta-analysis, and aids interpretation of signals. Genotype imputation is computationally demanding and, with current tools, typically requires access to a high-performance computing cluster and to a reference panel of sequenced genomes. Here we describe improvements to imputation machinery that reduce computational requirements by more than an order of magnitude with no loss of accuracy in comparison to standard imputation tools. We also describe a new web-based service for imputation that facilitates access to new reference panels and greatly improves user experience and productivity.

0 comments Cited 1182 times – based on 0 reviews      Review now

Bookmark

All references

Author and article information

Contributors

Jordan L. Cahoon: (View ORCID Profile)

Charleston W.K. Chiang: (View ORCID Profile)

Journal

Title: The American Journal of Human Genetics

Abbreviated Title: The American Journal of Human Genetics

Publisher: Elsevier BV

ISSN (Print): 00029297

Publication date Created: April 2024

Publication date (Print): April 2024

Article

DOI: 10.1016/j.ajhg.2024.03.011

SO-VID: e8c57079-da2d-498a-ad2e-94c748623dfb

License:

https://www.elsevier.com/tdm/userlicense/1.0/

History

Data availability:

Imputation accuracy across global human populations

Read this article at

Related collections

Horses and Humans Research Foundation

Most cited references 51

The mutational constraint spectrum quantified from variation in 141,456 humans

Second-generation PLINK: rising to the challenge of larger and richer datasets

Next-generation genotype imputation service and methods.

Author and article information

Contributors

Journal

Article

History

Comments

Comment on this article

Similar content 5,218

Most referenced authors 2,845