GAPIT Version 3: Boosting Power and Accuracy for Genomic Association and Prediction

There is no author summary for this article yet. Authors can add summaries to their articles on ScienceOpen to make them more accessible to a non-specialist audience.

Abstract

Genome-wide association study ( GWAS) and genomic prediction/selection (GP/GS) are the two essential enterprises in genomic research. Due to the great magnitude and complexity of genomic and phenotypic data, analytical methods and their associated software packages are frequently advanced. GAPIT is a widely-used genomic association and prediction integrated tool as an R package. The first version was released to the public in 2012 with the implementation of the general linear model (GLM), mixed linear model (MLM), compressed MLM (CMLM), and genomic best linear unbiased prediction (gBLUP). The second version was released in 2016 with several new implementations, including enriched CMLM (ECMLM) and settlement of MLMs under progressively exclusive relationship (SUPER). All the GWAS methods are based on the single-locus test. For the first time, in the current release of GAPIT, version 3 implemented three multi-locus test methods, including multiple loci mixed model (MLMM), fixed and random model circulating probability unification (FarmCPU), and Bayesian-information and linkage-disequilibrium iteratively nested keyway (BLINK). Additionally, two GP/GS methods were implemented based on CMLM (named compressed BLUP; cBLUP) and SUPER (named SUPER BLUP; sBLUP). These new implementations not only boost statistical power for GWAS and prediction accuracy for GP/GS, but also improve computing speed and increase the capacity to analyze big genomic data. Here, we document the current upgrade of GAPIT by describing the selection of the recently developed methods, their implementations, and potential impact. All documents, including source code, user manual, demo data, and tutorials, are freely available at the GAPIT website ( http://zzlab.net/GAPIT).

Related collections

Most cited references 29

Record: found
Abstract: found
Article: not found

PLINK: a tool set for whole-genome association and population-based linkage analyses.

Shaun Purcell, Benjamin M. Neale, Kathe Todd-Brown … (2007)

Whole-genome association studies (WGAS) bring new computational, as well as analytic, challenges to researchers. Many existing genetic-analysis tools are not designed to handle such large data sets in a convenient manner and do not necessarily exploit the new opportunities that whole-genome data bring. To address these issues, we developed PLINK, an open-source C/C++ WGAS tool set. With PLINK, large data sets comprising hundreds of thousands of markers genotyped for thousands of individuals can be rapidly manipulated and analyzed in their entirety. As well as providing tools to make the basic analytic steps computationally efficient, PLINK also supports some novel approaches to whole-genome data that take advantage of whole-genome coverage. We introduce PLINK and describe the five main domains of function: data management, summary statistics, population stratification, association analysis, and identity-by-descent estimation. In particular, we focus on the estimation and use of identity-by-state and identity-by-descent information in the context of population-based whole-genome studies. This information can be used to detect and correct for population stratification and to identify extended chromosomal segments that are shared identical by descent between very distantly related individuals. Analysis of the patterns of segmental sharing has the potential to map disease loci that contain multiple rare variants in a population-based linkage analysis.

0 comments Cited 5560 times – based on 0 reviews      Review now

Bookmark

Record: found
Abstract: found
Article: not found

TASSEL: software for association mapping of complex traits in diverse samples.

P J Bradbury, Z. Zhang, D. E. Kroon … (2007)

Association analyses that exploit the natural diversity of a genome to map at very high resolutions are becoming increasingly important. In most studies, however, researchers must contend with the confounding effects of both population and family structure. TASSEL (Trait Analysis by aSSociation, Evolution and Linkage) implements general linear model and mixed linear model approaches for controlling population and family structure. For result interpretation, the program allows for linkage disequilibrium statistics to be calculated and visualized graphically. Database browsing and data importation is facilitated by integrated middleware. Other features include analyzing insertions/deletions, calculating diversity statistics, integration of phenotypic and genotypic data, imputing missing data and calculating principal components.

0 comments Cited 1405 times – based on 0 reviews      Review now

Bookmark

Record: found
Abstract: found
Article: not found

Efficient methods to compute genomic predictions.

P VanRaden (2008)

Efficient methods for processing genomic data were developed to increase reliability of estimated breeding values and to estimate thousands of marker effects simultaneously. Algorithms were derived and computer programs tested with simulated data for 2,967 bulls and 50,000 markers distributed randomly across 30 chromosomes. Estimation of genomic inbreeding coefficients required accurate estimates of allele frequencies in the base population. Linear model predictions of breeding values were computed by 3 equivalent methods: 1) iteration for individual allele effects followed by summation across loci to obtain estimated breeding values, 2) selection index including a genomic relationship matrix, and 3) mixed model equations including the inverse of genomic relationships. A blend of first- and second-order Jacobi iteration using 2 separate relaxation factors converged well for allele frequencies and effects. Reliability of predicted net merit for young bulls was 63% compared with 32% using the traditional relationship matrix. Nonlinear predictions were also computed using iteration on data and nonlinear regression on marker deviations; an additional (about 3%) gain in reliability for young bulls increased average reliability to 66%. Computing times increased linearly with number of genotypes. Estimation of allele frequencies required 2 processor days, and genomic predictions required <1 d per trait, and traits were processed in parallel. Information from genotyping was equivalent to about 20 daughters with phenotypic records. Actual gains may differ because the simulation did not account for linkage disequilibrium in the base population or selection in subsequent generations.

0 comments Cited 1109 times – based on 0 reviews      Review now

Bookmark

All references

Author and article information

Contributors

Jiabo Wang

Zhiwu Zhang

Journal

Journal ID (nlm-ta): Genomics Proteomics Bioinformatics

Journal ID (iso-abbrev): Genomics Proteomics Bioinformatics

Title: Genomics, Proteomics & Bioinformatics

Publisher: Elsevier

ISSN (Print): 1672-0229

ISSN (Electronic): 2210-3244

Publication date PMC-release: 04 September 2021

Publication date (Print): August 2021

Publication date (Electronic): 04 September 2021

Volume: 19

Issue: 4

Pages: 629-640

Affiliations

[1 ]Key Laboratory of Qinghai-Tibetan Plateau Animal Genetic Resource Reservation and Utilization, Sichuan Province and Ministry of Education, Southwest Minzu University, Chengdu 610041, China

[2 ]Department of Crop and Soil Sciences, Washington State University, Pullman, WA 99164, USA

Author notes

[* ]Corresponding authors. 23900011@ 123456swun.edu.cn zhiwu.zhang@ 123456wsu.edu

Article

Publisher Item ID: S1672-0229(21)00177-7

DOI: 10.1016/j.gpb.2021.08.005

PMC ID: 9121400

PubMed ID: 34492338

SO-VID: ac58d413-d0be-4c33-a1f5-1b378dd9eb4f

License:

This is an open access article under the CC BY-NC-ND license (http://creativecommons.org/licenses/by-nc-nd/4.0/).

History

Date received : 21 May 2020

Date revision received : 26 April 2021

Date accepted : 26 August 2021

Comments

Comment on this article

scite_

Cited by 167

See all cited by

Most referenced authors 487

See all reference authors

- Version 1

GAPIT Version 3: Boosting Power and Accuracy for Genomic Association and Prediction

Read this article at

Abstract

Related collections

Software for SAXS correction and analysis

Most cited references 29

PLINK: a tool set for whole-genome association and population-based linkage analyses.

TASSEL: software for association mapping of complex traits in diverse samples.

Efficient methods to compute genomic predictions.

Author and article information

Contributors

Journal

Affiliations

Author notes

Article

History

Categories

Comments

Comment on this article

Similar content 55

Cited by 167

Most referenced authors 487