Strategies for imputation to whole genome sequence using a single or multi-breed reference population in cattle

There is no author summary for this article yet. Authors can add summaries to their articles on ScienceOpen to make them more accessible to a non-specialist audience.

Abstract

Background

The advent of low cost next generation sequencing has made it possible to sequence a large number of dairy and beef bulls which can be used as a reference for imputation of whole genome sequence data. The aim of this study was to investigate the accuracy and speed of imputation from a high density SNP marker panel to whole genome sequence level. Data contained 132 Holstein, 42 Jersey, 52 Nordic Red and 16 Brown Swiss bulls with whole genome sequence data; 16 Holstein, 27 Jersey and 29 Nordic Reds had previously been typed with the bovine high density SNP panel and were used for validation. We investigated the effect of enlarging the reference population by combining data across breeds on the accuracy of imputation, and the accuracy and speed of both IMPUTE2 and BEAGLE using either genotype probability reference data or pre-phased reference data. All analyses were done on Bovine autosome 29 using 387,436 bi-allelic variants and 13,612 SNP markers from the bovine HD panel.

Results

A combined breed reference population led to higher imputation accuracies than did a single breed reference. The highest accuracy of imputation for all three test breeds was achieved when using BEAGLE with un-phased reference data (mean genotype correlations of 0.90, 0.89 and 0.87 for Holstein, Jersey and Nordic Red respectively) but IMPUTE2 with un-phased reference data gave similar accuracies for Holsteins and Nordic Red. Pre-phasing the reference data only lead to a minor decrease in the imputation accuracy, but gave a large improvement in computation time. Pre-phasing with BEAGLE was substantially faster than pre-phasing with SHAPEIT2 (2.5 hours vs. 52 hours for 242 individuals), and imputation with pre-phased data was faster in IMPUTE2 than in BEAGLE (5 minutes vs. 50 minutes per individual).

Conclusion

Combining reference populations across breeds is a good option to increase the size of the reference data and in turn the accuracy of imputation when only few animals are available. Pre-phasing the reference data only slightly decreases the accuracy but gives substantial improvements in speed. Using BEAGLE for pre-phasing and IMPUTE2 for imputation is a fast and accurate strategy.

Related collections

Most cited references 13

Record: found
Abstract: found
Article: not found

Development and Characterization of a High Density SNP Genotyping Assay for Cattle

Lakshmi Matukumalli, Cynthia Lawley, Robert Schnabel … (2009)

The success of genome-wide association (GWA) studies for the detection of sequence variation affecting complex traits in human has spurred interest in the use of large-scale high-density single nucleotide polymorphism (SNP) genotyping for the identification of quantitative trait loci (QTL) and for marker-assisted selection in model and agricultural species. A cost-effective and efficient approach for the development of a custom genotyping assay interrogating 54,001 SNP loci to support GWA applications in cattle is described. A novel algorithm for achieving a compressed inter-marker interval distribution proved remarkably successful, with median interval of 37 kb and maximum predicted gap of <350 kb. The assay was tested on a panel of 576 animals from 21 cattle breeds and six outgroup species and revealed that from 39,765 to 46,492 SNP are polymorphic within individual breeds (average minor allele frequency (MAF) ranging from 0.24 to 0.27). The assay also identified 79 putative copy number variants in cattle. Utility for GWA was demonstrated by localizing known variation for coat color and the presence/absence of horns to their correct genomic locations. The combination of SNP selection and the novel spacing algorithm allows an efficient approach for the development of high-density genotyping platforms in species having full or even moderate quality draft sequence. Aspects of the approach can be exploited in species which lack an available genome sequence. The BovineSNP50 assay described here is commercially available from Illumina and provides a robust platform for mapping disease genes and QTL in cattle.

0 comments Cited 385 times – based on 0 reviews      Review now

Bookmark

Record: found
Abstract: found
Article: found

Linkage disequilibrium and persistence of phase in Holstein-Friesian, Jersey and Angus cattle.

A de Roos, B. Hayes, R J Spelman … (2008)

When a genetic marker and a quantitative trait locus (QTL) are in linkage disequilibrium (LD) in one population, they may not be in LD in another population or their LD phase may be reversed. The objectives of this study were to compare the extent of LD and the persistence of LD phase across multiple cattle populations. LD measures r and r(2) were calculated for syntenic marker pairs using genomewide single-nucleotide polymorphisms (SNP) that were genotyped in Dutch and Australian Holstein-Friesian (HF) bulls, Australian Angus cattle, and New Zealand Friesian and Jersey cows. Average r(2) was approximately 0.35, 0.25, 0.22, 0.14, and 0.06 at marker distances 10, 20, 40, 100, and 1000 kb, respectively, which indicates that genomic selection within cattle breeds with r(2) >or= 0.20 between adjacent markers would require approximately 50,000 SNPs. The correlation of r values between populations for the same marker pairs was close to 1 for pairs of very close markers (<10 kb) and decreased with increasing marker distance and the extent of divergence between the populations. To find markers that are in LD with QTL across diverged breeds, such as HF, Jersey, and Angus, would require approximately 300,000 markers.

0 comments Cited 198 times – based on 0 reviews      Review now

Bookmark

Record: found
Abstract: found
Article: found

Is Open Access

High-density marker imputation accuracy in sixteen French cattle breeds

Chris Hozé, Marie-Noëlle Fouilloux, Eric Venot … (2013)

Background Genotyping with the medium-density Bovine SNP50 BeadChip® (50K) is now standard in cattle. The high-density BovineHD BeadChip®, which contains 777 609 single nucleotide polymorphisms (SNPs), was developed in 2010. Increasing marker density increases the level of linkage disequilibrium between quantitative trait loci (QTL) and SNPs and the accuracy of QTL localization and genomic selection. However, re-genotyping all animals with the high-density chip is not economically feasible. An alternative strategy is to genotype part of the animals with the high-density chip and to impute high-density genotypes for animals already genotyped with the 50K chip. Thus, it is necessary to investigate the error rate when imputing from the 50K to the high-density chip. Methods Five thousand one hundred and fifty three animals from 16 breeds (89 to 788 per breed) were genotyped with the high-density chip. Imputation error rates from the 50K to the high-density chip were computed for each breed with a validation set that included the 20% youngest animals. Marker genotypes were masked for animals in the validation population in order to mimic 50K genotypes. Imputation was carried out using the Beagle 3.3.0 software. Results Mean allele imputation error rates ranged from 0.31% to 2.41% depending on the breed. In total, 1980 SNPs had high imputation error rates in several breeds, which is probably due to genome assembly errors, and we recommend to discard these in future studies. Differences in imputation accuracy between breeds were related to the high-density-genotyped sample size and to the genetic relationship between reference and validation populations, whereas differences in effective population size and level of linkage disequilibrium showed limited effects. Accordingly, imputation accuracy was higher in breeds with large populations and in dairy breeds than in beef breeds. More than 99% of the alleles were correctly imputed if more than 300 animals were genotyped at high-density. No improvement was observed when multi-breed imputation was performed. Conclusion In all breeds, imputation accuracy was higher than 97%, which indicates that imputation to the high-density chip was accurate. Imputation accuracy depends mainly on the size of the reference population and the relationship between reference and target populations.

0 comments Cited 54 times – based on 0 reviews      Review now

Bookmark

All references

Author and article information

Contributors

Rasmus Froberg Brøndum: rasmusf.brondum@agrsci.dk

Bernt Guldbrandtsen: bernt.guldbrandtsen@agrsci.dk

Goutam Sahana: goutam.sahana@agrsci.dk

Mogens Sandø Lund: mogens.lund@agrsci.dk

Guosheng Su: guosheng.su@agrsci.dk

Journal

Journal ID (nlm-ta): BMC Genomics

Journal ID (iso-abbrev): BMC Genomics

Title: BMC Genomics

Publisher: BioMed Central (London )

ISSN (Electronic): 1471-2164

Publication date (Electronic): 27 August 2014

Publication date PMC-release: 27 August 2014

Publication date Collection: 2014

Volume: 15

Issue: 1

Electronic Location Identifier: 728

Affiliations

Centre for Quantitative Genetics and Genomics, Department of Molecular Biology and Genetics, Faculty of Science and Technology, Aarhus University, Tjele, 8830 Denmark

Article

Publisher ID: 6388

DOI: 10.1186/1471-2164-15-728

PMC ID: 4152568

PubMed ID: 25164068

SO-VID: 1c198762-1ca0-4295-bb34-9d984ea2e5d4

License:

This article is published under license to BioMed Central Ltd. This is an Open Access article distributed under the terms of the Creative Commons Attribution License ( http://creativecommons.org/licenses/by/4.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly credited. The Creative Commons Public Domain Dedication waiver ( http://creativecommons.org/publicdomain/zero/1.0/) applies to the data made available in this article, unless otherwise stated.

History

Date received : 3 March 2014

Date accepted : 18 June 2014

Custom metadata

ScienceOpen disciplines: Genetics

Keywords: imputation,next generation sequencing,cross-validation,allele frequency,pre-phasing

Data availability:

ScienceOpen disciplines: Genetics

Keywords: imputation, next generation sequencing, cross-validation, allele frequency, pre-phasing

Strategies for imputation to whole genome sequence using a single or multi-breed reference population in cattle

Read this article at

Abstract

Background

Results

Conclusion

Related collections

Genome Engineering using CRISPR

Most cited references 13

Development and Characterization of a High Density SNP Genotyping Assay for Cattle

Linkage disequilibrium and persistence of phase in Holstein-Friesian, Jersey and Angus cattle.

High-density marker imputation accuracy in sixteen French cattle breeds

Author and article information

Contributors

Journal

Affiliations

Article

History

Categories

Custom metadata

Comments

Comment on this article

Similar content 239

Cited by 67

Most referenced authors 1,211