Imputation of Unordered Markers and the Impact on Genomic Selection Accuracy

There is no author summary for this article yet. Authors can add summaries to their articles on ScienceOpen to make them more accessible to a non-specialist audience.

Abstract

Genomic selection, a breeding method that promises to accelerate rates of genetic gain, requires dense, genome-wide marker data. Genotyping-by-sequencing can generate a large number of de novo markers. However, without a reference genome, these markers are unordered and typically have a large proportion of missing data. Because marker imputation algorithms were developed for species with a reference genome, algorithms suited for unordered markers have not been rigorously evaluated. Using four empirical datasets, we evaluate and characterize four such imputation methods, referred to as k-nearest neighbors, singular value decomposition, random forest regression, and expectation maximization imputation, in terms of their imputation accuracies and the factors affecting accuracy. The effect of imputation method on the genomic selection accuracy is assessed in comparison with mean imputation. The effect of excluding markers with a large proportion of missing data on the genomic selection accuracy is also examined. Our results show that imputation of unordered markers can be accurate, especially when linkage disequilibrium between markers is high and genotyped individuals are related. Of the methods evaluated, random forest regression imputation produced superior accuracy. In comparison with mean imputation, all four imputation methods we evaluated led to greater genomic selection accuracies when the level of missing data was high. Including rather than excluding markers with a large proportion of missing data nearly always led to greater GS accuracies. We conclude that high levels of missing data in dense marker sets is not a major obstacle for genomic selection, even when marker order is not known.

Most cited references 12

Record: found
Abstract: found
Article: not found

Missing value estimation methods for DNA microarrays.

Annette Hastie, Allison Altman, John P. Brown … (2001)

Gene expression microarray experiments can generate data sets with multiple missing expression values. Unfortunately, many algorithms for gene expression analysis require a complete matrix of gene array values as input. For example, methods such as hierarchical clustering and K-means clustering are not robust to missing data, and may lose effectiveness even with a few missing values. Methods for imputing missing data are needed, therefore, to minimize the effect of incomplete data sets on analyses, and to increase the range of data sets to which these algorithms can be applied. In this report, we investigate automated methods for estimating missing data. We present a comparative study of several methods for the estimation of missing values in gene microarray data. We implemented and evaluated three methods: a Singular Value Decomposition (SVD) based method (SVDimpute), weighted K-nearest neighbors (KNNimpute), and row average. We evaluated the methods using a variety of parameter settings and over different real data sets, and assessed the robustness of the imputation methods to the amount of missing data over the range of 1--20% missing values. We show that KNNimpute appears to provide a more robust and sensitive method for missing value estimation than SVDimpute, and both SVDimpute and KNNimpute surpass the commonly used row average method (as well as filling missing values with zeros). We report results of the comparative experiments and provide recommendations and tools for accurate estimation of missing microarray data under a variety of conditions.

0 comments Cited 311 times – based on 0 reviews      Review now

Bookmark

Record: found
Abstract: found
Article: not found

Accuracy of genomic selection in European maize elite breeding populations.

Yusheng Zhao, Manje Gowda, Wenxin Liu … (2012)

Genomic selection is a promising breeding strategy for rapid improvement of complex traits. The objective of our study was to investigate the prediction accuracy of genomic breeding values through cross validation. The study was based on experimental data of six segregating populations from a half-diallel mating design with 788 testcross progenies from an elite maize breeding program. The plants were intensively phenotyped in multi-location field trials and fingerprinted with 960 SNP markers. We used random regression best linear unbiased prediction in combination with fivefold cross validation. The prediction accuracy across populations was higher for grain moisture (0.90) than for grain yield (0.58). The accuracy of genomic selection realized for grain yield corresponds to the precision of phenotyping at unreplicated field trials in 3-4 locations. As for maize up to three generations are feasible per year, selection gain per unit time is high and, consequently, genomic selection holds great promise for maize breeding programs.

0 comments Cited 125 times – based on 0 reviews      Review now

Bookmark

Record: found
Abstract: found
Article: found

Genomic selection using low-density marker panels.

D Habier, R. L. Fernando, J Dekkers (2009)

Genomic selection (GS) using high-density single-nucleotide polymorphisms (SNPs) is promising to improve response to selection in populations that are under artificial selection. High-density SNP genotyping of all selection candidates each generation, however, may not be cost effective. Smaller panels with SNPs that show strong associations with phenotype can be used, but this may require separate SNPs for each trait and each population. As an alternative, we propose to use a panel of evenly spaced low-density SNPs across the genome to estimate genome-assisted breeding values of selection candidates in pedigreed populations. The principle of this approach is to utilize cosegregation information from low-density SNPs to track effects of high-density SNP alleles within families. Simulations were used to analyze the loss of accuracy of estimated breeding values from using evenly spaced and selected SNP panels compared to using all high-density SNPs in a Bayesian analysis. Forward stepwise selection and a Bayesian approach were used to select SNPs. Loss of accuracy was nearly independent of the number of simulated quantitative trait loci (QTL) with evenly spaced SNPs, but increased with number of QTL for the selected SNP panels. Loss of accuracy with evenly spaced SNPs increased steadily over generations but was constant when the smaller number individuals that are selected for breeding each generation were also genotyped using the high-density SNP panel. With equal numbers of low-density SNPs, panels with SNPs selected on the basis of the Bayesian approach had the smallest loss in accuracy for a single trait, but a panel with evenly spaced SNPs at 10 cM was only slightly worse, whereas a panel with SNPs selected by forward stepwise selection was inferior. Panels with evenly spaced SNPs can, however, be used across traits and populations and their performance is independent of the number of QTL affecting the trait and of the methods used to estimate effects in the training data and are, therefore, preferred for broad applications in pedigreed populations under artificial selection.

0 comments Cited 121 times – based on 0 reviews      Review now

Bookmark

All references

Author and article information

Journal

Journal ID (nlm-ta): G3 (Bethesda)

Journal ID (iso-abbrev): Genetics

Journal ID (hwp): G3: Genes, Genomes, Genetics

Journal ID (pmc): G3: Genes, Genomes, Genetics

Journal ID (publisher-id): G3: Genes, Genomes, Genetics

Title: G3: Genes|Genomes|Genetics

Publisher: Genetics Society of America

ISSN (Electronic): 2160-1836

Publication date (Electronic): 1 March 2013

Publication date Collection: March 2013

Volume: 3

Issue: 3

Pages: 427-439

Affiliations

[* ]Department of Plant Breeding and Genetics, Cornell University, Ithaca New York, 14853

[† ]Department of Agronomy, Kansas State University, Manhattan, Kansas 66506

[‡ ]United States Department of Agriculture-Agricultural Research Service (USDA-ARS), Manhattan, Kansas 66502

[§ ]USDA-ARS, Ithaca, New York 14853

Author notes

Supporting information is available online at http://www.g3journal.org/lookup/suppl/doi:10.1534/g3.112.005363/-/DC1

[1 ]Corresponding author: Department of Plant Breeding and Genetics, 240 Emerson Hall, Cornell, Ithaca, NY 14853-1902. E-mail: mes12@ 123456cornell.edu

Article

Publisher ID: GGG_005363

DOI: 10.1534/g3.112.005363

PMC ID: 3583451

PubMed ID: 23449944

SO-VID: 77f428b3-9974-47de-a7b7-1573b4017274

License:

This is an open-access article distributed under the terms of the Creative Commons Attribution Unported License ( http://creativecommons.org/licenses/by/3.0/), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.

History

Date received : 04 August 2012

Date accepted : 28 December 2012

Custom metadata

DJS Export v1

ScienceOpen disciplines: Genetics

Keywords: genomic selection,imputation algorithms,genotyping-by-sequencing,genpred,shared data resources

Data availability:

ScienceOpen disciplines: Genetics

Keywords: genomic selection, imputation algorithms, genotyping-by-sequencing, genpred, shared data resources

Imputation of Unordered Markers and the Impact on Genomic Selection Accuracy

Read this article at

Abstract

Most cited references 12

Missing value estimation methods for DNA microarrays.

Accuracy of genomic selection in European maize elite breeding populations.

Genomic selection using low-density marker panels.

Author and article information

Journal

Affiliations

Author notes

Article

History

Categories

Custom metadata

Comments

Comment on this article

Similar content 138

Cited by 67

Most referenced authors 390