36
views
0
recommends
+1 Recommend
0 collections
    0
    shares
      • Record: found
      • Abstract: found
      • Article: found
      Is Open Access

      A comparison of genotyping-by-sequencing analysis methods on low-coverage crop datasets shows advantages of a new workflow, GB-eaSy

      research-article

      Read this article at

      Bookmark
          There is no author summary for this article yet. Authors can add summaries to their articles on ScienceOpen to make them more accessible to a non-specialist audience.

          Abstract

          Background

          Genotyping-by-sequencing (GBS), a method to identify genetic variants and quickly genotype samples, reduces genome complexity by using restriction enzymes to divide the genome into fragments whose ends are sequenced on short-read sequencing platforms. While cost-effective, this method produces extensive missing data and requires complex bioinformatics analysis. GBS is most commonly used on crop plant genomes, and because crop plants have highly variable ploidy and repeat content, the performance of GBS analysis software can vary by target organism. Here we focus our analysis on soybean, a polyploid crop with a highly duplicated genome, relatively little public GBS data and few dedicated tools.

          Results

          We compared the performance of five GBS pipelines using low-coverage Illumina sequence data from three soybean populations. To address issues identified with existing methods, we developed GB-eaSy, a GBS bioinformatics workflow that incorporates widely used genomics tools, parallelization and automation to increase the accuracy and accessibility of GBS data analysis. Compared to other GBS pipelines, GB-eaSy rapidly and accurately identified the greatest number of SNPs, with SNP calls closely concordant with whole-genome sequencing of selected lines. Across all five GBS analysis platforms, SNP calls showed unexpectedly low convergence but generally high accuracy, indicating that the workflows arrived at largely complementary sets of valid SNP calls on the low-coverage data analyzed.

          Conclusions

          We show that GB-eaSy is approximately as good as, or better than, other leading software solutions in the accuracy, yield and missing data fraction of variant calling, as tested on low-coverage genomic data from soybean. It also performs well relative to other solutions in terms of the run time and disk space required. In addition, GB-eaSy is built from existing open-source, modular software packages that are regularly updated and commonly used, making it straightforward to install and maintain. While GB-eaSy outperformed other individual methods on the datasets analyzed, our findings suggest that a comprehensive approach integrating the results from multiple GBS bioinformatics pipelines may be the optimal strategy to obtain the largest, most highly accurate SNP yield possible from low-coverage polyploid sequence data.

          Related collections

          Most cited references3

          • Record: found
          • Abstract: found
          • Article: found
          Is Open Access

          A statistical framework for SNP calling, mutation discovery, association mapping and population genetical parameter estimation from sequencing data

          (2013)
          Motivation: Most existing methods for DNA sequence analysis rely on accurate sequences or genotypes. However, in applications of the next-generation sequencing (NGS), accurate genotypes may not be easily obtained (e.g. multi-sample low-coverage sequencing or somatic mutation discovery). These applications press for the development of new methods for analyzing sequence data with uncertainty. Results: We present a statistical framework for calling SNPs, discovering somatic mutations, inferring population genetical parameters and performing association tests directly based on sequencing data without explicit genotyping or linkage-based imputation. On real data, we demonstrate that our method achieves comparable accuracy to alternative methods for estimating site allele count, for inferring allele frequency spectrum and for association mapping. We also highlight the necessity of using symmetric datasets for finding somatic mutations and confirm that for discovering rare events, mismapping is frequently the leading source of errors. Availability: http://samtools.sourceforge.net. Contact: hengli@broadinstitute.org.
            Bookmark
            • Record: found
            • Abstract: found
            • Article: found
            Is Open Access

            Genotyping‐by‐sequencing approaches to characterize crop genomes: choosing the right tool for the right application

            Summary In the last decade, the revolution in sequencing technologies has deeply impacted crop genotyping practice. New methods allowing rapid, high‐throughput genotyping of entire crop populations have proliferated and opened the door to wider use of molecular tools in plant breeding. These new genotyping‐by‐sequencing (GBS) methods include over a dozen reduced‐representation sequencing (RRS) approaches and at least four whole‐genome resequencing (WGR) approaches. The diversity of methods available, each often producing different types of data at different cost, can make selection of the best‐suited method seem a daunting task. We review the most common genotyping methods used today and compare their suitability for linkage mapping, genomewide association studies (GWAS), marker‐assisted and genomic selection and genome assembly and improvement in crops with various genome sizes and complexity. Furthermore, we give an outline of bioinformatics tools for analysis of genotyping data. WGR is well suited to genotyping biparental cross populations with complex, small‐ to moderate‐sized genomes and provides the lowest cost per marker data point. RRS approaches differ in their suitability for various tasks, but demonstrate similar costs per marker data point. These approaches are generally better suited for de novo applications and more cost‐effective when genotyping populations with large genomes or high heterozygosity. We expect that although RRS approaches will remain the most cost‐effective for some time, WGR will become more widespread for crop genotyping as sequencing costs continue to decrease.
              Bookmark
              • Record: found
              • Abstract: found
              • Article: found
              Is Open Access

              Molecular characterization of CIMMYT maize inbred lines with genotyping-by-sequencing SNPs

              Key message Molecular characterization information on genetic diversity, population structure and genetic relationships provided by this research will help maize breeders to better understand how to utilize the current CML collection. Abstract CIMMYT maize inbred lines (CMLs) have been widely used all over the world and have contributed greatly to both tropical and temperate maize improvement. Genetic diversity and population structure of the current CML collection and of six temperate inbred lines were assessed and relationships among all lines were determined with genotyping-by-sequencing SNPs. Results indicated that: (1) wider genetic distance and low kinship coefficients among most pairs of lines reflected the uniqueness of most lines in the current CML collection; (2) the population structure and genetic divergence between the Temperate subgroup and Tropical subgroups were clear; three major environmental adaptation groups (Lowland Tropical, Subtropical/Mid-altitude and Highland Tropical subgroups) were clearly present in the current CML collection; (3) the genetic diversity of the three Tropical subgroups was similar and greater than that of the Temperate subgroup; the average genetic distance between the Temperate and Tropical subgroups was greater than among Tropical subgroups; and (4) heterotic patterns in each environmental adaptation group estimated using GBS SNPs were only partially consistent with patterns estimated based on combining ability tests and pedigree information. Combining current heterotic information based on combining ability tests and the genetic relationships inferred from molecular marker analyses may be the best strategy to define heterotic groups for future tropical maize improvement. Information resulting from this research will help breeders to better understand how to utilize all the CMLs to select parental lines, replace testers, assign heterotic groups and create a core set of breeding germplasm.
                Bookmark

                Author and article information

                Contributors
                wicklan2@illinois.edu
                gbattu@illlinois.edu
                Karen.Hudson@ars.usda.gov
                bdiers@illinois.edu
                mhudson@illinois.edu
                Journal
                BMC Bioinformatics
                BMC Bioinformatics
                BMC Bioinformatics
                BioMed Central (London )
                1471-2105
                28 December 2017
                28 December 2017
                2017
                : 18
                : 586
                Affiliations
                [1 ]ISNI 0000 0004 1936 9991, GRID grid.35403.31, Department of Crop Sciences, , University of Illinois at Urbana-Champaign, ; Urbana, IL 61801 USA
                [2 ]ISNI 0000 0004 1936 9991, GRID grid.35403.31, Illinois Informatics Institute, , University of Illinois at Urbana-Champaign, ; Urbana, IL 61801 USA
                [3 ]ISNI 0000 0004 0408 3720, GRID grid.417691.c, HudsonAlpha Institute for Biotechnology, ; 601 Genome Way, NW, Huntsville, AL 35806 USA
                [4 ]ISNI 0000 0004 0404 0958, GRID grid.463419.d, USDA-ARS Crop Production and Pest Control Research Unit, ; 915 West State Street, West Lafayette, IN 47907 USA
                Article
                2000
                10.1186/s12859-017-2000-6
                5745977
                29281959
                85401d19-4236-4205-b6be-808accc0a149
                © The Author(s). 2017

                Open AccessThis article is distributed under the terms of the Creative Commons Attribution 4.0 International License ( http://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution, and reproduction in any medium, provided you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made. The Creative Commons Public Domain Dedication waiver ( http://creativecommons.org/publicdomain/zero/1.0/) applies to the data made available in this article, unless otherwise stated.

                History
                : 28 August 2017
                : 13 December 2017
                Funding
                Funded by: North Central Soybean Research Program
                Categories
                Methodology Article
                Custom metadata
                © The Author(s) 2017

                Bioinformatics & Computational biology
                gbs,wgs,bioinformatics pipelines,variant calling,soybean,crops
                Bioinformatics & Computational biology
                gbs, wgs, bioinformatics pipelines, variant calling, soybean, crops

                Comments

                Comment on this article