16
views
0
recommends
+1 Recommend
0 collections
    0
    shares
      • Record: found
      • Abstract: found
      • Article: found
      Is Open Access

      Graph pangenome captures missing heritability and empowers tomato breeding

      research-article

      Read this article at

      Bookmark
          There is no author summary for this article yet. Authors can add summaries to their articles on ScienceOpen to make them more accessible to a non-specialist audience.

          Abstract

          Missing heritability in genome-wide association studies defines a major problem in genetic analyses of complex biological traits 1, 2 . The solution to this problem is to identify all causal genetic variants and to measure their individual contributions 3, 4 . Here we report a graph pangenome of tomato constructed by precisely cataloguing more than 19 million variants from 838 genomes, including 32 new reference-level genome assemblies. This graph pangenome was used for genome-wide association study analyses and heritability estimation of 20,323 gene-expression and metabolite traits. The average estimated trait heritability is 0.41 compared with 0.33 when using the single linear reference genome. This 24% increase in estimated heritability is largely due to resolving incomplete linkage disequilibrium through the inclusion of additional causal structural variants identified using the graph pangenome. Moreover, by resolving allelic and locus heterogeneity, structural variants improve the power to identify genetic factors underlying agronomically important traits leading to, for example, the identification of two new genes potentially contributing to soluble solid content. The newly identified structural variants will facilitate genetic improvement of tomato through both marker-assisted selection and genomic selection. Our study advances the understanding of the heritability of complex traits and demonstrates the power of the graph pangenome in crop breeding.

          Abstract

          A precise catalogue of more than 19 million variants from 838 tomato genomes, including 32 new reference-level genome assemblies, advances the understanding of the heritability of complex traits and demonstrates the power of the graph pangenome in crop breeding.

          Related collections

          Most cited references79

          • Record: found
          • Abstract: found
          • Article: found
          Is Open Access

          WGCNA: an R package for weighted correlation network analysis

          Background Correlation networks are increasingly being used in bioinformatics applications. For example, weighted gene co-expression network analysis is a systems biology method for describing the correlation patterns among genes across microarray samples. Weighted correlation network analysis (WGCNA) can be used for finding clusters (modules) of highly correlated genes, for summarizing such clusters using the module eigengene or an intramodular hub gene, for relating modules to one another and to external sample traits (using eigengene network methodology), and for calculating module membership measures. Correlation networks facilitate network based gene screening methods that can be used to identify candidate biomarkers or therapeutic targets. These methods have been successfully applied in various biological contexts, e.g. cancer, mouse genetics, yeast genetics, and analysis of brain imaging data. While parts of the correlation network methodology have been described in separate publications, there is a need to provide a user-friendly, comprehensive, and consistent software implementation and an accompanying tutorial. Results The WGCNA R software package is a comprehensive collection of R functions for performing various aspects of weighted correlation network analysis. The package includes functions for network construction, module detection, gene selection, calculations of topological properties, data simulation, visualization, and interfacing with external software. Along with the R package we also present R software tutorials. While the methods development was motivated by gene expression data, the underlying data mining approach can be applied to a variety of different settings. Conclusion The WGCNA package provides R functions for weighted correlation network analysis, e.g. co-expression network analysis of gene expression data. The R package along with its source code and additional material are freely available at .
            Bookmark
            • Record: found
            • Abstract: found
            • Article: not found

            PLINK: a tool set for whole-genome association and population-based linkage analyses.

            Whole-genome association studies (WGAS) bring new computational, as well as analytic, challenges to researchers. Many existing genetic-analysis tools are not designed to handle such large data sets in a convenient manner and do not necessarily exploit the new opportunities that whole-genome data bring. To address these issues, we developed PLINK, an open-source C/C++ WGAS tool set. With PLINK, large data sets comprising hundreds of thousands of markers genotyped for thousands of individuals can be rapidly manipulated and analyzed in their entirety. As well as providing tools to make the basic analytic steps computationally efficient, PLINK also supports some novel approaches to whole-genome data that take advantage of whole-genome coverage. We introduce PLINK and describe the five main domains of function: data management, summary statistics, population stratification, association analysis, and identity-by-descent estimation. In particular, we focus on the estimation and use of identity-by-state and identity-by-descent information in the context of population-based whole-genome studies. This information can be used to detect and correct for population stratification and to identify extended chromosomal segments that are shared identical by descent between very distantly related individuals. Analysis of the patterns of segmental sharing has the potential to map disease loci that contain multiple rare variants in a population-based linkage analysis.
              Bookmark
              • Record: found
              • Abstract: found
              • Article: found
              Is Open Access

              A global reference for human genetic variation

              The 1000 Genomes Project set out to provide a comprehensive description of common human genetic variation by applying whole-genome sequencing to a diverse set of individuals from multiple populations. Here we report completion of the project, having reconstructed the genomes of 2,504 individuals from 26 populations using a combination of low-coverage whole-genome sequencing, deep exome sequencing, and dense microarray genotyping. We characterized a broad spectrum of genetic variation, in total over 88 million variants (84.7 million single nucleotide polymorphisms (SNPs), 3.6 million short insertions/deletions (indels), and 60,000 structural variants), all phased onto high-quality haplotypes. This resource includes >99% of SNP variants with a frequency of >1% for a variety of ancestries. We describe the distribution of genetic variation across the global sample, and discuss the implications for common disease studies.
                Bookmark

                Author and article information

                Contributors
                huangsanwen@caas.cn
                Journal
                Nature
                Nature
                Nature
                Nature Publishing Group UK (London )
                0028-0836
                1476-4687
                8 June 2022
                8 June 2022
                2022
                : 606
                : 7914
                : 527-534
                Affiliations
                [1 ]GRID grid.410727.7, ISNI 0000 0001 0526 1937, Shenzhen Branch, Guangdong Laboratory of Lingnan Modern Agriculture, Genome Analysis Laboratory of the Ministry of Agriculture and Rural Affairs, Agricultural Genomics Institute at Shenzhen, , Chinese Academy of Agricultural Sciences, ; Shenzhen, China
                [2 ]GRID grid.6341.0, ISNI 0000 0000 8578 2742, Umeå Plant Science Center, Department of Forestry Genetics and Plant Physiology, , Swedish University of Agricultural Sciences, ; Umeå, Sweden
                [3 ]GRID grid.464357.7, Key Laboratory of Biology and Genetic Improvement of Horticultural Crops of the Ministry of Agriculture, , Sino-Dutch Joint Laboratory of Horticultural Genomics, and Institute of Vegetables and Flowers, Chinese Academy of Agricultural Sciences, ; Beijing, China
                [4 ]GRID grid.418524.e, ISNI 0000 0004 0369 6250, Institute of Vegetables, Shandong Academy of Agricultural Sciences, Shandong Province Key Laboratory for Biology of Greenhouse Vegetables, Shandong Branch of National Improvement Center for Vegetables, Huang-Huai-Hai Region Scientific Observation and Experimental Station of Vegetables, , Ministry of Agriculture and Rural Affairs, ; Jinan, China
                [5 ]GRID grid.22935.3f, ISNI 0000 0004 0530 8290, State Key Laboratory of Agrobiotechnology, College of Horticulture, , China Agricultural University, ; Beijing, China
                [6 ]Boke Biotech, Wuxi, China
                [7 ]GRID grid.5386.8, ISNI 000000041936877X, Boyce Thompson Institute, , Cornell University, ; Ithaca, NY USA
                [8 ]GRID grid.508984.8, Robert W. Holley Center for Agriculture and Health, , US Department of Agriculture, Agricultural Research Service, ; Ithaca, NY USA
                [9 ]GRID grid.5801.c, ISNI 0000 0001 2156 2780, Institute of Integrative Biology & Zurich, Basel Plant Science Center, , ETH Zurich, ; Zurich, Switzerland
                [10 ]GRID grid.266097.c, ISNI 0000 0001 2222 1582, Department of Botany and Plant Sciences, , University of California, ; Riverside, CA USA
                [11 ]GRID grid.30064.31, ISNI 0000 0001 2157 6568, Department of Crop and Soil Sciences, , Washington State University, ; Pullman, WA USA
                [12 ]GRID grid.7048.b, ISNI 0000 0001 1956 2722, Quantitative Genetics and Genomics (QGG), , Aarhus University, ; Aarhus, Denmark
                Author information
                http://orcid.org/0000-0001-9791-7664
                http://orcid.org/0000-0002-9466-9439
                http://orcid.org/0000-0003-3601-460X
                http://orcid.org/0000-0003-1579-4600
                http://orcid.org/0000-0002-1604-1988
                http://orcid.org/0000-0003-3127-4488
                http://orcid.org/0000-0002-9323-1101
                http://orcid.org/0000-0002-1160-1413
                http://orcid.org/0000-0001-9684-1450
                http://orcid.org/0000-0002-5784-9684
                http://orcid.org/0000-0002-0096-9765
                http://orcid.org/0000-0002-8547-5309
                Article
                4808
                10.1038/s41586-022-04808-9
                9200638
                35676474
                91570729-90e3-4b77-9669-5accf1ef51fd
                © The Author(s) 2022

                Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made. The images or other third party material in this article are included in the article’s Creative Commons license, unless indicated otherwise in a credit line to the material. If material is not included in the article’s Creative Commons license and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this license, visit http://creativecommons.org/licenses/by/4.0/.

                History
                : 4 October 2021
                : 27 April 2022
                Categories
                Article
                Custom metadata
                © The Author(s), under exclusive licence to Springer Nature Limited 2022

                Uncategorized
                structural variation,genomics,genome-wide association studies,agricultural genetics,plant breeding

                Comments

                Comment on this article