10
views
0
recommends
+1 Recommend
0 collections
    0
    shares
      • Record: found
      • Abstract: found
      • Article: found
      Is Open Access

      Metabolite signatures of diverse Camellia sinensis tea populations

      Read this article at

          There is no author summary for this article yet. Authors can add summaries to their articles on ScienceOpen to make them more accessible to a non-specialist audience.

          Abstract

          The tea plant ( Camellia sinensis) presents an excellent system to study evolution and diversification of the numerous classes, types and variable contents of specialized metabolites. Here, we investigate the relationship among C. sinensis phylogenetic groups and specialized metabolites using transcriptomic and metabolomic data on the fresh leaves collected from 136 representative tea accessions in China. We obtain 925,854 high-quality single-nucleotide polymorphisms (SNPs) enabling the refined grouping of the sampled tea accessions into five major clades. Untargeted metabolomic analyses detect 129 and 199 annotated metabolites that are differentially accumulated in different tea groups in positive and negative ionization modes, respectively. Each phylogenetic group contains signature metabolites. In particular, CSA tea accessions are featured with high accumulation of diverse classes of flavonoid compounds, such as flavanols, flavonol mono-/di-glycosides, proanthocyanidin dimers, and phenolic acids. Our results provide insights into the genetic and metabolite diversity and are useful for accelerated tea plant breeding.

          Related collections

          Most cited references75

          • Record: found
          • Abstract: found
          • Article: found
          Is Open Access

          edgeR: a Bioconductor package for differential expression analysis of digital gene expression data

          Summary: It is expected that emerging digital gene expression (DGE) technologies will overtake microarray technologies in the near future for many functional genomics applications. One of the fundamental data analysis tasks, especially for gene expression studies, involves determining whether there is evidence that counts for a transcript or exon are significantly different across experimental conditions. edgeR is a Bioconductor software package for examining differential expression of replicated count data. An overdispersed Poisson model is used to account for both biological and technical variability. Empirical Bayes methods are used to moderate the degree of overdispersion across transcripts, improving the reliability of inference. The methodology can be used even with the most minimal levels of replication, provided at least one phenotype or experimental condition is replicated. The software may have other applications beyond sequencing data, such as proteome peptide count data. Availability: The package is freely available under the LGPL licence from the Bioconductor web site (http://bioconductor.org). Contact: mrobinson@wehi.edu.au
            Bookmark
            • Record: found
            • Abstract: found
            • Article: not found

            The Genome Analysis Toolkit: a MapReduce framework for analyzing next-generation DNA sequencing data.

            Next-generation DNA sequencing (NGS) projects, such as the 1000 Genomes Project, are already revolutionizing our understanding of genetic variation among individuals. However, the massive data sets generated by NGS--the 1000 Genome pilot alone includes nearly five terabases--make writing feature-rich, efficient, and robust analysis tools difficult for even computationally sophisticated individuals. Indeed, many professionals are limited in the scope and the ease with which they can answer scientific questions by the complexity of accessing and manipulating the data produced by these machines. Here, we discuss our Genome Analysis Toolkit (GATK), a structured programming framework designed to ease the development of efficient and robust analysis tools for next-generation DNA sequencers using the functional programming philosophy of MapReduce. The GATK provides a small but rich set of data access patterns that encompass the majority of analysis tool needs. Separating specific analysis calculations from common data management infrastructure enables us to optimize the GATK framework for correctness, stability, and CPU and memory efficiency and to enable distributed and shared memory parallelization. We highlight the capabilities of the GATK by describing the implementation and application of robust, scale-tolerant tools like coverage calculators and single nucleotide polymorphism (SNP) calling. We conclude that the GATK programming framework enables developers and analysts to quickly and easily write efficient and robust NGS tools, many of which have already been incorporated into large-scale sequencing projects like the 1000 Genomes Project and The Cancer Genome Atlas.
              Bookmark
              • Record: found
              • Abstract: found
              • Article: not found

              PLINK: a tool set for whole-genome association and population-based linkage analyses.

              Whole-genome association studies (WGAS) bring new computational, as well as analytic, challenges to researchers. Many existing genetic-analysis tools are not designed to handle such large data sets in a convenient manner and do not necessarily exploit the new opportunities that whole-genome data bring. To address these issues, we developed PLINK, an open-source C/C++ WGAS tool set. With PLINK, large data sets comprising hundreds of thousands of markers genotyped for thousands of individuals can be rapidly manipulated and analyzed in their entirety. As well as providing tools to make the basic analytic steps computationally efficient, PLINK also supports some novel approaches to whole-genome data that take advantage of whole-genome coverage. We introduce PLINK and describe the five main domains of function: data management, summary statistics, population stratification, association analysis, and identity-by-descent estimation. In particular, we focus on the estimation and use of identity-by-state and identity-by-descent information in the context of population-based whole-genome studies. This information can be used to detect and correct for population stratification and to identify extended chromosomal segments that are shared identical by descent between very distantly related individuals. Analysis of the patterns of segmental sharing has the potential to map disease loci that contain multiple rare variants in a population-based linkage analysis.
                Bookmark

                Author and article information

                Contributors
                Journal
                Nature Communications
                Nat Commun
                Springer Science and Business Media LLC
                2041-1723
                December 2020
                November 04 2020
                December 2020
                : 11
                : 1
                Article
                10.1038/s41467-020-19441-1
                49e5667d-af59-4cd0-a8dd-67cd319e013c
                © 2020

                https://creativecommons.org/licenses/by/4.0

                https://creativecommons.org/licenses/by/4.0

                History

                Comments

                Comment on this article