1
views
0
recommends
+1 Recommend
0 collections
    0
    shares
      • Record: found
      • Abstract: found
      • Article: found
      Is Open Access

      The cacao gene atlas: a transcriptome developmental atlas reveals highly tissue-specific and dynamically-regulated gene networks in Theobroma cacao L

      research-article

      Read this article at

          There is no author summary for this article yet. Authors can add summaries to their articles on ScienceOpen to make them more accessible to a non-specialist audience.

          Abstract

          Background

          Theobroma cacao, the cocoa tree, is a tropical crop grown for its highly valuable cocoa solids and fat which are the basis of a 200-billion-dollar annual chocolate industry. However, the long generation time and difficulties associated with breeding a tropical tree crop have limited the progress of breeders to develop high-yielding disease-resistant varieties. Development of marker-assisted breeding methods for cacao requires discovery of genomic regions and specific alleles of genes encoding important traits of interest. To accelerate gene discovery, we developed a gene atlas composed of a large dataset of replicated transcriptomes with the long-term goal of progressing breeding towards developing high-yielding elite varieties of cacao.

          Results

          We describe the creation of the Cacao Transcriptome Atlas, its global characterization and define sets of genes co-regulated in highly organ- and temporally-specific manners. RNAs were extracted and transcriptomes sequenced from 123 different tissues and stages of development representing major organs and developmental stages of the cacao lifecycle. In addition, several experimental treatments and time courses were performed to measure gene expression in tissues responding to biotic and abiotic stressors. Samples were collected in replicates (3–5) to enable statistical analysis of gene expression levels for a total of 390 transcriptomes. To promote wide use of these data, all raw sequencing data, expression read mapping matrices, scripts, and other information used to create the resource are freely available online. We verified our atlas by analyzing the expression of genes with known functions and expression patterns in Arabidopsis ( ACT7, LEA19, AGL16, TIP13, LHY, MYB2) and found their expression profiles to be generally similar between both species. We also successfully identified tissue-specific genes at two thresholds in many tissue types represented and a set of genes highly conserved across all tissues.

          Conclusion

          The Cacao Gene Atlas consists of a gene expression browser with graphical user interface and open access to raw sequencing data files as well as the unnormalized and CPM normalized read count data mapped to several cacao genomes. The gene atlas is a publicly available resource to allow rapid mining of cacao gene expression profiles. We hope this resource will be used to help accelerate the discovery of important genes for key cacao traits such as disease resistance and contribute to the breeding of elite varieties to help farmers increase yields.

          Supplementary Information

          The online version contains supplementary material available at 10.1186/s12870-024-05171-9.

          Related collections

          Most cited references77

          • Record: found
          • Abstract: found
          • Article: not found

          STAR: ultrafast universal RNA-seq aligner.

          Accurate alignment of high-throughput RNA-seq data is a challenging and yet unsolved problem because of the non-contiguous transcript structure, relatively short read lengths and constantly increasing throughput of the sequencing technologies. Currently available RNA-seq aligners suffer from high mapping error rates, low mapping speed, read length limitation and mapping biases. To align our large (>80 billon reads) ENCODE Transcriptome RNA-seq dataset, we developed the Spliced Transcripts Alignment to a Reference (STAR) software based on a previously undescribed RNA-seq alignment algorithm that uses sequential maximum mappable seed search in uncompressed suffix arrays followed by seed clustering and stitching procedure. STAR outperforms other aligners by a factor of >50 in mapping speed, aligning to the human genome 550 million 2 × 76 bp paired-end reads per hour on a modest 12-core server, while at the same time improving alignment sensitivity and precision. In addition to unbiased de novo detection of canonical junctions, STAR can discover non-canonical splices and chimeric (fusion) transcripts, and is also capable of mapping full-length RNA sequences. Using Roche 454 sequencing of reverse transcription polymerase chain reaction amplicons, we experimentally validated 1960 novel intergenic splice junctions with an 80-90% success rate, corroborating the high precision of the STAR mapping strategy. STAR is implemented as a standalone C++ code. STAR is free open source software distributed under GPLv3 license and can be downloaded from http://code.google.com/p/rna-star/.
            Bookmark
            • Record: found
            • Abstract: found
            • Article: found
            Is Open Access

            edgeR: a Bioconductor package for differential expression analysis of digital gene expression data

            Summary: It is expected that emerging digital gene expression (DGE) technologies will overtake microarray technologies in the near future for many functional genomics applications. One of the fundamental data analysis tasks, especially for gene expression studies, involves determining whether there is evidence that counts for a transcript or exon are significantly different across experimental conditions. edgeR is a Bioconductor software package for examining differential expression of replicated count data. An overdispersed Poisson model is used to account for both biological and technical variability. Empirical Bayes methods are used to moderate the degree of overdispersion across transcripts, improving the reliability of inference. The methodology can be used even with the most minimal levels of replication, provided at least one phenotype or experimental condition is replicated. The software may have other applications beyond sequencing data, such as proteome peptide count data. Availability: The package is freely available under the LGPL licence from the Bioconductor web site (http://bioconductor.org). Contact: mrobinson@wehi.edu.au
              Bookmark
              • Record: found
              • Abstract: found
              • Article: not found

              featureCounts: an efficient general purpose program for assigning sequence reads to genomic features.

              Next-generation sequencing technologies generate millions of short sequence reads, which are usually aligned to a reference genome. In many applications, the key information required for downstream analysis is the number of reads mapping to each genomic feature, for example to each exon or each gene. The process of counting reads is called read summarization. Read summarization is required for a great variety of genomic analyses but has so far received relatively little attention in the literature. We present featureCounts, a read summarization program suitable for counting reads generated from either RNA or genomic DNA sequencing experiments. featureCounts implements highly efficient chromosome hashing and feature blocking techniques. It is considerably faster than existing methods (by an order of magnitude for gene-level summarization) and requires far less computer memory. It works with either single or paired-end reads and provides a wide range of options appropriate for different sequencing applications. featureCounts is available under GNU General Public License as part of the Subread (http://subread.sourceforge.net) or Rsubread (http://www.bioconductor.org) software packages.
                Bookmark

                Author and article information

                Contributors
                mjg9@psu.edu
                Journal
                BMC Plant Biol
                BMC Plant Biol
                BMC Plant Biology
                BioMed Central (London )
                1471-2229
                26 June 2024
                26 June 2024
                2024
                : 24
                : 601
                Affiliations
                [1 ]Department of Plant Science, The Pennsylvania State University, ( https://ror.org/04p491231) University Park, PA 16802 USA
                [2 ]Department of Cell & Systems Biology, Centre for the Analysis of Genome Evolution and Function, University of Toronto, ( https://ror.org/03dbr7087) Toronto, ON Canada
                [3 ]Huck Institute of the Life Sciences, The Pennsylvania State University, ( https://ror.org/04p491231) University Park, PA 16802 USA
                [4 ]GRID grid.413759.d, ISNI 0000 0001 0725 8379, USDA Animal and Plant Health Inspection Service (APHIS), ; Riverdale, MD 20737 USA
                [5 ]Plant Sciences, Volcani-ARO (Agricultural and Rural Organization), Gilat, Israel
                [6 ]Children’s Hospital of Philadelphia, ( https://ror.org/01z7r7q48) Philadelphia, PA 19104 USA
                [7 ]GRID grid.467419.9, Mars Inc, ; Davis, CA 95616 USA
                [8 ]Battelle Memorial Institute, ( https://ror.org/01h5tnr73) Columbus, OH 43201 USA
                Article
                5171
                10.1186/s12870-024-05171-9
                11201900
                38926852
                efb127f9-ec1a-4a85-8cff-6a2b5fe50334
                © The Author(s) 2024

                Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article’s Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article’s Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/. The Creative Commons Public Domain Dedication waiver ( http://creativecommons.org/publicdomain/zero/1.0/) applies to the data made available in this article, unless otherwise stated in a credit line to the data.

                History
                : 5 September 2023
                : 19 May 2024
                Funding
                Funded by: Mondelez International, Inc
                Funded by: FundRef http://dx.doi.org/10.13039/100005825, National Institute of Food and Agriculture;
                Award ID: Project #PEN04707
                Award ID: Project #PEN04707
                Award ID: Project #PEN04707
                Award ID: Project #PEN04707
                Award ID: Project #PEN04707
                Award ID: Project #PEN04707
                Award ID: Project #PEN04707
                Award ID: Project #PEN04707
                Award ID: Project #PEN04707
                Award ID: Project #PEN04707
                Award ID: Project #PEN04707
                Award ID: Project #PEN04707
                Award ID: Project #PEN04707
                Award ID: Project #PEN04707
                Award ID: Project #PEN04707
                Categories
                Research
                Custom metadata
                © BioMed Central Ltd., part of Springer Nature 2024

                Plant science & Botany
                transcriptome atlas,tissue-specificity,cacao genomics,gene expression
                Plant science & Botany
                transcriptome atlas, tissue-specificity, cacao genomics, gene expression

                Comments

                Comment on this article