Plant Public RNA‐seq Database: a comprehensive online database for expression analysis of ~45 000 plant public RNA‐Seq libraries

There is no author summary for this article yet. Authors can add summaries to their articles on ScienceOpen to make them more accessible to a non-specialist audience.

Abstract

Dear Editor, High‐throughput RNA‐sequencing (RNA‐seq) has become the most popular technology for profiling gene expression in the last decade due to its low cost and high coverage. As a result, the number of RNA‐seq libraries from the plant community has been increasing exponentially in recent years (Figure 1a). For major crops, such as maize, rice, soybean, wheat and cotton, the plant community has collected a total of ~45 000 libraries by 2021 (Figure 1b). Although currently there are several RNA‐seq databases for plants, for example, CoNekT with 750 rice and 574 maize RNA‐seq libraries (https://conekt.sbs.ntu.edu.sg/). However, these existing databases only host the already processed data from each study separately, and therefore, the expression values cannot be directly compared among projects, because they were derived from different bioinformatic pipelines and often mapped to different versions of the reference genomes. To take full advantage of the big data of RNA‐seq libraries, an effort to integrate all publicly available libraries via a uniformed processing pipeline and curate them into an easy‐to‐use searchable database is urgently needed. To address this challenge, here we present a comprehensive web‐based platform, Plant Public RNA‐seq Database (PPRD, http://ipf.sustech.edu.cn/pub/plantrna/). PPRD consists of a large number of RNA‐seq libraries of maize (19 664), rice (11 726), soybean (4085), wheat (5816) and cotton (3483) from Gene Expression Omnibus (GEO), Sequence Read Archive (SRA), European Nucleotide Archive (ENA) and DNA Data Bank of Japan (DDBJ) databases (Figure 1b). These RNA‐seq data are manually curated to highlight different mutants, tissues, developmental stages and abiotic or biotic stresses. Besides showing expression patterns from different tissues and developmental stages (Figure 1c–e), we also annotated the mutant‐related groups and treatment‐associated groups in our maize, rice, soybean, wheat and cotton database, respectively (Figure 1f,g). To reduce the quantification biases derived from differing bioinformatic processes, we processed the data of each species with a unified pipeline and the most up‐to‐date reference genomes (more details on the ‘Tutorials’ page and Appendix S1). Moreover, the database also provided hyperlinks to check the expression level of the homologous genes in other plants and supported a built‐in online Integrative Genomics Viewer (IGV) (Figure 1h; Robinson et al., 2017). Figure 1 Overview of Plant Public RNA‐Seq Database. (a) The number of Oryza sativa, Zea mays, Glycine max, Triticum aestivum and Gossypium hirsutum sequenced bases per year from 2010 to 2020. Bar indicates the bases deposited per year (GB). Line indicates the total number of bases (GB). GB, giga base pairs. (b) The basic summary of RNA‐seq libraries. ‘Mutant‐related groups’ and ‘treatment‐related groups’ denote the number of groups used to analyse the differential expression. (c–e) The tissue‐specific expression of some marker genes. The left panel shows the endosperm‐specific expression of ZmESR1 in maize (c), the middle panel shows the endosperm‐specific expression of Wx in rice (d), and the right panel displays the root‐specific expression of GmTIP4;1 in soybean (e). (f) The expression level of OsLecRK3 (LOC_Os04g12580) among top10 biotic stresses in rice. (g) Down‐regulated expression of OsLecRK3 (LOC_Os04g12580) among top10 treatment groups in rice. (h) The overview of IGV. The mapped reads of OsLecRK3 show decreased abundance in drought stress‐related samples. In general, PPRD supports searches by gene ID, library ID, BioProject IDs, keywords or any combination of these terms in selected libraries. After querying the above terms, the results in tables and diagrams will be returned. Here, we take the query results of a key regulator of plant small RNA biogenesis, OsDCL3a (LOC_Os01g68120) (Wei et al., 2014), to illustrate the database. After entering ‘LOC_Os01g68120’ in a ‘Google‐like’ search box, the ‘Information’ page will return the basic information of this gene. PPRD also provides hyperlinks for easy access to more information about the corresponding gene in the species‐related websites, such as MaizeGDB for maize (Portwood et al., 2019), RGAP for rice (Kawahara et al., 2013) and SoyBase for soybean (Brown et al., 2021). On the ‘Data Table’ page, detailed information could be displayed in a table, and various ‘Filter’ options are designed to allow users to select specific libraries. The ‘Data Plot’ page shows the results of expression comparison in multiple interactive diagrams, including expression levels among different tissues, developmental stages, abiotic and biotic stresses and up‐regulated or down‐regulated expression in mutant‐related or treatment‐related samples. The ‘CoExpression’ page provides a list of genes co‐expressed with the searched one, and the ‘IGV Online’ page is flexible for visualizing the mapping landscape of the local genomic region in selected libraries. In addition, the ‘Share’ function was supported to facilitate showing the results with others. Here, we used the tissue‐specific expressed genes to validate the results. The expression levels of these genes are consistent with previous studies, such as endosperm‐specific expression of gene ZmESR1 (Zm00001d027820) in maize (Opsahl‐Ferstad et al., 1997), endosperm‐specific expression of gene Wx (LOC_Os06g04200) in rice (Sano, 1984) and root‐specific expression of gene GmTIP4;1 (Glyma.06G084600) in soybean (Song et al., 2016) (Figure 1c–e). Plant Public RNA‐seq Database also supports users to perform data mining from the large‐scale database efficiently. The brown planthopper (BPH) is the most destructive pest that has a massive impact on rice production by the transformations of viruses, and OsLecRK3 (LOC_Os04g12580) is a crucial gene that confers resistance to the BPH (Liu et al., 2015). As expected, OsLecRK3 showed higher expression in some viruses‐related libraries (Figure 1f). To our surprise, OsLecRK3 is down‐regulated in many drought‐related libraries, suggesting that OsLecRK3 plays a crucial role in drought resistance (Figure 1g). In addition, the mapping details of this gene can be visualized using the built‐in IGV browser (Figure 1h). This example showed the exciting power of big data in providing novel insights and quickly developing robust, testable hypotheses with no experimental cost. In summary, PPRD is a convenient, web‐accessible, user‐friendly RNA‐seq database that allows users to quickly scan the gene expression from maize, rice, soybean, wheat or cotton public RNA‐seq libraries and returns the multiple forms of results in tables and diagrams, showing the expression levels in various tissues, developmental stages, abiotic stresses, biotic stresses, as well as the differential expression in different mutants and treatments. Our previous Arabidopsis RNA‐seq database (ARS) has been updated recently, and the number of libraries has been increased from 20 068 to 28 164 (Zhang et al., 2020). We also plan to continue updating PPRD regularly by including new libraries and new plant species in the future. We believe PPRD will help make the transcriptome big data more available and accessible for our plant community members. Conflicts of interest The authors declare no conflicts of interest. Author contributions H.Z., Y.Y., Y.L. and Y.S. analysed the data, H.Z. and Y.Y. processed the data and built the database and website, and J.Z oversaw the study. Y.Y., H.Z. and J.Z. wrote the manuscript. Supporting information Appendix S1 Supplementary Methods. Click here for additional data file.

Related collections

Most cited references 10

Record: found
Abstract: found
Article: found

Is Open Access

Improvement of the Oryza sativa Nipponbare reference genome using next generation sequence and optical map data

Yoshihiro Kawahara, Melissa de la Bastide, John P Hamilton … (2013)

Background Rice research has been enabled by access to the high quality reference genome sequence generated in 2005 by the International Rice Genome Sequencing Project (IRGSP). To further facilitate genomic-enabled research, we have updated and validated the genome assembly and sequence for the Nipponbare cultivar of Oryza sativa (japonica group). Results The Nipponbare genome assembly was updated by revising and validating the minimal tiling path of clones with the optical map for rice. Sequencing errors in the revised genome assembly were identified by re-sequencing the genome of two different Nipponbare individuals using the Illumina Genome Analyzer II/IIx platform. A total of 4,886 sequencing errors were identified in 321 Mb of the assembled genome indicating an error rate in the original IRGSP assembly of only 0.15 per 10,000 nucleotides. A small number (five) of insertions/deletions were identified using longer reads generated using the Roche 454 pyrosequencing platform. As the re-sequencing data were generated from two different individuals, we were able to identify a number of allelic differences between the original individual used in the IRGSP effort and the two individuals used in the re-sequencing effort. The revised assembly, termed Os-Nipponbare-Reference-IRGSP-1.0, is now being used in updated releases of the Rice Annotation Project and the Michigan State University Rice Genome Annotation Project, thereby providing a unified set of pseudomolecules for the rice community. Conclusions A revised, error-corrected, and validated assembly of the Nipponbare cultivar of rice was generated using optical map data, re-sequencing data, and manual curation that will facilitate on-going and future research in rice. Detection of polymorphisms between three different Nipponbare individuals highlights that allelic differences between individuals should be considered in diversity studies. Electronic supplementary material The online version of this article (doi:10.1186/1939-8433-6-4) contains supplementary material, which is available to authorized users.

0 comments Cited 437 times – based on 0 reviews      Review now

Bookmark

Record: found
Abstract: found
Article: not found

Variant Review with the Integrative Genomics Viewer.

James Robinson, Helga Thorvaldsdóttir, Aaron Wenger … (2017)

Manual review of aligned reads for confirmation and interpretation of variant calls is an important step in many variant calling pipelines for next-generation sequencing (NGS) data. Visual inspection can greatly increase the confidence in calls, reduce the risk of false positives, and help characterize complex events. The Integrative Genomics Viewer (IGV) was one of the first tools to provide NGS data visualization, and it currently provides a rich set of tools for inspection, validation, and interpretation of NGS datasets, as well as other types of genomic data. Here, we present a short overview of IGV's variant review features for both single-nucleotide variants and structural variants, with examples from both cancer and germline datasets. IGV is freely available at https://www.igv.org Cancer Res; 77(21); e31-34. ©2017 AACR.

0 comments Cited 401 times – based on 0 reviews      Review now

Bookmark

Record: found
Abstract: not found
Article: not found

A gene cluster encoding lectin receptor kinases confers broad-spectrum and durable insect resistance in rice.

Yuqiang Liu, Han-Kuei Wu, Hong Chen … (2015)

0 comments Cited 122 times – based on 0 reviews      Review now

Bookmark

All references

Author and article information

Contributors

Jixian Zhai:

ORCID: https://orcid.org/0000-0002-0217-0666

zhaijx@sustech.edu.cn

Journal

Journal ID (nlm-ta): Plant Biotechnol J

Journal ID (iso-abbrev): Plant Biotechnol J

Journal ID (doi): 10.1111/(ISSN)1467-7652

Journal ID (publisher-id): PBI

Title: Plant Biotechnology Journal

Publisher: John Wiley and Sons Inc. (Hoboken )

ISSN (Print): 1467-7644

ISSN (Electronic): 1467-7652

Publication date (Electronic): 06 March 2022

Publication date (Print): May 2022

Volume: 20

Issue: 5 ( doiID: 10.1111/pbi.v20.5 )

Pages: 806-808

Affiliations

[ ¹ ] ringgold 255310; Harbin Institute of Technology Harbin China

[ ² ] ringgold 255310; Department of Biology School of Life Sciences Southern University of Science and Technology Shenzhen China

[ ³ ] ringgold 255310; Institute of Plant and Food Science Southern University of Science and Technology Shenzhen China

[ ⁴ ] ringgold 255310; Key Laboratory of Molecular Design for Plant Cell Factory of Guangdong Higher Education Institutes Southern University of Science and Technology Shenzhen China

Author notes

[*] [* ] Correspondence (Tel +86‐755‐88018403; email zhaijx@ 123456sustech.edu.cn )

[ † ]

These authors contributed equally to this work.

Author information

Jixian Zhai https://orcid.org/0000-0002-0217-0666

Article

Publisher ID: PBI13798

DOI: 10.1111/pbi.13798

PMC ID: 9055819

PubMed ID: 35218297

SO-VID: c038778f-4fdc-4f27-9860-7760258f6535

License:

This is an open access article under the terms of the http://creativecommons.org/licenses/by-nc-nd/4.0/ License, which permits use and distribution in any medium, provided the original work is properly cited, the use is non‐commercial and no modifications or adaptations are made.

History

Date received : 10 November 2021

Date accepted : 17 February 2022

Page count

Figures: 1, Tables: 0, Pages: 3, Words: 1712

Funding

Funded by: Program for Guangdong Introducing Innovative and Entrepreneurial Teams

Award ID: 2016ZT06S172

Custom metadata

source-schema-version-number 2.0

cover-date May 2022

details-of-publishers-convertor Converter:WILEY_ML3GV2_TO_JATSPMC version:6.1.4 mode:remove_FC converted:30.04.2022

ScienceOpen disciplines: Biotechnology

Keywords: rna‐seq,oryza sativa,zea mays,glycine max,triticum aestivum,gossypium hirsutum,transcriptome,database

Data availability:

ScienceOpen disciplines: Biotechnology

Keywords: rna‐seq, oryza sativa, zea mays, glycine max, triticum aestivum, gossypium hirsutum, transcriptome, database

Plant Public RNA‐seq Database: a comprehensive online database for expression analysis of ~45 000 plant public RNA‐Seq libraries

Read this article at

Abstract

Related collections

Plant MYBs

Most cited references 10

Improvement of the Oryza sativa Nipponbare reference genome using next generation sequence and optical map data

Variant Review with the Integrative Genomics Viewer.

A gene cluster encoding lectin receptor kinases confers broad-spectrum and durable insect resistance in rice.

Author and article information

Contributors

Journal

Affiliations

Author notes

Author information

Article

History

Page count

Funding

Categories

Custom metadata

Comments

Comment on this article

Similar content 3,246

Cited by 31

Most referenced authors 239