The Genomes of Oryza sativa: A History of Duplications

There is no author summary for this article yet. Authors can add summaries to their articles on ScienceOpen to make them more accessible to a non-specialist audience.

Abstract

We report improved whole-genome shotgun sequences for the genomes of indica and japonica rice, both with multimegabase contiguity, or almost 1,000-fold improvement over the drafts of 2002. Tested against a nonredundant collection of 19,079 full-length cDNAs, 97.7% of the genes are aligned, without fragmentation, to the mapped super-scaffolds of one or the other genome. We introduce a gene identification procedure for plants that does not rely on similarity to known genes to remove erroneous predictions resulting from transposable elements. Using the available EST data to adjust for residual errors in the predictions, the estimated gene count is at least 38,000–40,000. Only 2%–3% of the genes are unique to any one subspecies, comparable to the amount of sequence that might still be missing. Despite this lack of variation in gene content, there is enormous variation in the intergenic regions. At least a quarter of the two sequences could not be aligned, and where they could be aligned, single nucleotide polymorphism (SNP) rates varied from as little as 3.0 SNP/kb in the coding regions to 27.6 SNP/kb in the transposable elements. A more inclusive new approach for analyzing duplication history is introduced here. It reveals an ancient whole-genome duplication, a recent segmental duplication on Chromosomes 11 and 12, and massive ongoing individual gene duplications. We find 18 distinct pairs of duplicated segments that cover 65.7% of the genome; 17 of these pairs date back to a common time before the divergence of the grasses. More important, ongoing individual gene duplications provide a never-ending source of raw material for gene genesis and are major contributors to the differences between members of the grass family.

Abstract

Comparative genome sequencing of indica and japonica rice reveals that duplication of genes and genomic regions has played a major part in the evolution of grass genomes

Related collections

Most cited references 78

Record: found
Abstract: found
Article: not found

A draft sequence of the rice genome (Oryza sativa L. ssp. indica).

J. Yu (2002)

We have produced a draft sequence of the rice genome for the most widely cultivated subspecies in China, Oryza sativa L. ssp. indica, by whole-genome shotgun sequencing. The genome was 466 megabases in size, with an estimated 46,022 to 55,615 genes. Functional coverage in the assembled sequences was 92.0%. About 42.2% of the genome was in exact 20-nucleotide oligomer repeats, and most of the transposons were in the intergenic regions between genes. Although 80.6% of predicted Arabidopsis thaliana genes had a homolog in rice, only 49.4% of predicted rice genes had a homolog in A. thaliana. The large proportion of rice genes with no recognizable homologs is due to a gradient in the GC content of rice coding sequences.

0 comments Cited 702 times – based on 0 reviews      Review now

Bookmark

Record: found
Abstract: found
Article: not found

DIP, the Database of Interacting Proteins: a research tool for studying cellular networks of protein interactions.

I Xenarios (2002)

The Database of Interacting Proteins (DIP: http://dip.doe-mbi.ucla.edu) is a database that documents experimentally determined protein-protein interactions. It provides the scientific community with an integrated set of tools for browsing and extracting information about protein interaction networks. As of September 2001, the DIP catalogs approximately 11 000 unique interactions among 5900 proteins from >80 organisms; the vast majority from yeast, Helicobacter pylori and human. Tools have been developed that allow users to analyze, visualize and integrate their own experimental data with the information about protein-protein interactions available in the DIP database.

0 comments Cited 484 times – based on 0 reviews      Review now

Bookmark

Record: found
Abstract: found
Article: not found

Estimating synonymous and nonsynonymous substitution rates under realistic evolutionary models.

Q. Z. Yang (2000)

Approximate methods for estimating the numbers of synonymous and nonsynonymous substitutions between two DNA sequences involve three steps: counting of synonymous and nonsynonymous sites in the two sequences, counting of synonymous and nonsynonymous differences between the two sequences, and correcting for multiple substitutions at the same site. We examine complexities involved in those steps and propose a new approximate method that takes into account two major features of DNA sequence evolution: transition/transversion rate bias and base/codon frequency bias. We compare the new method with maximum likelihood, as well as several other approximate methods, by examining infinitely long sequences, performing computer simulations, and analyzing a real data set. The results suggest that when there are transition/transversion rate biases and base/codon frequency biases, previously described approximate methods for estimating the nonsynonymous/synonymous rate ratio may involve serious biases, and the bias can be both positive and negative. The new method is, in general, superior to earlier approximate methods and may be useful for analyzing large data sets, although maximum likelihood appears to always be the method of choice.

0 comments Cited 421 times – based on 0 reviews      Review now

Bookmark

All references

Author and article information

Contributors

: Role: Academic Editor

Journal

Journal ID (nlm-ta): PLoS Biol

Journal ID (publisher-id): pbio

Title: PLoS Biology

Publisher: Public Library of Science (San Francisco, USA )

ISSN (Print): 1544-9173

ISSN (Electronic): 1545-7885

Publication date (Print): February 2005

Publication date (Electronic): 1 February 2005

Volume: 3

Issue: 2

Electronic Location Identifier: e38

Affiliations

[1] 1Beijing Institute of Genomics of the Chinese Academy of Sciences, Beijing Genomics Institute, Beijing Proteomics Institute BeijingChina

[2] 2James D. Watson Institute of Genome Sciences of Zhejiang University, Hangzhou Genomics Institute, Key Laboratory of Genomic Bioinformatics of Zhejiang Province HangzhouChina

[3] 3College of Life Sciences, Peking University BeijingChina

[4] 4Institute of Theoretical Physics, Chinese Academy of Sciences BeijingChina

[5] 5Beijing North Computation Center BeijingChina

[6] 6BioInformatics Laboratory, Institute of Computing Technology, Chinese Academy of Sciences BeijingChina

[7] 7Department of Statistics and Financial Mathematics, College of Mathematical Sciences, Beijing Normal University BeijingChina

[8] 8Kunming Institute of Zoology, Chinese Academy of Sciences KunmingChina

[9] 9National Hybrid Rice R & D Center ChangshaChina

[10] 10Computational Genomics Group, Department of Microbiology University of Washington, Seattle, WashingtonUnited States of America

[11] 11UW Genome Center, Department of Medicine, University of Washington Seattle, WashingtonUnited States of America

University of Georgia United States of America

Article

DOI: 10.1371/journal.pbio.0030038

PMC ID: 546038

PubMed ID: 15685292

SO-VID: 4ed320aa-6775-4bc9-8a5c-27b36b522e28

Copyright © Copyright: © 2005 Yu et al. This is an open-access article distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited

The Genomes of Oryza sativa: A History of Duplications

Read this article at

Abstract

Abstract

Related collections

G3: Genes|Genomes|Genetics

Most cited references 78

A draft sequence of the rice genome (Oryza sativa L. ssp. indica).

DIP, the Database of Interacting Proteins: a research tool for studying cellular networks of protein interactions.

Estimating synonymous and nonsynonymous substitution rates under realistic evolutionary models.

Author and article information

Contributors

Journal

Affiliations

Article

History

Categories

Comments

Comment on this article

Similar content 26

Cited by 277

Most referenced authors 3,001