phyloseq: An R Package for Reproducible Interactive Analysis and Graphics of Microbiome Census Data

There is no author summary for this article yet. Authors can add summaries to their articles on ScienceOpen to make them more accessible to a non-specialist audience.

Abstract

Background

The analysis of microbial communities through DNA sequencing brings many challenges: the integration of different types of data with methods from ecology, genetics, phylogenetics, multivariate statistics, visualization and testing. With the increased breadth of experimental designs now being pursued, project-specific statistical analyses are often needed, and these analyses are often difficult (or impossible) for peer researchers to independently reproduce. The vast majority of the requisite tools for performing these analyses reproducibly are already implemented in R and its extensions (packages), but with limited support for high throughput microbiome census data.

Results

Here we describe a software project, phyloseq, dedicated to the object-oriented representation and analysis of microbiome census data in R. It supports importing data from a variety of common formats, as well as many analysis techniques. These include calibration, filtering, subsetting, agglomeration, multi-table comparisons, diversity analysis, parallelized Fast UniFrac, ordination methods, and production of publication-quality graphics; all in a manner that is easy to document, share, and modify. We show how to apply functions from other R packages to phyloseq-represented data, illustrating the availability of a large number of open source analysis techniques. We discuss the use of phyloseq with tools for reproducible research, a practice common in other fields but still rare in the analysis of highly parallel microbiome census data. We have made available all of the materials necessary to completely reproduce the analysis and figures included in this article, an example of best practices for reproducible research.

Conclusions

The phyloseq project for R is a new open-source software package, freely available on the web from both GitHub and Bioconductor.

Related collections

Most cited references 23

Record: found
Abstract: found
Article: not found

Error-correcting barcoded primers for pyrosequencing hundreds of samples in multiplex.

Micah Hamady, Jeffrey Walker, J Kirk Harris … (2008)

We constructed error-correcting DNA barcodes that allow one run of a massively parallel pyrosequencer to process up to 1,544 samples simultaneously. Using these barcodes we processed bacterial 16S rRNA gene sequences representing microbial communities in 286 environmental samples, corrected 92% of sample assignment errors, and thus characterized nearly as many 16S rRNA genes as have been sequenced to date by Sanger sequencing.

0 comments Cited 465 times – based on 0 reviews      Review now

Bookmark

Record: found
Abstract: found
Article: not found

Reproducible research in computational science.

Roger Peng (2011)

Computational science has led to exciting new developments, but the nature of the work has exposed limitations in our ability to evaluate published findings. Reproducibility has the potential to serve as a minimum standard for judging scientific claims when full independent replication of a study is not possible.

0 comments Cited 389 times – based on 0 reviews      Review now

Bookmark

Record: found
Abstract: found
Article: not found

NAST: a multiple sequence alignment server for comparative analysis of 16S rRNA genes

T. DeSantis, P. Hugenholtz, K Keller … (2006)

Microbiologists conducting surveys of bacterial and archaeal diversity often require comparative alignments of thousands of 16S rRNA genes collected from a sample. The computational resources and bioinformatics expertise required to construct such an alignment has inhibited high-throughput analysis. It was hypothesized that an online tool could be developed to efficiently align thousands of 16S rRNA genes via the NAST (Nearest Alignment Space Termination) algorithm for creating multiple sequence alignments (MSA). The tool was implemented with a web-interface at . Each user-submitted sequence is compared with Greengenes' ‘Core Set’, comprising ∼10 000 aligned non-chimeric sequences representative of the currently recognized diversity among bacteria and archaea. User sequences are oriented and paired with their closest match in the Core Set to serve as a template for inserting gap characters. Non-16S data (sequence from vector or surrounding genomic regions) are conveniently removed in the returned alignment. From the resulting MSA, distance matrices can be calculated for diversity estimates and organisms can be classified by taxonomy. The ability to align and categorize large sequence sets using a simple interface has enabled researchers with various experience levels to obtain bacterial and archaeal community profiles.

0 comments Cited 285 times – based on 0 reviews      Review now

Bookmark

All references

Author and article information

Contributors

Michael Watson: Role: Editor

Journal

Journal ID (nlm-ta): PLoS One

Journal ID (iso-abbrev): PLoS ONE

Journal ID (publisher-id): plos

Journal ID (pmc): plosone

Title: PLoS ONE

Publisher: Public Library of Science (San Francisco, USA )

ISSN (Electronic): 1932-6203

Publication date Collection: 2013

Publication date (Electronic): 22 April 2013

Volume: 8

Issue: 4

Electronic Location Identifier: e61217

Affiliations

[1]Department of Statistics, Stanford University, Stanford, California, United States of America

The Roslin Institute, University of Edinburgh, United Kingdom

Author notes

* E-mail: susan@ 123456stat.stanford.edu

Competing Interests: The authors have declared that no competing interests exist.

Designed and wrote the software described: PJM. Conceived and designed the experiments: PJM SH. Performed the experiments: PJM SH. Analyzed the data: PJM SH. Contributed reagents/materials/analysis tools: PJM SH. Wrote the paper: PJM SH.

Article

Publisher ID: PONE-D-12-31789

DOI: 10.1371/journal.pone.0061217

PMC ID: 3632530

PubMed ID: 23630581

SO-VID: efc22cbb-b83d-4112-8b5b-e565cdf80426

License:

This is an open-access article distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited.

History

Date received : 17 October 2012

Date accepted : 6 March 2013

Page count

Pages: 11

Funding

This work was supported by grant NIH-R01GM086884. The funders had no role in study design, data collection and analysis, decision to publish, or preparation of the manuscript.

phyloseq: An R Package for Reproducible Interactive Analysis and Graphics of Microbiome Census Data

Read this article at

Abstract

Background

Results

Conclusions

Related collections

PLOS Climate

Most cited references 23

Error-correcting barcoded primers for pyrosequencing hundreds of samples in multiplex.

Reproducible research in computational science.

NAST: a multiple sequence alignment server for comparative analysis of 16S rRNA genes

Author and article information

Contributors

Journal

Affiliations

Author notes

Article

History

Page count

Funding

Categories

Comments

Comment on this article

Similar content 254

Cited by 6,862