301
views
0
recommends
+1 Recommend
0 collections
    0
    shares
      • Record: found
      • Abstract: found
      • Article: found
      Is Open Access

      phyloseq: An R Package for Reproducible Interactive Analysis and Graphics of Microbiome Census Data

      research-article
      , *
      PLoS ONE
      Public Library of Science

      Read this article at

      Bookmark
          There is no author summary for this article yet. Authors can add summaries to their articles on ScienceOpen to make them more accessible to a non-specialist audience.

          Abstract

          Background

          The analysis of microbial communities through DNA sequencing brings many challenges: the integration of different types of data with methods from ecology, genetics, phylogenetics, multivariate statistics, visualization and testing. With the increased breadth of experimental designs now being pursued, project-specific statistical analyses are often needed, and these analyses are often difficult (or impossible) for peer researchers to independently reproduce. The vast majority of the requisite tools for performing these analyses reproducibly are already implemented in R and its extensions (packages), but with limited support for high throughput microbiome census data.

          Results

          Here we describe a software project, phyloseq, dedicated to the object-oriented representation and analysis of microbiome census data in R. It supports importing data from a variety of common formats, as well as many analysis techniques. These include calibration, filtering, subsetting, agglomeration, multi-table comparisons, diversity analysis, parallelized Fast UniFrac, ordination methods, and production of publication-quality graphics; all in a manner that is easy to document, share, and modify. We show how to apply functions from other R packages to phyloseq-represented data, illustrating the availability of a large number of open source analysis techniques. We discuss the use of phyloseq with tools for reproducible research, a practice common in other fields but still rare in the analysis of highly parallel microbiome census data. We have made available all of the materials necessary to completely reproduce the analysis and figures included in this article, an example of best practices for reproducible research.

          Conclusions

          The phyloseq project for R is a new open-source software package, freely available on the web from both GitHub and Bioconductor.

          Related collections

          Most cited references23

          • Record: found
          • Abstract: found
          • Article: not found

          Error-correcting barcoded primers for pyrosequencing hundreds of samples in multiplex.

          We constructed error-correcting DNA barcodes that allow one run of a massively parallel pyrosequencer to process up to 1,544 samples simultaneously. Using these barcodes we processed bacterial 16S rRNA gene sequences representing microbial communities in 286 environmental samples, corrected 92% of sample assignment errors, and thus characterized nearly as many 16S rRNA genes as have been sequenced to date by Sanger sequencing.
            Bookmark
            • Record: found
            • Abstract: found
            • Article: not found

            Reproducible research in computational science.

            Roger Peng (2011)
            Computational science has led to exciting new developments, but the nature of the work has exposed limitations in our ability to evaluate published findings. Reproducibility has the potential to serve as a minimum standard for judging scientific claims when full independent replication of a study is not possible.
              Bookmark
              • Record: found
              • Abstract: found
              • Article: not found

              NAST: a multiple sequence alignment server for comparative analysis of 16S rRNA genes

              Microbiologists conducting surveys of bacterial and archaeal diversity often require comparative alignments of thousands of 16S rRNA genes collected from a sample. The computational resources and bioinformatics expertise required to construct such an alignment has inhibited high-throughput analysis. It was hypothesized that an online tool could be developed to efficiently align thousands of 16S rRNA genes via the NAST (Nearest Alignment Space Termination) algorithm for creating multiple sequence alignments (MSA). The tool was implemented with a web-interface at . Each user-submitted sequence is compared with Greengenes' ‘Core Set’, comprising ∼10 000 aligned non-chimeric sequences representative of the currently recognized diversity among bacteria and archaea. User sequences are oriented and paired with their closest match in the Core Set to serve as a template for inserting gap characters. Non-16S data (sequence from vector or surrounding genomic regions) are conveniently removed in the returned alignment. From the resulting MSA, distance matrices can be calculated for diversity estimates and organisms can be classified by taxonomy. The ability to align and categorize large sequence sets using a simple interface has enabled researchers with various experience levels to obtain bacterial and archaeal community profiles.
                Bookmark

                Author and article information

                Contributors
                Role: Editor
                Journal
                PLoS One
                PLoS ONE
                plos
                plosone
                PLoS ONE
                Public Library of Science (San Francisco, USA )
                1932-6203
                2013
                22 April 2013
                : 8
                : 4
                : e61217
                Affiliations
                [1]Department of Statistics, Stanford University, Stanford, California, United States of America
                The Roslin Institute, University of Edinburgh, United Kingdom
                Author notes

                Competing Interests: The authors have declared that no competing interests exist.

                Designed and wrote the software described: PJM. Conceived and designed the experiments: PJM SH. Performed the experiments: PJM SH. Analyzed the data: PJM SH. Contributed reagents/materials/analysis tools: PJM SH. Wrote the paper: PJM SH.

                Article
                PONE-D-12-31789
                10.1371/journal.pone.0061217
                3632530
                23630581
                efc22cbb-b83d-4112-8b5b-e565cdf80426
                Copyright @ 2013

                This is an open-access article distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited.

                History
                : 17 October 2012
                : 6 March 2013
                Page count
                Pages: 11
                Funding
                This work was supported by grant NIH-R01GM086884. The funders had no role in study design, data collection and analysis, decision to publish, or preparation of the manuscript.
                Categories
                Research Article
                Biology
                Computational Biology
                Genomics
                Metagenomics
                Biological Data Management
                Sequence Analysis
                Genomics
                Metagenomics
                Microbiology
                Applied Microbiology
                Microbial Ecology
                Population Biology
                Population Ecology
                Population Genetics
                Computer Science
                Programming Languages
                High Level Languages

                Uncategorized
                Uncategorized

                Comments

                Comment on this article