8
views
0
recommends
+1 Recommend
0 collections
    0
    shares
      • Record: found
      • Abstract: found
      • Article: found
      Is Open Access

      A field guide for the compositional analysis of any-omics data

      research-article

      Read this article at

      Bookmark
          There is no author summary for this article yet. Authors can add summaries to their articles on ScienceOpen to make them more accessible to a non-specialist audience.

          Abstract

          Background

          Next-generation sequencing (NGS) has made it possible to determine the sequence and relative abundance of all nucleotides in a biological or environmental sample. A cornerstone of NGS is the quantification of RNA or DNA presence as counts. However, these counts are not counts per se: their magnitude is determined arbitrarily by the sequencing depth, not by the input material. Consequently, counts must undergo normalization prior to use. Conventional normalization methods require a set of assumptions: they assume that the majority of features are unchanged and that all environments under study have the same carrying capacity for nucleotide synthesis. These assumptions are often untestable and may not hold when heterogeneous samples are compared.

          Results

          Methods developed within the field of compositional data analysis offer a general solution that is assumption-free and valid for all data. Herein, we synthesize the extant literature to provide a concise guide on how to apply compositional data analysis to NGS count data.

          Conclusions

          In highlighting the limitations of total library size, effective library size, and spike-in normalizations, we propose the log-ratio transformation as a general solution to answer the question, “Relative to some important activity of the cell, what is changing?”

          Related collections

          Most cited references50

          • Record: found
          • Abstract: found
          • Article: found
          Is Open Access

          A step-by-step workflow for low-level analysis of single-cell RNA-seq data with Bioconductor

          Single-cell RNA sequencing (scRNA-seq) is widely used to profile the transcriptome of individual cells. This provides biological resolution that cannot be matched by bulk RNA sequencing, at the cost of increased technical noise and data complexity. The differences between scRNA-seq and bulk RNA-seq data mean that the analysis of the former cannot be performed by recycling bioinformatics pipelines for the latter. Rather, dedicated single-cell methods are required at various steps to exploit the cellular resolution while accounting for technical noise. This article describes a computational workflow for low-level analyses of scRNA-seq data, based primarily on software packages from the open-source Bioconductor project. It covers basic steps including quality control, data exploration and normalization, as well as more complex procedures such as cell cycle phase assignment, identification of highly variable and correlated genes, clustering into subpopulations and marker gene detection. Analyses were demonstrated on gene-level count data from several publicly available datasets involving haematopoietic stem cells, brain-derived cells, T-helper cells and mouse embryonic stem cells. This will provide a range of usage scenarios from which readers can construct their own analysis pipelines.
            Bookmark
            • Record: found
            • Abstract: found
            • Article: not found

            The technology and biology of single-cell RNA sequencing.

            The differences between individual cells can have profound functional consequences, in both unicellular and multicellular organisms. Recently developed single-cell mRNA-sequencing methods enable unbiased, high-throughput, and high-resolution transcriptomic analysis of individual cells. This provides an additional dimension to transcriptomic information relative to traditional methods that profile bulk populations of cells. Already, single-cell RNA-sequencing methods have revealed new biology in terms of the composition of tissues, the dynamics of transcription, and the regulatory relationships between genes. Rapid technological developments at the level of cell capture, phenotyping, molecular biology, and bioinformatics promise an exciting future with numerous biological and medical applications.
              Bookmark
              • Record: found
              • Abstract: found
              • Article: not found

              Synthetic spike-in standards for RNA-seq experiments.

              High-throughput sequencing of cDNA (RNA-seq) is a widely deployed transcriptome profiling and annotation technique, but questions about the performance of different protocols and platforms remain. We used a newly developed pool of 96 synthetic RNAs with various lengths, and GC content covering a 2(20) concentration range as spike-in controls to measure sensitivity, accuracy, and biases in RNA-seq experiments as well as to derive standard curves for quantifying the abundance of transcripts. We observed linearity between read density and RNA input over the entire detection range and excellent agreement between replicates, but we observed significantly larger imprecision than expected under pure Poisson sampling errors. We use the control RNAs to directly measure reproducible protocol-dependent biases due to GC content and transcript length as well as stereotypic heterogeneity in coverage across transcripts correlated with position relative to RNA termini and priming sequence bias. These effects lead to biased quantification for short transcripts and individual exons, which is a serious problem for measurements of isoform abundances, but that can partially be corrected using appropriate models of bias. By using the control RNAs, we derive limits for the discovery and detection of rare transcripts in RNA-seq experiments. By using data collected as part of the model organism and human Encyclopedia of DNA Elements projects (ENCODE and modENCODE), we demonstrate that external RNA controls are a useful resource for evaluating sensitivity and accuracy of RNA-seq experiments for transcriptome discovery and quantification. These quality metrics facilitate comparable analysis across different samples, protocols, and platforms.
                Bookmark

                Author and article information

                Journal
                Gigascience
                Gigascience
                gigascience
                GigaScience
                Oxford University Press
                2047-217X
                23 September 2019
                September 2019
                23 September 2019
                : 8
                : 9
                : giz107
                Affiliations
                [1 ] Bioinformatics Core Research Group, Deakin University , 1 Gheringhap Street, Geelong Victoria 3220, Australia
                [2 ] Centre for Molecular and Medical Research, Deakin University , 1 Gheringhap Street, Geelong Victoria 3220, Australia
                [3 ] Centre for Genomic Regulation (CRG), The Barcelona Institute of Science and Technology , Dr Aiguader 88, Barcelona 08003, Spain
                [4 ] Department of Biochemistry, University of Western Ontario , 1151 Richmond Street, London ON N6A 3K7, Canada
                [5 ] Genomics Centre, School of Life and Environmental Sciences, Deakin University , 1 Gheringhap Street, Geelong Victoria 3220, Australia
                [6 ] Centre for Integrative Ecology, School of Life and Environmental Sciences, Deakin University , 1 Gheringhap Street, Geelong Victoria 3220, Australia
                [7 ] Poultry Hub Australia, University of New England , Elm Avenue, Armidale New South Wales 2351, Australia
                Author notes
                Correspondence address. Thomas P. Quinn, Deakin University, 1 Gheringhap Street, Geelong Victoria 3220, Australia. E-mail: contacttomquinn@ 123456gmail.com

                Mark F. Richardson and Tamsyn M. Crowley Contributed equally.

                Author information
                http://orcid.org/0000-0003-0286-6329
                http://orcid.org/0000-0001-5803-3380
                http://orcid.org/0000-0003-1461-0988
                http://orcid.org/0000-0002-3698-8917
                Article
                giz107
                10.1093/gigascience/giz107
                6755255
                31544212
                6c0cebbf-3a05-4c94-ba47-5caa3a7fde26
                © The Author(s) 2019. Published by Oxford University Press.

                This is an Open Access article distributed under the terms of the Creative Commons Attribution License ( http://creativecommons.org/licenses/by/4.0/), which permits unrestricted reuse, distribution, and reproduction in any medium, provided the original work is properly cited.

                History
                : 14 February 2019
                : 10 July 2019
                : 12 August 2019
                Page count
                Pages: 14
                Categories
                Technical Note

                Comments

                Comment on this article