21
views
0
recommends
+1 Recommend
0 collections
    0
    shares
      • Record: found
      • Abstract: found
      • Article: found
      Is Open Access

      Clustering gene expression time series data using an infinite Gaussian process mixture model

      research-article

      Read this article at

      Bookmark
          There is no author summary for this article yet. Authors can add summaries to their articles on ScienceOpen to make them more accessible to a non-specialist audience.

          Abstract

          Transcriptome-wide time series expression profiling is used to characterize the cellular response to environmental perturbations. The first step to analyzing transcriptional response data is often to cluster genes with similar responses. Here, we present a nonparametric model-based method, Dirichlet process Gaussian process mixture model (DPGP), which jointly models data clusters with a Dirichlet process and temporal dependencies with Gaussian processes. We demonstrate the accuracy of DPGP in comparison to state-of-the-art approaches using hundreds of simulated data sets. To further test our method, we apply DPGP to published microarray data from a microbial model organism exposed to stress and to novel RNA-seq data from a human cell line exposed to the glucocorticoid dexamethasone. We validate our clusters by examining local transcription factor binding and histone modifications. Our results demonstrate that jointly modeling cluster number and temporal dependencies can reveal shared regulatory mechanisms. DPGP software is freely available online at https://github.com/PrincetonUniversity/DP_GP_cluster.

          Author summary

          Transcriptome-wide measurement of gene expression dynamics can reveal regulatory mechanisms that control how cells respond to changes in the environment. Such measurements may identify hundreds to thousands of responsive genes. Clustering genes with similar dynamics reveals a smaller set of response types that can then be explored and analyzed for distinct functions. Two challenges in clustering time series gene expression data are selecting the number of clusters and modeling dependencies in gene expression levels between time points. We present a methodology, DPGP, in which a Dirichlet process clusters the trajectories of gene expression levels across time, where the trajectories are modeled using a Gaussian process. We demonstrate the performance of DPGP compared to state-of-the-art time series clustering methods across a variety of simulated data. We apply DPGP to published microbial expression data and find that it recapitulates known expression regulation with minimal user input. We then use DPGP to identify novel human gene expression responses to the widely-prescribed synthetic glucocorticoid hormone dexamethasone. We find distinct clusters of responsive transcripts that are validated by considering between-cluster differences in transcription factor binding and histone modifications. These results demonstrate that DPGP can be used for exploratory data analysis of gene expression time series to reveal novel insights into biomedically important gene regulatory processes.

          Related collections

          Most cited references66

          • Record: found
          • Abstract: not found
          • Article: not found

          Controlling the False Discovery Rate: A Practical and Powerful Approach to Multiple Testing

            Bookmark
            • Record: found
            • Abstract: found
            • Article: not found

            Gene Ontology: tool for the unification of biology

            Genomic sequencing has made it clear that a large fraction of the genes specifying the core biological functions are shared by all eukaryotes. Knowledge of the biological role of such shared proteins in one organism can often be transferred to other organisms. The goal of the Gene Ontology Consortium is to produce a dynamic, controlled vocabulary that can be applied to all eukaryotes even as knowledge of gene and protein roles in cells is accumulating and changing. To this end, three independent ontologies accessible on the World-Wide Web (http://www.geneontology.org) are being constructed: biological process, molecular function and cellular component.
              Bookmark
              • Record: found
              • Abstract: found
              • Article: not found

              Cluster analysis and display of genome-wide expression patterns.

              A system of cluster analysis for genome-wide expression data from DNA microarray hybridization is described that uses standard statistical algorithms to arrange genes according to similarity in pattern of gene expression. The output is displayed graphically, conveying the clustering and the underlying expression data simultaneously in a form intuitive for biologists. We have found in the budding yeast Saccharomyces cerevisiae that clustering gene expression data groups together efficiently genes of known similar function, and we find a similar tendency in human data. Thus patterns seen in genome-wide expression experiments can be interpreted as indications of the status of cellular processes. Also, coexpression of genes of known function with poorly characterized or novel genes may provide a simple means of gaining leads to the functions of many genes for which information is not available currently.
                Bookmark

                Author and article information

                Contributors
                Role: ConceptualizationRole: Data curationRole: Formal analysisRole: InvestigationRole: MethodologyRole: SoftwareRole: ValidationRole: VisualizationRole: Writing – original draftRole: Writing – review & editing
                Role: ConceptualizationRole: MethodologyRole: Software
                Role: Data curationRole: Resources
                Role: InvestigationRole: ResourcesRole: SupervisionRole: ValidationRole: Writing – original draftRole: Writing – review & editing
                Role: Data curationRole: Funding acquisitionRole: ResourcesRole: SupervisionRole: ValidationRole: Writing – original draftRole: Writing – review & editing
                Role: ConceptualizationRole: Data curationRole: Funding acquisitionRole: InvestigationRole: MethodologyRole: Project administrationRole: SupervisionRole: ValidationRole: VisualizationRole: Writing – original draftRole: Writing – review & editing
                Role: Editor
                Journal
                PLoS Comput Biol
                PLoS Comput. Biol
                plos
                ploscomp
                PLoS Computational Biology
                Public Library of Science (San Francisco, CA USA )
                1553-734X
                1553-7358
                January 2018
                16 January 2018
                : 14
                : 1
                : e1005896
                Affiliations
                [1 ] Computational Biology & Bioinformatics Graduate Program, Duke University, Durham, North Carolina, United States of America
                [2 ] Center for Genomic & Computational Biology, Duke University, Durham, North Carolina, United States of America
                [3 ] Department of Biostatistics & Bioinformatics, Duke University Medical Center, Durham, North Carolina, United States of America
                [4 ] Biology Department, Duke University, Durham, North Carolina, United States of America
                [5 ] Department of Computer Science, Princeton University, Princeton, New Jersey, United States of America
                [6 ] Center for Statistics and Machine Learning, Princeton University, Princeton, New Jersey, United States of America
                University of California Irvine, UNITED STATES
                Author notes

                The authors have declared that no competing interests exist.

                Author information
                http://orcid.org/0000-0003-1532-6786
                http://orcid.org/0000-0001-5821-8000
                Article
                PCOMPBIOL-D-17-00706
                10.1371/journal.pcbi.1005896
                5786324
                29337990
                1b40b62d-99a4-4f6d-940b-96baf35bd679
                © 2018 McDowell et al

                This is an open access article distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited.

                History
                : 3 May 2017
                : 25 November 2017
                Page count
                Figures: 5, Tables: 0, Pages: 27
                Funding
                Funded by: funder-id http://dx.doi.org/10.13039/100000002, National Institutes of Health;
                Award ID: R00 HG006265
                Award Recipient :
                Funded by: funder-id http://dx.doi.org/10.13039/100000002, National Institutes of Health;
                Award ID: R01 MH101822
                Award Recipient :
                Funded by: funder-id http://dx.doi.org/10.13039/100000002, National Institutes of Health;
                Award ID: U01 HG007900
                Award Recipient :
                Funded by: funder-id http://dx.doi.org/10.13039/100000879, Alfred P. Sloan Foundation;
                Award ID: Sloan Faculty Fellowship
                Award Recipient :
                Funded by: funder-id http://dx.doi.org/10.13039/100000002, National Institutes of Health;
                Award ID: U01 HG007900
                Award Recipient :
                Funded by: funder-id http://dx.doi.org/10.13039/100000002, National Institutes of Health;
                Award ID: F31 HL129743
                Award Recipient :
                Funded by: funder-id http://dx.doi.org/10.13039/100000002, National Institutes of Health;
                Award ID: 5T32GM071340
                Award Recipient :
                Funded by: funder-id http://dx.doi.org/10.13039/100000001, National Science Foundation;
                Award ID: MCB 1417750
                Award Recipient :
                BEE was funded by National Institutes of Health R00 HG006265, National Institutes of Health R01 MH101822, National Institutes of Health U01 HG007900, and a Sloan Faculty Fellowship. CMV, ICM, and TER were funded by National Institutes of Health U01 HG007900. CMV was also funded by National Institutes of Health F31 HL129743. DM was funded by National Institutes of Health training grant 5T32GM071340. AKS was funded by National Science Foundation MCB 1417750 and NSF CAREER 1651117. The funders had no role in study design, data collection and analysis, decision to publish, or preparation of the manuscript.
                Categories
                Research Article
                Biology and Life Sciences
                Genetics
                Gene Expression
                Biology and Life Sciences
                Cell Biology
                Chromosome Biology
                Chromatin
                Chromatin Modification
                Histone Modification
                Biology and Life Sciences
                Genetics
                Epigenetics
                Chromatin
                Chromatin Modification
                Histone Modification
                Biology and Life Sciences
                Genetics
                Gene Expression
                Chromatin
                Chromatin Modification
                Histone Modification
                Biology and Life Sciences
                Genetics
                Gene Expression
                Histone Modification
                Research and Analysis Methods
                Simulation and Modeling
                Physical Sciences
                Mathematics
                Probability Theory
                Random Variables
                Covariance
                Biology and Life Sciences
                Genetics
                Gene Expression
                Gene Regulation
                Biology and life sciences
                Genetics
                Gene expression
                DNA transcription
                Biology and Life Sciences
                Cell Biology
                Cell Processes
                Cell Cycle and Cell Division
                Physical Sciences
                Mathematics
                Probability Theory
                Probability Distribution
                Custom metadata
                vor-update-to-uncorrected-proof
                2018-01-26
                All raw RNA-seq files are available from the Gene Expression Omnibus database (accession number GSE104714).

                Quantitative & Systems biology
                Quantitative & Systems biology

                Comments

                Comment on this article