hdWGCNA identifies co-expression networks in high-dimensional transcriptomics data

There is no author summary for this article yet. Authors can add summaries to their articles on ScienceOpen to make them more accessible to a non-specialist audience.

Summary

Biological systems are immensely complex, organized into a multi-scale hierarchy of functional units based on tightly regulated interactions between distinct molecules, cells, organs, and organisms. While experimental methods enable transcriptome-wide measurements across millions of cells, popular bioinformatic tools do not support systems-level analysis. Here we present hdWGCNA, a comprehensive framework for analyzing co-expression networks in high-dimensional transcriptomics data such as single-cell and spatial RNA sequencing (RNA-seq). hdWGCNA provides functions for network inference, gene module identification, gene enrichment analysis, statistical tests, and data visualization. Beyond conventional single-cell RNA-seq, hdWGCNA is capable of performing isoform-level network analysis using long-read single-cell data. We showcase hdWGCNA using data from autism spectrum disorder and Alzheimer’s disease brain samples, identifying disease-relevant co-expression network modules. hdWGCNA is directly compatible with Seurat, a widely used R package for single-cell and spatial transcriptomics analysis, and we demonstrate the scalability of hdWGCNA by analyzing a dataset containing nearly 1 million cells.

Graphical abstract

Highlights

•

hdWGCNA constructs co-expression networks in high-dimensional transcriptomics data
•

hdWGCNA provides tools for statistics, visualization, and downstream interpretation
•

hdWGCNA is an open-source R package that uses Seurat data structures
•

hdWGCNA in human diseases demonstrates real-world analysis in complex datasets

Motivation

Single-cell and spatial transcriptomics assays are commonly used to profile the molecular signatures of biological systems, yielding high-dimensional datasets that can be used to model gene regulation across cell types, cell states, and spatial niches. Many statistical tools for high-dimensional transcriptomics data analysis focus on individual features rather than the underlying network structure, ignoring potential interactions between transcripts or genes. Here, we introduce hdWGCNA, a comprehensive methodological framework for the inference, analysis, and interpretation of gene co-expression networks in high-dimensional transcriptomics data. hdWGCNA is implemented as an open-source R package that extends the Seurat ecosystem of data analysis tools.

Abstract

Morabito et al. present hdWGCNA, an open-source R package for gene co-expression network analysis in single-cell and spatial transcriptomics data. hdWGCNA builds networks of genes using correlation information in specific cell subpopulations and spatial domains. Applications of hdWGCNA in autism spectrum disorder and Alzheimer’s disease revealed disease-associated gene networks.

Related collections

Most cited references 88

Record: found
Abstract: found
Article: found

Is Open Access

WGCNA: an R package for weighted correlation network analysis

Peter Langfelder, Steve Horvath (2008)

Background Correlation networks are increasingly being used in bioinformatics applications. For example, weighted gene co-expression network analysis is a systems biology method for describing the correlation patterns among genes across microarray samples. Weighted correlation network analysis (WGCNA) can be used for finding clusters (modules) of highly correlated genes, for summarizing such clusters using the module eigengene or an intramodular hub gene, for relating modules to one another and to external sample traits (using eigengene network methodology), and for calculating module membership measures. Correlation networks facilitate network based gene screening methods that can be used to identify candidate biomarkers or therapeutic targets. These methods have been successfully applied in various biological contexts, e.g. cancer, mouse genetics, yeast genetics, and analysis of brain imaging data. While parts of the correlation network methodology have been described in separate publications, there is a need to provide a user-friendly, comprehensive, and consistent software implementation and an accompanying tutorial. Results The WGCNA R software package is a comprehensive collection of R functions for performing various aspects of weighted correlation network analysis. The package includes functions for network construction, module detection, gene selection, calculations of topological properties, data simulation, visualization, and interfacing with external software. Along with the R package we also present R software tutorials. While the methods development was motivated by gene expression data, the underlying data mining approach can be applied to a variety of different settings. Conclusion The WGCNA package provides R functions for weighted correlation network analysis, e.g. co-expression network analysis of gene expression data. The R package along with its source code and additional material are freely available at .

0 comments Cited 6920 times – based on 0 reviews      Review now

Bookmark

Record: found
Abstract: found
Article: not found

Comprehensive Integration of Single-Cell Data

Tim Stuart, Andrew Butler, Paul Hoffman … (2019)

Single-cell transcriptomics has transformed our ability to characterize cell states, but deep biological understanding requires more than a taxonomic listing of clusters. As new methods arise to measure distinct cellular modalities, a key analytical challenge is to integrate these datasets to better understand cellular identity and function. Here, we develop a strategy to "anchor" diverse datasets together, enabling us to integrate single-cell measurements not only across scRNA-seq technologies, but also across different modalities. After demonstrating improvement over existing methods for integrating scRNA-seq data, we anchor scRNA-seq experiments with scATAC-seq to explore chromatin differences in closely related interneuron subsets and project protein expression measurements onto a bone marrow atlas to characterize lymphocyte populations. Lastly, we harmonize in situ gene expression and scRNA-seq datasets, allowing transcriptome-wide imputation of spatial gene expression patterns. Our work presents a strategy for the assembly of harmonized references and transfer of information across datasets.

0 comments Cited 5602 times – based on 0 reviews      Review now

Bookmark

Record: found
Abstract: found
Article: found

Is Open Access

Integrated analysis of multimodal single-cell data

Yuhan Hao, Stephanie Hao, Erica Andersen-Nissen … (2021)

Summary The simultaneous measurement of multiple modalities represents an exciting frontier for single-cell genomics and necessitates computational methods that can define cellular states based on multimodal data. Here, we introduce “weighted-nearest neighbor” analysis, an unsupervised framework to learn the relative utility of each data type in each cell, enabling an integrative analysis of multiple modalities. We apply our procedure to a CITE-seq dataset of 211,000 human peripheral blood mononuclear cells (PBMCs) with panels extending to 228 antibodies to construct a multimodal reference atlas of the circulating immune system. Multimodal analysis substantially improves our ability to resolve cell states, allowing us to identify and validate previously unreported lymphoid subpopulations. Moreover, we demonstrate how to leverage this reference to rapidly map new datasets and to interpret immune responses to vaccination and coronavirus disease 2019 (COVID-19). Our approach represents a broadly applicable strategy to analyze single-cell multimodal datasets and to look beyond the transcriptome toward a unified and multimodal definition of cellular identity.

0 comments Cited 4058 times – based on 0 reviews      Review now

Bookmark

All references

Author and article information

Contributors

Vivek Swarup

Journal

Journal ID (nlm-ta): Cell Rep Methods

Journal ID (iso-abbrev): Cell Rep Methods

Title: Cell Reports Methods

Publisher: Elsevier

ISSN (Electronic): 2667-2375

Publication date PMC-release: 12 June 2023

Publication date Collection: 26 June 2023

Publication date (Electronic): 12 June 2023

Volume: 3

Issue: 6

Electronic Location Identifier: 100498

Affiliations

[1 ]Mathematical, Computational, and Systems Biology (MCSB) Program, University of California, Irvine, Irvine, CA, USA

[2 ]Center for Complex Biological Systems (CCBS), University of California, Irvine, Irvine, CA, USA

[3 ]Institute for Memory Impairments and Neurological Disorders (MIND), University of California, Irvine, Irvine, CA, USA

[4 ]Department of Developmental and Cell Biology, University of California, Irvine, Irvine, CA, USA

[5 ]Department of Neurobiology and Behavior, University of California, Irvine, Irvine, CA, USA

Author notes

[∗ ]Corresponding author vswarup@ 123456uci.edu

[6]

Lead contact

Article

Publisher Item ID: S2667-2375(23)00127-3 Publisher ID: 100498

DOI: 10.1016/j.crmeth.2023.100498

PMC ID: 10326379

PubMed ID: 37426759

SO-VID: 07c073bc-4d4a-42a7-a0a4-6d7e0cb81682

License:

This is an open access article under the CC BY license (http://creativecommons.org/licenses/by/4.0/).

History

Date received : 5 October 2022

Date revision received : 13 February 2023

Date accepted : 16 May 2023

Comments

Comment on this article

scite_

Cited by 46

See all cited by

Most referenced authors 5,364

See all reference authors

- Version 1

hdWGCNA identifies co-expression networks in high-dimensional transcriptomics data

Read this article at

Summary

Graphical abstract

Highlights

Motivation

Abstract

Related collections

Journal for ReAttach Therapy and Developmental Diversities

Most cited references 88

WGCNA: an R package for weighted correlation network analysis

Comprehensive Integration of Single-Cell Data

Integrated analysis of multimodal single-cell data

Author and article information

Contributors

Journal

Affiliations

Author notes

Article

History

Categories

Comments

Comment on this article

Similar content 160

Cited by 46

Most referenced authors 5,364