GSEApy: a comprehensive package for performing gene set enrichment analysis in Python

There is no author summary for this article yet. Authors can add summaries to their articles on ScienceOpen to make them more accessible to a non-specialist audience.

Abstract

Motivation

Gene set enrichment analysis (GSEA) is a commonly used algorithm for characterizing gene expression changes. However, the currently available tools used to perform GSEA have a limited ability to analyze large datasets, which is particularly problematic for the analysis of single-cell data. To overcome this limitation, we developed a GSEA package in Python (GSEApy), which could efficiently analyze large single-cell datasets.

Results

We present a package (GSEApy) that performs GSEA in either the command line or Python environment. GSEApy uses a Rust implementation to enable it to calculate the same enrichment statistic as GSEA for a collection of pathways. The Rust implementation of GSEApy is 3-fold faster than the Numpy version of GSEApy (v0.10.8) and uses >4-fold less memory. GSEApy also provides an interface between Python and Enrichr web services, as well as for BioMart. The Enrichr application programming interface enables GSEApy to perform over-representation analysis for an input gene list. Furthermore, GSEApy consists of several tools, each designed to facilitate a particular type of enrichment analysis.

Availability and implementation

The new GSEApy with Rust extension is deposited in PyPI: https://pypi.org/project/gseapy/. The GSEApy source code is freely available at https://github.com/zqfang/GSEApy. Also, the documentation website is available at https://gseapy.rtfd.io/.

Supplementary information

Supplementary data are available at Bioinformatics online.

Related collections

Most cited references 19

Record: found
Abstract: found
Article: not found

Gene set enrichment analysis: A knowledge-based approach for interpreting genome-wide expression profiles

A. Subramanian, P. Tamayo, V. K. Mootha … (2005)

Although genomewide RNA expression analysis has become a routine tool in biomedical research, extracting biological insight from such information remains a major challenge. Here, we describe a powerful analytical method called Gene Set Enrichment Analysis (GSEA) for interpreting gene expression data. The method derives its power by focusing on gene sets, that is, groups of genes that share common biological function, chromosomal location, or regulation. We demonstrate how GSEA yields insights into several cancer-related data sets, including leukemia and lung cancer. Notably, where single-gene analysis finds little similarity between two independent studies of patient survival in lung cancer, GSEA reveals many biological pathways in common. The GSEA method is embodied in a freely available software package, together with an initial database of 1,325 biologically defined gene sets.

0 comments Cited 13309 times – based on 0 reviews      Review now

Bookmark

Record: found
Abstract: found
Article: found

Is Open Access

Enrichr: a comprehensive gene set enrichment analysis web server 2016 update

Maxim Kuleshov, Matthew Jones, Andrew D. Rouillard … (2016)

Enrichment analysis is a popular method for analyzing gene sets generated by genome-wide experiments. Here we present a significant update to one of the tools in this domain called Enrichr. Enrichr currently contains a large collection of diverse gene set libraries available for analysis and download. In total, Enrichr currently contains 180 184 annotated gene sets from 102 gene set libraries. New features have been added to Enrichr including the ability to submit fuzzy sets, upload BED files, improved application programming interface and visualization of the results as clustergrams. Overall, Enrichr is a comprehensive resource for curated gene sets and a search engine that accumulates biological knowledge for further biological discoveries. Enrichr is freely available at: http://amp.pharm.mssm.edu/Enrichr.

0 comments Cited 3011 times – based on 0 reviews      Review now

Bookmark

Record: found
Abstract: found
Article: found

Is Open Access

SCANPY : large-scale single-cell gene expression data analysis

F. Alexander Wolf, Philipp Angerer, Fabian Theis (2018)

Scanpy is a scalable toolkit for analyzing single-cell gene expression data. It includes methods for preprocessing, visualization, clustering, pseudotime and trajectory inference, differential expression testing, and simulation of gene regulatory networks. Its Python-based implementation efficiently deals with data sets of more than one million cells (https://github.com/theislab/Scanpy). Along with Scanpy, we present AnnData, a generic class for handling annotated data matrices (https://github.com/theislab/anndata).

0 comments Cited 2065 times – based on 0 reviews      Review now

Bookmark

All references

Author and article information

Contributors

Zhuoqing Fang:

ORCID: https://orcid.org/0000-0002-7418-1313

Xinyuan Liu:

ORCID: https://orcid.org/0000-0002-9754-0593

Gary Peltz:

ORCID: https://orcid.org/0000-0001-6191-7697

Zhiyong Lu: Role: Associate Editor

Journal

Journal ID (nlm-ta): Bioinformatics

Journal ID (iso-abbrev): Bioinformatics

Journal ID (publisher-id): bioinformatics

Title: Bioinformatics

Publisher: Oxford University Press

ISSN (Print): 1367-4803

ISSN (Electronic): 1367-4811

Publication date Collection: January 2023

Publication date (Electronic): 25 November 2022

Publication date PMC-release: 25 November 2022

Volume: 39

Issue: 1

Electronic Location Identifier: btac757

Affiliations

Department of Anesthesia, Pain and Perioperative Medicine, Stanford University School of Medicine , Stanford, CA 94305, USA

Department of Otolaryngology-Head and Neck Surgery, Stanford University School of Medicine , Stanford, CA 94305, USA

Department of Anesthesia, Pain and Perioperative Medicine, Stanford University School of Medicine , Stanford, CA 94305, USA

Author notes

To whom correspondence should be addressed. Email: gpeltz@ 123456stanford.edu

Author information

Zhuoqing Fang https://orcid.org/0000-0002-7418-1313

Xinyuan Liu https://orcid.org/0000-0002-9754-0593

Gary Peltz https://orcid.org/0000-0001-6191-7697

Article

Publisher ID: btac757

DOI: 10.1093/bioinformatics/btac757

PMC ID: 9805564

PubMed ID: 36426870

SO-VID: ec92aa66-4ec0-4454-b18a-7f8922b4d118

License:

This is an Open Access article distributed under the terms of the Creative Commons Attribution License ( https://creativecommons.org/licenses/by/4.0/), which permits unrestricted reuse, distribution, and reproduction in any medium, provided the original work is properly cited.

History

Date received : 17 August 2022

Date revision received : 04 November 2022

Date: 20 November 2022

Date accepted : 22 November 2022

Date: 06 December 2022

Page count

Pages: 3

Funding

Funded by: National Institute of Health;

Funded by: National Institute for Drug Addiction;

Award ID: 5U01DA04439902

Comments

Comment on this article

scite_

Cited by 100

See all cited by

Most referenced authors 590

See all reference authors

- Version 1

GSEApy: a comprehensive package for performing gene set enrichment analysis in Python

Read this article at

Abstract

Motivation

Results

Availability and implementation

Supplementary information

Related collections

Genetoberfest

Most cited references 19

Gene set enrichment analysis: A knowledge-based approach for interpreting genome-wide expression profiles

Enrichr: a comprehensive gene set enrichment analysis web server 2016 update

SCANPY : large-scale single-cell gene expression data analysis

Author and article information

Contributors

Journal

Affiliations

Author notes

Author information

Article

History

Page count

Funding

Categories

Comments

Comment on this article

Similar content 182

Cited by 100

Most referenced authors 590