14
views
0
recommends
+1 Recommend
0 collections
    0
    shares
      • Record: found
      • Abstract: found
      • Article: found
      Is Open Access

      GSEApy: a comprehensive package for performing gene set enrichment analysis in Python

      brief-report
      , ,
      Bioinformatics
      Oxford University Press

      Read this article at

      Bookmark
          There is no author summary for this article yet. Authors can add summaries to their articles on ScienceOpen to make them more accessible to a non-specialist audience.

          Abstract

          Motivation

          Gene set enrichment analysis (GSEA) is a commonly used algorithm for characterizing gene expression changes. However, the currently available tools used to perform GSEA have a limited ability to analyze large datasets, which is particularly problematic for the analysis of single-cell data. To overcome this limitation, we developed a GSEA package in Python (GSEApy), which could efficiently analyze large single-cell datasets.

          Results

          We present a package (GSEApy) that performs GSEA in either the command line or Python environment. GSEApy uses a Rust implementation to enable it to calculate the same enrichment statistic as GSEA for a collection of pathways. The Rust implementation of GSEApy is 3-fold faster than the Numpy version of GSEApy (v0.10.8) and uses >4-fold less memory. GSEApy also provides an interface between Python and Enrichr web services, as well as for BioMart. The Enrichr application programming interface enables GSEApy to perform over-representation analysis for an input gene list. Furthermore, GSEApy consists of several tools, each designed to facilitate a particular type of enrichment analysis.

          Availability and implementation

          The new GSEApy with Rust extension is deposited in PyPI: https://pypi.org/project/gseapy/. The GSEApy source code is freely available at https://github.com/zqfang/GSEApy. Also, the documentation website is available at https://gseapy.rtfd.io/.

          Supplementary information

          Supplementary data are available at Bioinformatics online.

          Related collections

          Most cited references19

          • Record: found
          • Abstract: found
          • Article: not found

          Gene set enrichment analysis: A knowledge-based approach for interpreting genome-wide expression profiles

          Although genomewide RNA expression analysis has become a routine tool in biomedical research, extracting biological insight from such information remains a major challenge. Here, we describe a powerful analytical method called Gene Set Enrichment Analysis (GSEA) for interpreting gene expression data. The method derives its power by focusing on gene sets, that is, groups of genes that share common biological function, chromosomal location, or regulation. We demonstrate how GSEA yields insights into several cancer-related data sets, including leukemia and lung cancer. Notably, where single-gene analysis finds little similarity between two independent studies of patient survival in lung cancer, GSEA reveals many biological pathways in common. The GSEA method is embodied in a freely available software package, together with an initial database of 1,325 biologically defined gene sets.
            Bookmark
            • Record: found
            • Abstract: found
            • Article: found
            Is Open Access

            Enrichr: a comprehensive gene set enrichment analysis web server 2016 update

            Enrichment analysis is a popular method for analyzing gene sets generated by genome-wide experiments. Here we present a significant update to one of the tools in this domain called Enrichr. Enrichr currently contains a large collection of diverse gene set libraries available for analysis and download. In total, Enrichr currently contains 180 184 annotated gene sets from 102 gene set libraries. New features have been added to Enrichr including the ability to submit fuzzy sets, upload BED files, improved application programming interface and visualization of the results as clustergrams. Overall, Enrichr is a comprehensive resource for curated gene sets and a search engine that accumulates biological knowledge for further biological discoveries. Enrichr is freely available at: http://amp.pharm.mssm.edu/Enrichr.
              Bookmark
              • Record: found
              • Abstract: found
              • Article: found
              Is Open Access

              SCANPY : large-scale single-cell gene expression data analysis

              Scanpy is a scalable toolkit for analyzing single-cell gene expression data. It includes methods for preprocessing, visualization, clustering, pseudotime and trajectory inference, differential expression testing, and simulation of gene regulatory networks. Its Python-based implementation efficiently deals with data sets of more than one million cells (https://github.com/theislab/Scanpy). Along with Scanpy, we present AnnData, a generic class for handling annotated data matrices (https://github.com/theislab/anndata).
                Bookmark

                Author and article information

                Contributors
                Role: Associate Editor
                Journal
                Bioinformatics
                Bioinformatics
                bioinformatics
                Bioinformatics
                Oxford University Press
                1367-4803
                1367-4811
                January 2023
                25 November 2022
                25 November 2022
                : 39
                : 1
                : btac757
                Affiliations
                Department of Anesthesia, Pain and Perioperative Medicine, Stanford University School of Medicine , Stanford, CA 94305, USA
                Department of Otolaryngology-Head and Neck Surgery, Stanford University School of Medicine , Stanford, CA 94305, USA
                Department of Anesthesia, Pain and Perioperative Medicine, Stanford University School of Medicine , Stanford, CA 94305, USA
                Author notes
                To whom correspondence should be addressed. Email: gpeltz@ 123456stanford.edu
                Author information
                https://orcid.org/0000-0002-7418-1313
                https://orcid.org/0000-0002-9754-0593
                https://orcid.org/0000-0001-6191-7697
                Article
                btac757
                10.1093/bioinformatics/btac757
                9805564
                36426870
                ec92aa66-4ec0-4454-b18a-7f8922b4d118
                © The Author(s) 2022. Published by Oxford University Press.

                This is an Open Access article distributed under the terms of the Creative Commons Attribution License ( https://creativecommons.org/licenses/by/4.0/), which permits unrestricted reuse, distribution, and reproduction in any medium, provided the original work is properly cited.

                History
                : 17 August 2022
                : 04 November 2022
                : 20 November 2022
                : 22 November 2022
                : 06 December 2022
                Page count
                Pages: 3
                Funding
                Funded by: National Institute of Health;
                Funded by: National Institute for Drug Addiction;
                Award ID: 5U01DA04439902
                Categories
                Applications Note
                Data and Text Mining
                AcademicSubjects/SCI01060

                Bioinformatics & Computational biology
                Bioinformatics & Computational biology

                Comments

                Comment on this article