6
views
0
recommends
+1 Recommend
0 collections
    0
    shares
      • Record: found
      • Abstract: found
      • Article: found
      Is Open Access

      Comparative analysis of common alignment tools for single-cell RNA sequencing

      research-article

      Read this article at

      Bookmark
          There is no author summary for this article yet. Authors can add summaries to their articles on ScienceOpen to make them more accessible to a non-specialist audience.

          Abstract

          Background

          With the rise of single-cell RNA sequencing new bioinformatic tools have been developed to handle specific demands, such as quantifying unique molecular identifiers and correcting cell barcodes. Here, we benchmarked several datasets with the most common alignment tools for single-cell RNA sequencing data. We evaluated differences in the whitelisting, gene quantification, overall performance, and potential variations in clustering or detection of differentially expressed genes. We compared the tools Cell Ranger version 6, STARsolo, Kallisto, Alevin, and Alevin-fry on 3 published datasets for human and mouse, sequenced with different versions of the 10X sequencing protocol.

          Results

          Striking differences were observed in the overall runtime of the mappers. Besides that, Kallisto and Alevin showed variances in the number of valid cells and detected genes per cell. Kallisto reported the highest number of cells; however, we observed an overrepresentation of cells with low gene content and unknown cell type. Conversely, Alevin rarely reported such low-content cells. Further variations were detected in the set of expressed genes. While STARsolo, Cell Ranger 6, Alevin-fry, and Alevin produced similar gene sets, Kallisto detected additional genes from the Vmn and Olfr gene family, which are likely mapping artefacts. We also observed differences in the mitochondrial content of the resulting cells when comparing a prefiltered annotation set to the full annotation set that includes pseudogenes and other biotypes.

          Conclusion

          Overall, this study provides a detailed comparison of common single-cell RNA sequencing mappers and shows their specific properties on 10X Genomics data.

          Related collections

          Most cited references41

          • Record: found
          • Abstract: found
          • Article: not found

          STAR: ultrafast universal RNA-seq aligner.

          Accurate alignment of high-throughput RNA-seq data is a challenging and yet unsolved problem because of the non-contiguous transcript structure, relatively short read lengths and constantly increasing throughput of the sequencing technologies. Currently available RNA-seq aligners suffer from high mapping error rates, low mapping speed, read length limitation and mapping biases. To align our large (>80 billon reads) ENCODE Transcriptome RNA-seq dataset, we developed the Spliced Transcripts Alignment to a Reference (STAR) software based on a previously undescribed RNA-seq alignment algorithm that uses sequential maximum mappable seed search in uncompressed suffix arrays followed by seed clustering and stitching procedure. STAR outperforms other aligners by a factor of >50 in mapping speed, aligning to the human genome 550 million 2 × 76 bp paired-end reads per hour on a modest 12-core server, while at the same time improving alignment sensitivity and precision. In addition to unbiased de novo detection of canonical junctions, STAR can discover non-canonical splices and chimeric (fusion) transcripts, and is also capable of mapping full-length RNA sequences. Using Roche 454 sequencing of reverse transcription polymerase chain reaction amplicons, we experimentally validated 1960 novel intergenic splice junctions with an 80-90% success rate, corroborating the high precision of the STAR mapping strategy. STAR is implemented as a standalone C++ code. STAR is free open source software distributed under GPLv3 license and can be downloaded from http://code.google.com/p/rna-star/.
            Bookmark
            • Record: found
            • Abstract: found
            • Article: not found

            Comprehensive Integration of Single-Cell Data

            Single-cell transcriptomics has transformed our ability to characterize cell states, but deep biological understanding requires more than a taxonomic listing of clusters. As new methods arise to measure distinct cellular modalities, a key analytical challenge is to integrate these datasets to better understand cellular identity and function. Here, we develop a strategy to "anchor" diverse datasets together, enabling us to integrate single-cell measurements not only across scRNA-seq technologies, but also across different modalities. After demonstrating improvement over existing methods for integrating scRNA-seq data, we anchor scRNA-seq experiments with scATAC-seq to explore chromatin differences in closely related interneuron subsets and project protein expression measurements onto a bone marrow atlas to characterize lymphocyte populations. Lastly, we harmonize in situ gene expression and scRNA-seq datasets, allowing transcriptome-wide imputation of spatial gene expression patterns. Our work presents a strategy for the assembly of harmonized references and transfer of information across datasets.
              Bookmark
              • Record: found
              • Abstract: found
              • Article: not found

              Salmon: fast and bias-aware quantification of transcript expression using dual-phase inference

              We introduce Salmon, a method for quantifying transcript abundance from RNA-seq reads that is accurate and fast. Salmon is the first transcriptome-wide quantifier to correct for fragment GC content bias, which we demonstrate substantially improves the accuracy of abundance estimates and the reliability of subsequent differential expression analysis. Salmon combines a new dual-phase parallel inference algorithm and feature-rich bias models with an ultra-fast read mapping procedure.
                Bookmark

                Author and article information

                Contributors
                Journal
                Gigascience
                Gigascience
                gigascience
                GigaScience
                Oxford University Press
                2047-217X
                27 January 2022
                2022
                27 January 2022
                : 11
                : giac001
                Affiliations
                Institute of Cardiovascular Regeneration , Theodor-Stern-Kai 7, 60590 Frankfurt, Germany
                Cardio-Pulmonary Institute (CPI) , Theodor-Stern-Kai 7, 60590 Frankfurt, Germany
                Institute of Cardiovascular Regeneration , Theodor-Stern-Kai 7, 60590 Frankfurt, Germany
                German Center for Cardiovascular Research (DZHK) , Potsdamer Str. 58 10785 Berlin, Germany
                Institute of Cardiovascular Regeneration , Theodor-Stern-Kai 7, 60590 Frankfurt, Germany
                Cardio-Pulmonary Institute (CPI) , Theodor-Stern-Kai 7, 60590 Frankfurt, Germany
                German Center for Cardiovascular Research (DZHK) , Potsdamer Str. 58 10785 Berlin, Germany
                Institute of Cardiovascular Regeneration , Theodor-Stern-Kai 7, 60590 Frankfurt, Germany
                Cardio-Pulmonary Institute (CPI) , Theodor-Stern-Kai 7, 60590 Frankfurt, Germany
                German Center for Cardiovascular Research (DZHK) , Potsdamer Str. 58 10785 Berlin, Germany
                Institute of Cardiovascular Regeneration , Theodor-Stern-Kai 7, 60590 Frankfurt, Germany
                Cardio-Pulmonary Institute (CPI) , Theodor-Stern-Kai 7, 60590 Frankfurt, Germany
                Author notes
                Correspondence address. David John, Institute for Cardiovascular Regeneration, Centre of Molecular Medicine, Goethe University Frankfurt, Theodor-Stern-Kai 7, 60590 Frankfurt, Germany. E-mail: john@ 123456med.uni-frankfurt.de
                Author information
                https://orcid.org/0000-0002-5867-576X
                https://orcid.org/0000-0002-1252-3656
                https://orcid.org/0000-0002-1045-2436
                https://orcid.org/0000-0003-3217-5449
                Article
                giac001
                10.1093/gigascience/giac001
                8848315
                35084033
                b3c20687-42f3-462e-b462-3ef18fb8d633
                © The Author(s) 2022. Published by Oxford University Press GigaScience.

                This is an Open Access article distributed under the terms of the Creative Commons Attribution License ( https://creativecommons.org/licenses/by/4.0/), which permits unrestricted reuse, distribution, and reproduction in any medium, provided the original work is properly cited.

                History
                : 30 April 2021
                : 07 October 2021
                : 27 December 2021
                Page count
                Pages: 12
                Funding
                Funded by: Dr. Robert Schwiete Foundation;
                Funded by: Cardio-Pulmonary Institute Frankfurt;
                Funded by: German Center for Cardiovascular Research;
                Categories
                Research
                AcademicSubjects/SCI00960
                AcademicSubjects/SCI02254

                benchmarking,single-cell rna sequencing,mapping-algorithms,aligners,transcriptomics,mappers

                Comments

                Comment on this article

                scite_
                0
                0
                0
                0
                Smart Citations
                0
                0
                0
                0
                Citing PublicationsSupportingMentioningContrasting
                View Citations

                See how this article has been cited at scite.ai

                scite shows how a scientific paper has been cited by providing the context of the citation, a classification describing whether it supports, mentions, or contrasts the cited claim, and a label indicating in which section the citation was made.

                Similar content212

                Cited by9

                Most referenced authors2,593