0
views
0
recommends
+1 Recommend
0 collections
    0
    shares
      • Record: found
      • Abstract: found
      • Article: found
      Is Open Access

      ODGI: understanding pangenome graphs

      research-article

      Read this article at

      Bookmark
          There is no author summary for this article yet. Authors can add summaries to their articles on ScienceOpen to make them more accessible to a non-specialist audience.

          Abstract

          Motivation

          Pangenome graphs provide a complete representation of the mutual alignment of collections of genomes. These models offer the opportunity to study the entire genomic diversity of a population, including structurally complex regions. Nevertheless, analyzing hundreds of gigabase-scale genomes using pangenome graphs is difficult as it is not well-supported by existing tools. Hence, fast and versatile software is required to ask advanced questions to such data in an efficient way.

          Results

          We wrote Optimized Dynamic Genome/Graph Implementation (ODGI), a novel suite of tools that implements scalable algorithms and has an efficient in-memory representation of DNA pangenome graphs in the form of variation graphs. ODGI supports pre-built graphs in the Graphical Fragment Assembly format. ODGI includes tools for detecting complex regions, extracting pangenomic loci, removing artifacts, exploratory analysis, manipulation, validation and visualization. Its fast parallel execution facilitates routine pangenomic tasks, as well as pipelines that can quickly answer complex biological questions of gigabase-scale pangenome graphs.

          Availability and implementation

          ODGI is published as free software under the MIT open source license. Source code can be downloaded from https://github.com/pangenome/odgi and documentation is available at https://odgi.readthedocs.io. ODGI can be installed via Bioconda https://bioconda.github.io/recipes/odgi/README.html or GNU Guix https://github.com/pangenome/odgi/blob/master/guix.scm.

          Supplementary information

          Supplementary data are available at Bioinformatics online.

          Related collections

          Most cited references41

          • Record: found
          • Abstract: found
          • Article: found
          Is Open Access

          The Sequence Alignment/Map format and SAMtools

          Summary: The Sequence Alignment/Map (SAM) format is a generic alignment format for storing read alignments against reference sequences, supporting short and long reads (up to 128 Mbp) produced by different sequencing platforms. It is flexible in style, compact in size, efficient in random access and is the format in which alignments from the 1000 Genomes Project are released. SAMtools implements various utilities for post-processing alignments in the SAM format, such as indexing, variant caller and alignment viewer, and thus provides universal tools for processing read alignments. Availability: http://samtools.sourceforge.net Contact: rd@sanger.ac.uk
            Bookmark
            • Record: found
            • Abstract: found
            • Article: found
            Is Open Access

            BEDTools: a flexible suite of utilities for comparing genomic features

            Motivation: Testing for correlations between different sets of genomic features is a fundamental task in genomics research. However, searching for overlaps between features with existing web-based methods is complicated by the massive datasets that are routinely produced with current sequencing technologies. Fast and flexible tools are therefore required to ask complex questions of these data in an efficient manner. Results: This article introduces a new software suite for the comparison, manipulation and annotation of genomic features in Browser Extensible Data (BED) and General Feature Format (GFF) format. BEDTools also supports the comparison of sequence alignments in BAM format to both BED and GFF features. The tools are extremely efficient and allow the user to compare large datasets (e.g. next-generation sequencing data) with both public and custom genome annotation tracks. BEDTools can be combined with one another as well as with standard UNIX commands, thus facilitating routine genomics tasks as well as pipelines that can quickly answer intricate questions of large genomic datasets. Availability and implementation: BEDTools was written in C++. Source code and a comprehensive user manual are freely available at http://code.google.com/p/bedtools Contact: aaronquinlan@gmail.com; imh4y@virginia.edu Supplementary information: Supplementary data are available at Bioinformatics online.
              Bookmark
              • Record: found
              • Abstract: found
              • Article: found
              Is Open Access

              MultiQC: summarize analysis results for multiple tools and samples in a single report

              Motivation: Fast and accurate quality control is essential for studies involving next-generation sequencing data. Whilst numerous tools exist to quantify QC metrics, there is no common approach to flexibly integrate these across tools and large sample sets. Assessing analysis results across an entire project can be time consuming and error prone; batch effects and outlier samples can easily be missed in the early stages of analysis. Results: We present MultiQC, a tool to create a single report visualising output from multiple tools across many samples, enabling global trends and biases to be quickly identified. MultiQC can plot data from many common bioinformatics tools and is built to allow easy extension and customization. Availability and implementation: MultiQC is available with an GNU GPLv3 license on GitHub, the Python Package Index and Bioconda. Documentation and example reports are available at http://multiqc.info Contact: phil.ewels@scilifelab.se
                Bookmark

                Author and article information

                Contributors
                Role: Associate Editor
                Journal
                Bioinformatics
                Bioinformatics
                bioinformatics
                Bioinformatics
                Oxford University Press
                1367-4803
                1367-4811
                01 July 2022
                13 May 2022
                13 May 2022
                : 38
                : 13
                : 3319-3326
                Affiliations
                Genomics Research Centre, Human Technopole , Milan 20157, Italy
                Quantitative Biology Center (QBiC), University of Tübingen , Tübingen 72076, Germany
                Biomedical Data Science, Department of Computer Science, University of Tübingen , Tübingen 72076, Germany
                Quantitative Biology Center (QBiC), University of Tübingen , Tübingen 72076, Germany
                Biomedical Data Science, Department of Computer Science, University of Tübingen , Tübingen 72076, Germany
                Department of Genetics, Genomics and Informatics, University of Tennessee Health Science Center , Memphis, TN 38163, USA
                Department of Genetics, Genomics and Informatics, University of Tennessee Health Science Center , Memphis, TN 38163, USA
                Author notes

                The authors wish it to be known that, in their opinion, Andrea Guarracino and Simon Heumos should be regarded as Joint First Authors.

                To whom correspondence should be addressed. Email: egarris5@ 123456uthsc.edu
                Author information
                https://orcid.org/0000-0001-9744-131X
                https://orcid.org/0000-0003-3326-817X
                https://orcid.org/0000-0002-4375-0691
                https://orcid.org/0000-0003-3821-631X
                Article
                btac308
                10.1093/bioinformatics/btac308
                9237687
                35552372
                7b9d4491-0a91-48f3-8b84-e58fe49ced55
                © The Author(s) 2022. Published by Oxford University Press.

                This is an Open Access article distributed under the terms of the Creative Commons Attribution License ( https://creativecommons.org/licenses/by/4.0/), which permits unrestricted reuse, distribution, and reproduction in any medium, provided the original work is properly cited.

                History
                : 09 November 2021
                : 18 March 2022
                : 23 April 2022
                : 28 May 2022
                Page count
                Pages: 8
                Funding
                Funded by: National Institutes of Health, DOI 10.13039/100000002;
                Award ID: NIDA U01DA047638
                Funded by: NIGMS, DOI 10.13039/100000057;
                Award ID: R01GM123489
                Funded by: NSF PPoSS;
                Award ID: #2118709
                Funded by: Federal Ministry for Economic Affairs and Energy of Germany;
                Funded by: BMBF, DOI 10.13039/501100002347;
                Funded by: German Network for Bioinformatics Infrastructure, DOI 10.13039/501100018929;
                Award ID: 031A537B
                Award ID: 031A533A
                Award ID: 031A538A
                Award ID: 031A533B
                Award ID: 031A535A
                Award ID: 031A537C
                Award ID: 031A534A
                Award ID: 031A532B
                Categories
                Original Papers
                Genome Analysis
                AcademicSubjects/SCI01060

                Bioinformatics & Computational biology
                Bioinformatics & Computational biology

                Comments

                Comment on this article