23
views
0
recommends
+1 Recommend
0 collections
    0
    shares
      • Record: found
      • Abstract: found
      • Article: found
      Is Open Access

      Fast and flexible bacterial genomic epidemiology with PopPUNK

      research-article

      Read this article at

          There is no author summary for this article yet. Authors can add summaries to their articles on ScienceOpen to make them more accessible to a non-specialist audience.

          Abstract

          The routine use of genomics for disease surveillance provides the opportunity for high-resolution bacterial epidemiology. Current whole-genome clustering and multilocus typing approaches do not fully exploit core and accessory genomic variation, and they cannot both automatically identify, and subsequently expand, clusters of significantly similar isolates in large data sets spanning entire species. Here, we describe PopPUNK ( Population Partitioning Using Nucleotide K -mers), a software implementing scalable and expandable annotation- and alignment-free methods for population analysis and clustering. Variable-length k-mer comparisons are used to distinguish isolates’ divergence in shared sequence and gene content, which we demonstrate to be accurate over multiple orders of magnitude using data from both simulations and genomic collections representing 10 taxonomically widespread species. Connections between closely related isolates of the same strain are robustly identified, despite interspecies variation in the pairwise distance distributions that reflects species’ diverse evolutionary patterns. PopPUNK can process 10 3–10 4 genomes in a single batch, with minimal memory use and runtimes up to 200-fold faster than existing model-based methods. Clusters of strains remain consistent as new batches of genomes are added, which is achieved without needing to reanalyze all genomes de novo. This facilitates real-time surveillance with consistent cluster naming between studies and allows for outbreak detection using hundreds of genomes in minutes. Interactive visualization and online publication is streamlined through the automatic output of results to multiple platforms. PopPUNK has been designed as a flexible platform that addresses important issues with currently used whole-genome clustering and typing methods, and has potential uses across bacterial genetics and public health research.

          Related collections

          Most cited references70

          • Record: found
          • Abstract: found
          • Article: not found

          Cytoscape: a software environment for integrated models of biomolecular interaction networks.

          Cytoscape is an open source software project for integrating biomolecular interaction networks with high-throughput expression data and other molecular states into a unified conceptual framework. Although applicable to any system of molecular components and interactions, Cytoscape is most powerful when used in conjunction with large databases of protein-protein, protein-DNA, and genetic interactions that are increasingly available for humans and model organisms. Cytoscape's software Core provides basic functionality to layout and query the network; to visually integrate the network with expression profiles, phenotypes, and other molecular states; and to link the network to databases of functional annotations. The Core is extensible through a straightforward plug-in architecture, allowing rapid development of additional computational analyses and features. Several case studies of Cytoscape plug-ins are surveyed, including a search for interaction pathways correlating with changes in gene expression, a study of protein complexes involved in cellular recovery to DNA damage, inference of a combined physical/functional interaction network for Halobacterium, and an interface to detailed stochastic/kinetic gene regulatory models.
            Bookmark
            • Record: found
            • Abstract: found
            • Article: found
            Is Open Access

            IQ-TREE: A Fast and Effective Stochastic Algorithm for Estimating Maximum-Likelihood Phylogenies

            Large phylogenomics data sets require fast tree inference methods, especially for maximum-likelihood (ML) phylogenies. Fast programs exist, but due to inherent heuristics to find optimal trees, it is not clear whether the best tree is found. Thus, there is need for additional approaches that employ different search strategies to find ML trees and that are at the same time as fast as currently available ML programs. We show that a combination of hill-climbing approaches and a stochastic perturbation method can be time-efficiently implemented. If we allow the same CPU time as RAxML and PhyML, then our software IQ-TREE found higher likelihoods between 62.2% and 87.1% of the studied alignments, thus efficiently exploring the tree-space. If we use the IQ-TREE stopping rule, RAxML and PhyML are faster in 75.7% and 47.1% of the DNA alignments and 42.2% and 100% of the protein alignments, respectively. However, the range of obtaining higher likelihoods with IQ-TREE improves to 73.3-97.1%.
              Bookmark
              • Record: found
              • Abstract: found
              • Article: found
              Is Open Access

              Roary: rapid large-scale prokaryote pan genome analysis

              Summary: A typical prokaryote population sequencing study can now consist of hundreds or thousands of isolates. Interrogating these datasets can provide detailed insights into the genetic structure of prokaryotic genomes. We introduce Roary, a tool that rapidly builds large-scale pan genomes, identifying the core and accessory genes. Roary makes construction of the pan genome of thousands of prokaryote samples possible on a standard desktop without compromising on the accuracy of results. Using a single CPU Roary can produce a pan genome consisting of 1000 isolates in 4.5 hours using 13 GB of RAM, with further speedups possible using multiple processors. Availability and implementation: Roary is implemented in Perl and is freely available under an open source GPLv3 license from http://sanger-pathogens.github.io/Roary Contact: roary@sanger.ac.uk Supplementary information: Supplementary data are available at Bioinformatics online.
                Bookmark

                Author and article information

                Journal
                Genome Res
                Genome Res
                genome
                genome
                GENOME
                Genome Research
                Cold Spring Harbor Laboratory Press
                1088-9051
                1549-5469
                February 2019
                February 2019
                : 29
                : 2
                : 304-316
                Affiliations
                [1 ]Department of Microbiology, New York University School of Medicine, New York, New York 10016, USA;
                [2 ]Parasites and Microbes, Wellcome Sanger Institute, Wellcome Genome Campus, Hinxton CB10 1SA, United Kingdom;
                [3 ]Department of Biostatistics, University of Oslo, 0372 Oslo, Norway;
                [4 ]Helsinki Institute of Information Technology, Department of Mathematics and Statistics, University of Helsinki, 00014 Helsinki, Finland;
                [5 ]Institute of Infection and Global Health, University of Liverpool, Liverpool L7 3EA, United Kingdom;
                [6 ]Department of Pathology, University of Cambridge, Cambridge CB2 1QP, United Kingdom;
                [7 ]MRC Centre for Global Infectious Disease Analysis, Department of Infectious Disease Epidemiology, Imperial College London, London W2 1PG, United Kingdom
                Author notes
                Author information
                http://orcid.org/0000-0001-5360-1254
                http://orcid.org/0000-0003-1512-6194
                http://orcid.org/0000-0003-4397-2224
                http://orcid.org/0000-0002-7993-3051
                http://orcid.org/0000-0002-2182-0222
                http://orcid.org/0000-0001-7168-8090
                http://orcid.org/0000-0002-7752-1942
                http://orcid.org/0000-0001-8094-3751
                http://orcid.org/0000-0001-6303-8768
                Article
                9509184
                10.1101/gr.241455.118
                6360808
                30679308
                733af3e9-e53e-415a-bd9e-9a57d66845d2
                © 2019 Lees et al.; Published by Cold Spring Harbor Laboratory Press

                This article, published in Genome Research, is available under a Creative Commons License (Attribution 4.0 International), as described at http://creativecommons.org/licenses/by/4.0/.

                History
                : 5 July 2018
                : 10 December 2018
                Page count
                Pages: 13
                Funding
                Funded by: United States Public Health Service , open-funder-registry 10.13039/100007197;
                Award ID: AI038446
                Award ID: AI105168
                Funded by: Wellcome
                Award ID: 098051
                Funded by: Bill and Melinda Gates Foundation , open-funder-registry 10.13039/100000865;
                Funded by: European Research Council , open-funder-registry 10.13039/100010663;
                Award ID: 742158
                Funded by: Wellcome and the Royal Society
                Award ID: 104169/Z/14/Z
                Categories
                Method

                Comments

                Comment on this article