6
views
0
recommends
+1 Recommend
0 collections
    0
    shares
      • Record: found
      • Abstract: found
      • Article: found
      Is Open Access

      MUMmer4: A fast and versatile genome alignment system

      research-article

      Read this article at

          There is no author summary for this article yet. Authors can add summaries to their articles on ScienceOpen to make them more accessible to a non-specialist audience.

          Abstract

          The MUMmer system and the genome sequence aligner nucmer included within it are among the most widely used alignment packages in genomics. Since the last major release of MUMmer version 3 in 2004, it has been applied to many types of problems including aligning whole genome sequences, aligning reads to a reference genome, and comparing different assemblies of the same genome. Despite its broad utility, MUMmer3 has limitations that can make it difficult to use for large genomes and for the very large sequence data sets that are common today. In this paper we describe MUMmer4, a substantially improved version of MUMmer that addresses genome size constraints by changing the 32-bit suffix tree data structure at the core of MUMmer to a 48-bit suffix array, and that offers improved speed through parallel processing of input query sequences. With a theoretical limit on the input size of 141Tbp, MUMmer4 can now work with input sequences of any biologically realistic length. We show that as a result of these enhancements, the nucmer program in MUMmer4 is easily able to handle alignments of large genomes; we illustrate this with an alignment of the human and chimpanzee genomes, which allows us to compute that the two species are 98% identical across 96% of their length. With the enhancements described here, MUMmer4 can also be used to efficiently align reads to reference genomes, although it is less sensitive and accurate than the dedicated read aligners. The nucmer aligner in MUMmer4 can now be called from scripting languages such as Perl, Python and Ruby. These improvements make MUMer4 one the most versatile genome alignment packages available.

          Related collections

          Most cited references10

          • Record: found
          • Abstract: not found
          • Article: not found

          Identification of common molecular subsequences.

            Bookmark
            • Record: found
            • Abstract: not found
            • Article: not found

            Alignment of whole genomes

              Bookmark
              • Record: found
              • Abstract: found
              • Article: not found

              No evidence for extensive horizontal gene transfer in the genome of the tardigrade Hypsibius dujardini.

              Tardigrades are meiofaunal ecdysozoans that are key to understanding the origins of Arthropoda. Many species of Tardigrada can survive extreme conditions through cryptobiosis. In a recent paper [Boothby TC, et al. (2015) Proc Natl Acad Sci USA 112(52):15976-15981], the authors concluded that the tardigrade Hypsibius dujardini had an unprecedented proportion (17%) of genes originating through functional horizontal gene transfer (fHGT) and speculated that fHGT was likely formative in the evolution of cryptobiosis. We independently sequenced the genome of H. dujardini As expected from whole-organism DNA sampling, our raw data contained reads from nontarget genomes. Filtering using metagenomics approaches generated a draft H. dujardini genome assembly of 135 Mb with superior assembly metrics to the previously published assembly. Additional microbial contamination likely remains. We found no support for extensive fHGT. Among 23,021 gene predictions we identified 0.2% strong candidates for fHGT from bacteria and 0.2% strong candidates for fHGT from nonmetazoan eukaryotes. Cross-comparison of assemblies showed that the overwhelming majority of HGT candidates in the Boothby et al. genome derived from contaminants. We conclude that fHGT into H. dujardini accounts for at most 1-2% of genes and that the proposal that one-sixth of tardigrade genes originate from functional HGT events is an artifact of undetected contamination.
                Bookmark

                Author and article information

                Contributors
                Role: ConceptualizationRole: Data curationRole: InvestigationRole: MethodologyRole: SoftwareRole: ValidationRole: Writing – original draftRole: Writing – review & editing
                Role: Data curationRole: SoftwareRole: ValidationRole: Writing – review & editing
                Role: Data curationRole: SoftwareRole: ValidationRole: Writing – review & editing
                Role: Formal analysisRole: Validation
                Role: Funding acquisitionRole: MethodologyRole: ValidationRole: Writing – review & editing
                Role: ConceptualizationRole: Data curationRole: Funding acquisitionRole: InvestigationRole: MethodologyRole: Project administrationRole: SoftwareRole: SupervisionRole: ValidationRole: VisualizationRole: Writing – original draftRole: Writing – review & editing
                Role: Editor
                Journal
                PLoS Comput Biol
                PLoS Comput. Biol
                plos
                ploscomp
                PLoS Computational Biology
                Public Library of Science (San Francisco, CA USA )
                1553-734X
                1553-7358
                January 2018
                26 January 2018
                : 14
                : 1
                : e1005944
                Affiliations
                [1 ] Institute for Physical Science and Technology, University of Maryland, College Park, Maryland, United States of America
                [2 ] Computational Biology Department, Carnegie Mellon University, Pittsburgh, Pennsylvania, United States of America
                [3 ] Center for Computational Biology, Johns Hopkins School of Medicine, Baltimore, Maryland, United States of America
                [4 ] National Human Genome Research Institute, Bethesda, Maryland, United States of America
                [5 ] Departments of Biomedical Engineering, Computer Science, and Biostatistics, Johns Hopkins University, Baltimore, Maryland, United States of America
                University of Technology Sydney, AUSTRALIA
                Author notes

                The authors have declared that no competing interests exist.

                Author information
                http://orcid.org/0000-0002-5083-5925
                http://orcid.org/0000-0001-5091-3092
                Article
                PCOMPBIOL-D-17-01370
                10.1371/journal.pcbi.1005944
                5802927
                29373581
                67111f40-5140-451f-9df5-c2044304866a

                This is an open access article, free of all copyright, and may be freely reproduced, distributed, transmitted, modified, built upon, or otherwise used by anyone for any lawful purpose. The work is made available under the Creative Commons CC0 public domain dedication.

                History
                : 15 August 2017
                : 1 January 2018
                Page count
                Figures: 1, Tables: 6, Pages: 14
                Funding
                Funded by: funder-id http://dx.doi.org/10.13039/100000002, National Institutes of Health;
                Award ID: R01 GM083873
                Award Recipient :
                Funded by: funder-id http://dx.doi.org/10.13039/100000936, Gordon and Betty Moore Foundation;
                Award ID: GBMF4554
                Award Recipient :
                Funded by: funder-id http://dx.doi.org/10.13039/100000001, National Science Foundation;
                Award ID: IOS-1238231
                Award Recipient :
                Funded by: funder-id http://dx.doi.org/10.13039/100000001, National Science Foundation;
                Award ID: ABR-PG-144893
                Award Recipient :
                This research was supported in part by the U.S. National Institutes of Health under grant R01 GM083873 to Steven Salzberg, in part by the Gordon and Betty Moore Foundation’s Data-Driven Discovery Initiative through Grant GBMF4554 to Carl Kingsford, and in part by National Science Foundation Grants IOS-1238231 to Jan Dvorak, IOS-144893 to Herbert Aldwinckle, Keithanne Mockaitis, Aleksey Zimin, James Yorke and Marcela Yepes. The funders had no role in study design, data collection and analysis, decision to publish, or preparation of the manuscript.
                Categories
                Research Article
                Research and Analysis Methods
                Database and Informatics Methods
                Bioinformatics
                Sequence Analysis
                Sequence Alignment
                Biology and Life Sciences
                Genetics
                Genomics
                Plant Genomics
                Biology and Life Sciences
                Biotechnology
                Plant Biotechnology
                Plant Genomics
                Biology and Life Sciences
                Plant Science
                Plant Biotechnology
                Plant Genomics
                Biology and Life Sciences
                Genetics
                Plant Genetics
                Plant Genomics
                Biology and Life Sciences
                Plant Science
                Plant Genetics
                Plant Genomics
                Research and Analysis Methods
                Computational Techniques
                Split-Decomposition Method
                Multiple Alignment Calculation
                Biology and Life Sciences
                Genetics
                Genomics
                Human Genomics
                Research and Analysis Methods
                Experimental Organism Systems
                Model Organisms
                Arabidopsis Thaliana
                Research and Analysis Methods
                Model Organisms
                Arabidopsis Thaliana
                Biology and Life Sciences
                Organisms
                Eukaryota
                Plants
                Brassica
                Arabidopsis Thaliana
                Research and Analysis Methods
                Experimental Organism Systems
                Plant and Algal Models
                Arabidopsis Thaliana
                Computer and Information Sciences
                Computer Software
                Biology and Life Sciences
                Genetics
                Genomics
                Animal Genomics
                Biology and Life Sciences
                Organisms
                Eukaryota
                Animals
                Vertebrates
                Amniotes
                Mammals
                Primates
                Apes
                Chimpanzees
                Custom metadata
                vor-update-to-uncorrected-proof
                2018-02-07
                The data used for this paper is available from the NCBI SRA https://www.ncbi.nlm.nih.gov/sra, and from the Cold Spring Harbor Laboratory web site http://schatzlab.cshl.edu/data/ectools/.

                Quantitative & Systems biology
                Quantitative & Systems biology

                Comments

                Comment on this article