21
views
0
recommends
+1 Recommend
1 collections
    0
    shares
      • Record: found
      • Abstract: found
      • Article: found
      Is Open Access

      A De Novo Genome Sequence Assembly of the Arabidopsis thaliana Accession Niederzenz-1 Displays Presence/Absence Variation and Strong Synteny

      research-article

      Read this article at

      Bookmark
          There is no author summary for this article yet. Authors can add summaries to their articles on ScienceOpen to make them more accessible to a non-specialist audience.

          Abstract

          Arabidopsis thaliana is the most important model organism for fundamental plant biology. The genome diversity of different accessions of this species has been intensively studied, for example in the 1001 genome project which led to the identification of many small nucleotide polymorphisms (SNPs) and small insertions and deletions (InDels). In addition, presence/absence variation (PAV), copy number variation (CNV) and mobile genetic elements contribute to genomic differences between A. thaliana accessions. To address larger genome rearrangements between the A. thaliana reference accession Columbia-0 (Col-0) and another accession of about average distance to Col-0, we created a de novo next generation sequencing (NGS)-based assembly from the accession Niederzenz-1 (Nd-1). The result was evaluated with respect to assembly strategy and synteny to Col-0. We provide a high quality genome sequence of the A. thaliana accession (Nd-1, LXSY01000000). The assembly displays an N50 of 0.590 Mbp and covers 99% of the Col-0 reference sequence. Scaffolds from the de novo assembly were positioned on the basis of sequence similarity to the reference. Errors in this automatic scaffold anchoring were manually corrected based on analyzing reciprocal best BLAST hits (RBHs) of genes. Comparison of the final Nd-1 assembly to the reference revealed duplications and deletions (PAV). We identified 826 insertions and 746 deletions in Nd-1. Randomly selected candidates of PAV were experimentally validated. Our Nd-1 de novo assembly allowed reliable identification of larger genic and intergenic variants, which was difficult or error-prone by short read mapping approaches alone. While overall sequence similarity as well as synteny is very high, we detected short and larger (affecting more than 100 bp) differences between Col-0 and Nd-1 based on bi-directional comparisons. The de novo assembly provided here and additional assemblies that will certainly be published in the future will allow to describe the pan-genome of A. thaliana.

          Related collections

          Most cited references56

          • Record: found
          • Abstract: found
          • Article: found
          Is Open Access

          Toward almost closed genomes with GapFiller

          De novo assembly is a commonly used application of next-generation sequencing experiments. The ultimate goal is to puzzle millions of reads into one complete genome, although draft assemblies usually result in a number of gapped scaffold sequences. In this paper we propose an automated strategy, called GapFiller, to reliably close gaps within scaffolds using paired reads. The method shows good results on both bacterial and eukaryotic datasets, allowing only few errors. As a consequence, the amount of additional wetlab work needed to close a genome is drastically reduced. The software is available at http://www.baseclear.com/bioinformatics-tools/.
            Bookmark
            • Record: found
            • Abstract: found
            • Article: not found

            The rate and molecular spectrum of spontaneous mutations in Arabidopsis thaliana.

            To take complete advantage of information on within-species polymorphism and divergence from close relatives, one needs to know the rate and the molecular spectrum of spontaneous mutations. To this end, we have searched for de novo spontaneous mutations in the complete nuclear genomes of five Arabidopsis thaliana mutation accumulation lines that had been maintained by single-seed descent for 30 generations. We identified and validated 99 base substitutions and 17 small and large insertions and deletions. Our results imply a spontaneous mutation rate of 7 x 10(-9) base substitutions per site per generation, the majority of which are G:C-->A:T transitions. We explain this very biased spectrum of base substitution mutations as a result of two main processes: deamination of methylated cytosines and ultraviolet light-induced mutagenesis.
              Bookmark
              • Record: found
              • Abstract: found
              • Article: not found

              Mechanisms of change in gene copy number.

              Deletions and duplications of chromosomal segments (copy number variants, CNVs) are a major source of variation between individual humans and are an underlying factor in human evolution and in many diseases, including mental illness, developmental disorders and cancer. CNVs form at a faster rate than other types of mutation, and seem to do so by similar mechanisms in bacteria, yeast and humans. Here we review current models of the mechanisms that cause copy number variation. Non-homologous end-joining mechanisms are well known, but recent models focus on perturbation of DNA replication and replication of non-contiguous DNA segments. For example, cellular stress might induce repair of broken replication forks to switch from high-fidelity homologous recombination to non-homologous repair, thus promoting copy number change.
                Bookmark

                Author and article information

                Contributors
                Role: Editor
                Journal
                PLoS One
                PLoS ONE
                plos
                plosone
                PLoS ONE
                Public Library of Science (San Francisco, CA USA )
                1932-6203
                6 October 2016
                2016
                : 11
                : 10
                : e0164321
                Affiliations
                [1 ]Faculty of Biology, Bielefeld University, Bielefeld, Germany
                [2 ]Center for Biotechnology, Bielefeld University, Bielefeld, Germany
                Universiteit Gent, BELGIUM
                Author notes

                Competing Interests: The authors have declared that no competing interests exist.

                • Conceptualization: BP DH TRS BW.

                • Data curation: BP DH TRS BW.

                • Formal analysis: BP DH.

                • Funding acquisition: BW.

                • Investigation: BP PV.

                • Methodology: BP DH BW.

                • Project administration: DH BW.

                • Resources: RS DH BW.

                • Software: BP.

                • Supervision: DH BW.

                • Validation: BP DH RS TRS BW.

                • Visualization: BP.

                • Writing – original draft: BP DH BW.

                • Writing – review & editing: BP DH RS BW.

                Author information
                http://orcid.org/0000-0002-7635-3473
                Article
                PONE-D-16-31523
                10.1371/journal.pone.0164321
                5053417
                27711162
                49a1589e-7a4e-46d3-be1f-0a6f2324608a
                © 2016 Pucker et al

                This is an open access article distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited.

                History
                : 6 August 2016
                : 22 September 2016
                Page count
                Figures: 2, Tables: 2, Pages: 23
                Funding
                This work was supported by institutional funds of the Chair of Genome Research at Bielefeld University. We acknowledge the financial support of the German Research Foundation (DFG) and the Open Access Publication Fund of Bielefeld University for the article processing charge. The funders had no role in study design, data collection and analysis, decision to publish, or preparation of the manuscript.
                Categories
                Research Article
                Biology and Life Sciences
                Computational Biology
                Genome Analysis
                Sequence Assembly Tools
                Biology and Life Sciences
                Genetics
                Genomics
                Genome Analysis
                Sequence Assembly Tools
                Biology and Life Sciences
                Organisms
                Plants
                Brassica
                Arabidopsis Thaliana
                Research and Analysis Methods
                Model Organisms
                Plant and Algal Models
                Arabidopsis Thaliana
                Biology and Life Sciences
                Genetics
                Genomics
                Plant Genomics
                Biology and Life Sciences
                Biotechnology
                Plant Biotechnology
                Plant Genomics
                Biology and Life Sciences
                Plant Science
                Plant Biotechnology
                Plant Genomics
                Biology and Life Sciences
                Genetics
                Plant Genetics
                Plant Genomics
                Biology and Life Sciences
                Plant Science
                Plant Genetics
                Plant Genomics
                Biology and Life Sciences
                Computational Biology
                Genome Analysis
                Genomic Libraries
                Biology and Life Sciences
                Genetics
                Genomics
                Genome Analysis
                Genomic Libraries
                Biology and Life Sciences
                Genetics
                Genomics
                Repeated Sequences
                Biology and Life Sciences
                Cell Biology
                Chromosome Biology
                Chromosomes
                Chromosome Structure and Function
                Centromeres
                Biology and Life Sciences
                Computational Biology
                Genome Analysis
                Biology and Life Sciences
                Genetics
                Genomics
                Genome Analysis
                Biology and Life Sciences
                Molecular Biology
                Molecular Biology Techniques
                Sequencing Techniques
                Genome Sequencing
                Research and Analysis Methods
                Molecular Biology Techniques
                Sequencing Techniques
                Genome Sequencing
                Custom metadata
                The data sets supporting the results of this article are included within the article and its additional files. In addition, the assembly is available at GenBank/ENA under the accession number LXSY00000000.

                Uncategorized
                Uncategorized

                Comments

                Comment on this article