30
views
0
recommends
+1 Recommend
0 collections
    0
    shares
      • Record: found
      • Abstract: found
      • Article: found
      Is Open Access

      The de novo genome assembly and annotation of a female domestic dromedary of North African origin

      research-article

      Read this article at

      Bookmark
          There is no author summary for this article yet. Authors can add summaries to their articles on ScienceOpen to make them more accessible to a non-specialist audience.

          Abstract

          The single‐humped dromedary ( Camelus dromedarius) is the most numerous and widespread of domestic camel species and is a significant source of meat, milk, wool, transportation and sport for millions of people. Dromedaries are particularly well adapted to hot, desert conditions and harbour a variety of biological and physiological characteristics with evolutionary, economic and medical importance. To understand the genetic basis of these traits, an extensive resource of genomic variation is required. In this study, we assembled at 65× coverage, a 2.06 Gb draft genome of a female dromedary whose ancestry can be traced to an isolated population from the Canary Islands. We annotated 21 167 protein‐coding genes and estimated ~33.7% of the genome to be repetitive. A comparison with the recently published draft genome of an Arabian dromedary resulted in 1.91 Gb of aligned sequence with a divergence of 0.095%. An evaluation of our genome with the reference revealed that our assembly contains more error‐free bases (91.2%) and fewer scaffolding errors. We identified ~1.4 million single‐nucleotide polymorphisms with a mean density of 0.71 × 10 −3 per base. An analysis of demographic history indicated that changes in effective population size corresponded with recent glacial epochs. Our de novo assembly provides a useful resource of genomic variation for future studies of the camel's adaptations to arid environments and economically important traits. Furthermore, these results suggest that draft genome assemblies constructed with only two differently sized sequencing libraries can be comparable to those sequenced using additional library sizes, highlighting that additional resources might be better placed in technologies alternative to short‐read sequencing to physically anchor scaffolds to genome maps.

          Related collections

          Most cited references25

          • Record: found
          • Abstract: found
          • Article: not found

          Rfam: an RNA family database.

          Rfam is a collection of multiple sequence alignments and covariance models representing non-coding RNA families. Rfam is available on the web in the UK at http://www.sanger.ac.uk/Software/Rfam/ and in the US at http://rfam.wustl.edu/. These websites allow the user to search a query sequence against a library of covariance models, and view multiple sequence alignments and family annotation. The database can also be downloaded in flatfile form and searched locally using the INFERNAL package (http://infernal.wustl.edu/). The first release of Rfam (1.0) contains 25 families, which annotate over 50 000 non-coding RNA genes in the taxonomic divisions of the EMBL nucleotide database.
            Bookmark
            • Record: found
            • Abstract: found
            • Article: not found

            Reproducible research in computational science.

            Roger Peng (2011)
            Computational science has led to exciting new developments, but the nature of the work has exposed limitations in our ability to evaluate published findings. Reproducibility has the potential to serve as a minimum standard for judging scientific claims when full independent replication of a study is not possible.
              Bookmark
              • Record: found
              • Abstract: found
              • Article: not found

              GAGE: A critical evaluation of genome assemblies and assembly algorithms.

              New sequencing technology has dramatically altered the landscape of whole-genome sequencing, allowing scientists to initiate numerous projects to decode the genomes of previously unsequenced organisms. The lowest-cost technology can generate deep coverage of most species, including mammals, in just a few days. The sequence data generated by one of these projects consist of millions or billions of short DNA sequences (reads) that range from 50 to 150 nt in length. These sequences must then be assembled de novo before most genome analyses can begin. Unfortunately, genome assembly remains a very difficult problem, made more difficult by shorter reads and unreliable long-range linking information. In this study, we evaluated several of the leading de novo assembly algorithms on four different short-read data sets, all generated by Illumina sequencers. Our results describe the relative performance of the different assemblers as well as other significant differences in assembly difficulty that appear to be inherent in the genomes themselves. Three overarching conclusions are apparent: first, that data quality, rather than the assembler itself, has a dramatic effect on the quality of an assembled genome; second, that the degree of contiguity of an assembly varies enormously among different assemblers and different genomes; and third, that the correctness of an assembly also varies widely and is not well correlated with statistics on contiguity. To enable others to replicate our results, all of our data and methods are freely available, as are all assemblers used in this study.
                Bookmark

                Author and article information

                Journal
                Mol Ecol Resour
                Mol Ecol Resour
                10.1111/(ISSN)1755-0998
                MEN
                Molecular Ecology Resources
                John Wiley and Sons Inc. (Hoboken )
                1755-098X
                1755-0998
                24 July 2015
                January 2016
                : 16
                : 1 ( doiID: 10.1111/men.2016.16.issue-1 )
                : 314-324
                Affiliations
                [ 1 ]Institut für Populationsgenetik Vetmeduni Vienna Veterinärplatz 1 Vienna 1210Austria
                [ 2 ] Department of Mathematics and StatisticsUniversity of Helsinki Helsinki FIN‐0014Finland
                [ 3 ]Present address: Department of BiologyDuke University Durham NC 27708USA
                Author notes
                [*] [* ]Correspondence: Robert R. Fitak, Fax: 919‐660‐7293; E‐mail: robert.fitak@ 123456duke.edu
                Article
                MEN12443
                10.1111/1755-0998.12443
                4973839
                26178449
                8ee11844-4b13-48c6-8f63-a60e12847bbb
                © 2015 The Authors. Molecular Ecology Resources Published by John Wiley & Sons Ltd

                This is an open access article under the terms of the Creative Commons Attribution License, which permits use, distribution and reproduction in any medium, provided the original work is properly cited.

                History
                : 02 February 2015
                : 22 June 2015
                : 25 June 2015
                Page count
                Pages: 11
                Funding
                Funded by: Academy of Finland to COIN centre of excellence
                Award ID: 251170
                Funded by: Austrian Science Foundation (FWF)
                Award ID: P24706‐B25
                Funded by: Austrian Academy of Sciences
                Award ID: 11506
                Categories
                Resource Article
                RESOURCE ARTICLES
                Permanent Genetic Resources
                Custom metadata
                2.0
                men12443
                January 2016
                Converter:WILEY_ML3GV2_TO_NLMPMC version:4.9.4 mode:remove_FC converted:04.08.2016

                Ecology
                adaptation,camelus dromedarius,demography,domestication,next‐generation sequencing
                Ecology
                adaptation, camelus dromedarius, demography, domestication, next‐generation sequencing

                Comments

                Comment on this article