10
views
0
recommends
+1 Recommend
0 collections
    0
    shares
      • Record: found
      • Abstract: found
      • Article: found
      Is Open Access

      De novo assembly of the cattle reference genome with single-molecule sequencing

      research-article
      1 , 2 , 3 , 4 , 3 , 5 , 3 , 6 , 7 , 8 , 5 , 2 , 4 , 9 , 10 , 11 , 11 , 12 , 13 , 13 , 14 , 14 , 15 , 16 , 17 , 18 , 18 , 1 , 4 , 1 , 1 , 1 , 13 , 19
      GigaScience
      Oxford University Press
      bovine genome, reference assembly, cattle, Hereford

      Read this article at

      Bookmark
          There is no author summary for this article yet. Authors can add summaries to their articles on ScienceOpen to make them more accessible to a non-specialist audience.

          Abstract

          Background

          Major advances in selection progress for cattle have been made following the introduction of genomic tools over the past 10–12 years. These tools depend upon the Bos taurus reference genome (UMD3.1.1), which was created using now-outdated technologies and is hindered by a variety of deficiencies and inaccuracies.

          Results

          We present the new reference genome for cattle, ARS-UCD1.2, based on the same animal as the original to facilitate transfer and interpretation of results obtained from the earlier version, but applying a combination of modern technologies in a de novo assembly to increase continuity, accuracy, and completeness. The assembly includes 2.7 Gb and is >250× more continuous than the original assembly, with contig N50 >25 Mb and L50 of 32. We also greatly expanded supporting RNA-based data for annotation that identifies 30,396 total genes (21,039 protein coding). The new reference assembly is accessible in annotated form for public use.

          Conclusions

          We demonstrate that improved continuity of assembled sequence warrants the adoption of ARS-UCD1.2 as the new cattle reference genome and that increased assembly accuracy will benefit future research on this species.

          Related collections

          Most cited references32

          • Record: found
          • Abstract: found
          • Article: not found

          Basic local alignment search tool.

          A new approach to rapid sequence comparison, basic local alignment search tool (BLAST), directly approximates alignments that optimize a measure of local similarity, the maximal segment pair (MSP) score. Recent mathematical results on the stochastic properties of MSP scores allow an analysis of the performance of this method as well as the statistical significance of alignments it generates. The basic algorithm is simple and robust; it can be implemented in a number of ways and applied in a variety of contexts including straightforward DNA and protein sequence database searches, motif searches, gene identification searches, and in the analysis of multiple regions of similarity in long DNA sequences. In addition to its flexibility and tractability to mathematical analysis, BLAST is an order of magnitude faster than existing sequence comparison tools of comparable sensitivity.
            Bookmark
            • Record: found
            • Abstract: found
            • Article: found
            Is Open Access

            Pilon: An Integrated Tool for Comprehensive Microbial Variant Detection and Genome Assembly Improvement

            Advances in modern sequencing technologies allow us to generate sufficient data to analyze hundreds of bacterial genomes from a single machine in a single day. This potential for sequencing massive numbers of genomes calls for fully automated methods to produce high-quality assemblies and variant calls. We introduce Pilon, a fully automated, all-in-one tool for correcting draft assemblies and calling sequence variants of multiple sizes, including very large insertions and deletions. Pilon works with many types of sequence data, but is particularly strong when supplied with paired end data from two Illumina libraries with small e.g., 180 bp and large e.g., 3–5 Kb inserts. Pilon significantly improves draft genome assemblies by correcting bases, fixing mis-assemblies and filling gaps. For both haploid and diploid genomes, Pilon produces more contiguous genomes with fewer errors, enabling identification of more biologically relevant genes. Furthermore, Pilon identifies small variants with high accuracy as compared to state-of-the-art tools and is unique in its ability to accurately identify large sequence variants including duplications and resolve large insertions. Pilon is being used to improve the assemblies of thousands of new genomes and to identify variants from thousands of clinically relevant bacterial strains. Pilon is freely available as open source software.
              Bookmark
              • Record: found
              • Abstract: found
              • Article: found
              Is Open Access

              Canu: scalable and accurate long-read assembly via adaptive k-mer weighting and repeat separation

              Long-read single-molecule sequencing has revolutionized de novo genome assembly and enabled the automated reconstruction of reference-quality genomes. However, given the relatively high error rates of such technologies, efficient and accurate assembly of large repeats and closely related haplotypes remains challenging. We address these issues with Canu, a successor of Celera Assembler that is specifically designed for noisy single-molecule sequences. Canu introduces support for nanopore sequencing, halves depth-of-coverage requirements, and improves assembly continuity while simultaneously reducing runtime by an order of magnitude on large genomes versus Celera Assembler 8.2. These advances result from new overlapping and assembly algorithms, including an adaptive overlapping strategy based on tf-idf weighted MinHash and a sparse assembly graph construction that avoids collapsing diverged repeats and haplotypes. We demonstrate that Canu can reliably assemble complete microbial genomes and near-complete eukaryotic chromosomes using either Pacific Biosciences (PacBio) or Oxford Nanopore technologies and achieves a contig NG50 of >21 Mbp on both human and Drosophila melanogaster PacBio data sets. For assembly structures that cannot be linearly represented, Canu provides graph-based assembly outputs in graphical fragment assembly (GFA) format for analysis or integration with complementary phasing and scaffolding techniques. The combination of such highly resolved assembly graphs with long-range scaffolding information promises the complete and automated assembly of complex genomes.
                Bookmark

                Author and article information

                Journal
                Gigascience
                Gigascience
                gigascience
                GigaScience
                Oxford University Press
                2047-217X
                19 March 2020
                March 2020
                19 March 2020
                : 9
                : 3
                : giaa021
                Affiliations
                [1 ] USDA-ARS, Beltsville, MD, 20705-2350 , Animal Genomics and Improvement Laboratory , USDA-ARS, 10300 Baltimore Ave, Beltsville, MD 20705-2350, USA
                [2 ] Dairy Forage Research Center , USDA-ARS, 1925 Linden Drive, Madison, WI, 53706, USA
                [3 ] Division of Animal Sciences, University of Missouri , 162 Animal Science Research Center, Columbia, MO 65211, USA
                [4 ] Genome Informatics Section, Computational and Statistical Genomics Branch, National Human Genome Research Institute, National Institutes of Health , 9000 Rockville Pike, Bethesda, MD 20892, USA
                [5 ] Pacific Biosciences , 1305 O'Brien Drive, Menlo Park, CA 94025, USA
                [6 ] The Davies Research Centre, School of Animal and Veterinary Sciences, University of Adelaide , Roseworthy, SA 5371, Australia
                [7 ] Johns Hopkins University , Welch Library of Medicine, Ste 105, 1900 E. Monument St., Baltimore, MD 21205, USA
                [8 ] Livestock Improvement Corporation , Private Bag 3016, Hamilton 3240, New Zealand
                [9 ] Department of Computer Science, University of Maryland , 8125 Paint Branch Drive, College Park, MD 20742 USA
                [10 ] Department of Animal and Veterinary Sciences, University of Vermont , Burlington, VT 05405, USA
                [11 ] National Center for Biotechnology Information, National Library of Medicine, National Institutes of Health , Bethesda, MD 20894, USA
                [12 ] Department of Animal and Veterinary Science, University of Idaho , 875 Perimeter Drive MS 2330, Moscow, ID 83844-2330, USA
                [13 ] U.S. Meat Animal Research Center , USDA-ARS, 844 Road 313, Clay Center, NE 68933, USA
                [14 ] The Pirbright Institute , Pirbright, Woking, Surrey, UK
                [15 ] Division of Livestock Sciences, University of Natural Resources and Life Sciences , Gregor Mendel str. 33, A-1180, Vienna, Austria
                [16 ] Animal Science Department, Lilongwe University of Agriculture and Natural Resources , P.O. Box 219, Lilongwe, Malawi
                [17 ] Department of Animal and Food Sciences, Oklahoma State University , 101 Animal Science Building, Stillwater, OK 74078, USA
                [18 ] Computomics GmbH , Christophstr. 32, 72072 Tübingen, Germany
                [19 ] Department of Animal Science, University of California , Davis, One Shields Avenue, Davis, CA 95616, USA
                Author notes
                Correspondence address. Benjamin D. Rosen, Animal Genomics and Improvement Laboratory, USDA-ARS, Beltsville, MD, USA. E-mail: ben.rosen@ 123456usda.gov
                Correspondence address. Timothy P.L. Smith, U.S. Meat Animal Research Center, USDA-ARS, Clay Center, NE, USA. E-mail: tim.smith2@ 123456usda.gov

                These authors contributed equally to this work.

                Author information
                http://orcid.org/0000-0001-9395-8346
                http://orcid.org/0000-0003-2223-9285
                http://orcid.org/0000-0001-5018-7641
                http://orcid.org/0000-0002-1472-8962
                http://orcid.org/0000-0002-4248-7713
                http://orcid.org/0000-0002-1074-5095
                http://orcid.org/0000-0001-5347-7695
                http://orcid.org/0000-0002-0749-765X
                http://orcid.org/0000-0001-5091-3092
                http://orcid.org/0000-0001-7410-9410
                http://orcid.org/0000-0001-6490-8227
                http://orcid.org/0000-0002-9006-0634
                http://orcid.org/0000-0003-1381-4081
                http://orcid.org/0000-0001-8675-3473
                http://orcid.org/0000-0001-6282-9728
                http://orcid.org/0000-0001-5321-1133
                http://orcid.org/0000-0002-2213-3248
                http://orcid.org/0000-0003-2057-1831
                http://orcid.org/0000-0002-2507-8133
                http://orcid.org/0000-0001-8295-020X
                http://orcid.org/0000-0001-8488-906X
                http://orcid.org/0000-0001-9103-5150
                http://orcid.org/0000-0003-2983-8934
                http://orcid.org/0000-0003-1242-4401
                http://orcid.org/0000-0002-8416-2087
                http://orcid.org/0000-0003-0192-6705
                http://orcid.org/0000-0003-1611-6828
                http://orcid.org/0000-0001-7113-3183
                Article
                giaa021
                10.1093/gigascience/giaa021
                7081964
                32191811
                2fd1f830-4d73-4296-8cef-59439b8af23f
                © The Author(s) 2020. Published by Oxford University Press.

                This is an Open Access article distributed under the terms of the Creative Commons Attribution License ( http://creativecommons.org/licenses/by/4.0/), which permits unrestricted reuse, distribution, and reproduction in any medium, provided the original work is properly cited.

                History
                : 24 September 2019
                : 31 January 2020
                : 14 February 2020
                Page count
                Pages: 9
                Funding
                Funded by: U.S. Department of Agriculture, DOI 10.13039/100000199;
                Award ID: 8042-31000-001-00-D
                Award ID: 8042-31000-002-00-D
                Award ID: 5090-31000-026-00-D
                Award ID: 3040-31000-100-00-D
                Funded by: National Institute of Food and Agriculture, DOI 10.13039/100005825;
                Award ID: 5090-31000-026-06-I
                Award ID: 2016-68004-24827
                Award ID: 2013-67015-21202
                Award ID: MO-HAAS0001
                Award ID: 2015-67015-23183
                Funded by: U.S. National Library of Medicine, DOI 10.13039/100000092;
                Funded by: National Institutes of Health, DOI 10.13039/100000002;
                Funded by: Biotechnology and Biological Sciences Research Council, DOI 10.13039/501100000268;
                Award ID: BB/M027155/1
                Award ID: BBS/E/I/00007035
                Award ID: BBS/E/I/00007038
                Award ID: BBS/E/I/00007039
                Funded by: National Human Genome Research Institute, DOI 10.13039/100000051;
                Funded by: Korea Health Industry Development Institute, DOI 10.13039/501100003710;
                Funded by: Ministry of Health, DOI 10.13039/501100004726;
                Award ID: HI17C2098
                Categories
                Data Note
                AcademicSubjects/SCI00960
                AcademicSubjects/SCI02254

                bovine genome,reference assembly,cattle,hereford
                bovine genome, reference assembly, cattle, hereford

                Comments

                Comment on this article