14
views
0
recommends
+1 Recommend
0 collections
    0
    shares
      • Record: found
      • Abstract: found
      • Article: found
      Is Open Access

      Long-read assembly of a Great Dane genome highlights the contribution of GC-rich sequence and mobile elements to canine genomes

      research-article

      Read this article at

      Bookmark
          There is no author summary for this article yet. Authors can add summaries to their articles on ScienceOpen to make them more accessible to a non-specialist audience.

          Significance

          Advancements in long-read DNA sequencing technologies provide more comprehensive views of genomes. We used long-read sequences to assemble a Great Dane dog genome that provides several improvements over the existing reference derived from a Boxer. Assembly comparisons revealed that gaps in the Boxer assembly often occur at the beginning of protein-coding genes and have a high-GC content, which likely reflects limitations of previous technologies in resolving GC-rich sequences. Dimorphic LINE-1 and SINEC retrotransposons represent the predominant differences between the Great Dane and Boxer assemblies. Proof-of-principle experiments demonstrated that expression of a canine LINE-1 could promote the retrotransposition of itself and a SINEC_Cf consensus sequence in cultured human cells. Thus, ongoing retrotransposon activity is a major contributor to canine genetic diversity.

          Abstract

          Technological advances have allowed improvements in genome reference sequence assemblies. Here, we combined long- and short-read sequence resources to assemble the genome of a female Great Dane dog. This assembly has improved continuity compared to the existing Boxer-derived (CanFam3.1) reference genome. Annotation of the Great Dane assembly identified 22,182 protein-coding gene models and 7,049 long noncoding RNAs, including 49 protein-coding genes not present in the CanFam3.1 reference. The Great Dane assembly spans the majority of sequence gaps in the CanFam3.1 reference and illustrates that 2,151 gaps overlap the transcription start site of a predicted protein-coding gene. Moreover, a subset of the resolved gaps, which have an 80.95% median GC content, localize to transcription start sites and recombination hotspots more often than expected by chance, suggesting the stable canine recombinational landscape has shaped genome architecture. Alignment of the Great Dane and CanFam3.1 assemblies identified 16,834 deletions and 15,621 insertions, as well as 2,665 deletions and 3,493 insertions located on secondary contigs. These structural variants are dominated by retrotransposon insertion/deletion polymorphisms and include 16,221 dimorphic canine short interspersed elements (SINECs) and 1,121 dimorphic long interspersed element-1 sequences (LINE-1_Cfs). Analysis of sequences flanking the 3′ end of LINE-1_Cfs (i.e., LINE-1_Cf 3′-transductions) suggests multiple retrotransposition-competent LINE-1_Cfs segregate among dog populations. Consistent with this conclusion, we demonstrate that a canine LINE-1_Cf element with intact open reading frames can retrotranspose its own RNA and that of a SINEC_Cf consensus sequence in cultured human cells, implicating ongoing retrotransposon activity as a driver of canine genetic variation.

          Related collections

          Most cited references99

          • Record: found
          • Abstract: found
          • Article: found
          Is Open Access

          A global reference for human genetic variation

          The 1000 Genomes Project set out to provide a comprehensive description of common human genetic variation by applying whole-genome sequencing to a diverse set of individuals from multiple populations. Here we report completion of the project, having reconstructed the genomes of 2,504 individuals from 26 populations using a combination of low-coverage whole-genome sequencing, deep exome sequencing, and dense microarray genotyping. We characterized a broad spectrum of genetic variation, in total over 88 million variants (84.7 million single nucleotide polymorphisms (SNPs), 3.6 million short insertions/deletions (indels), and 60,000 structural variants), all phased onto high-quality haplotypes. This resource includes >99% of SNP variants with a frequency of >1% for a variety of ancestries. We describe the distribution of genetic variation across the global sample, and discuss the implications for common disease studies.
            Bookmark
            • Record: found
            • Abstract: found
            • Article: not found

            Minimap2: pairwise alignment for nucleotide sequences

            Heng Li (2018)
            Recent advances in sequencing technologies promise ultra-long reads of ∼100 kb in average, full-length mRNA or cDNA reads in high throughput and genomic contigs over 100 Mb in length. Existing alignment programs are unable or inefficient to process such data at scale, which presses for the development of new alignment algorithms.
              Bookmark
              • Record: found
              • Abstract: found
              • Article: not found

              Trinity: reconstructing a full-length transcriptome without a genome from RNA-Seq data

              Massively-parallel cDNA sequencing has opened the way to deep and efficient probing of transcriptomes. Current approaches for transcript reconstruction from such data often rely on aligning reads to a reference genome, and are thus unsuitable for samples with a partial or missing reference genome. Here, we present the Trinity methodology for de novo full-length transcriptome reconstruction, and evaluate it on samples from fission yeast, mouse, and whitefly – an insect whose genome has not yet been sequenced. Trinity fully reconstructs a large fraction of the transcripts present in the data, also reporting alternative splice isoforms and transcripts from recently duplicated genes. In all cases, Trinity performs better than other available de novo transcriptome assembly programs, and its sensitivity is comparable to methods relying on genome alignments. Our approach provides a unified and general solution for transcriptome reconstruction in any sample, especially in the complete absence of a reference genome.
                Bookmark

                Author and article information

                Journal
                Proc Natl Acad Sci U S A
                Proc Natl Acad Sci U S A
                pnas
                pnas
                PNAS
                Proceedings of the National Academy of Sciences of the United States of America
                National Academy of Sciences
                0027-8424
                1091-6490
                16 March 2021
                8 March 2021
                8 March 2021
                : 118
                : 11
                : e2016274118
                Affiliations
                [1] aDepartment of Biological Sciences, Bowling Green State University , Bowling Green, OH 43403;
                [2] bDepartment of Human Genetics, University of Michigan , Ann Arbor, MI 48109;
                [3] cUniversité Côte d’Azur, CNRS, INSERM, Institut de Recherche sur le Cancer et le Vieillissement de Nice , F-06100 Nice, France;
                [4] dUniversité de Rennes 1, CNRS, Institut de Génétique et Développement de Rennes −UMR 6290 , F-35000 Rennes, France;
                [5] eDepartment of Internal Medicine, University of Michigan , Ann Arbor, MI 48109;
                [6] fDepartment of Biomedical Sciences, Cornell University , Ithaca, NY 14850;
                [7] gDepartment Computational Medicine and Bioinformatics, University of Michigan , Ann Arbor, MI 48109
                Author notes
                2To whom correspondence may be addressed. Email: jmkidd@ 123456umich.edu .

                Edited by Mary-Claire King, University of Washington, Seattle, WA, and approved January 25, 2021 (received for review July 31, 2020)

                Author contributions: J.V.H., A.L.P., F.S., A.J.D., J.V.M., A.R.B., and J.M.K. designed research; J.V.H., A.L.P., F.S., A.J.D., T.D., C.H., B.M., E.S., S.E., and J.M.K. performed research; J.V.H., A.J.D., B.M., E.S., and S.E. contributed new reagents/analytic tools; J.V.H., A.L.P., F.S., T.D., C.H., L.E.K., and J.M.K. analyzed data; and J.V.H., A.L.P., A.J.D., J.V.M., and J.M.K. wrote the paper.

                1J.V.H., A.L.P., and F.S. contributed equally to this work.

                Author information
                https://orcid.org/0000-0001-7221-4874
                https://orcid.org/0000-0002-8211-5789
                https://orcid.org/0000-0003-1714-437X
                https://orcid.org/0000-0002-2443-153X
                https://orcid.org/0000-0002-6413-6332
                https://orcid.org/0000-0002-5308-4864
                https://orcid.org/0000-0002-9125-6874
                Article
                202016274
                10.1073/pnas.2016274118
                7980453
                33836575
                fb8f45a6-23d5-46f0-81d0-15580d484492
                Copyright © 2021 the Author(s). Published by PNAS.

                This open access article is distributed under Creative Commons Attribution-NonCommercial-NoDerivatives License 4.0 (CC BY-NC-ND).

                History
                Page count
                Pages: 9
                Funding
                Funded by: HHS | NIH | National Institute of General Medical Sciences (NIGMS) 100000057
                Award ID: R01GM103961
                Award Recipient : Julia Vera Halo Award Recipient : Adam R Boyko Award Recipient : Jeffrey M. Kidd
                Funded by: HHS | NIH | National Institute of General Medical Sciences (NIGMS) 100000057
                Award ID: R15GM122028
                Award Recipient : Julia Vera Halo Award Recipient : Adam R Boyko Award Recipient : Jeffrey M. Kidd
                Funded by: HHS | NIH | National Human Genome Research Institute (NHGRI) 100000051
                Award ID: T32HG00040
                Award Recipient : Amanda L Pendleton
                Categories
                419
                Biological Sciences
                Genetics

                canis familiaris,long-read assembly,mobile elements,structural variation

                Comments

                Comment on this article