0
views
0
recommends
+1 Recommend
0 collections
    0
    shares
      • Record: found
      • Abstract: found
      • Article: found
      Is Open Access

      Use of a Candida albicans SC5314 PacBio HiFi reads dataset to close gaps in the reference genome assembly, reveal a subtelomeric gene family, and produce accurate phased allelic sequences

      brief-report

      Read this article at

      Bookmark
          There is no author summary for this article yet. Authors can add summaries to their articles on ScienceOpen to make them more accessible to a non-specialist audience.

          Abstract

          Candida albicans SC5314 is the most-often used strain for molecular manipulation of the species. The SC5314 reference genome sequence is the result of considerable effort from many scientists and has advanced research into fungal biology and pathogenesis. Although the resource is highly developed and presented in a phased diploid format, the sequence includes gaps and does not extend to the telomeres on its eight chromosome pairs. Accurate SC5314 genome assembly is complicated by the presence of extensive repeated sequences and considerable allelic length variation at some loci. Advances in genome sequencing technology provide the tools to obtain highly accurate long-read data that span even the most-difficult-to-assemble genome regions. Here, we describe derivation of a PacBio HiFi data set and creation of a collapsed haploid telomere-to-telomere assembly of the SC5314 genome (ASM3268872v1) that revealed previously unknown features of the strain. ASM3268872v1 subtelomeric distances were up to 19 kb larger than in the reference genome and revealed a family of highly conserved DNA helicase-encoding genes at 10 of the 16 chromosome ends. We also describe alignments of individual HiFi reads to deduce accurate diploid sequences for the most notoriously difficult-to-assemble C. albicans genes: the agglutinin-like sequence ( ALS) gene family. We provide a tutorial that demonstrates how the HiFi reads can be visualized to explore any region of interest. Availability of the HiFi reads data set and the ASM3268872v1 comparative guide assembly will streamline research efforts because accurate diploid sequences can be derived using simple in silico methods rather than time-consuming laboratory-bench approaches.

          Related collections

          Most cited references38

          • Record: found
          • Abstract: found
          • Article: found
          Is Open Access

          The Sequence Alignment/Map format and SAMtools

          Summary: The Sequence Alignment/Map (SAM) format is a generic alignment format for storing read alignments against reference sequences, supporting short and long reads (up to 128 Mbp) produced by different sequencing platforms. It is flexible in style, compact in size, efficient in random access and is the format in which alignments from the 1000 Genomes Project are released. SAMtools implements various utilities for post-processing alignments in the SAM format, such as indexing, variant caller and alignment viewer, and thus provides universal tools for processing read alignments. Availability: http://samtools.sourceforge.net Contact: rd@sanger.ac.uk
            Bookmark
            • Record: found
            • Abstract: found
            • Article: not found

            featureCounts: an efficient general purpose program for assigning sequence reads to genomic features.

            Next-generation sequencing technologies generate millions of short sequence reads, which are usually aligned to a reference genome. In many applications, the key information required for downstream analysis is the number of reads mapping to each genomic feature, for example to each exon or each gene. The process of counting reads is called read summarization. Read summarization is required for a great variety of genomic analyses but has so far received relatively little attention in the literature. We present featureCounts, a read summarization program suitable for counting reads generated from either RNA or genomic DNA sequencing experiments. featureCounts implements highly efficient chromosome hashing and feature blocking techniques. It is considerably faster than existing methods (by an order of magnitude for gene-level summarization) and requires far less computer memory. It works with either single or paired-end reads and provides a wide range of options appropriate for different sequencing applications. featureCounts is available under GNU General Public License as part of the Subread (http://subread.sourceforge.net) or Rsubread (http://www.bioconductor.org) software packages.
              Bookmark
              • Record: found
              • Abstract: found
              • Article: not found

              Minimap2: pairwise alignment for nucleotide sequences

              Heng Li (2018)
              Recent advances in sequencing technologies promise ultra-long reads of ∼100 kb in average, full-length mRNA or cDNA reads in high throughput and genomic contigs over 100 Mb in length. Existing alignment programs are unable or inefficient to process such data at scale, which presses for the development of new alignment algorithms.
                Bookmark

                Author and article information

                Contributors
                URI : https://loop.frontiersin.org/people/22685Role: Role: Role: Role: Role: Role: Role: Role: Role: Role: Role:
                Role: Role: Role: Role:
                Role: Role: Role:
                URI : https://loop.frontiersin.org/people/718682Role: Role: Role:
                Journal
                Front Cell Infect Microbiol
                Front Cell Infect Microbiol
                Front. Cell. Infect. Microbiol.
                Frontiers in Cellular and Infection Microbiology
                Frontiers Media S.A.
                2235-2988
                01 February 2024
                2024
                : 14
                : 1329438
                Affiliations
                [1] 1 Department of Pathobiology, College of Veterinary Medicine, University of Illinois Urbana-Champaign , Urbana, IL, United States
                [2] 2 Department of Mathematics and Computational Sciences, Millikin University , Decatur, IL, United States
                [3] 3 Roy J. Carver Biotechnology Center, University of Illinois Urbana-Champaign , Urbana, IL, United States
                Author notes

                Edited by: Victoriano Garre, University of Murcia, Spain

                Reviewed by: Gavin Sherlock, Stanford University, United States

                David A. Cisneros, Queen’s University Belfast, United Kingdom

                *Correspondence: Lois L. Hoyer, lhoyer@ 123456illinois.edu
                Article
                10.3389/fcimb.2024.1329438
                10867151
                38362496
                f518be99-57b0-46f3-bf8b-84fbe521e91d
                Copyright © 2024 Hoyer, Freeman, Hogan and Hernandez

                This is an open-access article distributed under the terms of the Creative Commons Attribution License (CC BY). The use, distribution or reproduction in other forums is permitted, provided the original author(s) and the copyright owner(s) are credited and that the original publication in this journal is cited, in accordance with accepted academic practice. No use, distribution or reproduction is permitted which does not comply with these terms.

                History
                : 29 October 2023
                : 05 January 2024
                Page count
                Figures: 2, Tables: 2, Equations: 0, References: 40, Pages: 10, Words: 5289
                Funding
                Funded by: National Institute of Dental and Craniofacial Research , doi 10.13039/100000072;
                The author(s) declare financial support was received for the research, authorship, and/or publication of this article. This work was funded by R15 DE026401 from the National Institute of Dental and Craniofacial Research, National Institutes of Health.
                Categories
                Cellular and Infection Microbiology
                Brief Research Report
                Custom metadata
                Fungal Pathogenesis

                Infectious disease & Microbiology
                genome sequence,candida albicans,pathogenic yeast genomes,pacbio sequence data,allelic sequences,telomere-to-telomere

                Comments

                Comment on this article