2
views
0
recommends
+1 Recommend
1 collections
    0
    shares

      Why publish your research Open Access with G3: Genes|Genomes|Genetics?

      Learn more and submit today!

      • Record: found
      • Abstract: found
      • Article: found
      Is Open Access

      The first gapless, reference-quality, fully annotated genome from a Southern Han Chinese individual

      research-article

      Read this article at

      ScienceOpenPublisherPMC
          There is no author summary for this article yet. Authors can add summaries to their articles on ScienceOpen to make them more accessible to a non-specialist audience.

          Abstract

          We used long-read DNA sequencing to assemble the genome of a Southern Han Chinese male. We organized the sequence into chromosomes and filled in gaps using the recently completed T2T-CHM13 genome as a guide, yielding a gap-free genome, Han1, containing 3,099,707,698 bases. Using the T2T-CHM13 annotation as a reference, we mapped all genes onto the Han1 genome and identified additional gene copies, generating a total of 60,708 putative genes, of which 20,003 are protein-coding. A comprehensive comparison between the genes revealed that 235 protein-coding genes were substantially different between the individuals, with frameshifts or truncations affecting the protein-coding sequence. Most of these were heterozygous variants in which one gene copy was unaffected. This represents the first gene-level comparison between two finished, annotated individual human genomes.

          Related collections

          Most cited references48

          • Record: found
          • Abstract: found
          • Article: found
          Is Open Access

          The Sequence Alignment/Map format and SAMtools

          Summary: The Sequence Alignment/Map (SAM) format is a generic alignment format for storing read alignments against reference sequences, supporting short and long reads (up to 128 Mbp) produced by different sequencing platforms. It is flexible in style, compact in size, efficient in random access and is the format in which alignments from the 1000 Genomes Project are released. SAMtools implements various utilities for post-processing alignments in the SAM format, such as indexing, variant caller and alignment viewer, and thus provides universal tools for processing read alignments. Availability: http://samtools.sourceforge.net Contact: rd@sanger.ac.uk
            Bookmark
            • Record: found
            • Abstract: found
            • Article: found
            Is Open Access

            BEDTools: a flexible suite of utilities for comparing genomic features

            Motivation: Testing for correlations between different sets of genomic features is a fundamental task in genomics research. However, searching for overlaps between features with existing web-based methods is complicated by the massive datasets that are routinely produced with current sequencing technologies. Fast and flexible tools are therefore required to ask complex questions of these data in an efficient manner. Results: This article introduces a new software suite for the comparison, manipulation and annotation of genomic features in Browser Extensible Data (BED) and General Feature Format (GFF) format. BEDTools also supports the comparison of sequence alignments in BAM format to both BED and GFF features. The tools are extremely efficient and allow the user to compare large datasets (e.g. next-generation sequencing data) with both public and custom genome annotation tracks. BEDTools can be combined with one another as well as with standard UNIX commands, thus facilitating routine genomics tasks as well as pipelines that can quickly answer intricate questions of large genomic datasets. Availability and implementation: BEDTools was written in C++. Source code and a comprehensive user manual are freely available at http://code.google.com/p/bedtools Contact: aaronquinlan@gmail.com; imh4y@virginia.edu Supplementary information: Supplementary data are available at Bioinformatics online.
              Bookmark
              • Record: found
              • Abstract: found
              • Article: not found

              Gapped BLAST and PSI-BLAST: a new generation of protein database search programs.

              S Altschul (1997)
              The BLAST programs are widely used tools for searching protein and DNA databases for sequence similarities. For protein comparisons, a variety of definitional, algorithmic and statistical refinements described here permits the execution time of the BLAST programs to be decreased substantially while enhancing their sensitivity to weak similarities. A new criterion for triggering the extension of word hits, combined with a new heuristic for generating gapped alignments, yields a gapped BLAST program that runs at approximately three times the speed of the original. In addition, a method is introduced for automatically combining statistically significant alignments produced by BLAST into a position-specific score matrix, and searching the database using this matrix. The resulting Position-Specific Iterated BLAST (PSI-BLAST) program runs at approximately the same speed per iteration as gapped BLAST, but in many cases is much more sensitive to weak but biologically relevant sequence similarities. PSI-BLAST is used to uncover several new and interesting members of the BRCT superfamily.
                Bookmark

                Author and article information

                Contributors
                Role: Associate Editor
                Journal
                G3 (Bethesda)
                Genetics
                g3journal
                G3: Genes|Genomes|Genetics
                Oxford University Press (US )
                2160-1836
                March 2023
                11 January 2023
                11 January 2023
                : 13
                : 3
                : jkac321
                Affiliations
                Department of Computer Science, Johns Hopkins University , Baltimore, MD 21218, USA
                Center for Computational Biology, Johns Hopkins University , Baltimore, MD 21218, USA
                Center for Computational Biology, Johns Hopkins University , Baltimore, MD 21218, USA
                Department of Biomedical Engineering, Johns Hopkins University , Baltimore, MD 21218, USA
                Center for Computational Biology, Johns Hopkins University , Baltimore, MD 21218, USA
                Department of Biomedical Engineering, Johns Hopkins University , Baltimore, MD 21218, USA
                Department of Computer Science, Johns Hopkins University , Baltimore, MD 21218, USA
                Center for Computational Biology, Johns Hopkins University , Baltimore, MD 21218, USA
                Department of Biomedical Engineering, Johns Hopkins University , Baltimore, MD 21218, USA
                Department of Biostatistics, Johns Hopkins University , Baltimore, MD 21211, USA
                Author notes
                Corresponding author: 3100 Wyman Park Dr., Wyman Park Building, Room S217, Baltimore, MD 21211, USA. Email: kh.chao@ 123456cs.jhu.edu
                Corresponding author: 3100 Wyman Park Dr., Wyman Park Building, Room S220, Baltimore, MD 21211, USA. Email: salzberg@ 123456jhu.edu

                Conflicts of interest The authors declare no conflict of interest.

                Article
                jkac321
                10.1093/g3journal/jkac321
                9997556
                36630290
                dc50a665-4313-4228-abdd-886e2b767ad4
                © The Author(s) 2023. Published by Oxford University Press on behalf of the Genetics Society of America.

                This is an Open Access article distributed under the terms of the Creative Commons Attribution License ( https://creativecommons.org/licenses/by/4.0/), which permits unrestricted reuse, distribution, and reproduction in any medium, provided the original work is properly cited.

                History
                : 8 August 2022
                : 3 November 2022
                : 31 January 2023
                Page count
                Pages: 9
                Funding
                Funded by: U.S. National Institutes of Health;
                Award ID: R01-HG006677
                Award ID: R35-GM130151
                Funded by: U.S. National Science Foundation, doi 10.13039/100000001;
                Award ID: IOS-1744309
                Award ID: DBI-1759518
                Categories
                Genomic Prediction
                AcademicSubjects/SCI01180
                AcademicSubjects/SCI01140

                Genetics
                genome assembly,annotation,dna sequencing,reference genome,variant calling
                Genetics
                genome assembly, annotation, dna sequencing, reference genome, variant calling

                Comments

                Comment on this article