4
views
0
recommends
+1 Recommend
0 collections
    0
    shares
      • Record: found
      • Abstract: found
      • Article: found
      Is Open Access

      Characterization of genome-wide STR variation in 6487 human genomes

      research-article

      Read this article at

      Bookmark
          There is no author summary for this article yet. Authors can add summaries to their articles on ScienceOpen to make them more accessible to a non-specialist audience.

          Abstract

          Short tandem repeats (STRs) are abundant and highly mutagenic in the human genome. Many STR loci have been associated with a range of human genetic disorders. However, most population-scale studies on STR variation in humans have focused on European ancestry cohorts or are limited by sequencing depth. Here, we depicted a comprehensive map of 366,013 polymorphic STRs (pSTRs) constructed from 6487 deeply sequenced genomes, comprising 3983 Chinese samples (~31.5x, NyuWa) and 2504 samples from the 1000 Genomes Project (~33.3x, 1KGP). We found that STR mutations were affected by motif length, chromosome context and epigenetic features. We identified 3273 and 1117 pSTRs whose repeat numbers were associated with gene expression and 3′UTR alternative polyadenylation, respectively. We also implemented population analysis, investigated population differentiated signatures, and genotyped 60 known disease-causing STRs. Overall, this study further extends the scale of STR variation in humans and propels our understanding of the semantics of STRs.

          Abstract

          Short tandem repeat studies in humans have often focused on European populations. Here, the authors report a comprehensive map of 366,013 polymorphic short tandem repeats in Chinese individuals and their mutational patterns, functional properties, gene regulatory effects and population characteristics.

          Related collections

          Most cited references121

          • Record: found
          • Abstract: found
          • Article: found
          Is Open Access

          The Sequence Alignment/Map format and SAMtools

          Summary: The Sequence Alignment/Map (SAM) format is a generic alignment format for storing read alignments against reference sequences, supporting short and long reads (up to 128 Mbp) produced by different sequencing platforms. It is flexible in style, compact in size, efficient in random access and is the format in which alignments from the 1000 Genomes Project are released. SAMtools implements various utilities for post-processing alignments in the SAM format, such as indexing, variant caller and alignment viewer, and thus provides universal tools for processing read alignments. Availability: http://samtools.sourceforge.net Contact: rd@sanger.ac.uk
            Bookmark
            • Record: found
            • Abstract: found
            • Article: not found

            STAR: ultrafast universal RNA-seq aligner.

            Accurate alignment of high-throughput RNA-seq data is a challenging and yet unsolved problem because of the non-contiguous transcript structure, relatively short read lengths and constantly increasing throughput of the sequencing technologies. Currently available RNA-seq aligners suffer from high mapping error rates, low mapping speed, read length limitation and mapping biases. To align our large (>80 billon reads) ENCODE Transcriptome RNA-seq dataset, we developed the Spliced Transcripts Alignment to a Reference (STAR) software based on a previously undescribed RNA-seq alignment algorithm that uses sequential maximum mappable seed search in uncompressed suffix arrays followed by seed clustering and stitching procedure. STAR outperforms other aligners by a factor of >50 in mapping speed, aligning to the human genome 550 million 2 × 76 bp paired-end reads per hour on a modest 12-core server, while at the same time improving alignment sensitivity and precision. In addition to unbiased de novo detection of canonical junctions, STAR can discover non-canonical splices and chimeric (fusion) transcripts, and is also capable of mapping full-length RNA sequences. Using Roche 454 sequencing of reverse transcription polymerase chain reaction amplicons, we experimentally validated 1960 novel intergenic splice junctions with an 80-90% success rate, corroborating the high precision of the STAR mapping strategy. STAR is implemented as a standalone C++ code. STAR is free open source software distributed under GPLv3 license and can be downloaded from http://code.google.com/p/rna-star/.
              Bookmark
              • Record: found
              • Abstract: found
              • Article: not found

              clusterProfiler: an R package for comparing biological themes among gene clusters.

              Increasing quantitative data generated from transcriptomics and proteomics require integrative strategies for analysis. Here, we present an R package, clusterProfiler that automates the process of biological-term classification and the enrichment analysis of gene clusters. The analysis module and visualization module were combined into a reusable workflow. Currently, clusterProfiler supports three species, including humans, mice, and yeast. Methods provided in this package can be easily extended to other species and ontologies. The clusterProfiler package is released under Artistic-2.0 License within Bioconductor project. The source code and vignette are freely available at http://bioconductor.org/packages/release/bioc/html/clusterProfiler.html.
                Bookmark

                Author and article information

                Contributors
                xutao@ibp.ac.cn
                heshunmin@ibp.ac.cn
                Journal
                Nat Commun
                Nat Commun
                Nature Communications
                Nature Publishing Group UK (London )
                2041-1723
                12 April 2023
                12 April 2023
                2023
                : 14
                : 2092
                Affiliations
                [1 ]GRID grid.9227.e, ISNI 0000000119573309, Key Laboratory of RNA Biology, Center for Big Data Research in Health, Institute of Biophysics, , Chinese Academy of Sciences, ; Beijing, 100101 China
                [2 ]GRID grid.410726.6, ISNI 0000 0004 1797 8419, University of Chinese Academy of Sciences, ; Beijing, 100049 China
                [3 ]GRID grid.410726.6, ISNI 0000 0004 1797 8419, College of Life Sciences, , University of Chinese Academy of Sciences, ; Beijing, 100049 China
                [4 ]GRID grid.9227.e, ISNI 0000000119573309, National Laboratory of Biomacromolecules, CAS Center for Excellence in Biomacromolecules, Institute of Biophysics, , Chinese Academy of Sciences, ; Beijing, 100101 China
                [5 ]GRID grid.410587.f, Shandong First Medical University & Shandong Academy of Medical Sciences, ; Jinan, 250117 Shandong China
                Author information
                http://orcid.org/0000-0002-9694-8159
                http://orcid.org/0000-0002-8260-9754
                http://orcid.org/0000-0002-7294-0865
                Article
                37690
                10.1038/s41467-023-37690-8
                10097659
                37045857
                17f14e8c-a1bb-4a26-adba-96785465666c
                © The Author(s) 2023

                Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made. The images or other third party material in this article are included in the article’s Creative Commons license, unless indicated otherwise in a credit line to the material. If material is not included in the article’s Creative Commons license and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this license, visit http://creativecommons.org/licenses/by/4.0/.

                History
                : 4 August 2022
                : 27 March 2023
                Funding
                Funded by: FundRef https://doi.org/10.13039/501100001809, National Natural Science Foundation of China (National Science Foundation of China);
                Award ID: 91940306
                Award ID: 31871294
                Award ID: 31970647
                Award ID: 32200478
                Award Recipient :
                Funded by: Strategic Priority Research Program of the Chinese Academy of Sciences [XDB38040300 (S.-M.H)] National Key R&D Program of China [2021YFF0703701 (S.-M.H)] 14th Five-year Informatization Plan of Chinese Academy of Sciences [CAS-WX2021SF-0203(S.-M.H)]
                Funded by: National Key R&D Program of China [2021YFF0704500 (P.Z)] Special investigation on science and technology basic resources of the MOST, China [2019FY100102 (P.Z)]
                Funded by: China Postdoctoral Science Foundation [2022M713311 (Y.-Y.L)]
                Categories
                Article
                Custom metadata
                © The Author(s) 2023

                Uncategorized
                genetic variation,bioinformatics
                Uncategorized
                genetic variation, bioinformatics

                Comments

                Comment on this article