TogoVar: A comprehensive Japanese genetic variation database

There is no author summary for this article yet. Authors can add summaries to their articles on ScienceOpen to make them more accessible to a non-specialist audience.

Abstract

TogoVar ( https://togovar.org) is a database that integrates allele frequencies derived from Japanese populations and provides annotations for variant interpretation. First, a scheme to reanalyze individual-level genome sequence data deposited in the Japanese Genotype-phenotype Archive (JGA), a controlled-access database, was established to make allele frequencies publicly available. As more Japanese individual-level genome sequence data are deposited in JGA, the sample size employed in TogoVar is expected to increase, contributing to genetic study as reference data for Japanese populations. Second, public datasets of Japanese and non-Japanese populations were integrated into TogoVar to easily compare allele frequencies in Japanese and other populations. Each variant detected in Japanese populations was assigned a TogoVar ID as a permanent identifier. Third, these variants were annotated with molecular consequence, pathogenicity, and literature information for interpreting and prioritizing variants. Here, we introduce the newly developed TogoVar database that compares allele frequencies among Japanese and non-Japanese populations and describes the integrated annotations.

Genomic database: Genetic variation patterns in Japan

A comprehensive database of genome sequence differences found in Japanese individuals is helping researchers uncover the genetic basis of diseases occurring in the Japanese population. Nobutaka Mitsuhashi from the Database Center for Life Science in Chiba, Japan, and colleagues describe the development of TogoVar, a web-based resource that includes genetic data from more than 200,000 Japanese individuals, plus many others of non-Japanese ancestry for comparison. From the millions of tracked DNA differences, many associated with disease, researchers can search for genetic variants of interest and find information on variant frequency, clinical importance, genomic context, related publications, and more. First established in 2018, TogoVar ( https://togovar.org) now provides a one-stop shop for researchers looking to interpret genomic variation data in Japanese populations.

Related collections

Most cited references 29

Record: found
Abstract: found
Article: found

Is Open Access

The mutational constraint spectrum quantified from variation in 141,456 humans

Konrad J. Karczewski, Laurent C. Francioli, Grace Tiao … (2021)

Genetic variants that inactivate protein-coding genes are a powerful source of information about the phenotypic consequences of gene disruption: genes that are crucial for the function of an organism will be depleted of such variants in natural populations, whereas non-essential genes will tolerate their accumulation. However, predicted loss-of-function variants are enriched for annotation errors, and tend to be found at extremely low frequencies, so their analysis requires careful variant annotation and very large sample sizes 1 . Here we describe the aggregation of 125,748 exomes and 15,708 genomes from human sequencing studies into the Genome Aggregation Database (gnomAD). We identify 443,769 high-confidence predicted loss-of-function variants in this cohort after filtering for artefacts caused by sequencing and annotation errors. Using an improved model of human mutation rates, we classify human protein-coding genes along a spectrum that represents tolerance to inactivation, validate this classification using data from model organisms and engineered human cells, and show that it can be used to improve the power of gene discovery for both common and rare diseases.

0 comments Cited 3554 times – based on 0 reviews      Review now

Bookmark

Record: found
Abstract: found
Article: found

Is Open Access

The Ensembl Variant Effect Predictor

William McLaren, Laurent Gil, Sarah Hunt … (2016)

The Ensembl Variant Effect Predictor is a powerful toolset for the analysis, annotation, and prioritization of genomic variants in coding and non-coding regions. It provides access to an extensive collection of genomic annotation, with a variety of interfaces to suit different requirements, and simple options for configuring and extending analysis. It is open source, free to use, and supports full reproducibility of results. The Ensembl Variant Effect Predictor can simplify and accelerate variant interpretation in a wide range of study designs.

0 comments Cited 2202 times – based on 0 reviews      Review now

Bookmark

Record: found
Abstract: found
Article: not found

A method and server for predicting damaging missense mutations

Ivan Adzhubei, Steffen Schmidt, Leonid Peshkin … (2010)

To the Editor: Applications of rapidly advancing sequencing technologies exacerbate the need to interpret individual sequence variants. Sequencing of phenotyped clinical subjects will soon become a method of choice in studies of the genetic causes of Mendelian and complex diseases. New exon capture techniques will direct sequencing efforts towards the most informative and easily interpretable protein-coding fraction of the genome. Thus, the demand for computational predictions of the impact of protein sequence variants will continue to grow. Here we present a new method and the corresponding software tool, PolyPhen-2 (http://genetics.bwh.harvard.edu/pph2/), which is different from the early tool PolyPhen1 in the set of predictive features, alignment pipeline, and the method of classification (Fig. 1a). PolyPhen-2 uses eight sequence-based and three structure-based predictive features (Supplementary Table 1) which were selected automatically by an iterative greedy algorithm (Supplementary Methods). Majority of these features involve comparison of a property of the wild-type (ancestral, normal) allele and the corresponding property of the mutant (derived, disease-causing) allele, which together define an amino acid replacement. Most informative features characterize how well the two human alleles fit into the pattern of amino acid replacements within the multiple sequence alignment of homologous proteins, how distant the protein harboring the first deviation from the human wild-type allele is from the human protein, and whether the mutant allele originated at a hypermutable site2. The alignment pipeline selects the set of homologous sequences for the analysis using a clustering algorithm and then constructs and refines their multiple alignment (Supplementary Fig. 1). The functional significance of an allele replacement is predicted from its individual features (Supplementary Figs. 2–4) by Naïve Bayes classifier (Supplementary Methods). We used two pairs of datasets to train and test PolyPhen-2. We compiled the first pair, HumDiv, from all 3,155 damaging alleles with known effects on the molecular function causing human Mendelian diseases, present in the UniProt database, together with 6,321 differences between human proteins and their closely related mammalian homologs, assumed to be non-damaging (Supplementary Methods). The second pair, HumVar3, consists of all the 13,032 human disease-causing mutations from UniProt, together with 8,946 human nsSNPs without annotated involvement in disease, which were treated as non-damaging. We found that PolyPhen-2 performance, as presented by its receiver operating characteristic curves, was consistently superior compared to PolyPhen (Fig. 1b) and it also compared favorably with the three other popular prediction tools4–6 (Fig. 1c). For a false positive rate of 20%, PolyPhen-2 achieves the rate of true positive predictions of 92% and 73% on HumDiv and HumVar, respectively (Supplementary Table 2). One reason for a lower accuracy of predictions on HumVar is that nsSNPs assumed to be non-damaging in HumVar contain a sizable fraction of mildly deleterious alleles. In contrast, most of amino acid replacements assumed non-damaging in HumDiv must be close to selective neutrality. Because alleles that are even mildly but unconditionally deleterious cannot be fixed in the evolving lineage, no method based on comparative sequence analysis is ideal for discriminating between drastically and mildly deleterious mutations, which are assigned to the opposite categories in HumVar. Another reason is that HumDiv uses an extra criterion to avoid possible erroneous annotations of damaging mutations. For a mutation, PolyPhen-2 calculates Naïve Bayes posterior probability that this mutation is damaging and reports estimates of false positive (the chance that the mutation is classified as damaging when it is in fact non-damaging) and true positive (the chance that the mutation is classified as damaging when it is indeed damaging) rates. A mutation is also appraised qualitatively, as benign, possibly damaging, or probably damaging (Supplementary Methods). The user can choose between HumDiv- and HumVar-trained PolyPhen-2. Diagnostics of Mendelian diseases requires distinguishing mutations with drastic effects from all the remaining human variation, including abundant mildly deleterious alleles. Thus, HumVar-trained PolyPhen-2 should be used for this task. In contrast, HumDiv-trained PolyPhen-2 should be used for evaluating rare alleles at loci potentially involved in complex phenotypes, dense mapping of regions identified by genome-wide association studies, and analysis of natural selection from sequence data, where even mildly deleterious alleles must be treated as damaging. Supplementary Material 1

0 comments Cited 1971 times – based on 0 reviews      Review now

Bookmark

All references

Author and article information

Contributors

Nobutaka Mitsuhashi:

ORCID: http://orcid.org/0000-0003-3300-7308

mitsuhashi@dbcls.rois.ac.jp

Journal

Journal ID (nlm-ta): Hum Genome Var

Journal ID (iso-abbrev): Hum Genome Var

Title: Human Genome Variation

Publisher: Nature Publishing Group UK (London )

ISSN (Electronic): 2054-345X

Publication date (Electronic): 12 December 2022

Publication date PMC-release: 12 December 2022

Publication date Collection: 2022

Volume: 9

Electronic Location Identifier: 44

Affiliations

[1 ]GRID grid.418987.b, ISNI 0000 0004 1764 2181, Database Center for Life Science, Joint Support-Center for Data Science Research, , Research Organization of Information and Systems, ; University of Tokyo Kashiwanoha-campus Station Satellite 6F, 178-4-4, Wakashiba, Kashiwa, Chiba 277-0871 Japan

[2 ]GRID grid.444016.3, ISNI 0000 0004 0374 5235, Toyama University of International Studies, ; 65-1, Higashi-Kuromaki, Toyama, Toyama 930-1292 Japan

[3 ]GRID grid.419082.6, ISNI 0000 0004 1754 9200, Department of NBDC Program, , Japan Science and Technology Agency, ; Science Plaza, 5-3, Yonbancho, Chiyoda-ku, Tokyo 102-8666 Japan

[4 ]GRID grid.410825.a, ISNI 0000 0004 1770 8232, Toshiba Corporation, ; 1-1, Shibaura 1-Chome, Minato-ku, Tokyo 105-8001 Japan

Author information

Nobutaka Mitsuhashi http://orcid.org/0000-0003-3300-7308

Article

Publisher ID: 222

DOI: 10.1038/s41439-022-00222-9

PMC ID: 9744889

PubMed ID: 36509753

SO-VID: 12817cda-8efb-4308-85bb-dab19cd95d16

License:

Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made. The images or other third party material in this article are included in the article’s Creative Commons license, unless indicated otherwise in a credit line to the material. If material is not included in the article’s Creative Commons license and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this license, visit http://creativecommons.org/licenses/by/4.0/.

History

Date received : 1 September 2022

Date revision received : 3 November 2022

Date accepted : 7 November 2022

Custom metadata

Keywords: genetic databases,genetic variation

Data availability:

Keywords: genetic databases, genetic variation

Comments

Comment on this article

scite_

Cited by 6

See all cited by

Most referenced authors 947

See all reference authors

- Version 1

TogoVar: A comprehensive Japanese genetic variation database

Read this article at

Abstract

Genomic database: Genetic variation patterns in Japan

Related collections

Pensoft Biodiversity

Most cited references 29

The mutational constraint spectrum quantified from variation in 141,456 humans

The Ensembl Variant Effect Predictor

A method and server for predicting damaging missense mutations

Author and article information

Contributors

Journal

Affiliations

Author information

Article

History

Categories

Custom metadata

Comments

Comment on this article

Similar content 37

Cited by 6

Most referenced authors 947