TiSAn: estimating tissue-specific effects of coding and non-coding variants

There is no author summary for this article yet. Authors can add summaries to their articles on ScienceOpen to make them more accessible to a non-specialist audience.

Abstract

Motivation

Model-based estimates of general deleteriousness, like CADD, DANN or PolyPhen, have become indispensable tools in the interpretation of genetic variants. However, these approaches say little about the tissues in which the effects of deleterious variants will be most meaningful. Tissue-specific annotations have been recently inferred for dozens of tissues/cell types from large collections of cross-tissue epigenomic data, and have demonstrated sensitivity in predicting affected tissues in complex traits. It remains unclear, however, whether including additional genome-scale data specific to the tissue of interest would appreciably improve functional annotations.

Results

Herein, we introduce TiSAn, a tool that integrates multiple genome-scale data sources, defined by expert knowledge. TiSAn uses machine learning to discriminate variants relevant to a tissue from those with no bearing on the function of that tissue. Predictions are made genome-wide, and can be used to contextualize and filter variants of interest in whole genome sequencing or genome-wide association studies. We demonstrate the accuracy and flexibility of TiSAn by producing predictive models for human heart and brain, and detecting tissue-relevant variations in large cohorts for autism spectrum disorder (TiSAn-brain) and coronary artery disease (TiSAn-heart). We find the multiomics TiSAn model is better able to prioritize genetic variants according to their tissue-specific action than the current state-of-the-art method, GenoSkyLine.

Availability and implementation

Software and vignettes are available at http://github.com/kevinVervier/TiSAn.

Supplementary information

Supplementary data are available at Bioinformatics online.

Related collections

Most cited references 17

Record: found
Abstract: found
Article: not found

Association between microdeletion and microduplication at 16p11.2 and autism.

Lauren A Weiss, Yiping Shen, Joshua M. Korn … (2008)

Autism spectrum disorder is a heritable developmental disorder in which chromosomal abnormalities are thought to play a role. As a first component of a genomewide association study of families from the Autism Genetic Resource Exchange (AGRE), we used two novel algorithms to search for recurrent copy-number variations in genotype data from 751 multiplex families with autism. Specific recurrent de novo events were further evaluated in clinical-testing data from Children's Hospital Boston and in a large population study in Iceland. Among the AGRE families, we observed five instances of a de novo deletion of 593 kb on chromosome 16p11.2. Using comparative genomic hybridization, we observed the identical deletion in 5 of 512 children referred to Children's Hospital Boston for developmental delay, mental retardation, or suspected autism spectrum disorder, as well as in 3 of 299 persons with autism in an Icelandic population; the deletion was also carried by 2 of 18,834 unscreened Icelandic control subjects. The reciprocal duplication of this region occurred in 7 affected persons in AGRE families and 4 of the 512 children from Children's Hospital Boston. The duplication also appeared to be a high-penetrance risk factor. We have identified a novel, recurrent microdeletion and a reciprocal microduplication that carry substantial susceptibility to autism and appear to account for approximately 1% of cases. We did not identify other regions with similar aggregations of large de novo mutations. Copyright 2008 Massachusetts Medical Society.

0 comments Cited 467 times – based on 0 reviews      Review now

Bookmark

Record: found
Abstract: found
Article: not found

DANN: a deep learning approach for annotating the pathogenicity of genetic variants.

Daniel Quang, Yifei Chen, Xiaohui Xie (2015)

Annotating genetic variants, especially non-coding variants, for the purpose of identifying pathogenic variants remains a challenge. Combined annotation-dependent depletion (CADD) is an algorithm designed to annotate both coding and non-coding variants, and has been shown to outperform other annotation algorithms. CADD trains a linear kernel support vector machine (SVM) to differentiate evolutionarily derived, likely benign, alleles from simulated, likely deleterious, variants. However, SVMs cannot capture non-linear relationships among the features, which can limit performance. To address this issue, we have developed DANN. DANN uses the same feature set and training data as CADD to train a deep neural network (DNN). DNNs can capture non-linear relationships among features and are better suited than SVMs for problems with a large number of samples and features. We exploit Compute Unified Device Architecture-compatible graphics processing units and deep learning techniques such as dropout and momentum training to accelerate the DNN training. DANN achieves about a 19% relative reduction in the error rate and about a 14% relative increase in the area under the curve (AUC) metric over CADD's SVM methodology. All data and source code are available at https://cbcl.ics.uci.edu/public_data/DANN/. © The Author 2014. Published by Oxford University Press. All rights reserved. For Permissions, please e-mail: journals.permissions@oup.com.

0 comments Cited 423 times – based on 0 reviews      Review now

Bookmark

Record: found
Abstract: found
Article: not found

Defining functional DNA elements in the human genome.

M Kellis, B Wold, M Snyder … (2014)

With the completion of the human genome sequence, attention turned to identifying and annotating its functional DNA elements. As a complement to genetic and comparative genomics approaches, the Encyclopedia of DNA Elements Project was launched to contribute maps of RNA transcripts, transcriptional regulator binding sites, and chromatin states in many cell types. The resulting genome-wide data reveal sites of biochemical activity with high positional resolution and cell type specificity that facilitate studies of gene regulation and interpretation of noncoding variants associated with human disease. However, the biochemically active regions cover a much larger fraction of the genome than do evolutionarily conserved regions, raising the question of whether nonconserved but biochemically active regions are truly functional. Here, we review the strengths and limitations of biochemical, evolutionary, and genetic approaches for defining functional DNA segments, potential sources for the observed differences in estimated genomic coverage, and the biological implications of these discrepancies. We also analyze the relationship between signal intensity, genomic coverage, and evolutionary conservation. Our results reinforce the principle that each approach provides complementary information and that we need to use combinations of all three to elucidate genome function in human biology and disease.

0 comments Cited 276 times – based on 0 reviews      Review now

Bookmark

All references

Author and article information

Contributors

Bonnie Berger: Role: Associate Editor

Journal

Journal ID (nlm-ta): Bioinformatics

Journal ID (iso-abbrev): Bioinformatics

Journal ID (publisher-id): bioinformatics

Title: Bioinformatics

Publisher: Oxford University Press

ISSN (Print): 1367-4803

ISSN (Electronic): 1367-4811

Publication date (Print): 15 September 2018

Publication date (Electronic): 18 April 2018

Publication date PMC-release: 18 April 2018

Volume: 34

Issue: 18

Pages: 3061-3068

Affiliations

Department of Psychiatry, Carver College of Medicine, University of Iowa, Iowa City, IA, USA

Author notes

To whom correspondence should be addressed. Jacob-Michaelson@ 123456uiowa.edu

Author information

Jacob J Michaelson http://orcid.org/0000-0001-9713-0992

Article

Publisher ID: bty301

DOI: 10.1093/bioinformatics/bty301

PMC ID: 6137979

PubMed ID: 29912365

SO-VID: 7fbea664-fe45-4a46-86db-55e93ac1e77d

License:

This is an Open Access article distributed under the terms of the Creative Commons Attribution Non-Commercial License ( http://creativecommons.org/licenses/by-nc/4.0/), which permits non-commercial re-use, distribution, and reproduction in any medium, provided the original work is properly cited. For commercial re-use, please contact journals.permissions@oup.com

History

Date received : 08 November 2017

Date revision received : 04 April 2018

Date accepted : 16 April 2018

Page count

Pages: 8

Funding

Funded by: National Institutes of Health 10.13039/100000002

Award ID: MH105527

Award ID: DC014489

TiSAn: estimating tissue-specific effects of coding and non-coding variants

Read this article at

Abstract

Motivation

Results

Availability and implementation

Supplementary information

Related collections

REPO4EU WP2 Databases

Most cited references 17

Association between microdeletion and microduplication at 16p11.2 and autism.

DANN: a deep learning approach for annotating the pathogenicity of genetic variants.

Defining functional DNA elements in the human genome.

Author and article information

Contributors

Journal

Affiliations

Author notes

Author information

Article

History

Page count

Funding

Categories

Comments

Comment on this article

Similar content 70

Cited by 1

Most referenced authors 2,007