8
views
0
recommends
+1 Recommend
0 collections
    0
    shares
      • Record: found
      • Abstract: found
      • Article: found
      Is Open Access

      TiSAn: estimating tissue-specific effects of coding and non-coding variants

      research-article
      ,
      Bioinformatics
      Oxford University Press

      Read this article at

      Bookmark
          There is no author summary for this article yet. Authors can add summaries to their articles on ScienceOpen to make them more accessible to a non-specialist audience.

          Abstract

          Motivation

          Model-based estimates of general deleteriousness, like CADD, DANN or PolyPhen, have become indispensable tools in the interpretation of genetic variants. However, these approaches say little about the tissues in which the effects of deleterious variants will be most meaningful. Tissue-specific annotations have been recently inferred for dozens of tissues/cell types from large collections of cross-tissue epigenomic data, and have demonstrated sensitivity in predicting affected tissues in complex traits. It remains unclear, however, whether including additional genome-scale data specific to the tissue of interest would appreciably improve functional annotations.

          Results

          Herein, we introduce TiSAn, a tool that integrates multiple genome-scale data sources, defined by expert knowledge. TiSAn uses machine learning to discriminate variants relevant to a tissue from those with no bearing on the function of that tissue. Predictions are made genome-wide, and can be used to contextualize and filter variants of interest in whole genome sequencing or genome-wide association studies. We demonstrate the accuracy and flexibility of TiSAn by producing predictive models for human heart and brain, and detecting tissue-relevant variations in large cohorts for autism spectrum disorder (TiSAn-brain) and coronary artery disease (TiSAn-heart). We find the multiomics TiSAn model is better able to prioritize genetic variants according to their tissue-specific action than the current state-of-the-art method, GenoSkyLine.

          Availability and implementation

          Software and vignettes are available at http://github.com/kevinVervier/TiSAn.

          Supplementary information

          Supplementary data are available at Bioinformatics online.

          Related collections

          Most cited references17

          • Record: found
          • Abstract: found
          • Article: not found

          Association between microdeletion and microduplication at 16p11.2 and autism.

          Autism spectrum disorder is a heritable developmental disorder in which chromosomal abnormalities are thought to play a role. As a first component of a genomewide association study of families from the Autism Genetic Resource Exchange (AGRE), we used two novel algorithms to search for recurrent copy-number variations in genotype data from 751 multiplex families with autism. Specific recurrent de novo events were further evaluated in clinical-testing data from Children's Hospital Boston and in a large population study in Iceland. Among the AGRE families, we observed five instances of a de novo deletion of 593 kb on chromosome 16p11.2. Using comparative genomic hybridization, we observed the identical deletion in 5 of 512 children referred to Children's Hospital Boston for developmental delay, mental retardation, or suspected autism spectrum disorder, as well as in 3 of 299 persons with autism in an Icelandic population; the deletion was also carried by 2 of 18,834 unscreened Icelandic control subjects. The reciprocal duplication of this region occurred in 7 affected persons in AGRE families and 4 of the 512 children from Children's Hospital Boston. The duplication also appeared to be a high-penetrance risk factor. We have identified a novel, recurrent microdeletion and a reciprocal microduplication that carry substantial susceptibility to autism and appear to account for approximately 1% of cases. We did not identify other regions with similar aggregations of large de novo mutations. Copyright 2008 Massachusetts Medical Society.
            Bookmark
            • Record: found
            • Abstract: found
            • Article: not found

            DANN: a deep learning approach for annotating the pathogenicity of genetic variants.

            Annotating genetic variants, especially non-coding variants, for the purpose of identifying pathogenic variants remains a challenge. Combined annotation-dependent depletion (CADD) is an algorithm designed to annotate both coding and non-coding variants, and has been shown to outperform other annotation algorithms. CADD trains a linear kernel support vector machine (SVM) to differentiate evolutionarily derived, likely benign, alleles from simulated, likely deleterious, variants. However, SVMs cannot capture non-linear relationships among the features, which can limit performance. To address this issue, we have developed DANN. DANN uses the same feature set and training data as CADD to train a deep neural network (DNN). DNNs can capture non-linear relationships among features and are better suited than SVMs for problems with a large number of samples and features. We exploit Compute Unified Device Architecture-compatible graphics processing units and deep learning techniques such as dropout and momentum training to accelerate the DNN training. DANN achieves about a 19% relative reduction in the error rate and about a 14% relative increase in the area under the curve (AUC) metric over CADD's SVM methodology. All data and source code are available at https://cbcl.ics.uci.edu/public_data/DANN/. © The Author 2014. Published by Oxford University Press. All rights reserved. For Permissions, please e-mail: journals.permissions@oup.com.
              Bookmark
              • Record: found
              • Abstract: found
              • Article: not found

              Defining functional DNA elements in the human genome.

              With the completion of the human genome sequence, attention turned to identifying and annotating its functional DNA elements. As a complement to genetic and comparative genomics approaches, the Encyclopedia of DNA Elements Project was launched to contribute maps of RNA transcripts, transcriptional regulator binding sites, and chromatin states in many cell types. The resulting genome-wide data reveal sites of biochemical activity with high positional resolution and cell type specificity that facilitate studies of gene regulation and interpretation of noncoding variants associated with human disease. However, the biochemically active regions cover a much larger fraction of the genome than do evolutionarily conserved regions, raising the question of whether nonconserved but biochemically active regions are truly functional. Here, we review the strengths and limitations of biochemical, evolutionary, and genetic approaches for defining functional DNA segments, potential sources for the observed differences in estimated genomic coverage, and the biological implications of these discrepancies. We also analyze the relationship between signal intensity, genomic coverage, and evolutionary conservation. Our results reinforce the principle that each approach provides complementary information and that we need to use combinations of all three to elucidate genome function in human biology and disease.
                Bookmark

                Author and article information

                Contributors
                Role: Associate Editor
                Journal
                Bioinformatics
                Bioinformatics
                bioinformatics
                Bioinformatics
                Oxford University Press
                1367-4803
                1367-4811
                15 September 2018
                18 April 2018
                18 April 2018
                : 34
                : 18
                : 3061-3068
                Affiliations
                Department of Psychiatry, Carver College of Medicine, University of Iowa, Iowa City, IA, USA
                Author notes
                To whom correspondence should be addressed. Jacob-Michaelson@ 123456uiowa.edu
                Author information
                http://orcid.org/0000-0001-9713-0992
                Article
                bty301
                10.1093/bioinformatics/bty301
                6137979
                29912365
                7fbea664-fe45-4a46-86db-55e93ac1e77d
                © The Author(s) 2018. Published by Oxford University Press.

                This is an Open Access article distributed under the terms of the Creative Commons Attribution Non-Commercial License ( http://creativecommons.org/licenses/by-nc/4.0/), which permits non-commercial re-use, distribution, and reproduction in any medium, provided the original work is properly cited. For commercial re-use, please contact journals.permissions@oup.com

                History
                : 08 November 2017
                : 04 April 2018
                : 16 April 2018
                Page count
                Pages: 8
                Funding
                Funded by: National Institutes of Health 10.13039/100000002
                Award ID: MH105527
                Award ID: DC014489
                Categories
                Discovery Note
                Genome Analysis

                Bioinformatics & Computational biology
                Bioinformatics & Computational biology

                Comments

                Comment on this article