71
views
0
recommends
+1 Recommend
0 collections
    0
    shares
      • Record: found
      • Abstract: found
      • Article: found
      Is Open Access

      Accurate splice site prediction using support vector machines

      research-article
      1 , 2 , 3 , 4 , 2 , 2 , 2 ,
      BMC Bioinformatics
      BioMed Central
      NIPS workshop on New Problems and Methods in Computational Biology
      8122006

      Read this article at

      Bookmark
          There is no author summary for this article yet. Authors can add summaries to their articles on ScienceOpen to make them more accessible to a non-specialist audience.

          Abstract

          Background

          For splice site recognition, one has to solve two classification problems: discriminating true from decoy splice sites for both acceptor and donor sites. Gene finding systems typically rely on Markov Chains to solve these tasks.

          Results

          In this work we consider Support Vector Machines for splice site recognition. We employ the so-called weighted degree kernel which turns out well suited for this task, as we will illustrate in several experiments where we compare its prediction accuracy with that of recently proposed systems. We apply our method to the genome-wide recognition of splice sites in Caenorhabditis elegans, Drosophila melanogaster, Arabidopsis thaliana, Danio rerio, and Homo sapiens. Our performance estimates indicate that splice sites can be recognized very accurately in these genomes and that our method outperforms many other methods including Markov Chains, GeneSplicer and SpliceMachine. We provide genome-wide predictions of splice sites and a stand-alone prediction tool ready to be used for incorporation in a gene finder.

          Availability

          Data, splits, additional information on the model selection, the whole genome predictions, as well as the stand-alone prediction tool are available for download at http://www.fml.mpg.de/raetsch/projects/splice.

          Related collections

          Most cited references45

          • Record: found
          • Abstract: found
          • Article: not found

          What is a support vector machine?

          Support vector machines (SVMs) are becoming popular in a wide variety of biological applications. But, what exactly are SVMs and how do they work? And what are their most promising applications in the life sciences?
            Bookmark
            • Record: found
            • Abstract: not found
            • Article: not found

            dbEST--database for "expressed sequence tags".

              Bookmark
              • Record: found
              • Abstract: found
              • Article: not found

              Improved splice site detection in Genie.

              We present an improved splice site predictor for the genefinding program Genie. Genie is based on a generalized Hidden Markov Model (GHMM) that describes the grammar of a legal parse of a multi-exon gene in a DNA sequence. In Genie, probabilities are estimated for gene features by using dynamic programming to combine information from multiple content and signal sensors, including sensors that integrate matches to homologous sequences from a database. One of the hardest problems in genefinding is to determine the complete gene structure correctly. The splice site sensors are the key signal sensors that address this problem. We replaced the existing splice site sensors in Genie with two novel neural networks based on dinucleotide frequencies. Using these novel sensors, Genie shows significant improvements in the sensitivity and specificity of gene structure identification. Experimental results in tests using a standard set of annotated genes showed that Genie identified 86% of coding nucleotides correctly with a specificity of 85%, versus 80% and 84% in the older system. In further splice site experiments, we also looked at correlations between splice site scores and intron and exon lengths, as well as at the effect of distance to the nearest splice site on false positive rates.
                Bookmark

                Author and article information

                Conference
                BMC Bioinformatics
                BMC Bioinformatics
                BioMed Central
                1471-2105
                2007
                21 December 2007
                : 8
                : Suppl 10
                : S7
                Affiliations
                [1 ]Fraunhofer Institute FIRST, Kekuléstr. 7, 12489 Berlin, Germany
                [2 ]Friedrich Miescher Laboratory of the Max Planck Society, Spemannstr. 39, 72076 Tübingen, Germany
                [3 ]Max Planck Institute for Biological Cybernetics, Spemannstr. 38, 72076 Tübingen, Germany
                [4 ]Max Planck Institute for Developmental Biology, Spemannstr. 35, 72076 Tübingen, Germany
                Article
                1471-2105-8-S10-S7
                10.1186/1471-2105-8-S10-S7
                2230508
                18269701
                ff95a7c9-c573-4b10-8122-fcff3910e457
                Copyright © 2007 Sonnenburg et al; licensee BioMed Central Ltd.

                This is an open access article distributed under the terms of the Creative Commons Attribution License ( http://creativecommons.org/licenses/by/2.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.

                NIPS workshop on New Problems and Methods in Computational Biology
                Whistler, Canada
                8122006
                History
                Categories
                Proceedings

                Bioinformatics & Computational biology
                Bioinformatics & Computational biology

                Comments

                Comment on this article