703
views
0
recommends
+1 Recommend
0 collections
    0
    shares
      • Record: found
      • Abstract: found
      • Article: found
      Is Open Access

      Quantifying similarity between motifs

      research-article

      Read this article at

      Bookmark
          There is no author summary for this article yet. Authors can add summaries to their articles on ScienceOpen to make them more accessible to a non-specialist audience.

          Abstract

          Tomtom allows for the statistical measurement of similarity between pairs of motifs, thereby enabling searching a motif database with a motif query.

          Abstract

          A common question within the context of de novo motif discovery is whether a newly discovered, putative motif resembles any previously discovered motif in an existing database. To answer this question, we define a statistical measure of motif-motif similarity, and we describe an algorithm, called Tomtom, for searching a database of motifs with a given query motif. Experimental simulations demonstrate the accuracy of Tomtom's E values and its effectiveness in finding similar motifs.

          Related collections

          Most cited references22

          • Record: found
          • Abstract: found
          • Article: not found

          The meaning and use of the area under a receiver operating characteristic (ROC) curve.

          A representation and interpretation of the area under a receiver operating characteristic (ROC) curve obtained by the "rating" method, or by mathematical predictions based on patient characteristics, is presented. It is shown that in such a setting the area represents the probability that a randomly chosen diseased subject is (correctly) rated or ranked with greater suspicion than a randomly chosen non-diseased subject. Moreover, this probability of a correct ranking is the same quantity that is estimated by the already well-studied nonparametric Wilcoxon statistic. These two relationships are exploited to (a) provide rapid closed-form expressions for the approximate magnitude of the sampling variability, i.e., standard error that one uses to accompany the area under a smoothed ROC curve, (b) guide in determining the size of the sample required to provide a sufficiently reliable estimate of this area, and (c) determine how large sample sizes should be to ensure that one can statistically detect differences in the accuracy of diagnostic techniques.
            Bookmark
            • Record: found
            • Abstract: found
            • Article: not found

            JASPAR: an open-access database for eukaryotic transcription factor binding profiles.

            The analysis of regulatory regions in genome sequences is strongly based on the detection of potential transcription factor binding sites. The preferred models for representation of transcription factor binding specificity have been termed position-specific scoring matrices. JASPAR is an open-access database of annotated, high-quality, matrix-based transcription factor binding site profiles for multicellular eukaryotes. The profiles were derived exclusively from sets of nucleotide sequences experimentally demonstrated to bind transcription factors. The database is complemented by a web interface for browsing, searching and subset selection, an online sequence analysis utility and a suite of programming tools for genome-wide and comparative genomic analysis of regulatory regions. JASPAR is available at http://jaspar. cgb.ki.se.
              Bookmark
              • Record: found
              • Abstract: found
              • Article: not found

              Assessing computational tools for the discovery of transcription factor binding sites.

              The prediction of regulatory elements is a problem where computational methods offer great hope. Over the past few years, numerous tools have become available for this task. The purpose of the current assessment is twofold: to provide some guidance to users regarding the accuracy of currently available tools in various settings, and to provide a benchmark of data sets for assessing future tools.
                Bookmark

                Author and article information

                Journal
                Genome Biol
                Genome Biology
                BioMed Central (London )
                1465-6906
                1465-6914
                2007
                26 February 2007
                : 8
                : 2
                : R24
                Affiliations
                [1 ]Department of Genome Sciences, University of Washington, 1705 NE Pacific Street, Box 355065, Seattle, WA 98195, USA
                [2 ]Institute for Molecular Bioscience, University of Queensland, Brisbane, QLD 4072, Australia
                [3 ]Department of Computer Science and Engineering, University of Washington, 185 Stevens Way, Box 352350, Seattle, WA 98105, USA
                Article
                gb-2007-8-2-r24
                10.1186/gb-2007-8-2-r24
                1852410
                17324271
                37ef3921-7a3b-4b11-88a6-38a2b3e22326
                Copyright © 2007 Gupta et al.; licensee BioMed Central Ltd.

                This is an open access article distributed under the terms of the Creative Commons Attribution License ( http://creativecommons.org/licenses/by/2.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.

                History
                : 13 September 2006
                : 5 January 2007
                : 26 February 2007
                Categories
                Method

                Genetics
                Genetics

                Comments

                Comment on this article