24
views
0
recommends
+1 Recommend
0 collections
    0
    shares
      • Record: found
      • Abstract: found
      • Article: found
      Is Open Access

      Area under Precision-Recall Curves for Weighted and Unweighted Data

      research-article
      1 , * , 2 , 3 , 2
      PLoS ONE
      Public Library of Science

      Read this article at

      Bookmark
          There is no author summary for this article yet. Authors can add summaries to their articles on ScienceOpen to make them more accessible to a non-specialist audience.

          Abstract

          Precision-recall curves are highly informative about the performance of binary classifiers, and the area under these curves is a popular scalar performance measure for comparing different classifiers. However, for many applications class labels are not provided with absolute certainty, but with some degree of confidence, often reflected by weights or soft labels assigned to data points. Computing the area under the precision-recall curve requires interpolating between adjacent supporting points, but previous interpolation schemes are not directly applicable to weighted data. Hence, even in cases where weights were available, they had to be neglected for assessing classifiers using precision-recall curves. Here, we propose an interpolation for precision-recall curves that can also be used for weighted data, and we derive conditions for classification scores yielding the maximum and minimum area under the precision-recall curve. We investigate accordances and differences of the proposed interpolation and previous ones, and we demonstrate that taking into account existing weights of test data is important for the comparison of classifiers.

          Related collections

          Most cited references8

          • Record: found
          • Abstract: found
          • Article: not found

          Genomics, gene expression and DNA arrays.

          Experimental genomics in combination with the growing body of sequence information promise to revolutionize the way cells and cellular processes are studied. Information on genomic sequence can be used experimentally with high-density DNA arrays that allow complex mixtures of RNA and DNA to be interrogated in a parallel and quantitative fashion. DNA arrays can be used for many different purposes, most prominently to measure levels of gene expression (messenger RNA abundance) for tens of thousands of genes simultaneously. Measurements of gene expression and other applications of arrays embody much of what is implied by the term 'genomics'; they are broad in scope, large in scale, and take advantage of all available sequence information for experimental design and data interpretation in pursuit of biological understanding.
            Bookmark
            • Record: found
            • Abstract: found
            • Article: not found

            Lost in translation: an assessment and perspective for computational microRNA target identification.

            MicroRNAs (miRNAs) are a class of short endogenously expressed RNA molecules that regulate gene expression by binding directly to the messenger RNA of protein coding genes. They have been found to confer a novel layer of genetic regulation in a wide range of biological processes. Computational miRNA target prediction remains one of the key means used to decipher the role of miRNAs in development and disease. Here we introduce the basic idea behind the experimental identification of miRNA targets and present some of the most widely used computational miRNA target identification programs. The review includes an assessment of the prediction quality of these programs and their combinations. Supplementary data are available at Bioinformatics online.
              Bookmark
              • Record: found
              • Abstract: found
              • Article: found
              Is Open Access

              Accurate splice site prediction using support vector machines

              Background For splice site recognition, one has to solve two classification problems: discriminating true from decoy splice sites for both acceptor and donor sites. Gene finding systems typically rely on Markov Chains to solve these tasks. Results In this work we consider Support Vector Machines for splice site recognition. We employ the so-called weighted degree kernel which turns out well suited for this task, as we will illustrate in several experiments where we compare its prediction accuracy with that of recently proposed systems. We apply our method to the genome-wide recognition of splice sites in Caenorhabditis elegans, Drosophila melanogaster, Arabidopsis thaliana, Danio rerio, and Homo sapiens. Our performance estimates indicate that splice sites can be recognized very accurately in these genomes and that our method outperforms many other methods including Markov Chains, GeneSplicer and SpliceMachine. We provide genome-wide predictions of splice sites and a stand-alone prediction tool ready to be used for incorporation in a gene finder. Availability Data, splits, additional information on the model selection, the whole genome predictions, as well as the stand-alone prediction tool are available for download at .
                Bookmark

                Author and article information

                Contributors
                Role: Editor
                Journal
                PLoS One
                PLoS ONE
                plos
                plosone
                PLoS ONE
                Public Library of Science (San Francisco, USA )
                1932-6203
                2014
                20 March 2014
                : 9
                : 3
                : e92209
                Affiliations
                [1 ]Institute for Biosafety in Plant Biotechnology, Julius Kühn-Institut (JKI) – Federal Research Centre for Cultivated Plants, Quedlinburg, Germany
                [2 ]Institute of Computer Science, Martin Luther University Halle–Wittenberg, Halle (Saale), Germany
                [3 ]German Centre for Integrative Biodiversity Research (iDiv) Halle-Jena-Leipzig, Leipzig, Germany
                Indiana University Bloomington, United States of America
                Author notes

                Competing Interests: The authors have declared that no competing interests exist.

                Conceived and designed the experiments: JK JG. Performed the experiments: JK JG. Analyzed the data: JK IG JG. Contributed reagents/materials/analysis tools: JK JG. Wrote the paper: JK IG JG. Implemented the software: JK JG.

                Article
                PONE-D-13-41224
                10.1371/journal.pone.0092209
                3961324
                24651729
                84155430-463d-4b59-b312-f69be832abe2
                Copyright @ 2014

                This is an open-access article distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited.

                History
                : 9 October 2013
                : 20 February 2014
                Page count
                Pages: 13
                Funding
                No current external funding sources for this study.
                Categories
                Research Article
                Biology and Life Sciences
                Computational Biology
                Computer and Information Sciences
                Computer Modeling
                Computing Methods
                Information Technology
                Databases
                Data Mining
                Software Engineering
                Software Tools
                Engineering and Technology
                Signal Processing
                Physical Sciences
                Mathematics
                Applied Mathematics
                Algorithms
                Decision Theory
                Statistics (Mathematics)
                Contingency Tables

                Uncategorized
                Uncategorized

                Comments

                Comment on this article