107
views
0
recommends
+1 Recommend
0 collections
    0
    shares
      • Record: found
      • Abstract: found
      • Article: found
      Is Open Access

      The Protein-Protein Interaction tasks of BioCreative III: classification/ranking of articles and linking bio-ontology concepts to full text

      research-article
      1 , , 1 , 1 , 2 , 3 , 3 , 4 , 4 , 4 , 4 , 4 , 4 , 5 , 3 , 6 , 6 , 7 , 8 , 9 , 10 , 10 , 11 , 12 , 13 , 14 , 14 , 15 , 15 , 16 , 17 , 10 , 10 , 18 ,   18 , 1
      BMC Bioinformatics
      BioMed Central
      The Third BioCreative, Critical Assessment of Information Extraction in Biology Challenge
      13-15 September 2010

      Read this article at

      Bookmark
          There is no author summary for this article yet. Authors can add summaries to their articles on ScienceOpen to make them more accessible to a non-specialist audience.

          Abstract

          Background

          Determining usefulness of biomedical text mining systems requires realistic task definition and data selection criteria without artificial constraints, measuring performance aspects that go beyond traditional metrics. The BioCreative III Protein-Protein Interaction (PPI) tasks were motivated by such considerations, trying to address aspects including how the end user would oversee the generated output, for instance by providing ranked results, textual evidence for human interpretation or measuring time savings by using automated systems. Detecting articles describing complex biological events like PPIs was addressed in the Article Classification Task (ACT), where participants were asked to implement tools for detecting PPI-describing abstracts. Therefore the BCIII-ACT corpus was provided, which includes a training, development and test set of over 12,000 PPI relevant and non-relevant PubMed abstracts labeled manually by domain experts and recording also the human classification times. The Interaction Method Task (IMT) went beyond abstracts and required mining for associations between more than 3,500 full text articles and interaction detection method ontology concepts that had been applied to detect the PPIs reported in them.

          Results

          A total of 11 teams participated in at least one of the two PPI tasks (10 in ACT and 8 in the IMT) and a total of 62 persons were involved either as participants or in preparing data sets/evaluating these tasks. Per task, each team was allowed to submit five runs offline and another five online via the BioCreative Meta-Server. From the 52 runs submitted for the ACT, the highest Matthew's Correlation Coefficient (MCC) score measured was 0.55 at an accuracy of 89% and the best AUC iP/R was 68%. Most ACT teams explored machine learning methods, some of them also used lexical resources like MeSH terms, PSI-MI concepts or particular lists of verbs and nouns, some integrated NER approaches. For the IMT, a total of 42 runs were evaluated by comparing systems against manually generated annotations done by curators from the BioGRID and MINT databases. The highest AUC iP/R achieved by any run was 53%, the best MCC score 0.55. In case of competitive systems with an acceptable recall (above 35%) the macro-averaged precision ranged between 50% and 80%, with a maximum F-Score of 55%.

          Conclusions

          The results of the ACT task of BioCreative III indicate that classification of large unbalanced article collections reflecting the real class imbalance is still challenging. Nevertheless, text-mining tools that report ranked lists of relevant articles for manual selection can potentially reduce the time needed to identify half of the relevant articles to less than 1/4 of the time when compared to unranked results. Detecting associations between full text articles and interaction detection method PSI-MI terms (IMT) is more difficult than might be anticipated. This is due to the variability of method term mentions, errors resulting from pre-processing of articles provided as PDF files, and the heterogeneity and different granularity of method term concepts encountered in the ontology. However, combining the sophisticated techniques developed by the participants with supporting evidence strings derived from the articles for human interpretation could result in practical modules for biological annotation workflows.

          Related collections

          Most cited references30

          • Record: found
          • Abstract: found
          • Article: found
          Is Open Access

          The BioGRID Interaction Database: 2011 update

          The Biological General Repository for Interaction Datasets (BioGRID) is a public database that archives and disseminates genetic and protein interaction data from model organisms and humans (http://www.thebiogrid.org). BioGRID currently holds 347 966 interactions (170 162 genetic, 177 804 protein) curated from both high-throughput data sets and individual focused studies, as derived from over 23 000 publications in the primary literature. Complete coverage of the entire literature is maintained for budding yeast (Saccharomyces cerevisiae), fission yeast (Schizosaccharomyces pombe) and thale cress (Arabidopsis thaliana), and efforts to expand curation across multiple metazoan species are underway. The BioGRID houses 48 831 human protein interactions that have been curated from 10 247 publications. Current curation drives are focused on particular areas of biology to enable insights into conserved networks and pathways that are relevant to human health. The BioGRID 3.0 web interface contains new search and display features that enable rapid queries across multiple data types and sources. An automated Interaction Management System (IMS) is used to prioritize, coordinate and track curation across international sites and projects. BioGRID provides interaction data to several model organism databases, resources such as Entrez-Gene and other interaction meta-databases. The entire BioGRID 3.0 data collection may be downloaded in multiple file formats, including PSI MI XML. Source code for BioGRID 3.0 is freely available without any restrictions.
            Bookmark
            • Record: found
            • Abstract: found
            • Article: found
            Is Open Access

            The Gene Ontology project in 2008

            (2008)
            The Gene Ontology (GO) project (http://www.geneontology.org/) provides a set of structured, controlled vocabularies for community use in annotating genes, gene products and sequences (also see http://www.sequenceontology.org/). The ontologies have been extended and refined for several biological areas, and improvements to the structure of the ontologies have been implemented. To improve the quantity and quality of gene product annotations available from its public repository, the GO Consortium has launched a focused effort to provide comprehensive and detailed annotation of orthologous genes across a number of ‘reference’ genomes, including human and several key model organisms. Software developments include two releases of the ontology-editing tool OBO-Edit, and improvements to the AmiGO browser interface.
              Bookmark
              • Record: found
              • Abstract: found
              • Article: found
              Is Open Access

              The IntAct molecular interaction database in 2010

              IntAct is an open-source, open data molecular interaction database and toolkit. Data is abstracted from the literature or from direct data depositions by expert curators following a deep annotation model providing a high level of detail. As of September 2009, IntAct contains over 200.000 curated binary interaction evidences. In response to the growing data volume and user requests, IntAct now provides a two-tiered view of the interaction data. The search interface allows the user to iteratively develop complex queries, exploiting the detailed annotation with hierarchical controlled vocabularies. Results are provided at any stage in a simplified, tabular view. Specialized views then allows ‘zooming in’ on the full annotation of interactions, interactors and their properties. IntAct source code and data are freely available at http://www.ebi.ac.uk/intact.
                Bookmark

                Author and article information

                Conference
                BMC Bioinformatics
                BMC Bioinformatics
                BioMed Central
                1471-2105
                2011
                3 October 2011
                : 12
                : Suppl 8
                : S3
                Affiliations
                [1 ]Structural Biology and BioComputing Programme, Spanish National Cancer Research Centre (CNIO), Madrid, Spain
                [2 ]Australian Regenerative Medicine Institute, Monash University, Australia
                [3 ]School of Biological Sciences, University of Edinburgh, Edinburgh, UK
                [4 ]Department of Biology, University of Rome Tor Vergata, Rome, Italy
                [5 ]IRCSS, Fondazione Santa Lucia, Rome, Italy
                [6 ]Institute of Computational Linguistics, University of Zurich, Zurich, Switzerland
                [7 ]School of Computing, Informatics and Decision Systems Engineering, Arizona State University, Tempe, Arizona, USA
                [8 ]Department of Biomedical Informatics, Arizona State University, Tempe, Arizona, USA
                [9 ]Institute of Electronics and Telematics Engineering of Aveiro, University of Aveiro Campus Universitario de Santiago, 3810-193 Aveiro, Portugal
                [10 ]National Center for Biotechnology Information (NCBI), 8600 Rockville Pike, Bethesda, Maryland, 20894, USA
                [11 ]School of Informatics and Computing, Indiana University, 919 E. 10th St Bloomington IN, 47408, USA
                [12 ]Department of Computer and Information Sciences, University of Delaware, Newark, DE 19716, USA
                [13 ]Department of Computer Science and Engineering, IIT Madras, Chennai-600 036, India
                [14 ]Medical Informatics, University of Wisconsin-Milwaukee, Milwaukee, Wisconsin, USA
                [15 ]National Centre for Text Mining and School of Computer Science, University of Manchester, Manchester, UK
                [16 ]Department of Computer Science, Tufts University, 161 College Ave, Medford, MA 02155, USA
                [17 ]Department of Computer Science and Engineering, University of California, San Diego, 9500 Gilman Drive, La Jolla, CA 92093, USA
                [18 ]Computational Biology and Data Mining Group, Max-Delbrück-Centrum für Molekulare Medizin, Robert-Rössle-Str. 10, 13125 Berlin, Germany
                Article
                1471-2105-12-S8-S3
                10.1186/1471-2105-12-S8-S3
                3269938
                22151929
                dc9c952b-7190-4bbc-b7e3-db8be31cf391
                Copyright ©2011 Krallinger et al.

                This is an open access article distributed under the terms of the Creative Commons Attribution License ( http://creativecommons.org/licenses/by/2.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.

                The Third BioCreative, Critical Assessment of Information Extraction in Biology Challenge
                Bethesda, MD, USA
                13-15 September 2010
                History
                Categories
                Research

                Bioinformatics & Computational biology
                Bioinformatics & Computational biology

                Comments

                Comment on this article