63
views
0
recommends
+1 Recommend
0 collections
    0
    shares
      • Record: found
      • Abstract: found
      • Article: found
      Is Open Access

      BioCreative III interactive task: an overview

      research-article
      1 , , 2 , 3 , 4 , 5 , 6 , 7 , 8 , 9 , 10 , 11 , 2 , 12 , 13 , 14 , 12 , 3 , 15 , 16 , 17 , 5 , 8 , 17 , 18 , 19 , 20 , 4 , 14 , 21 , 22 , 1
      BMC Bioinformatics
      BioMed Central
      The Third BioCreative, Critical Assessment of Information Extraction in Biology Challenge
      13-15 September 2010

      Read this article at

      Bookmark
          There is no author summary for this article yet. Authors can add summaries to their articles on ScienceOpen to make them more accessible to a non-specialist audience.

          Abstract

          Background

          The BioCreative challenge evaluation is a community-wide effort for evaluating text mining and information extraction systems applied to the biological domain. The biocurator community, as an active user of biomedical literature, provides a diverse and engaged end user group for text mining tools. Earlier BioCreative challenges involved many text mining teams in developing basic capabilities relevant to biological curation, but they did not address the issues of system usage, insertion into the workflow and adoption by curators. Thus in BioCreative III (BC-III), the InterActive Task (IAT) was introduced to address the utility and usability of text mining tools for real-life biocuration tasks. To support the aims of the IAT in BC-III, involvement of both developers and end users was solicited, and the development of a user interface to address the tasks interactively was requested.

          Results

          A User Advisory Group (UAG) actively participated in the IAT design and assessment. The task focused on gene normalization (identifying gene mentions in the article and linking these genes to standard database identifiers), gene ranking based on the overall importance of each gene mentioned in the article, and gene-oriented document retrieval (identifying full text papers relevant to a selected gene). Six systems participated and all processed and displayed the same set of articles. The articles were selected based on content known to be problematic for curation, such as ambiguity of gene names, coverage of multiple genes and species, or introduction of a new gene name. Members of the UAG curated three articles for training and assessment purposes, and each member was assigned a system to review. A questionnaire related to the interface usability and task performance (as measured by precision and recall) was answered after systems were used to curate articles. Although the limited number of articles analyzed and users involved in the IAT experiment precluded rigorous quantitative analysis of the results, a qualitative analysis provided valuable insight into some of the problems encountered by users when using the systems. The overall assessment indicates that the system usability features appealed to most users, but the system performance was suboptimal (mainly due to low accuracy in gene normalization). Some of the issues included failure of species identification and gene name ambiguity in the gene normalization task leading to an extensive list of gene identifiers to review, which, in some cases, did not contain the relevant genes. The document retrieval suffered from the same shortfalls. The UAG favored achieving high performance (measured by precision and recall), but strongly recommended the addition of features that facilitate the identification of correct gene and its identifier, such as contextual information to assist in disambiguation.

          Discussion

          The IAT was an informative exercise that advanced the dialog between curators and developers and increased the appreciation of challenges faced by each group. A major conclusion was that the intended users should be actively involved in every phase of software development, and this will be strongly encouraged in future tasks. The IAT Task provides the first steps toward the definition of metrics and functional requirements that are necessary for designing a formal evaluation of interactive curation systems in the BioCreative IV challenge.

          Related collections

          Most cited references23

          • Record: found
          • Abstract: found
          • Article: not found

          Gene Ontology: tool for the unification of biology

          Genomic sequencing has made it clear that a large fraction of the genes specifying the core biological functions are shared by all eukaryotes. Knowledge of the biological role of such shared proteins in one organism can often be transferred to other organisms. The goal of the Gene Ontology Consortium is to produce a dynamic, controlled vocabulary that can be applied to all eukaryotes even as knowledge of gene and protein roles in cells is accumulating and changing. To this end, three independent ontologies accessible on the World-Wide Web (http://www.geneontology.org) are being constructed: biological process, molecular function and cellular component.
            Bookmark
            • Record: found
            • Abstract: found
            • Article: found
            Is Open Access

            The Universal Protein Resource (UniProt) in 2010

            The primary mission of UniProt is to support biological research by maintaining a stable, comprehensive, fully classified, richly and accurately annotated protein sequence knowledgebase, with extensive cross-references and querying interfaces freely accessible to the scientific community. UniProt is produced by the UniProt Consortium which consists of groups from the European Bioinformatics Institute (EBI), the Swiss Institute of Bioinformatics (SIB) and the Protein Information Resource (PIR). UniProt is comprised of four major components, each optimized for different uses: the UniProt Archive, the UniProt Knowledgebase, the UniProt Reference Clusters and the UniProt Metagenomic and Environmental Sequence Database. UniProt is updated and distributed every 3 weeks and can be accessed online for searches or download at http://www.uniprot.org.
              Bookmark
              • Record: found
              • Abstract: not found
              • Article: not found

              A survey of current work in biomedical text mining.

              A. Cohen (2005)
              The volume of published biomedical research, and therefore the underlying biomedical knowledge base, is expanding at an increasing rate. Among the tools that can aid researchers in coping with this information overload are text mining and knowledge extraction. Significant progress has been made in applying text mining to named entity recognition, text classification, terminology extraction, relationship extraction and hypothesis generation. Several research groups are constructing integrated flexible text-mining systems intended for multiple uses. The major challenge of biomedical text mining over the next 5-10 years is to make these systems useful to biomedical researchers. This will require enhanced access to full text, better understanding of the feature space of biomedical literature, better methods for measuring the usefulness of systems to users, and continued cooperation with the biomedical research community to ensure that their needs are addressed.
                Bookmark

                Author and article information

                Conference
                BMC Bioinformatics
                BMC Bioinformatics
                BioMed Central
                1471-2105
                2011
                3 October 2011
                : 12
                : Suppl 8
                : S4
                Affiliations
                [1 ]Center for Bioinformatics and Computational Biology, University of Delaware, Newark, DE, USA
                [2 ]Pfizer Research Technology Center, Cambridge, Massachusetts, USA
                [3 ]Medical Informatics, University of Wisconsin-Milwaukee, Milwaukee, Wisconsin, USA
                [4 ]Department of Computer Science, The University of Iowa, Iowa City, Iowa, USA
                [5 ]University of Rome Tor Vergata, Italy
                [6 ]IRCCS Fondazione Santa Lucia, Italy
                [7 ]Wellcome Trust Centre for Cell Biology, University of Edinburgh, UK
                [8 ]Institute of Computational Linguistics, University of Zurich, Zurich, Switzerland
                [9 ]CALIPHO group, Swiss Institutes of Bioinformatics, Geneva, Switzerland
                [10 ]dictyBase, NIBIC, Northwestern University, Chicago, IL, USA
                [11 ]University of Maryland, Baltimore, MD, USA
                [12 ]TAIR, Carnegie Institution for Science, Washington, DC, USA
                [13 ]Structural and Computational Biology Group, Spanish National Cancer Research Centre (CNIO), Madrid, Spain
                [14 ]Humboldt-Universität zu Berlin, Unter den Linden 6, 10099 Berlin, Germany
                [15 ]National Center for Biotechnology Information (NCBI), Bethesda, MD, USA
                [16 ]MGI, The Jackson Laboratory, Bar Harbor, ME, USA
                [17 ]Department of Computer Science, University of Tokyo, Japan
                [18 ]Department of Computer and Information Science, NTNU, Trondheim, Norway
                [19 ]Australian Regenerative Medicine Institute, Monash University, Melbourne, Victoria, Australia
                [20 ]Developmental Biology Institute of Marseille Luminy (IBDML), Université de la Méditerranée, Campus de Luminy, Marseille, France
                [21 ]Merck KGaA, Darmstadt, Germany
                [22 ]Information Technology Center, The MITRE Corporation, Bedford, MA, USA
                Article
                1471-2105-12-S8-S4
                10.1186/1471-2105-12-S8-S4
                3269939
                22151968
                6d7a76f1-adee-4d5a-8964-c2c2ad3a20d6
                Copyright ©2011 Arighi et al; licensee BioMed Central Ltd.

                This is an open access article distributed under the terms of the Creative Commons Attribution License ( http://creativecommons.org/licenses/by/2.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.

                The Third BioCreative, Critical Assessment of Information Extraction in Biology Challenge
                Bethesda, MD, USA
                13-15 September 2010
                History
                Categories
                Research

                Bioinformatics & Computational biology
                Bioinformatics & Computational biology

                Comments

                Comment on this article