Inviting an author to review:
Find an author and click ‘Invite to review selected article’ near their name.
Search for authorsSearch for similar articles
45
views
0
recommends
+1 Recommend
0 collections
    0
    shares
      • Record: found
      • Abstract: found
      • Article: found
      Is Open Access

      Crowdsourcing and curation: perspectives from biology and natural language processing

      research-article

      Read this article at

      Bookmark
          There is no author summary for this article yet. Authors can add summaries to their articles on ScienceOpen to make them more accessible to a non-specialist audience.

          Abstract

          Crowdsourcing is increasingly utilized for performing tasks in both natural language processing and biocuration. Although there have been many applications of crowdsourcing in these fields, there have been fewer high-level discussions of the methodology and its applicability to biocuration. This paper explores crowdsourcing for biocuration through several case studies that highlight different ways of leveraging ‘the crowd’; these raise issues about the kind(s) of expertise needed, the motivations of participants, and questions related to feasibility, cost and quality. The paper is an outgrowth of a panel session held at BioCreative V (Seville, September 9–11, 2015). The session consisted of four short talks, followed by a discussion. In their talks, the panelists explored the role of expertise and the potential to improve crowd performance by training; the challenge of decomposing tasks to make them amenable to crowdsourcing; and the capture of biological data and metadata through community editing.

          Database URL: http://www.mitre.org/publications/technical-papers/crowdsourcing-and-curation-perspectives

          Related collections

          Most cited references16

          • Record: found
          • Abstract: found
          • Article: not found

          The Genomes OnLine Database (GOLD) v.5: a metadata management system based on a four level (meta)genome project classification

          The Genomes OnLine Database (GOLD; http://www.genomesonline.org) is a comprehensive online resource to catalog and monitor genetic studies worldwide. GOLD provides up-to-date status on complete and ongoing sequencing projects along with a broad array of curated metadata. Here we report version 5 (v.5) of the database. The newly designed database schema and web user interface supports several new features including the implementation of a four level (meta)genome project classification system and a simplified intuitive web interface to access reports and launch search tools. The database currently hosts information for about 19 200 studies, 56 000 Biosamples, 56 000 sequencing projects and 39 400 analysis projects. More than just a catalog of worldwide genome projects, GOLD is a manually curated, quality-controlled metadata warehouse. The problems encountered in integrating disparate and varying quality data into GOLD are briefly highlighted. GOLD fully supports and follows the Genomic Standards Consortium (GSC) Minimum Information standards.
            Bookmark
            • Record: found
            • Abstract: found
            • Article: not found

            Strengths and limitations of microarray-based phenotype prediction: lessons learned from the IMPROVER Diagnostic Signature Challenge.

            After more than a decade since microarrays were used to predict phenotype of biological samples, real-life applications for disease screening and identification of patients who would best benefit from treatment are still emerging. The interest of the scientific community in identifying best approaches to develop such prediction models was reaffirmed in a competition style international collaboration called IMPROVER Diagnostic Signature Challenge whose results we describe herein. Fifty-four teams used public data to develop prediction models in four disease areas including multiple sclerosis, lung cancer, psoriasis and chronic obstructive pulmonary disease, and made predictions on blinded new data that we generated. Teams were scored using three metrics that captured various aspects of the quality of predictions, and best performers were awarded. This article presents the challenge results and introduces to the community the approaches of the best overall three performers, as well as an R package that implements the approach of the best overall team. The analyses of model performance data submitted in the challenge as well as additional simulations that we have performed revealed that (i) the quality of predictions depends more on the disease endpoint than on the particular approaches used in the challenge; (ii) the most important modeling factor (e.g. data preprocessing, feature selection and classifier type) is problem dependent; and (iii) for optimal results datasets and methods have to be carefully matched. Biomedical factors such as the disease severity and confidence in diagnostic were found to be associated with the misclassification rates across the different teams. The lung cancer dataset is available from Gene Expression Omnibus (accession, GSE43580). The maPredictDSC R package implementing the approach of the best overall team is available at www.bioconductor.org or http://bioinformaticsprb.med.wayne.edu/.
              Bookmark
              • Record: found
              • Abstract: found
              • Article: not found

              Crowdsourcing in biomedicine: challenges and opportunities.

              The use of crowdsourcing to solve important but complex problems in biomedical and clinical sciences is growing and encompasses a wide variety of approaches. The crowd is diverse and includes online marketplace workers, health information seekers, science enthusiasts and domain experts. In this article, we review and highlight recent studies that use crowdsourcing to advance biomedicine. We classify these studies into two broad categories: (i) mining big data generated from a crowd (e.g. search logs) and (ii) active crowdsourcing via specific technical platforms, e.g. labor markets, wikis, scientific games and community challenges. Through describing each study in detail, we demonstrate the applicability of different methods in a variety of domains in biomedical research, including genomics, biocuration and clinical research. Furthermore, we discuss and highlight the strengths and limitations of different crowdsourcing platforms. Finally, we identify important emerging trends, opportunities and remaining challenges for future crowdsourcing research in biomedicine.
                Bookmark

                Author and article information

                Journal
                Database (Oxford)
                Database (Oxford)
                databa
                databa
                Database: The Journal of Biological Databases and Curation
                Oxford University Press
                1758-0463
                2016
                08 August 2016
                08 August 2016
                : 2016
                : baw115
                Affiliations
                1The MITRE Corporation, Bedford, MA, USA
                2University of Paris-Sorbonne/STIH Team, Paris, France
                3Philip Morris International R&D, Philip Morris Products S.A., Neuchâtel, Switzerland
                4Joint Genome Institute, Walnut, Creek, CA, USA
                5National Center for Biotechnology Information, National Library of Medicine, National Institutes of Health, Bethesda, MD, USA
                6University of Colorado, School of Medicine, Denver, CO, USA
                Author notes
                *Corresponding author: Email: lynette@ 123456mitre.org

                Citation details: Hirschman,L., Fort,K., Boué,S. et al. Crowdsourcing and curation: perspectives from biology and natural language processing. Database (2016) Vol. 2016: article ID baw115; doi:10.1093/database/baw1115

                Article
                baw115
                10.1093/database/baw115
                4976298
                27504010
                fa116138-b585-4c7a-853f-4b72ce47d70d
                © The Author(s) 2016. Published by Oxford University Press.

                This is an Open Access article distributed under the terms of the Creative Commons Attribution License ( http://creativecommons.org/licenses/by/4.0/), which permits unrestricted reuse, distribution, and reproduction in any medium, provided the original work is properly cited.

                History
                : 19 January 2016
                : 28 June 2016
                : 11 July 2016
                Page count
                Pages: 11
                Categories
                Original Article

                Bioinformatics & Computational biology
                Bioinformatics & Computational biology

                Comments

                Comment on this article