11
views
0
recommends
+1 Recommend
0 collections
    0
    shares
      • Record: found
      • Abstract: found
      • Article: found
      Is Open Access

      Reproducible biomedical benchmarking in the cloud: lessons from crowd-sourced data challenges

      letter

      Read this article at

      Bookmark
          There is no author summary for this article yet. Authors can add summaries to their articles on ScienceOpen to make them more accessible to a non-specialist audience.

          Abstract

          Challenges are achieving broad acceptance for addressing many biomedical questions and enabling tool assessment. But ensuring that the methods evaluated are reproducible and reusable is complicated by the diversity of software architectures, input and output file formats, and computing environments. To mitigate these problems, some challenges have leveraged new virtualization and compute methods, requiring participants to submit cloud-ready software packages. We review recent data challenges with innovative approaches to model reproducibility and data sharing, and outline key lessons for improving quantitative biomedical data analysis through crowd-sourced benchmarking challenges.

          Electronic supplementary material

          The online version of this article (10.1186/s13059-019-1794-0) contains supplementary material, which is available to authorized users.

          Related collections

          Most cited references8

          • Record: found
          • Abstract: found
          • Article: not found

          A validated gene expression model of high-risk multiple myeloma is defined by deregulated expression of genes mapping to chromosome 1.

          To molecularly define high-risk disease, we performed microarray analysis on tumor cells from 532 newly diagnosed patients with multiple myeloma (MM) treated on 2 separate protocols. Using log-rank tests of expression quartiles, 70 genes, 30% mapping to chromosome 1 (P < .001), were linked to early disease-related death. Importantly, most up-regulated genes mapped to chromosome 1q, and down-regulated genes mapped to chromosome 1p. The ratio of mean expression levels of up-regulated to down-regulated genes defined a high-risk score present in 13% of patients with shorter durations of complete remission, event-free survival, and overall survival (training set: hazard ratio [HR], 5.16; P < .001; test cohort: HR, 4.75; P < .001). The high-risk score also was an independent predictor of outcome endpoints in multivariate analysis (P < .001) that included the International Staging System and high-risk translocations. In a comparison of paired baseline and relapse samples, the high-risk score frequency rose to 76% at relapse and predicted short postrelapse survival (P < .05). Multivariate discriminant analysis revealed that a 17-gene subset could predict outcome as well as the 70-gene model. Our data suggest that altered transcriptional regulation of genes mapping to chromosome 1 may contribute to disease progression, and that expression profiling can be used to identify high-risk disease and guide therapeutic interventions.
            Bookmark
            • Record: found
            • Abstract: found
            • Article: not found

            The NCI Genomic Data Commons as an engine for precision medicine.

            The National Cancer Institute Genomic Data Commons (GDC) is an information system for storing, analyzing, and sharing genomic and clinical data from patients with cancer. The recent high-throughput sequencing of cancer genomes and transcriptomes has produced a big data problem that precludes many cancer biologists and oncologists from gleaning knowledge from these data regarding the nature of malignant processes and the relationship between tumor genomic profiles and treatment response. The GDC aims to democratize access to cancer genomic data and to foster the sharing of these data to promote precision medicine approaches to the diagnosis and treatment of cancer.
              Bookmark
              • Record: found
              • Abstract: found
              • Article: not found

              A gene expression signature for high-risk multiple myeloma.

              There is a strong need to better predict the survival of patients with newly diagnosed multiple myeloma (MM). As gene expression profiles (GEPs) reflect the biology of MM in individual patients, we built a prognostic signature based on GEPs. GEPs obtained from newly diagnosed MM patients included in the HOVON65/GMMG-HD4 trial (n=290) were used as training data. Using this set, a prognostic signature of 92 genes (EMC-92-gene signature) was generated by supervised principal component analysis combined with simulated annealing. Performance of the EMC-92-gene signature was confirmed in independent validation sets of newly diagnosed (total therapy (TT)2, n=351; TT3, n=142; MRC-IX, n=247) and relapsed patients (APEX, n=264). In all the sets, patients defined as high-risk by the EMC-92-gene signature show a clearly reduced overall survival (OS) with a hazard ratio (HR) of 3.40 (95% confidence interval (CI): 2.19-5.29) for the TT2 study, 5.23 (95% CI: 2.46-11.13) for the TT3 study, 2.38 (95% CI: 1.65-3.43) for the MRC-IX study and 3.01 (95% CI: 2.06-4.39) for the APEX study (P<0.0001 in all studies). In multivariate analyses this signature was proven to be independent of the currently used prognostic factors. The EMC-92-gene signature is better or comparable to previously published signatures. This signature contributes to risk assessment in clinical trials and could provide a tool for treatment choices in high-risk MM patients.
                Bookmark

                Author and article information

                Contributors
                justin.guinney@sagebase.org
                Journal
                Genome Biol
                Genome Biol
                Genome Biology
                BioMed Central (London )
                1474-7596
                1474-760X
                10 September 2019
                10 September 2019
                2019
                : 20
                : 195
                Affiliations
                [1 ]ISNI 0000 0000 9758 5690, GRID grid.5288.7, Biomedical Engineering, , Oregon Health and Science University, ; Portland, OR 97239 USA
                [2 ]ISNI 0000 0004 6023 5303, GRID grid.430406.5, Sage Bionetworks, ; Seattle, WA USA
                [3 ]GRID grid.481554.9, IBM Research, ; Yorktown Heights, NY USA
                [4 ]ISNI 0000 0001 2097 4281, GRID grid.29857.31, Department of Biochemistry and Molecular Biology, , The Pennsylvania State University, ; University Park, State College, PA USA
                [5 ]ISNI 0000 0001 0740 6917, GRID grid.205975.c, University of California, Santa Cruz, ; Santa Cruz, CA USA
                [6 ]ISNI 0000 0001 2190 4373, GRID grid.7700.0, Institute for Computational Biomedicine, Heidelberg University, Faculty of Medicine and Heidelberg University Hospital, Bioquant, ; Heidelberg, Germany
                [7 ]ISNI 0000 0001 0728 696X, GRID grid.1957.a, Joint Research Center for Computational Biomedicine, , RWTH Aachen University, Faculty of Medicine, ; Aachen, Germany
                [8 ]ISNI 0000 0004 0626 690X, GRID grid.419890.d, Ontario Institute for Cancer Research, ; Toronto, Canada
                [9 ]ISNI 0000 0001 2157 2938, GRID grid.17063.33, Departments of Medical Biophysics and Pharmacology & Toxicology, University of Toronto, ; Toronto, Canada
                [10 ]ISNI 0000 0000 9632 6718, GRID grid.19006.3e, Departments of Human Genetics and Urology, , University of California, ; Los Angeles, CA USA
                [11 ]ISNI 0000 0000 9632 6718, GRID grid.19006.3e, Jonsson Comprehensive Cancer Centre, , University of California, ; Los Angeles, CA USA
                [12 ]ISNI 0000 0000 9632 6718, GRID grid.19006.3e, Institute for Precision Health, , University of California, ; Los Angeles, CA USA
                [13 ]ISNI 0000000122986657, GRID grid.34477.33, Biomedical Informatics and Medical Education, , University of Washington, ; Seattle, WA 98195 USA
                Author information
                http://orcid.org/0000-0003-1477-1888
                Article
                1794
                10.1186/s13059-019-1794-0
                6737594
                31506093
                97cf2ea3-5d6f-4961-91ff-35c18b587080
                © The Author(s). 2019

                Open AccessThis article is distributed under the terms of the Creative Commons Attribution 4.0 International License ( http://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution, and reproduction in any medium, provided you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made. The Creative Commons Public Domain Dedication waiver ( http://creativecommons.org/publicdomain/zero/1.0/) applies to the data made available in this article, unless otherwise stated.

                History
                : 25 April 2019
                : 13 August 2019
                Funding
                Funded by: FundRef http://dx.doi.org/10.13039/100000054, National Cancer Institute;
                Award ID: P30CA016042
                Award ID: R01CA180778
                Award ID: 5U24CA209923
                Award Recipient :
                Categories
                Open Letter
                Custom metadata
                © The Author(s) 2019

                Genetics
                Genetics

                Comments

                Comment on this article