6
views
0
recommends
+1 Recommend
0 collections
    0
    shares
      • Record: found
      • Abstract: found
      • Article: found
      Is Open Access

      Systematic benchmarking of omics computational tools

      research-article

      Read this article at

      Bookmark
          There is no author summary for this article yet. Authors can add summaries to their articles on ScienceOpen to make them more accessible to a non-specialist audience.

          Abstract

          Computational omics methods packaged as software have become essential to modern biological research. The increasing dependence of scientists on these powerful software tools creates a need for systematic assessment of these methods, known as benchmarking. Adopting a standardized benchmarking practice could help researchers who use omics data to better leverage recent technological innovations. Our review summarizes benchmarking practices from 25 recent studies and discusses the challenges, advantages, and limitations of benchmarking across various domains of biology. We also propose principles that can make computational biology benchmarking studies more sustainable and reproducible, ultimately increasing the transparency of biomedical data and results.

          Abstract

          Benchmarking studies are important for comprehensively understanding and evaluating different computational omics methods. Here, the authors review practices from 25 recent studies and propose principles to improve the quality of benchmarking studies.

          Related collections

          Most cited references64

          • Record: found
          • Abstract: found
          • Article: found
          Is Open Access

          RSEM: accurate transcript quantification from RNA-Seq data with or without a reference genome

          Background RNA-Seq is revolutionizing the way transcript abundances are measured. A key challenge in transcript quantification from RNA-Seq data is the handling of reads that map to multiple genes or isoforms. This issue is particularly important for quantification with de novo transcriptome assemblies in the absence of sequenced genomes, as it is difficult to determine which transcripts are isoforms of the same gene. A second significant issue is the design of RNA-Seq experiments, in terms of the number of reads, read length, and whether reads come from one or both ends of cDNA fragments. Results We present RSEM, an user-friendly software package for quantifying gene and isoform abundances from single-end or paired-end RNA-Seq data. RSEM outputs abundance estimates, 95% credibility intervals, and visualization files and can also simulate RNA-Seq data. In contrast to other existing tools, the software does not require a reference genome. Thus, in combination with a de novo transcriptome assembler, RSEM enables accurate transcript quantification for species without sequenced genomes. On simulated and real data sets, RSEM has superior or comparable performance to quantification methods that rely on a reference genome. Taking advantage of RSEM's ability to effectively use ambiguously-mapping reads, we show that accurate gene-level abundance estimates are best obtained with large numbers of short single-end reads. On the other hand, estimates of the relative frequencies of isoforms within single genes may be improved through the use of paired-end reads, depending on the number of possible splice forms for each gene. Conclusions RSEM is an accurate and user-friendly software tool for quantifying transcript abundances from RNA-Seq data. As it does not rely on the existence of a reference genome, it is particularly useful for quantification with de novo transcriptome assemblies. In addition, RSEM has enabled valuable guidance for cost-efficient design of quantification experiments with RNA-Seq, which is currently relatively expensive.
            Bookmark
            • Record: found
            • Abstract: found
            • Article: found
            Is Open Access

            The GENCODE v7 catalog of human long noncoding RNAs: Analysis of their gene structure, evolution, and expression

            The human genome contains many thousands of long noncoding RNAs (lncRNAs). While several studies have demonstrated compelling biological and disease roles for individual examples, analytical and experimental approaches to investigate these genes have been hampered by the lack of comprehensive lncRNA annotation. Here, we present and analyze the most complete human lncRNA annotation to date, produced by the GENCODE consortium within the framework of the ENCODE project and comprising 9277 manually annotated genes producing 14,880 transcripts. Our analyses indicate that lncRNAs are generated through pathways similar to that of protein-coding genes, with similar histone-modification profiles, splicing signals, and exon/intron lengths. In contrast to protein-coding genes, however, lncRNAs display a striking bias toward two-exon transcripts, they are predominantly localized in the chromatin and nucleus, and a fraction appear to be preferentially processed into small RNAs. They are under stronger selective pressure than neutrally evolving sequences—particularly in their promoter regions, which display levels of selection comparable to protein-coding genes. Importantly, about one-third seem to have arisen within the primate lineage. Comprehensive analysis of their expression in multiple human organs and brain regions shows that lncRNAs are generally lower expressed than protein-coding genes, and display more tissue-specific expression patterns, with a large fraction of tissue-specific lncRNAs expressed in the brain. Expression correlation analysis indicates that lncRNAs show particularly striking positive correlation with the expression of antisense coding genes. This GENCODE annotation represents a valuable resource for future studies of lncRNAs.
              Bookmark
              • Record: found
              • Abstract: found
              • Article: not found

              Comparison of the predicted and observed secondary structure of T4 phage lysozyme.

              Predictions of the secondary structure of T4 phage lysozyme, made by a number of investigators on the basis of the amino acid sequence, are compared with the structure of the protein determined experimentally by X-ray crystallography. Within the amino terminal half of the molecule the locations of helices predicted by a number of methods agree moderately well with the observed structure, however within the carboxyl half of the molecule the overall agreement is poor. For eleven different helix predictions, the coefficients giving the correlation between prediction and observation range from 0.14 to 0.42. The accuracy of the predictions for both beta-sheet regions and for turns are generally lower than for the helices, and in a number of instances the agreement between prediction and observation is no better than would be expected for a random selection of residues. The structural predictions for T4 phage lysozyme are much less successful than was the case for adenylate kinase (Schulz et al. (1974) Nature 250, 140-142). No one method of prediction is clearly superior to all others, and although empirical predictions based on larger numbers of known protein structure tend to be more accurate than those based on a limited sample, the improvement in accuracy is not dramatic, suggesting that the accuracy of current empirical predictive methods will not be substantially increased simply by the inclusion of more data from additional protein structure determinations.
                Bookmark

                Author and article information

                Contributors
                smangul@ucla.edu
                Journal
                Nat Commun
                Nat Commun
                Nature Communications
                Nature Publishing Group UK (London )
                2041-1723
                27 March 2019
                27 March 2019
                2019
                : 10
                : 1393
                Affiliations
                [1 ]ISNI 0000 0000 9632 6718, GRID grid.19006.3e, Department of Computer Science, , University of California Los Angeles, ; 580 Portola Plaza, Los Angeles, CA 90095 USA
                [2 ]ISNI 0000 0000 9632 6718, GRID grid.19006.3e, Institute for Quantitative and Computational Biosciences, , University of California Los Angeles, ; 611 Charles E Young Drive East, Los Angeles, CA 90095 USA
                [3 ]ISNI 0000 0000 9632 6718, GRID grid.19006.3e, Department of Psychiatry and Biobehavioral Sciences, David Geffen School of Medicine, , University of California Los Angeles, ; Los Angeles, CA 90095 USA
                [4 ]ISNI 0000 0004 1936 7400, GRID grid.256304.6, Department of Computer Science, , Georgia State University, ; Atlanta, GA 30303 USA
                [5 ]ISNI 0000 0001 2288 8774, GRID grid.448878.f, The Laboratory of Bioinformatics, , I.M. Sechenov First Moscow State Medical University, ; Moscow, 119991 Russia
                [6 ]ISNI 0000 0000 9632 6718, GRID grid.19006.3e, Department of Human Genetics, , University of California Los Angeles, ; 695 Charles E. Young, Los Angeles, CA USA
                Author information
                http://orcid.org/0000-0002-6881-5770
                http://orcid.org/0000-0003-4424-4691
                Article
                9406
                10.1038/s41467-019-09406-4
                6437167
                30918265
                bf7685fa-6d52-4c17-8a9b-fdccbb36415c
                © The Author(s) 2019

                Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made. The images or other third party material in this article are included in the article’s Creative Commons license, unless indicated otherwise in a credit line to the material. If material is not included in the article’s Creative Commons license and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this license, visit http://creativecommons.org/licenses/by/4.0/.

                History
                : 23 July 2018
                : 6 March 2019
                Categories
                Review Article
                Custom metadata
                © The Author(s) 2019

                Uncategorized
                Uncategorized

                Comments

                Comment on this article