40
views
0
recommends
+1 Recommend
0 collections
    0
    shares
      • Record: found
      • Abstract: found
      • Article: found
      Is Open Access

      The ProteomeXchange consortium in 2020: enabling ‘big data’ approaches in proteomics

      research-article

      Read this article at

      Bookmark
          There is no author summary for this article yet. Authors can add summaries to their articles on ScienceOpen to make them more accessible to a non-specialist audience.

          Abstract

          The ProteomeXchange (PX) consortium of proteomics resources ( http://www.proteomexchange.org) has standardized data submission and dissemination of mass spectrometry proteomics data worldwide since 2012. In this paper, we describe the main developments since the previous update manuscript was published in Nucleic Acids Research in 2017. Since then, in addition to the four PX existing members at the time (PRIDE, PeptideAtlas including the PASSEL resource, MassIVE and jPOST), two new resources have joined PX: iProX (China) and Panorama Public (USA). We first describe the updated submission guidelines, now expanded to include six members. Next, with current data submission statistics, we demonstrate that the proteomics field is now actively embracing public open data policies. At the end of June 2019, more than 14 100 datasets had been submitted to PX resources since 2012, and from those, more than 9 500 in just the last three years. In parallel, an unprecedented increase of data re-use activities in the field, including ‘big data’ approaches, is enabling novel research and new data resources. At last, we also outline some of our future plans for the coming years.

          Related collections

          Most cited references23

          • Record: found
          • Abstract: found
          • Article: not found

          Prosit: proteome-wide prediction of peptide tandem mass spectra by deep learning

          In mass-spectrometry-based proteomics, the identification and quantification of peptides and proteins heavily rely on sequence database searching or spectral library matching. The lack of accurate predictive models for fragment ion intensities impairs the realization of the full potential of these approaches. Here, we extended the ProteomeTools synthetic peptide library to 550,000 tryptic peptides and 21 million high-quality tandem mass spectra. We trained a deep neural network, termed Prosit, resulting in chromatographic retention time and fragment ion intensity predictions that exceed the quality of the experimental data. Integrating Prosit into database search pipelines led to more identifications at >10× lower false discovery rates. We show the general applicability of Prosit by predicting spectra for proteases other than trypsin, generating spectral libraries for data-independent acquisition and improving the analysis of metaproteomes. Prosit is integrated into ProteomicsDB, allowing search result re-scoring and custom spectral library generation for any organism on the basis of peptide sequence alone.
            Bookmark
            • Record: found
            • Abstract: not found
            • Article: not found

            PeptideShaker enables reanalysis of MS-derived proteomics data sets.

              Bookmark
              • Record: found
              • Abstract: found
              • Article: found
              Is Open Access

              Expression Atlas: gene and protein expression across multiple studies and organisms

              Abstract Expression Atlas (http://www.ebi.ac.uk/gxa) is an added value database that provides information about gene and protein expression in different species and contexts, such as tissue, developmental stage, disease or cell type. The available public and controlled access data sets from different sources are curated and re-analysed using standardized, open source pipelines and made available for queries, download and visualization. As of August 2017, Expression Atlas holds data from 3,126 studies across 33 different species, including 731 from plants. Data from large-scale RNA sequencing studies including Blueprint, PCAWG, ENCODE, GTEx and HipSci can be visualized next to each other. In Expression Atlas, users can query genes or gene-sets of interest and explore their expression across or within species, tissues, developmental stages in a constitutive or differential context, representing the effects of diseases, conditions or experimental interventions. All processed data matrices are available for direct download in tab-delimited format or as R-data. In addition to the web interface, data sets can now be searched and downloaded through the Expression Atlas R package. Novel features and visualizations include the on-the-fly analysis of gene set overlaps and the option to view gene co-expression in experiments investigating constitutive gene expression across tissues or other conditions.
                Bookmark

                Author and article information

                Journal
                Nucleic Acids Res
                Nucleic Acids Res
                nar
                Nucleic Acids Research
                Oxford University Press
                0305-1048
                1362-4962
                08 January 2020
                05 November 2019
                05 November 2019
                : 48
                : D1
                : D1145-D1152
                Affiliations
                [1 ] Institute for Systems Biology , Seattle, WA 98109, USA
                [2 ] Center for Computational Mass Spectrometry, University of California , San Diego (UCSD), La Jolla, CA 92093, USA
                [3 ] Department Computer Science and Engineering, University of California , San Diego (UCSD), La Jolla, CA 92093, USA
                [4 ] Skaggs School of Pharmacy and Pharmaceutical Sciences, University of California , San Diego (UCSD), La Jolla, CA 92093, USA
                [5 ] University of Washington, Seattle , WA 98195, USA
                [6 ] European Molecular Biology Laboratory, European Bioinformatics Institute (EMBL-EBI), Wellcome Trust Genome Campus , Hinxton, Cambridge, CB10 1SD, UK
                [7 ] Faculty of Contemporary Society, Toyama University of International Studies , Toyama 930–1292, Japan
                [8 ] Database Center for Life Science (DBCLS), Joint Support-Center for Data Science Research, Research Organization of Information and Systems , Chiba 277–0871, Japan
                [9 ] Niigata University Graduate School of Medical and Dental Sciences , Niigata 951–8510, Japan
                [10 ] State Key Laboratory of Proteomics, Beijing Proteome Research Center, National Center for Protein Sciences (Beijing), Beijing Institute of Life Omics , Beijing 102206, China
                [11 ] Graduate School of Pharmaceutical Sciences, Kyoto University , Kyoto 606–8501, Japan
                Author notes
                To whom correspondence should be addressed. Tel: +44 1223 492686; Fax: +44 1223 484696; Email: juan@ 123456ebi.ac.uk

                The authors wish it to be known that, in their opinion, the first two authors should be regarded as Joint First Authors.

                Author information
                http://orcid.org/0000-0001-8479-0262
                http://orcid.org/0000-0002-3905-4335
                Article
                gkz984
                10.1093/nar/gkz984
                7145525
                31686107
                3779ca8d-2327-4133-a4e8-3176196bc626
                © The Author(s) 2019. Published by Oxford University Press on behalf of Nucleic Acids Research.

                This is an Open Access article distributed under the terms of the Creative Commons Attribution License ( http://creativecommons.org/licenses/by/4.0/), which permits unrestricted reuse, distribution, and reproduction in any medium, provided the original work is properly cited.

                History
                : 14 October 2019
                : 11 October 2019
                : 14 September 2019
                Page count
                Pages: 8
                Funding
                Funded by: Wellcome Trust 10.13039/100010269
                Award ID: WT101477MA
                Award ID: 208391/Z/17/Z
                Funded by: BBSRC 10.13039/501100000268
                Award ID: BB/N022440/1
                Award ID: BB/N022432/1
                Award ID: BB/P024599/1
                Funded by: NIH 10.13039/100000002
                Award ID: R24 GM127667-01
                Award ID: R01 GM103551
                Award ID: R01 GM121696
                Award ID: U54 HG008097
                Award ID: U01DK121289
                Award ID: R01GM087221
                Award ID: R24GM127667
                Award ID: U54EB020406
                Award ID: 5P41GM103484-07
                Award ID: R24GM127667
                Funded by: H2020 EU EPIC-XS
                Award ID: 823839
                Funded by: ELIXIR
                Funded by: NIA 10.13039/100000049
                Award ID: U19AG02312
                Funded by: NSF 10.13039/100003187
                Award ID: 1922871
                Award ID: 1933311
                Funded by: MOST 10.13039/501100002855
                Award ID: 2016YFB0201702
                Award ID: 2016YFC0901701
                Funded by: University of Washington 10.13039/100007812
                Award ID: UWPR95794
                Funded by: National Science Foundation 10.13039/100000001
                Award ID: ABI 1759980
                Funded by: National Bioscience Database Center 10.13039/501100004696
                Award ID: 18063028
                Categories
                Database Issue

                Genetics
                Genetics

                Comments

                Comment on this article