22
views
0
recommends
+1 Recommend
0 collections
    0
    shares
      • Record: found
      • Abstract: found
      • Article: found
      Is Open Access

      Protein complex-based analysis is resistant to the obfuscating consequences of batch effects --- a case study in clinical proteomics

      research-article
      1 , 2 , , 2 , 3 ,
      BMC Genomics
      BioMed Central
      The Fifteenth Asia Pacific Bioinformatics Conference (APBC 2017)
      16-18 January 2017
      Proteomics, Bioinformatics, Principal component analysis, Heterogeneity, Batch effects

      Read this article at

      Bookmark
          There is no author summary for this article yet. Authors can add summaries to their articles on ScienceOpen to make them more accessible to a non-specialist audience.

          Abstract

          Background

          In proteomics, batch effects are technical sources of variation that confounds proper analysis, preventing effective deployment in clinical and translational research.

          Results

          Using simulated and real data, we demonstrate existing batch effect-correction methods do not always eradicate all batch effects. Worse still, they may alter data integrity, and introduce false positives. Moreover, although Principal component analysis (PCA) is commonly used for detecting batch effects. The principal components (PCs) themselves may be used as differential features, from which relevant differential proteins may be effectively traced. Batch effect are removable by identifying PCs highly correlated with batch but not class effect.

          However, neither PC-based nor existing batch effect-correction methods address well subtle batch effects, which are difficult to eradicate, and involve data transformation and/or projection which is error-prone. To address this, we introduce the concept of batch-effect resistant methods and demonstrate how such methods incorporating protein complexes are particularly resistant to batch effect without compromising data integrity.

          Conclusions

          Protein complex-based analyses are powerful, offering unparalleled differential protein-selection reproducibility and high prediction accuracy. We demonstrate for the first time their innate resistance against batch effects, even subtle ones. As complex-based analyses require no prior data transformation (e.g. batch-effect correction), data integrity is protected. Individual checks on top-ranked protein complexes confirm strong association with phenotype classes and not batch. Therefore, the constituent proteins of these complexes are more likely to be clinically relevant.

          Electronic supplementary material

          The online version of this article (doi:10.1186/s12864-017-3490-3) contains supplementary material, which is available to authorized users.

          Related collections

          Most cited references30

          • Record: found
          • Abstract: found
          • Article: found
          Is Open Access

          CORUM: the comprehensive resource of mammalian protein complexes—2009

          CORUM is a database that provides a manually curated repository of experimentally characterized protein complexes from mammalian organisms, mainly human (64%), mouse (16%) and rat (12%). Protein complexes are key molecular entities that integrate multiple gene products to perform cellular functions. The new CORUM 2.0 release encompasses 2837 protein complexes offering the largest and most comprehensive publicly available dataset of mammalian protein complexes. The CORUM dataset is built from 3198 different genes, representing ∼16% of the protein coding genes in humans. Each protein complex is described by a protein complex name, subunit composition, function as well as the literature reference that characterizes the respective protein complex. Recent developments include mapping of functional annotation to Gene Ontology terms as well as cross-references to Entrez Gene identifiers. In addition, a ‘Phylogenetic Conservation’ analysis tool was implemented that analyses the potential occurrence of orthologous protein complex subunits in mammals and other selected groups of organisms. This allows one to predict the occurrence of protein complexes in different phylogenetic groups. CORUM is freely accessible at (http://mips.helmholtz-muenchen.de/genre/proj/corum/index.html).
            Bookmark
            • Record: found
            • Abstract: not found
            • Article: not found

            OpenSWATH enables automated, targeted analysis of data-independent acquisition MS data.

              Bookmark
              • Record: found
              • Abstract: found
              • Article: not found

              Increasing the multiplexing capacity of TMTs using reporter ion isotopologues with isobaric masses.

              Quantitative mass spectrometry methods offer near-comprehensive proteome coverage; however, these methods still suffer with regards to sample throughput. Multiplex quantitation via isobaric chemical tags (e.g., TMT and iTRAQ) provides an avenue for mass spectrometry-based proteome quantitation experiments to move away from simple binary comparisons and toward greater parallelization. Herein, we demonstrate a straightforward method for immediately expanding the throughput of the TMT isobaric reagents from 6-plex to 8-plex. This method is based upon our ability to resolve the isotopic shift that results from substituting a (15)N for a (13)C. In an accommodation to the preferred fragmentation pathways of ETD, the TMT-127 and -129 reagents were recently modified such that a (13)C was exchanged for a (15)N. As a result of this substitution, the new TMT reporter ions are 6.32 mDa lighter. Even though the mass difference between these reporter ion isotopologues is incredibly small, modern high-resolution and mass accuracy analyzers can resolve these ions. On the basis of our ability to resolve and accurately measure the relative intensity of these isobaric reporter ions, we demonstrate that we are able to quantify across eight samples simultaneously by combining the (13)C- and (15)N-containing reporter ions. Considering the structure of the TMT reporter ion, we believe this work serves as a blueprint for expanding the multiplexing capacity of the TMT reagents to at least 10-plex and possibly up to 18-plex.
                Bookmark

                Author and article information

                Contributors
                +86-22-27401021 , wilson.goh@tju.edu.cn , goh.informatics@gmail.com
                +65-65162902 , wongls@comp.nus.edu.sg
                Conference
                BMC Genomics
                BMC Genomics
                BMC Genomics
                BioMed Central (London )
                1471-2164
                14 March 2017
                14 March 2017
                2017
                : 18
                Issue : Suppl 2 Issue sponsor : Publication of this supplement has not been supported by sponsorship. Information about the source of funding for publication charges can be found in the individual articles. The articles have undergone the journal's standard peer review process for supplements. The Supplement Editors declare that they have no competing interests.
                : 142
                Affiliations
                [1 ]ISNI 0000 0004 1761 2484, GRID grid.33763.32, School of Pharmaceutical Science and Technology, , Tianjin University, ; 92 Weijin Road, Nankai District, Tianjin, 300072 People’s Republic of China
                [2 ]ISNI 0000 0001 2180 6431, GRID grid.4280.e, Department of Computer Science, , National University of Singapore, ; 13 Computing Drive, Singapore, 117417 Singapore
                [3 ]ISNI 0000 0001 2180 6431, GRID grid.4280.e, Department of Pathology, , National University of Singapore, ; Singapore, Singapore
                Article
                3490
                10.1186/s12864-017-3490-3
                5374662
                28361693
                3a741f71-0e51-4c2d-ae65-3423912158dd
                © The Author(s). 2017

                Open AccessThis article is distributed under the terms of the Creative Commons Attribution 4.0 International License ( http://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution, and reproduction in any medium, provided you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made. The Creative Commons Public Domain Dedication waiver ( http://creativecommons.org/publicdomain/zero/1.0/) applies to the data made available in this article, unless otherwise stated.

                The Fifteenth Asia Pacific Bioinformatics Conference
                APBC 2017
                Shenzhen, China
                16-18 January 2017
                History
                Categories
                Research
                Custom metadata
                © The Author(s) 2017

                Genetics
                proteomics,bioinformatics,principal component analysis,heterogeneity,batch effects
                Genetics
                proteomics, bioinformatics, principal component analysis, heterogeneity, batch effects

                Comments

                Comment on this article