29
views
0
recommends
+1 Recommend
0 collections
    0
    shares
      • Record: found
      • Abstract: found
      • Article: found
      Is Open Access

      Controlling false discoveries in high-dimensional situations: boosting with stability selection

      research-article

      Read this article at

      Bookmark
          There is no author summary for this article yet. Authors can add summaries to their articles on ScienceOpen to make them more accessible to a non-specialist audience.

          Abstract

          Background

          Modern biotechnologies often result in high-dimensional data sets with many more variables than observations ( np). These data sets pose new challenges to statistical analysis: Variable selection becomes one of the most important tasks in this setting. Similar challenges arise if in modern data sets from observational studies, e.g., in ecology, where flexible, non-linear models are fitted to high-dimensional data. We assess the recently proposed flexible framework for variable selection called stability selection. By the use of resampling procedures, stability selection adds a finite sample error control to high-dimensional variable selection procedures such as Lasso or boosting. We consider the combination of boosting and stability selection and present results from a detailed simulation study that provide insights into the usefulness of this combination. The interpretation of the used error bounds is elaborated and insights for practical data analysis are given.

          Results

          Stability selection with boosting was able to detect influential predictors in high-dimensional settings while controlling the given error bound in various simulation scenarios. The dependence on various parameters such as the sample size, the number of truly influential variables or tuning parameters of the algorithm was investigated. The results were applied to investigate phenotype measurements in patients with autism spectrum disorders using a log-linear interaction model which was fitted by boosting. Stability selection identified five differentially expressed amino acid pathways.

          Conclusion

          Stability selection is implemented in the freely available R package stabs ( http://CRAN.R-project.org/package=stabs). It proved to work well in high-dimensional settings with more predictors than observations for both, linear and additive models. The original version of stability selection, which controls the per-family error rate, is quite conservative, though, this is much less the case for its improvement, complementary pairs stability selection. Nevertheless, care should be taken to appropriately specify the error bound.

          Electronic supplementary material

          The online version of this article (doi:10.1186/s12859-015-0575-3) contains supplementary material, which is available to authorized users.

          Related collections

          Most cited references41

          • Record: found
          • Abstract: found
          • Article: found
          Is Open Access

          In silico prediction of protein-protein interactions in human macrophages

          Background: Protein-protein interaction (PPI) network analyses are highly valuable in deciphering and understanding the intricate organisation of cellular functions. Nevertheless, the majority of available protein-protein interaction networks are context-less, i.e. without any reference to the spatial, temporal or physiological conditions in which the interactions may occur. In this work, we are proposing a protocol to infer the most likely protein-protein interaction (PPI) network in human macrophages. Results: We integrated the PPI dataset from the Agile Protein Interaction DataAnalyzer (APID) with different meta-data to infer a contextualized macrophage-specific interactome using a combination of statistical methods. The obtained interactome is enriched in experimentally verified interactions and in proteins involved in macrophage-related biological processes (i.e. immune response activation, regulation of apoptosis). As a case study, we used the contextualized interactome to highlight the cellular processes induced upon Mycobacterium tuberculosis infection. Conclusion: Our work confirms that contextualizing interactomes improves the biological significance of bioinformatic analyses. More specifically, studying such inferred network rather than focusing at the gene expression level only, is informative on the processes involved in the host response. Indeed, important immune features such as apoptosis are solely highlighted when the spotlight is on the protein interaction level.
            Bookmark
            • Record: found
            • Abstract: not found
            • Article: not found

            Stability selection

              Bookmark
              • Record: found
              • Abstract: found
              • Article: found
              Is Open Access

              Discussion of "Least angle regression" by Efron et al

              (2004)
              Discussion of ``Least angle regression'' by Efron et al. [math.ST/0406456]
                Bookmark

                Author and article information

                Contributors
                benjamin.hofner@fau.de
                lboccuto@ggc.org
                markus.goeker@dsmz.de
                Journal
                BMC Bioinformatics
                BMC Bioinformatics
                BMC Bioinformatics
                BioMed Central (London )
                1471-2105
                6 May 2015
                6 May 2015
                2015
                : 16
                : 1
                : 144
                Affiliations
                [ ]Department of Medical Informatics, Biometry and Epidemiology, Friedrich-Alexander-University Erlangen-Nuremberg, Waldstraße 6, Erlangen, 91054 Germany
                [ ]Greenwood Genetic Center, 113 Gregor Mendel Circle, Greenwood, 29646 SC USA
                [ ]Leibniz Institute DSMZ – German Collection of Microorganisms and Cell Cultures, Inhoffenstraße 7b, Braunschweig, 38124 Germany
                Article
                575
                10.1186/s12859-015-0575-3
                4464883
                25943565
                4fa71142-f817-41cc-9bf0-85a46648ef88
                © Hofner et al. 2015

                This is an Open Access article distributed under the terms of the Creative Commons Attribution License ( http://creativecommons.org/licenses/by/4.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly credited. The Creative Commons Public Domain Dedication waiver ( http://creativecommons.org/publicdomain/zero/1.0/) applies to the data made available in this article, unless otherwise stated.

                History
                : 7 November 2014
                : 16 April 2015
                Categories
                Methodology Article
                Custom metadata
                © The Author(s) 2015

                Bioinformatics & Computational biology
                boosting,error control,variable selection,stability selection

                Comments

                Comment on this article