Sparse PLS discriminant analysis: biologically relevant feature selection and graphical displays for multiclass problems

There is no author summary for this article yet. Authors can add summaries to their articles on ScienceOpen to make them more accessible to a non-specialist audience.

Abstract

Background

Variable selection on high throughput biological data, such as gene expression or single nucleotide polymorphisms (SNPs), becomes inevitable to select relevant information and, therefore, to better characterize diseases or assess genetic structure. There are different ways to perform variable selection in large data sets. Statistical tests are commonly used to identify differentially expressed features for explanatory purposes, whereas Machine Learning wrapper approaches can be used for predictive purposes. In the case of multiple highly correlated variables, another option is to use multivariate exploratory approaches to give more insight into cell biology, biological pathways or complex traits.

Results

A simple extension of a sparse PLS exploratory approach is proposed to perform variable selection in a multiclass classification framework.

Conclusions

sPLS-DA has a classification performance similar to other wrapper or sparse discriminant analysis approaches on public microarray and SNP data sets. More importantly, sPLS-DA is clearly competitive in terms of computational efficiency and superior in terms of interpretability of the results via valuable graphical outputs. sPLS-DA is available in the R package mixOmics, which is dedicated to the analysis of large biological data sets.

Related collections

Most cited references 29

Record: found
Abstract: found
Article: not found

Gene Ontology: tool for the unification of biology

Michael Ashburner, Catherine A. Ball, Judith Blake … (2002)

Genomic sequencing has made it clear that a large fraction of the genes specifying the core biological functions are shared by all eukaryotes. Knowledge of the biological role of such shared proteins in one organism can often be transferred to other organisms. The goal of the Gene Ontology Consortium is to produce a dynamic, controlled vocabulary that can be applied to all eukaryotes even as knowledge of gene and protein roles in cells is accumulating and changing. To this end, three independent ontologies accessible on the World-Wide Web (http://www.geneontology.org) are being constructed: biological process, molecular function and cellular component.

0 comments Cited 15593 times – based on 0 reviews      Review now

Bookmark

Record: found
Abstract: not found
Article: not found

Molecular Classification of Cancer: Class Discovery and Class Prediction by Gene Expression Monitoring

T. Golub (1999)

0 comments Cited 664 times – based on 0 reviews      Review now

Bookmark

Record: found
Abstract: found
Article: not found

A penalized matrix decomposition, with applications to sparse principal components and canonical correlation analysis.

Daniela M. Witten, Robert Tibshirani, Trevor J. Hastie (2009)

We present a penalized matrix decomposition (PMD), a new framework for computing a rank-K approximation for a matrix. We approximate the matrix X as circumflexX = sigma(k=1)(K) d(k)u(k)v(k)(T), where d(k), u(k), and v(k) minimize the squared Frobenius norm of X - circumflexX, subject to penalties on u(k) and v(k). This results in a regularized version of the singular value decomposition. Of particular interest is the use of L(1)-penalties on u(k) and v(k), which yields a decomposition of X using sparse vectors. We show that when the PMD is applied using an L(1)-penalty on v(k) but not on u(k), a method for sparse principal components results. In fact, this yields an efficient algorithm for the "SCoTLASS" proposal (Jolliffe and others 2003) for obtaining sparse principal components. This method is demonstrated on a publicly available gene expression data set. We also establish connections between the SCoTLASS method for sparse principal component analysis and the method of Zou and others (2006). In addition, we show that when the PMD is applied to a cross-products matrix, it results in a method for penalized canonical correlation analysis (CCA). We apply this penalized CCA method to simulated data and to a genomic data set consisting of gene expression and DNA copy number measurements on the same set of samples.

0 comments Cited 458 times – based on 0 reviews      Review now

Bookmark

All references

Author and article information

Journal

Journal ID (nlm-ta): BMC Bioinformatics

Title: BMC Bioinformatics

Publisher: BioMed Central

ISSN (Electronic): 1471-2105

Publication date Collection: 2011

Publication date (Electronic): 22 June 2011

Volume: 12

Page: 253

Affiliations

[1 ]Queensland Facility for Advanced Bioinformatics, University of Queensland, 4072 St Lucia, QLD, Australia

[2 ]UMR444 Laboratoire de Génétique Cellulaire, INRA, BP 52627, F-31326 Castanet Tolosan, France

[3 ]Institut de Mathématiques de Toulouse, Université de Toulouse et CNRS (UMR 5219), F-31062 Toulouse, France

Article

Publisher ID: 1471-2105-12-253

DOI: 10.1186/1471-2105-12-253

PMC ID: 3133555

PubMed ID: 21693065

SO-VID: 441e888f-9835-442c-ac59-2a12bf5ec3d1

License:

This is an Open Access article distributed under the terms of the Creative Commons Attribution License ( http://creativecommons.org/licenses/by/2.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.

Sparse PLS discriminant analysis: biologically relevant feature selection and graphical displays for multiclass problems

Read this article at

Abstract

Background

Results

Conclusions

Related collections

REPO4EU WP2 Databases

Most cited references 29

Gene Ontology: tool for the unification of biology

Molecular Classification of Cancer: Class Discovery and Class Prediction by Gene Expression Monitoring

A penalized matrix decomposition, with applications to sparse principal components and canonical correlation analysis.

Author and article information

Journal

Affiliations

Article

History

Categories

Comments

Comment on this article

Similar content 209

Cited by 318

Most referenced authors 1,067