A Machine Learning Approach to Parkinson’s Disease Blood Transcriptomics

There is no author summary for this article yet. Authors can add summaries to their articles on ScienceOpen to make them more accessible to a non-specialist audience.

Abstract

The increased incidence and the significant health burden associated with Parkinson’s disease (PD) have stimulated substantial research efforts towards the identification of effective treatments and diagnostic procedures. Despite technological advancements, a cure is still not available and PD is often diagnosed a long time after onset when irreversible damage has already occurred. Blood transcriptomics represents a potentially disruptive technology for the early diagnosis of PD. We used transcriptome data from the PPMI study, a large cohort study with early PD subjects and age matched controls (HC), to perform the classification of PD vs. HC in around 550 samples. Using a nested feature selection procedure based on Random Forests and XGBoost we reached an AUC of 72% and found 493 candidate genes. We further discussed the importance of the selected genes through a functional analysis based on GOs and KEGG pathways.

Related collections

Most cited references 55

Record: found
Abstract: found
Article: found

Is Open Access

Moderated estimation of fold change and dispersion for RNA-seq data with DESeq2

Michael Love, Wolfgang Huber, Simon Anders (2014)

In comparative high-throughput sequencing assays, a fundamental task is the analysis of count data, such as read counts per gene in RNA-seq, for evidence of systematic changes across experimental conditions. Small replicate numbers, discreteness, large dynamic range and the presence of outliers require a suitable statistical approach. We present DESeq2, a method for differential analysis of count data, using shrinkage estimation for dispersions and fold changes to improve stability and interpretability of estimates. This enables a more quantitative analysis focused on the strength rather than the mere presence of differential expression. The DESeq2 package is available at http://www.bioconductor.org/packages/release/bioc/html/DESeq2.html. Electronic supplementary material The online version of this article (doi:10.1186/s13059-014-0550-8) contains supplementary material, which is available to authorized users.

0 comments Cited 29694 times – based on 0 reviews      Review now

Bookmark

Record: found
Abstract: found
Article: not found

STAR: ultrafast universal RNA-seq aligner.

Alexander Dobin, Carrie A. Davis, Felix Schlesinger … (2013)

Accurate alignment of high-throughput RNA-seq data is a challenging and yet unsolved problem because of the non-contiguous transcript structure, relatively short read lengths and constantly increasing throughput of the sequencing technologies. Currently available RNA-seq aligners suffer from high mapping error rates, low mapping speed, read length limitation and mapping biases. To align our large (>80 billon reads) ENCODE Transcriptome RNA-seq dataset, we developed the Spliced Transcripts Alignment to a Reference (STAR) software based on a previously undescribed RNA-seq alignment algorithm that uses sequential maximum mappable seed search in uncompressed suffix arrays followed by seed clustering and stitching procedure. STAR outperforms other aligners by a factor of >50 in mapping speed, aligning to the human genome 550 million 2 × 76 bp paired-end reads per hour on a modest 12-core server, while at the same time improving alignment sensitivity and precision. In addition to unbiased de novo detection of canonical junctions, STAR can discover non-canonical splices and chimeric (fusion) transcripts, and is also capable of mapping full-length RNA sequences. Using Roche 454 sequencing of reverse transcription polymerase chain reaction amplicons, we experimentally validated 1960 novel intergenic splice junctions with an 80-90% success rate, corroborating the high precision of the STAR mapping strategy. STAR is implemented as a standalone C++ code. STAR is free open source software distributed under GPLv3 license and can be downloaded from http://code.google.com/p/rna-star/.

0 comments Cited 16935 times – based on 0 reviews      Review now

Bookmark

Record: found
Abstract: found
Article: found

Is Open Access

limma powers differential expression analyses for RNA-sequencing and microarray studies

Matthew E. Ritchie, Belinda Phipson, Di Wu … (2015)

limma is an R/Bioconductor software package that provides an integrated solution for analysing data from gene expression experiments. It contains rich features for handling complex experimental designs and for information borrowing to overcome the problem of small sample sizes. Over the past decade, limma has been a popular choice for gene discovery through differential expression analyses of microarray and high-throughput PCR data. The package contains particularly strong facilities for reading, normalizing and exploring such data. Recently, the capabilities of limma have been significantly expanded in two important directions. First, the package can now perform both differential expression and differential splicing analyses of RNA sequencing (RNA-seq) data. All the downstream analysis tools previously restricted to microarray data are now available for RNA-seq as well. These capabilities allow users to analyse both RNA-seq and microarray data with very similar pipelines. Second, the package is now able to go past the traditional gene-wise expression analyses in a variety of ways, analysing expression profiles in terms of co-regulated sets of genes or in terms of higher-order expression signatures. This provides enhanced possibilities for biological interpretation of gene expression differences. This article reviews the philosophy and design of the limma package, summarizing both new and historical features, with an emphasis on recent enhancements and features that have not been previously described.

0 comments Cited 13590 times – based on 0 reviews      Review now

Bookmark

All references

Author and article information

Contributors

Ester Pantaleo: (View ORCID Profile)

Alfonso Monaco: (View ORCID Profile)

Nicola Amoroso: (View ORCID Profile)

Angela Lombardi: (View ORCID Profile)

Loredana Bellantuono: (View ORCID Profile)

Claudio Lo Giudice: (View ORCID Profile)

Ernesto Picardi: (View ORCID Profile)

Benedetta Tafuri: (View ORCID Profile)

Salvatore Nigro: (View ORCID Profile)

Graziano Pesole: (View ORCID Profile)

Sabina Tangaro: (View ORCID Profile)

Journal

Journal ID (publisher-id): GENEG9

Title: Genes

Abbreviated Title: Genes

Publisher: MDPI AG

ISSN (Electronic): 2073-4425

Publication date Created: May 2022

Publication date (Electronic): April 21 2022

Volume: 13

Issue: 5

Page: 727

Article

DOI: 10.3390/genes13050727

PubMed ID: 35627112

SO-VID: 4ee3b801-63c3-4a5c-b1e5-e55ad2946ba6

License:

https://creativecommons.org/licenses/by/4.0/

History

Data availability:

Comments

Comment on this article

scite_

Smart Citations

Citing PublicationsSupportingMentioningContrasting

View Citations

See how this article has been cited at scite.ai

scite shows how a scientific paper has been cited by providing the context of the citation, a classification describing whether it supports, mentions, or contrasts the cited claim, and a label indicating in which section the citation was made.

Cited by 8

See all cited by

Most referenced authors 3,962

See all reference authors

A Machine Learning Approach to Parkinson’s Disease Blood Transcriptomics

Read this article at

Abstract

Related collections

Annual Reviews AI, Machine Learning, and Society

Most cited references 55

Moderated estimation of fold change and dispersion for RNA-seq data with DESeq2

STAR: ultrafast universal RNA-seq aligner.

limma powers differential expression analyses for RNA-sequencing and microarray studies

Author and article information

Contributors

Journal

Article

History

Comments

Comment on this article

Similar content 171

Cited by 8

Most referenced authors 3,962