SEMdag: Fast learning of Directed Acyclic Graphs via node or layer ordering

There is no author summary for this article yet. Authors can add summaries to their articles on ScienceOpen to make them more accessible to a non-specialist audience.

Abstract

A Directed Acyclic Graph (DAG) offers an easy approach to define causal structures among gathered nodes: causal linkages are represented by arrows between the variables, leading from cause to effect. Recently, industry and academics have paid close attention to DAG structure learning from observable data, and many techniques have been put out to address the problem. We provide a two-step approach, named SEMdag(), that can be used to quickly learn high-dimensional linear SEMs. It is included in the R package SEMgraph and employs a two-stage order-based search using previous knowledge (Knowledge-based, KB) or data-driven method (Bottom-up, BU), under the premise that a linear SEM with equal variance error terms is assumed. We evaluated our framework’s for finding plausible DAGs against six well-known causal discovery techniques (ARGES, GES, PC, LiNGAM, CAM, NOTEARS). We conducted a series of experiments using observed expression (or RNA-seq) data, taking into account a pair of training and testing datasets for four distinct diseases: Amyotrophic Lateral Sclerosis (ALS), Breast cancer (BRCA), Coronavirus disease (COVID-19) and ST-elevation myocardial infarction (STEMI). The results show that the SEMdag() procedure can recover a graph structure with good disease prediction performance evaluated by a conventional supervised learning algorithm (RF): in the scenario where the initial graph is sparse, the BU approach may be a better choice than the KB one; in the case where the graph is denser, both BU an KB report high performance, with highest score for KB approach based on topological layers. Besides its superior disease predictive performance compared to previous research, SEMdag() offers the user the flexibility to define distinct structure learning algorithms and can handle high dimensional issues with less computing load. SEMdag() function is implemented in the R package SEMgraph, easily available at https://CRAN.R-project.org/package=SEMgraph.

Related collections

Most cited references 53

Record: found
Abstract: not found
Article: not found

Controlling the False Discovery Rate: A Practical and Powerful Approach to Multiple Testing

Yoav Benjamini, Yosef Hochberg (1995)

0 comments Cited 26947 times – based on 0 reviews      Review now

Bookmark

Record: found
Abstract: not found
Article: not found

Estimating the Dimension of a Model

Gideon Schwarz (1978)

0 comments Cited 3132 times – based on 0 reviews      Review now

Bookmark

Record: found
Abstract: found
Article: not found

Sparse inverse covariance estimation with the graphical lasso.

J. Friedman, T. Hastie, R. Tibshirani (2008)

We consider the problem of estimating sparse graphs by a lasso penalty applied to the inverse covariance matrix. Using a coordinate descent procedure for the lasso, we develop a simple algorithm--the graphical lasso--that is remarkably fast: It solves a 1000-node problem ( approximately 500,000 parameters) in at most a minute and is 30-4000 times faster than competing methods. It also provides a conceptual link between the exact problem and the approximation suggested by Meinshausen and Bühlmann (2006). We illustrate the method on some cell-signaling data from proteomics.

0 comments Cited 1598 times – based on 0 reviews      Review now

Bookmark

All references

Author and article information

Contributors

Mario Grassi: Role: ConceptualizationRole: Data curationRole: Formal analysisRole: MethodologyRole: SoftwareRole: SupervisionRole: ValidationRole: Writing – original draft

Barbara Tarantino:

ORCID: https://orcid.org/0000-0001-6384-2561

Role: Data curationRole: Formal analysisRole: InvestigationRole: SoftwareRole: ValidationRole: VisualizationRole: Writing – original draftRole: Writing – review & editing

Tao Huang: Role: Editor

Journal

Journal ID (nlm-ta): PLoS One

Journal ID (iso-abbrev): PLoS One

Journal ID (publisher-id): plos

Title: PLOS ONE

Publisher: Public Library of Science (San Francisco, CA USA )

ISSN (Electronic): 1932-6203

Publication date Collection: 2025

Publication date (Electronic): 8 January 2025

Volume: 20

Issue: 1

Electronic Location Identifier: e0317283

Affiliations

[001] Department of Brain and Behavioral Sciences, University of Pavia, Pavia, Italy

Chinese Academy of Sciences, CHINA

Author notes

Competing Interests: The authors have declared that no competing interests exist.

* E-mail: barbara.tarantino@ 123456unipv.it

Author information

Barbara Tarantino https://orcid.org/0000-0001-6384-2561

Article

Publisher ID: PONE-D-24-15014

DOI: 10.1371/journal.pone.0317283

PMC ID: 11709272

PubMed ID: 39775401

SO-VID: 23e30c14-28c4-4c2c-9fa0-066c5e5f8764

License:

This is an open access article distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited.

History

Date received : 14 April 2024

Date accepted : 25 December 2024

Page count

Figures: 5, Tables: 4, Pages: 24

Funding

The author(s) received no specific funding for this work.

Custom metadata

Data Availability Code to reproduce all results of the analysis, together with the data used in this study can be found in the supplementary files available at: https://github.com/fernandoPalluzzi/SEMgraph/tree/master/SEMdag.

ScienceOpen disciplines: Uncategorized

Data availability:

ScienceOpen disciplines: Uncategorized

Comments

Comment on this article

scite_

Smart Citations

Citing PublicationsSupportingMentioningContrasting

View Citations

See how this article has been cited at scite.ai

scite shows how a scientific paper has been cited by providing the context of the citation, a classification describing whether it supports, mentions, or contrasts the cited claim, and a label indicating in which section the citation was made.

Most referenced authors 1,402

See all reference authors

SEMdag: Fast learning of Directed Acyclic Graphs via node or layer ordering

Read this article at

Abstract

Related collections

Novel Coronavirus Disease COVID-19

Most cited references 53

Controlling the False Discovery Rate: A Practical and Powerful Approach to Multiple Testing

Estimating the Dimension of a Model

Sparse inverse covariance estimation with the graphical lasso.

Author and article information

Contributors

Journal

Affiliations

Author notes

Author information

Article

History

Page count

Funding

Categories

Custom metadata

Comments

Comment on this article

Similar content 141

Most referenced authors 1,402