0
views
0
recommends
+1 Recommend
0 collections
    0
    shares
      • Record: found
      • Abstract: found
      • Article: not found

      Large language models identify causal genes in complex trait GWAS

      Preprint

      Read this article at

      ScienceOpenPublisher
      Bookmark
          There is no author summary for this article yet. Authors can add summaries to their articles on ScienceOpen to make them more accessible to a non-specialist audience.

          Abstract

          Identifying underlying causal genes at significant loci from genome-wide association studies (GWAS) remains a challenging task. Literature evidence for disease-gene co-occurrence, whether through automated approaches or human expert annotation, is one way of nominating causal genes at GWAS loci. However, current automated approaches are limited in accuracy and generalizability, and expert annotation is not scalable to hundreds of thousands of significant findings. Here, we demonstrate that large language models (LLMs) can accurately identify genes likely to be causal at loci from GWAS. By evaluating the performance of GPT-3.5 and GPT-4 on datasets of GWAS loci with high-confidence causal gene annotations, we show that these models outperform state-of-the-art methods in identifying putative causal genes. These findings highlight the potential of LLMs to augment existing approaches to causal gene discovery.

          Related collections

          Author and article information

          Journal
          medRxiv
          May 31 2024
          Article
          10.1101/2024.05.30.24308179
          14e69630-41e5-4421-b756-37c5ea33c01d
          © 2024
          History

          Comments

          Comment on this article