Inviting an author to review:
Find an author and click ‘Invite to review selected article’ near their name.
Search for authorsSearch for similar articles
0
views
0
recommends
+1 Recommend
0 collections
    0
    shares
      • Record: found
      • Abstract: found
      • Article: found
      Is Open Access

      Hybrid Value-Aware Transformer Architecture for Joint Learning from Longitudinal and Non-Longitudinal Clinical Data

      Preprint
      research-article

      Read this article at

      Bookmark
          There is no author summary for this article yet. Authors can add summaries to their articles on ScienceOpen to make them more accessible to a non-specialist audience.

          Abstract

          Transformer is the latest deep neural network (DNN) architecture for sequence data learning that has revolutionized the field of natural language processing. This success has motivated researchers to explore its application in the healthcare domain. Despite the similarities between longitudinal clinical data and natural language data, clinical data presents unique complexities that make adapting Transformer to this domain challenging. To address this issue, we have designed a new Transformer-based DNN architecture, referred to as Hybrid Value-Aware Transformer (HVAT), which can jointly learn from longitudinal and non-longitudinal clinical data. HVAT is unique in the ability to learn from the numerical values associated with clinical codes/concepts such as labs, and also the use of a flexible longitudinal data representation called clinical tokens. We trained a prototype HVAT model on a case-control dataset, achieving high performance in predicting Alzheimer’s disease and related dementias as the patient outcome. The result demonstrates the potential of HVAT for broader clinical data learning tasks.

          Related collections

          Most cited references17

          • Record: found
          • Abstract: found
          • Article: not found

          Attention Is All You Need

          The dominant sequence transduction models are based on complex recurrent or convolutional neural networks in an encoder-decoder configuration. The best performing models also connect the encoder and decoder through an attention mechanism. We propose a new simple network architecture, the Transformer, based solely on attention mechanisms, dispensing with recurrence and convolutions entirely. Experiments on two machine translation tasks show these models to be superior in quality while being more parallelizable and requiring significantly less time to train. Our model achieves 28.4 BLEU on the WMT 2014 English-to-German translation task, improving over the existing best results, including ensembles by over 2 BLEU. On the WMT 2014 English-to-French translation task, our model establishes a new single-model state-of-the-art BLEU score of 41.8 after training for 3.5 days on eight GPUs, a small fraction of the training costs of the best models from the literature. We show that the Transformer generalizes well to other tasks by applying it successfully to English constituency parsing both with large and limited training data. 15 pages, 5 figures
            Bookmark
            • Record: found
            • Abstract: found
            • Article: not found

            Language Models are Few-Shot Learners

            Recent work has demonstrated substantial gains on many NLP tasks and benchmarks by pre-training on a large corpus of text followed by fine-tuning on a specific task. While typically task-agnostic in architecture, this method still requires task-specific fine-tuning datasets of thousands or tens of thousands of examples. By contrast, humans can generally perform a new language task from only a few examples or from simple instructions - something which current NLP systems still largely struggle to do. Here we show that scaling up language models greatly improves task-agnostic, few-shot performance, sometimes even reaching competitiveness with prior state-of-the-art fine-tuning approaches. Specifically, we train GPT-3, an autoregressive language model with 175 billion parameters, 10x more than any previous non-sparse language model, and test its performance in the few-shot setting. For all tasks, GPT-3 is applied without any gradient updates or fine-tuning, with tasks and few-shot demonstrations specified purely via text interaction with the model. GPT-3 achieves strong performance on many NLP datasets, including translation, question-answering, and cloze tasks, as well as several tasks that require on-the-fly reasoning or domain adaptation, such as unscrambling words, using a novel word in a sentence, or performing 3-digit arithmetic. At the same time, we also identify some datasets where GPT-3's few-shot learning still struggles, as well as some datasets where GPT-3 faces methodological issues related to training on large web corpora. Finally, we find that GPT-3 can generate samples of news articles which human evaluators have difficulty distinguishing from articles written by humans. We discuss broader societal impacts of this finding and of GPT-3 in general. 40+32 pages
              Bookmark
              • Record: found
              • Abstract: not found
              • Conference Proceedings: not found

              Language models are unsupervised multitask learners

                Bookmark

                Author and article information

                Journal
                medRxiv
                MEDRXIV
                medRxiv
                Cold Spring Harbor Laboratory
                14 March 2023
                : 2023.03.09.23287046
                Affiliations
                [1 ]George Washington University, Washington, DC
                [2 ]Washington DC VA Medical Center, Washington, DC
                [3 ]Rutgers University, New Brunswick, NJ
                [4 ]University of Utah, Salt Lake City, Utah
                [5 ]Irvine Clinical Research, Irvine, CA
                [6 ]Georgetown University, Washington, DC
                Author notes
                [* ]Corresponding authors, yshao@ 123456gwu.edu ; zengq@ 123456gwu.edu
                Article
                10.1101/2023.03.09.23287046
                10055462
                36993767
                b589ef52-6117-4fc1-a6eb-6355fd2ecfd7

                This work is licensed under a Creative Commons Attribution-NonCommercial 4.0 International License, which allows reusers to distribute, remix, adapt, and build upon the material in any medium or format for noncommercial purposes only, and only so long as attribution is given to the creator.

                History
                Funding
                Funded by: NIH/NIA
                Award ID: RF1AG069121
                Funded by: NIH/NIA
                Award ID: 1R01AG073474-01A1
                Categories
                Article

                Comments

                Comment on this article

                scite_
                0
                0
                0
                0
                Smart Citations
                0
                0
                0
                0
                Citing PublicationsSupportingMentioningContrasting
                View Citations

                See how this article has been cited at scite.ai

                scite shows how a scientific paper has been cited by providing the context of the citation, a classification describing whether it supports, mentions, or contrasts the cited claim, and a label indicating in which section the citation was made.

                Similar content71

                Cited by1

                Most referenced authors543