7
views
0
recommends
+1 Recommend
0 collections
    0
    shares
      • Record: found
      • Abstract: found
      • Article: found
      Is Open Access

      Structured information extraction from scientific text with large language models

      research-article

      Read this article at

      Bookmark
          There is no author summary for this article yet. Authors can add summaries to their articles on ScienceOpen to make them more accessible to a non-specialist audience.

          Abstract

          Extracting structured knowledge from scientific text remains a challenging task for machine learning models. Here, we present a simple approach to joint named entity recognition and relation extraction and demonstrate how pretrained large language models (GPT-3, Llama-2) can be fine-tuned to extract useful records of complex scientific knowledge. We test three representative tasks in materials chemistry: linking dopants and host materials, cataloging metal-organic frameworks, and general composition/phase/morphology/application information extraction. Records are extracted from single sentences or entire paragraphs, and the output can be returned as simple English sentences or a more structured format such as a list of JSON objects. This approach represents a simple, accessible, and highly flexible route to obtaining large databases of structured specialized scientific knowledge extracted from research papers.

          Abstract

          Extracting scientific data from published research is a complex task required specialised tools. Here the authors present a scheme based on large language models to automatise the retrieval of information from text in a flexible and accessible manner.

          Related collections

          Most cited references28

          • Record: found
          • Abstract: not found
          • Article: not found

          Unsupervised word embeddings capture latent knowledge from materials science literature

            Bookmark
            • Record: found
            • Abstract: not found
            • Article: not found

            MOF-Based Membranes for Gas Separations

              Bookmark
              • Record: found
              • Abstract: found
              • Article: found
              Is Open Access

              Recent advances and applications of deep learning methods in materials science

              Deep learning (DL) is one of the fastest-growing topics in materials data science, with rapidly emerging applications spanning atomistic, image-based, spectral, and textual data modalities. DL allows analysis of unstructured data and automated identification of features. The recent development of large materials databases has fueled the application of DL methods in atomistic prediction in particular. In contrast, advances in image and spectral data have largely leveraged synthetic data enabled by high-quality forward models as well as by generative unsupervised DL methods. In this article, we present a high-level overview of deep learning methods followed by a detailed discussion of recent developments of deep learning in atomistic simulation, materials imaging, spectral analysis, and natural language processing. For each modality we discuss applications involving both theoretical and experimental data, typical modeling approaches with their strengths and limitations, and relevant publicly available software and datasets. We conclude the review with a discussion of recent cross-cutting work related to uncertainty quantification in this field and a brief perspective on limitations, challenges, and potential growth areas for DL methods in materials science.
                Bookmark

                Author and article information

                Contributors
                ajain@lbl.gov
                Journal
                Nat Commun
                Nat Commun
                Nature Communications
                Nature Publishing Group UK (London )
                2041-1723
                15 February 2024
                15 February 2024
                2024
                : 15
                : 1418
                Affiliations
                [1 ]Lawrence Berkeley National Laboratory, ( https://ror.org/02jbv0t02) Berkeley, CA USA
                [2 ]GRID grid.47840.3f, ISNI 0000 0001 2181 7878, Materials Science and Engineering Department, , University of California, ; Berkeley, CA USA
                Author information
                http://orcid.org/0000-0003-2181-4815
                http://orcid.org/0000-0002-8567-1879
                http://orcid.org/0000-0002-0141-7006
                http://orcid.org/0000-0003-2495-5509
                http://orcid.org/0000-0001-5893-9967
                Article
                45563
                10.1038/s41467-024-45563-x
                10869356
                38360817
                8ded5ab0-4433-47ba-8b92-cdb91c8514f9
                © The Author(s) 2024

                Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made. The images or other third party material in this article are included in the article’s Creative Commons license, unless indicated otherwise in a credit line to the material. If material is not included in the article’s Creative Commons license and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this license, visit http://creativecommons.org/licenses/by/4.0/.

                History
                : 17 March 2023
                : 22 January 2024
                Categories
                Article
                Custom metadata
                © Springer Nature Limited 2024

                Uncategorized
                materials science,theory and computation,scientific data,databases
                Uncategorized
                materials science, theory and computation, scientific data, databases

                Comments

                Comment on this article