60
views
0
recommends
+1 Recommend
0 collections
    0
    shares
      • Record: found
      • Abstract: found
      • Article: found
      Is Open Access

      Discovering and Summarizing Relationships Between Chemicals, Genes, Proteins, and Diseases in PubChem

      research-article

      Read this article at

      Bookmark
          There is no author summary for this article yet. Authors can add summaries to their articles on ScienceOpen to make them more accessible to a non-specialist audience.

          Abstract

          The literature knowledge panels developed and implemented in PubChem are described. These help to uncover and summarize important relationships between chemicals, genes, proteins, and diseases by analyzing co-occurrences of terms in biomedical literature abstracts. Named entities in PubMed records are matched with chemical names in PubChem, disease names in Medical Subject Headings (MeSH), and gene/protein names in popular gene/protein information resources, and the most closely related entities are identified using statistical analysis and relevance-based sampling. Knowledge panels for the co-occurrence of chemical, disease, and gene/protein entities are included in PubChem Compound, Protein, and Gene pages, summarizing these in a compact form. Statistical methods for removing redundancy and estimating relevance scores are discussed, along with benefits and pitfalls of relying on automated (i.e., not human-curated) methods operating on data from multiple heterogeneous sources.

          Related collections

          Most cited references37

          • Record: found
          • Abstract: not found
          • Article: not found

          On Information and Sufficiency

            Bookmark
            • Record: found
            • Abstract: found
            • Article: not found

            PubChem in 2021: new data content and improved web interfaces

            Abstract PubChem (https://pubchem.ncbi.nlm.nih.gov) is a popular chemical information resource that serves the scientific community as well as the general public, with millions of unique users per month. In the past two years, PubChem made substantial improvements. Data from more than 100 new data sources were added to PubChem, including chemical-literature links from Thieme Chemistry, chemical and physical property links from SpringerMaterials, and patent links from the World Intellectual Properties Organization (WIPO). PubChem's homepage and individual record pages were updated to help users find desired information faster. This update involved a data model change for the data objects used by these pages as well as by programmatic users. Several new services were introduced, including the PubChem Periodic Table and Element pages, Pathway pages, and Knowledge panels. Additionally, in response to the coronavirus disease 2019 (COVID-19) outbreak, PubChem created a special data collection that contains PubChem data related to COVID-19 and the severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2).
              Bookmark
              • Record: found
              • Abstract: found
              • Article: not found

              PubChem Substance and Compound databases

              PubChem (https://pubchem.ncbi.nlm.nih.gov) is a public repository for information on chemical substances and their biological activities, launched in 2004 as a component of the Molecular Libraries Roadmap Initiatives of the US National Institutes of Health (NIH). For the past 11 years, PubChem has grown to a sizable system, serving as a chemical information resource for the scientific research community. PubChem consists of three inter-linked databases, Substance, Compound and BioAssay. The Substance database contains chemical information deposited by individual data contributors to PubChem, and the Compound database stores unique chemical structures extracted from the Substance database. Biological activity data of chemical substances tested in assay experiments are contained in the BioAssay database. This paper provides an overview of the PubChem Substance and Compound databases, including data sources and contents, data organization, data submission using PubChem Upload, chemical structure standardization, web-based interfaces for textual and non-textual searches, and programmatic access. It also gives a brief description of PubChem3D, a resource derived from theoretical three-dimensional structures of compounds in PubChem, as well as PubChemRDF, Resource Description Framework (RDF)-formatted PubChem data for data sharing, analysis and integration with information contained in other databases.
                Bookmark

                Author and article information

                Contributors
                Journal
                Front Res Metr Anal
                Front Res Metr Anal
                Front. Res. Metr. Anal.
                Frontiers in Research Metrics and Analytics
                Frontiers Media S.A.
                2504-0537
                12 July 2021
                2021
                : 6
                : 689059
                Affiliations
                National Center for Biotechnology Information, National Library of Medicine, National Institutes of Health, Bethesda, MD, United States
                Author notes

                Edited by: Karin Verspoor, RMIT University, Australia

                Reviewed by: Bridget McInnes, Virginia Commonwealth University, United States

                Nansu Zong, Mayo Clinic, United States

                *Correspondence: Leonid Zaslavsky, leonid.zaslavsky@ 123456nih.gov

                This article was submitted to Text-mining and Literature-based Discovery, a section of the journal Frontiers in Research Metrics and Analytics

                Article
                689059
                10.3389/frma.2021.689059
                8311438
                34322655
                1cd54d70-a1b1-4148-80f8-f659533af058
                Copyright © 2021 Zaslavsky, Cheng, Gindulyte, He, Kim, Li, Thiessen, Yu and Bolton.

                This work is authored by Zaslavsky*, Cheng, Gindulyte, He, Kim, Li, Thiessen, Yu and Bolton on behalf of the U.S. Government and, as regards Zaslavsky*, Cheng, Gindulyte, He, Kim, Li, Thiessen, Yu and Bolton and the U.S. Government, is not subject to copyright protection in the United States. Foreign and other copyrights may apply. This is an open-access article distributed under the terms of the Creative Commons Attribution License (CC BY). The use, distribution or reproduction in other forums is permitted, provided the original author(s) and the copyright owner(s) are credited and that the original publication in this journal is cited, in accordance with accepted academic practice. No use, distribution or reproduction is permitted which does not comply with these terms.

                History
                : 31 March 2021
                : 17 June 2021
                Funding
                Funded by: U.S. National Library of Medicine 10.13039/100000092
                Categories
                Research Metrics and Analytics
                Original Research

                data mining,knowledge discovery,knowledge summarization,information retrieval,natural language processing,knowledge panels,knowledge graph,pubchem

                Comments

                Comment on this article