Discovering and Summarizing Relationships Between Chemicals, Genes, Proteins, and Diseases in PubChem

There is no author summary for this article yet. Authors can add summaries to their articles on ScienceOpen to make them more accessible to a non-specialist audience.

Abstract

The literature knowledge panels developed and implemented in PubChem are described. These help to uncover and summarize important relationships between chemicals, genes, proteins, and diseases by analyzing co-occurrences of terms in biomedical literature abstracts. Named entities in PubMed records are matched with chemical names in PubChem, disease names in Medical Subject Headings (MeSH), and gene/protein names in popular gene/protein information resources, and the most closely related entities are identified using statistical analysis and relevance-based sampling. Knowledge panels for the co-occurrence of chemical, disease, and gene/protein entities are included in PubChem Compound, Protein, and Gene pages, summarizing these in a compact form. Statistical methods for removing redundancy and estimating relevance scores are discussed, along with benefits and pitfalls of relying on automated (i.e., not human-curated) methods operating on data from multiple heterogeneous sources.

Related collections

Most cited references 37

Record: found
Abstract: not found
Article: not found

On Information and Sufficiency

S Kullback, R. A. Leibler (1951)

0 comments Cited 1159 times – based on 0 reviews      Review now

Bookmark

Record: found
Abstract: found
Article: not found

PubChem in 2021: new data content and improved web interfaces

Sunghwan Kim, Jie Chen, Tiejun Cheng … (2020)

Abstract PubChem (https://pubchem.ncbi.nlm.nih.gov) is a popular chemical information resource that serves the scientific community as well as the general public, with millions of unique users per month. In the past two years, PubChem made substantial improvements. Data from more than 100 new data sources were added to PubChem, including chemical-literature links from Thieme Chemistry, chemical and physical property links from SpringerMaterials, and patent links from the World Intellectual Properties Organization (WIPO). PubChem's homepage and individual record pages were updated to help users find desired information faster. This update involved a data model change for the data objects used by these pages as well as by programmatic users. Several new services were introduced, including the PubChem Periodic Table and Element pages, Pathway pages, and Knowledge panels. Additionally, in response to the coronavirus disease 2019 (COVID-19) outbreak, PubChem created a special data collection that contains PubChem data related to COVID-19 and the severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2).

0 comments Cited 1117 times – based on 0 reviews      Review now

Bookmark

Record: found
Abstract: found
Article: not found

PubChem Substance and Compound databases

Sunghwan Kim, Paul Thiessen, Evan Bolton … (2015)

PubChem (https://pubchem.ncbi.nlm.nih.gov) is a public repository for information on chemical substances and their biological activities, launched in 2004 as a component of the Molecular Libraries Roadmap Initiatives of the US National Institutes of Health (NIH). For the past 11 years, PubChem has grown to a sizable system, serving as a chemical information resource for the scientific research community. PubChem consists of three inter-linked databases, Substance, Compound and BioAssay. The Substance database contains chemical information deposited by individual data contributors to PubChem, and the Compound database stores unique chemical structures extracted from the Substance database. Biological activity data of chemical substances tested in assay experiments are contained in the BioAssay database. This paper provides an overview of the PubChem Substance and Compound databases, including data sources and contents, data organization, data submission using PubChem Upload, chemical structure standardization, web-based interfaces for textual and non-textual searches, and programmatic access. It also gives a brief description of PubChem3D, a resource derived from theoretical three-dimensional structures of compounds in PubChem, as well as PubChemRDF, Resource Description Framework (RDF)-formatted PubChem data for data sharing, analysis and integration with information contained in other databases.

0 comments Cited 974 times – based on 0 reviews      Review now

Bookmark

All references

Author and article information

Contributors

Leonid Zaslavsky: URI : https://loop.frontiersin.org/people/1140471/overview

Tiejun Cheng: URI : https://loop.frontiersin.org/people/1382008/overview

Asta Gindulyte: URI : https://loop.frontiersin.org/people/1382982/overview

Siqian He: URI : https://loop.frontiersin.org/people/1382097/overview

Sunghwan Kim: URI : https://loop.frontiersin.org/people/1288973/overview

Qingliang Li: URI : https://loop.frontiersin.org/people/1382084/overview

Paul Thiessen: URI : https://loop.frontiersin.org/people/1382134/overview

Bo Yu: URI : https://loop.frontiersin.org/people/1382040/overview

Evan E. Bolton: URI : https://loop.frontiersin.org/people/1383482/overview

Journal

Journal ID (nlm-ta): Front Res Metr Anal

Journal ID (iso-abbrev): Front Res Metr Anal

Journal ID (publisher-id): Front. Res. Metr. Anal.

Title: Frontiers in Research Metrics and Analytics

Publisher: Frontiers Media S.A.

ISSN (Electronic): 2504-0537

Publication date (Electronic): 12 July 2021

Publication date Collection: 2021

Volume: 6

Electronic Location Identifier: 689059

Affiliations

National Center for Biotechnology Information, National Library of Medicine, National Institutes of Health, Bethesda, MD, United States

Author notes

Edited by: Karin Verspoor, RMIT University, Australia

Reviewed by: Bridget McInnes, Virginia Commonwealth University, United States

Nansu Zong, Mayo Clinic, United States

*Correspondence: Leonid Zaslavsky, leonid.zaslavsky@ 123456nih.gov

[†]

ORCID:

Leonid Zaslavsky

orcid.org/0000-0001-5873-4873

Tiejun Cheng

orcid.org/0000-0002-4486-3356

Asta Gindulyte

orcid.org/0000-0001-9600-5305

Siqian He

orcid.org/0000-0002-1707-4167

Sunghwan Kim

orcid.org/0000-0001-9828-2074

Qingliang Li

orcid.org/0000-0002-6453-236X

Paul Thiessen

orcid.org/0000-0002-1992-2086

Bo Yu

orcid.org/0000-0003-3952-8921

Evan Bolton

orcid.org/0000-0002-5959-6190

This article was submitted to Text-mining and Literature-based Discovery, a section of the journal Frontiers in Research Metrics and Analytics

Article

Publisher ID: 689059

DOI: 10.3389/frma.2021.689059

PMC ID: 8311438

PubMed ID: 34322655

SO-VID: 1cd54d70-a1b1-4148-80f8-f659533af058

License:

This work is authored by Zaslavsky*, Cheng, Gindulyte, He, Kim, Li, Thiessen, Yu and Bolton on behalf of the U.S. Government and, as regards Zaslavsky*, Cheng, Gindulyte, He, Kim, Li, Thiessen, Yu and Bolton and the U.S. Government, is not subject to copyright protection in the United States. Foreign and other copyrights may apply. This is an open-access article distributed under the terms of the Creative Commons Attribution License (CC BY). The use, distribution or reproduction in other forums is permitted, provided the original author(s) and the copyright owner(s) are credited and that the original publication in this journal is cited, in accordance with accepted academic practice. No use, distribution or reproduction is permitted which does not comply with these terms.

History

Date received : 31 March 2021

Date accepted : 17 June 2021

Funding

Funded by: U.S. National Library of Medicine 10.13039/100000092

Comments

Comment on this article

scite_

Cited by 13

See all cited by

Most referenced authors 897

See all reference authors

Discovering and Summarizing Relationships Between Chemicals, Genes, Proteins, and Diseases in PubChem

Read this article at

Abstract

Related collections

Genes & Diseases

Most cited references 37

On Information and Sufficiency

PubChem in 2021: new data content and improved web interfaces

PubChem Substance and Compound databases

Author and article information

Contributors

Journal

Affiliations

Author notes

Article

History

Funding

Categories

Comments

Comment on this article

Similar content 348

Cited by 13

Most referenced authors 897