Integration of text mining and biological network analysis: Identification of essential genes in sulfate-reducing bacteria

There is no author summary for this article yet. Authors can add summaries to their articles on ScienceOpen to make them more accessible to a non-specialist audience.

Abstract

The growth and survival of an organism in a particular environment is highly depends on the certain indispensable genes, termed as essential genes. Sulfate-reducing bacteria (SRB) are obligate anaerobes which thrives on sulfate reduction for its energy requirements. The present study used Oleidesulfovibrio alaskensis G20 (OA G20) as a model SRB to categorize the essential genes based on their key metabolic pathways. Herein, we reported a feedback loop framework for gene of interest discovery, from bio-problem to gene set of interest, leveraging expert annotation with computational prediction. Defined bio-problem was applied to retrieve the genes of SRB from literature databases (PubMed, and PubMed Central) and annotated them to the genome of OA G20. Retrieved gene list was further used to enrich protein–protein interaction and was corroborated to the pangenome analysis, to categorize the enriched gene sets and the respective pathways under essential and non-essential. Interestingly, the sat gene (dde_2265) from the sulfur metabolism was the bridging gene between all the enriched pathways. Gene clusters involved in essential pathways were linked with the genes from seleno-compound metabolism, amino acid metabolism, secondary metabolite synthesis, and cofactor biosynthesis. Furthermore, pangenome analysis demonstrated the gene distribution, where 69.83% of the 116 enriched genes were mapped under “persistent,” inferring the essentiality of these genes. Likewise, 21.55% of the enriched genes, which involves specially the formate dehydrogenases and metallic hydrogenases, appeared under “shell.” Our methodology suggested that semi-automated text mining and network analysis may play a crucial role in deciphering the previously unexplored genes and key mechanisms which can help to generate a baseline prior to perform any experimental studies.

Related collections

Most cited references 112

Record: found
Abstract: found
Article: not found

Cytoscape: a software environment for integrated models of biomolecular interaction networks.

Paul Shannon, Andrew Markiel, Owen Ozier … (2003)

Cytoscape is an open source software project for integrating biomolecular interaction networks with high-throughput expression data and other molecular states into a unified conceptual framework. Although applicable to any system of molecular components and interactions, Cytoscape is most powerful when used in conjunction with large databases of protein-protein, protein-DNA, and genetic interactions that are increasingly available for humans and model organisms. Cytoscape's software Core provides basic functionality to layout and query the network; to visually integrate the network with expression profiles, phenotypes, and other molecular states; and to link the network to databases of functional annotations. The Core is extensible through a straightforward plug-in architecture, allowing rapid development of additional computational analyses and features. Several case studies of Cytoscape plug-ins are surveyed, including a search for interaction pathways correlating with changes in gene expression, a study of protein complexes involved in cellular recovery to DNA damage, inference of a combined physical/functional interaction network for Halobacterium, and an interface to detailed stochastic/kinetic gene regulatory models.

0 comments Cited 11794 times – based on 0 reviews      Review now

Bookmark

Record: found
Abstract: found
Article: not found

Gapped BLAST and PSI-BLAST: a new generation of protein database search programs.

S Altschul (1997)

The BLAST programs are widely used tools for searching protein and DNA databases for sequence similarities. For protein comparisons, a variety of definitional, algorithmic and statistical refinements described here permits the execution time of the BLAST programs to be decreased substantially while enhancing their sensitivity to weak similarities. A new criterion for triggering the extension of word hits, combined with a new heuristic for generating gapped alignments, yields a gapped BLAST program that runs at approximately three times the speed of the original. In addition, a method is introduced for automatically combining statistically significant alignments produced by BLAST into a position-specific score matrix, and searching the database using this matrix. The resulting Position-Specific Iterated BLAST (PSI-BLAST) program runs at approximately the same speed per iteration as gapped BLAST, but in many cases is much more sensitive to weak but biologically relevant sequence similarities. PSI-BLAST is used to uncover several new and interesting members of the BRCT superfamily.

0 comments Cited 4289 times – based on 0 reviews      Review now

Bookmark

Record: found
Abstract: found
Article: found

Is Open Access

The STRING database in 2021: customizable protein–protein networks, and functional characterization of user-uploaded gene/measurement sets

Damian Szklarczyk, Annika Gable, Katerina C. Nastou … (2020)

Abstract Cellular life depends on a complex web of functional associations between biomolecules. Among these associations, protein–protein interactions are particularly important due to their versatility, specificity and adaptability. The STRING database aims to integrate all known and predicted associations between proteins, including both physical interactions as well as functional associations. To achieve this, STRING collects and scores evidence from a number of sources: (i) automated text mining of the scientific literature, (ii) databases of interaction experiments and annotated complexes/pathways, (iii) computational interaction predictions from co-expression and from conserved genomic context and (iv) systematic transfers of interaction evidence from one organism to another. STRING aims for wide coverage; the upcoming version 11.5 of the resource will contain more than 14 000 organisms. In this update paper, we describe changes to the text-mining system, a new scoring-mode for physical interactions, as well as extensive user interface features for customizing, extending and sharing protein networks. In addition, we describe how to query STRING with genome-wide, experimental data, including the automated detection of enriched functionalities and potential biases in the user's query data. The STRING resource is available online, at https://string-db.org/.

0 comments Cited 2565 times – based on 0 reviews      Review now

Bookmark

All references

Author and article information

Contributors

Priya Saxena: URI : https://loop.frontiersin.org/people/1489693/overview

Shailabh Rauniyar: URI : https://loop.frontiersin.org/people/470862/overview

Payal Thakur: URI : https://loop.frontiersin.org/people/1476032/overview

Ram Nageena Singh: URI : https://loop.frontiersin.org/people/287271/overview

Alain Bomgni: URI : https://loop.frontiersin.org/people/1900579/overview

Mathew O. Alaba: URI : https://loop.frontiersin.org/people/2261059/overview

Etienne Z. Gnimpieba: URI : https://loop.frontiersin.org/people/1421699/overview

Rajesh Kumar Sani: URI : https://loop.frontiersin.org/people/52607/overview

Journal

Journal ID (nlm-ta): Front Microbiol

Journal ID (iso-abbrev): Front Microbiol

Journal ID (publisher-id): Front. Microbiol.

Title: Frontiers in Microbiology

Publisher: Frontiers Media S.A.

ISSN (Electronic): 1664-302X

Publication date (Electronic): 13 April 2023

Publication date Collection: 2023

Volume: 14

Electronic Location Identifier: 1086021

Affiliations

[1] ¹Department of Chemical and Biological Engineering, South Dakota School of Mines and Technology , Rapid City, SD, United States

[2] ²Data Driven Material Discovery Center for Bioengineering Innovation, South Dakota School of Mines and Technology , Rapid City, SD, United States

[3] ³2-Dimensional Materials for Biofilm Engineering, Science and Technology, South Dakota School of Mines and Technology , Rapid City, SD, United States

[4] ⁴Department of Biomedical Engineering, University of South Dakota , Sioux Falls, SD, United States

[5] ⁵BuG ReMeDEE Consortium, South Dakota School of Mines and Technology , Rapid City, SD, United States

Author notes

Edited by: George Tsiamis, University of Patras, Greece

Reviewed by: Hongwei Liu, Sun Yat-sen University, Zhuhai Campus, China; Yuejun Wang, University of California, San Francisco, United States

*Correspondence: Etienne Z. Gnimpieba, etienne.gnimpieba@ 123456usd.edu

Rajesh Kumar Sani, rajesh.sani@ 123456sdsmt.edu

This article was submitted to Systems Microbiology, a section of the journal Frontiers in Microbiology

Article

DOI: 10.3389/fmicb.2023.1086021

PMC ID: 10133479

PubMed ID: 37125195

SO-VID: 2fbd5780-08d9-48a6-94f6-afdfd78cafa5

License:

This is an open-access article distributed under the terms of the Creative Commons Attribution License (CC BY). The use, distribution or reproduction in other forums is permitted, provided the original author(s) and the copyright owner(s) are credited and that the original publication in this journal is cited, in accordance with accepted academic practice. No use, distribution or reproduction is permitted which does not comply with these terms.

History

Date received : 31 October 2022

Date accepted : 23 March 2023

Page count

Figures: 8, Tables: 3, Equations: 0, References: 117, Pages: 17, Words: 13581

Funding

Funded by: National Science Foundation, doi 10.13039/501100008982;

Award ID: #1736255

Award ID: #1849206

Award ID: #1920954

Comments

Comment on this article

scite_

Cited by 4

See all cited by

Most referenced authors 1,984

See all reference authors

Integration of text mining and biological network analysis: Identification of essential genes in sulfate-reducing bacteria

Read this article at

Abstract

Related collections

Role of Microbes in Soil Fertility and Human Health

Most cited references 112

Cytoscape: a software environment for integrated models of biomolecular interaction networks.

Gapped BLAST and PSI-BLAST: a new generation of protein database search programs.

The STRING database in 2021: customizable protein–protein networks, and functional characterization of user-uploaded gene/measurement sets

Author and article information

Contributors

Journal

Affiliations

Author notes

Article

History

Page count

Funding

Categories

Comments

Comment on this article

Similar content 236

Cited by 4

Most referenced authors 1,984