Computational approaches to predict bacteriophage–host relationships

There is no author summary for this article yet. Authors can add summaries to their articles on ScienceOpen to make them more accessible to a non-specialist audience.

Abstract

Metagenomics has changed the face of virus discovery by enabling the accurate identification of viral genome sequences without requiring isolation of the viruses. As a result, metagenomic virus discovery leaves the first and most fundamental question about any novel virus unanswered: What host does the virus infect? The diversity of the global virosphere and the volumes of data obtained in metagenomic sequencing projects demand computational tools for virus–host prediction. We focus on bacteriophages (phages, viruses that infect bacteria), the most abundant and diverse group of viruses found in environmental metagenomes. By analyzing 820 phages with annotated hosts, we review and assess the predictive power of in silico phage–host signals. Sequence homology approaches are the most effective at identifying known phage–host pairs. Compositional and abundance-based methods contain significant signal for phage–host classification, providing opportunities for analyzing the unknowns in viral metagenomes. Together, these computational approaches further our knowledge of the interactions between phages and their hosts. Importantly, we find that all reviewed signals significantly link phages to their hosts, illustrating how current knowledge and insights about the interaction mechanisms and ecology of coevolving phages and bacteria can be exploited to predict phage–host relationships, with potential relevance for medical and industrial applications.

Abstract

New viruses infecting bacteria are increasingly being discovered in many environments through sequence-based explorations. To understand their role in microbial ecosystems, computational tools are indispensable to prioritize and guide experimental efforts. This review assesses and discusses a range of bioinformatic approaches to predict bacteriophage–host relationships when all that is known is their genome sequence.

Related collections

Most cited references 77

Record: found
Abstract: not found
Article: not found

Basic Local Alignment Search Tool

S Altschul (1990)

0 comments Cited 1460 times – based on 0 reviews      Review now

Bookmark

Record: found
Abstract: found
Article: found

Is Open Access

NCBI Reference Sequences (RefSeq): current status, new features and genome annotation policy

Kim D. Pruitt, Tatiana Tatusova, Garth R. Brown … (2011)

The National Center for Biotechnology Information (NCBI) Reference Sequence (RefSeq) database is a collection of genomic, transcript and protein sequence records. These records are selected and curated from public sequence archives and represent a significant reduction in redundancy compared to the volume of data archived by the International Nucleotide Sequence Database Collaboration. The database includes over 16 000 organisms, 2.4 × 106 genomic records, 13 × 106 proteins and 2 × 106 RNA records spanning prokaryotes, eukaryotes and viruses (RefSeq release 49, September 2011). The RefSeq database is maintained by a combined approach of automated analyses, collaboration and manual curation to generate an up-to-date representation of the sequence, its features, names and cross-links to related sources of information. We report here on recent growth, the status of curating the human RefSeq data set, more extensive feature annotation and current policy for eukaryotic genome annotation via the NCBI annotation pipeline. More information about the resource is available online (see http://www.ncbi.nlm.nih.gov/RefSeq/).

0 comments Cited 543 times – based on 0 reviews      Review now

Bookmark

Record: found
Abstract: found
Article: found

Is Open Access

The CRISPRdb database and tools to display CRISPRs and to generate dictionaries of spacers and repeats

Ibtissem Grissa, Gilles Vergnaud, Christine Pourcel (2007)

Background In Archeae and Bacteria, the repeated elements called CRISPRs for "clustered regularly interspaced short palindromic repeats" are believed to participate in the defence against viruses. Short sequences called spacers are stored in-between repeated elements. In the current model, motifs comprising spacers and repeats may target an invading DNA and lead to its degradation through a proposed mechanism similar to RNA interference. Analysis of intra-species polymorphism shows that new motifs (one spacer and one repeated element) are added in a polarised fashion. Although their principal characteristics have been described, a lot remains to be discovered on the way CRISPRs are created and evolve. As new genome sequences become available it appears necessary to develop automated scanning tools to make available CRISPRs related information and to facilitate additional investigations. Description We have produced a program, CRISPRFinder, which identifies CRISPRs and extracts the repeated and unique sequences. Using this software, a database is constructed which is automatically updated monthly from newly released genome sequences. Additional tools were created to allow the alignment of flanking sequences in search for similarities between different loci and to build dictionaries of unique sequences. To date, almost six hundred CRISPRs have been identified in 475 published genomes. Two Archeae out of thirty-seven and about half of Bacteria do not possess a CRISPR. Fine analysis of repeated sequences strongly supports the current view that new motifs are added at one end of the CRISPR adjacent to the putative promoter. Conclusion It is hoped that availability of a public database, regularly updated and which can be queried on the web will help in further dissecting and understanding CRISPR structure and flanking sequences evolution. Subsequent analyses of the intra-species CRISPR polymorphism will be facilitated by CRISPRFinder and the dictionary creator. CRISPRdb is accessible at

0 comments Cited 402 times – based on 0 reviews      Review now

Bookmark

All references

Author and article information

Journal

Journal ID (nlm-ta): FEMS Microbiol Rev

Journal ID (iso-abbrev): FEMS Microbiol. Rev

Journal ID (publisher-id): femsre

Title: FEMS Microbiology Reviews

Publisher: Oxford University Press

ISSN (Print): 0168-6445

ISSN (Electronic): 1574-6976

Publication date (Electronic): 09 December 2015

Publication date (Print): 01 March 2016

Publication date PMC-release: 09 December 2015

Volume: 40

Issue: 2

Pages: 258-272

Affiliations

[1 ]Department of Computer Science, San Diego State University, 5500 Campanile Dr., San Diego, CA 92182, USA

[2 ]Department of Marine Biology, Institute of Biology, Federal University of Rio de Janeiro, CEP 21941-902, Brazil

[3 ]Division of Mathematics and Computer Science, Argonne National Laboratory, 9700 S. Cass Ave, Argonne, IL 60439, USA

[4 ]Department of Microbiology and Immunology, Rega Institute KU Leuven, Herestraat 49, 3000 Leuven, Belgium

[5 ]VIB Center for the Biology of Disease, VIB, Herestraat 49, 3000 Leuven, Belgium

[6 ]Laboratory of Microbiology, Vrije Universiteit Brussel, Pleinlaan 2, 1050 Brussels, Belgium

[7 ]Theoretical Biology and Bioinformatics, Utrecht University, Padualaan 8, 3584 CH, Utrecht, the Netherlands

[8 ]Centre for Molecular and Biomolecular Informatics, Radboud Institute for Molecular Life Sciences, Radboud University Medical Centre, Geert Grooteplein 28, 6525 GA, Nijmegen, the Netherlands

Author notes

[* ] Corresponding author:Theoretical Biology and Bioinformatics, Utrecht University, Padualaan 8, 3584 CH, Utrecht, the Netherlands. Tel: +31-30-2534212

Article

DOI: 10.1093/femsre/fuv048

PMC ID: 5831537

PubMed ID: 26657537

SO-VID: a41836f3-d29a-4a53-86ae-9e771c54b8f3

License:

This is an Open Access article distributed under the terms of the Creative Commons Attribution License ( http://creativecommons.org/licenses/by/4.0/), which permits unrestricted reuse, distribution, and reproduction in any medium, provided the original work is properly cited.

History

Date accepted : 11 November 2015

Date revision received : 29 April 2015

Page count

Pages: 15

Comments

Comment on this article

scite_

Cited by 201

See all cited by

Computational approaches to predict bacteriophage–host relationships

Read this article at

Abstract

Abstract

Related collections

Microbial Genomics

Most cited references 77

Basic Local Alignment Search Tool

NCBI Reference Sequences (RefSeq): current status, new features and genome annotation policy

The CRISPRdb database and tools to display CRISPRs and to generate dictionaries of spacers and repeats

Author and article information

Journal

Affiliations

Author notes

Article

History

Page count

Categories

Comments

Comment on this article

Similar content 143

Cited by 201