16
views
0
recommends
+1 Recommend
0 collections
    0
    shares
      • Record: found
      • Abstract: found
      • Article: found
      Is Open Access

      Graph Theory-Based Sequence Descriptors as Remote Homology Predictors

      review-article

      Read this article at

      Bookmark
          There is no author summary for this article yet. Authors can add summaries to their articles on ScienceOpen to make them more accessible to a non-specialist audience.

          Abstract

          Alignment-free (AF) methodologies have increased in popularity in the last decades as alternative tools to alignment-based (AB) algorithms for performing comparative sequence analyses. They have been especially useful to detect remote homologs within the twilight zone of highly diverse gene/protein families and superfamilies. The most popular alignment-free methodologies, as well as their applications to classification problems, have been described in previous reviews. Despite a new set of graph theory-derived sequence/structural descriptors that have been gaining relevance in the detection of remote homology, they have been omitted as AF predictors when the topic is addressed. Here, we first go over the most popular AF approaches used for detecting homology signals within the twilight zone and then bring out the state-of-the-art tools encoding graph theory-derived sequence/structure descriptors and their success for identifying remote homologs. We also highlight the tendency of integrating AF features/measures with the AB ones, either into the same prediction model or by assembling the predictions from different algorithms using voting/weighting strategies, for improving the detection of remote signals. Lastly, we briefly discuss the efforts made to scale up AB and AF features/measures for the comparison of multiple genomes and proteomes. Alongside the achieved experiences in remote homology detection by both the most popular AF tools and other less known ones, we provide our own using the graphical–numerical methodologies, MARCH-INSIDE, TI2BioP, and ProtDCal. We also present a new Python-based tool (SeqDivA) with a friendly graphical user interface (GUI) for delimiting the twilight zone by using several similar criteria.

          Related collections

          Most cited references153

          • Record: found
          • Abstract: not found
          • Article: not found

          Identification of common molecular subsequences.

            Bookmark
            • Record: found
            • Abstract: found
            • Article: not found

            Community structure and metabolism through reconstruction of microbial genomes from the environment.

            Microbial communities are vital in the functioning of all ecosystems; however, most microorganisms are uncultivated, and their roles in natural systems are unclear. Here, using random shotgun sequencing of DNA from a natural acidophilic biofilm, we report reconstruction of near-complete genomes of Leptospirillum group II and Ferroplasma type II, and partial recovery of three other genomes. This was possible because the biofilm was dominated by a small number of species populations and the frequency of genomic rearrangements and gene insertions or deletions was relatively low. Because each sequence read came from a different individual, we could determine that single-nucleotide polymorphisms are the predominant form of heterogeneity at the strain level. The Leptospirillum group II genome had remarkably few nucleotide polymorphisms, despite the existence of low-abundance variants. The Ferroplasma type II genome seems to be a composite from three ancestral strains that have undergone homologous recombination to form a large population of mosaic genomes. Analysis of the gene complement for each organism revealed the pathways for carbon and nitrogen fixation and energy generation, and provided insights into survival strategies in an extreme environment.
              Bookmark
              • Record: found
              • Abstract: not found
              • Article: not found

              On the Complexity of Finite Sequences

                Bookmark

                Author and article information

                Journal
                Biomolecules
                Biomolecules
                biomolecules
                Biomolecules
                MDPI
                2218-273X
                23 December 2019
                January 2020
                : 10
                : 1
                : 26
                Affiliations
                [1 ]CIIMAR/CIMAR, Interdisciplinary Centre of Marine and Environmental Research, University of Porto, Terminal de Cruzeiros do Porto de Leixões, Av. General Norton de Matos s/n 4450-208 Porto, Portugal
                [2 ]Department of Biology, Faculty of Sciences, University of Porto, Rua do Campo Alegre, 4169-007 Porto, Portugal
                [3 ]Departamento de Ciencia de la Computación. Universidad Central ¨Marta Abreu¨ de Las Villas (UCLV), Santa Clara 54830, Cuba; deborah@ 123456uclv.edu.cu
                [4 ]Centro de Bioactivos Químicos (CBQ), Universidad Central ¨Marta Abreu¨ de Las Villas (UCLV), Santa Clara 54830, Cuba; reymolina@ 123456uclv.edu.cu
                [5 ]Programa de Doctorado en Fisicoquímica Molecular, Facultad de Ciencias Exactas, Universidad Andrés Bello, Av. República 239, Santiago 8370146, Chile; eancedeg@ 123456gmail.com
                [6 ]EpiDisease S.L. Spin-Off of Centro de Investigación Biomédica en Red de Enfermedades Raras (CIBERER), 46980 Valencia, Spain; giselle.perez@ 123456epidisease.com
                [7 ]Laboratorio de Biotecnología Aplicada S. de R.L. de C.V., GRECA Inc., Carretera La Piedad-Carapán, km 3.5, La Piedad, Michoacán 59300, Mexico; griva_2010@ 123456hotmail.com
                [8 ]Tecnológico Nacional de México, Instituto Tecnológico de la Piedad, Av. Ricardo Guzmán Romero, Santa Fe, La Piedad de Cavadas, Michoacán 59370, Mexico
                Author notes
                [* ]Correspondence: gchapin@ 123456ciimar.up.pt (G.A.-C.); aantunes@ 123456ciimar.up.pt (A.A.); Tel.: +351-22-340-1813 (A.A.)
                Author information
                https://orcid.org/0000-0002-9908-2418
                https://orcid.org/0000-0002-3478-6035
                https://orcid.org/0000-0002-1328-1732
                Article
                biomolecules-10-00026
                10.3390/biom10010026
                7022958
                31878100
                76d913c5-512b-496a-8a5c-1c42538062ff
                © 2019 by the authors.

                Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license ( http://creativecommons.org/licenses/by/4.0/).

                History
                : 19 November 2019
                : 18 December 2019
                Categories
                Review

                qsar,topological indices,alignment-free,bioinformatics,big data

                Comments

                Comment on this article