11
views
0
recommends
+1 Recommend
0 collections
    0
    shares
      • Record: found
      • Abstract: found
      • Article: found
      Is Open Access

      PhylomeDB V5: an expanding repository for genome-wide catalogues of annotated gene phylogenies

      research-article

      Read this article at

      Bookmark
          There is no author summary for this article yet. Authors can add summaries to their articles on ScienceOpen to make them more accessible to a non-specialist audience.

          Abstract

          PhylomeDB is a unique knowledge base providing public access to minable and browsable catalogues of pre-computed genome-wide collections of annotated sequences, alignments and phylogenies (i.e. phylomes) of homologous genes, as well as to their corresponding phylogeny-based orthology and paralogy relationships. In addition, PhylomeDB trees and alignments can be downloaded for further processing to detect and date gene duplication events, infer past events of inter-species hybridization and horizontal gene transfer, as well as to uncover footprints of selection, introgression, gene conversion, or other relevant evolutionary processes in the genes and organisms of interest. Here, we describe the latest evolution of PhylomeDB (version 5). This new version includes a newly implemented web interface and several new functionalities such as optimized searching procedures, the possibility to create user-defined phylome collections, and a fully redesigned data structure. This release also represents a significant core data expansion, with the database providing access to 534 phylomes, comprising over 8 million trees, and homology relationships for genes in over 6000 species. This makes PhylomeDB the largest and most comprehensive public repository of gene phylogenies. PhylomeDB is available at http://www.phylomedb.org.

          Related collections

          Most cited references48

          • Record: found
          • Abstract: found
          • Article: not found

          Basic local alignment search tool.

          A new approach to rapid sequence comparison, basic local alignment search tool (BLAST), directly approximates alignments that optimize a measure of local similarity, the maximal segment pair (MSP) score. Recent mathematical results on the stochastic properties of MSP scores allow an analysis of the performance of this method as well as the statistical significance of alignments it generates. The basic algorithm is simple and robust; it can be implemented in a number of ways and applied in a variety of contexts including straightforward DNA and protein sequence database searches, motif searches, gene identification searches, and in the analysis of multiple regions of similarity in long DNA sequences. In addition to its flexibility and tractability to mathematical analysis, BLAST is an order of magnitude faster than existing sequence comparison tools of comparable sensitivity.
            Bookmark
            • Record: found
            • Abstract: found
            • Article: found
            Is Open Access

            IQ-TREE: A Fast and Effective Stochastic Algorithm for Estimating Maximum-Likelihood Phylogenies

            Large phylogenomics data sets require fast tree inference methods, especially for maximum-likelihood (ML) phylogenies. Fast programs exist, but due to inherent heuristics to find optimal trees, it is not clear whether the best tree is found. Thus, there is need for additional approaches that employ different search strategies to find ML trees and that are at the same time as fast as currently available ML programs. We show that a combination of hill-climbing approaches and a stochastic perturbation method can be time-efficiently implemented. If we allow the same CPU time as RAxML and PhyML, then our software IQ-TREE found higher likelihoods between 62.2% and 87.1% of the studied alignments, thus efficiently exploring the tree-space. If we use the IQ-TREE stopping rule, RAxML and PhyML are faster in 75.7% and 47.1% of the DNA alignments and 42.2% and 100% of the protein alignments, respectively. However, the range of obtaining higher likelihoods with IQ-TREE improves to 73.3-97.1%.
              Bookmark
              • Record: found
              • Abstract: found
              • Article: not found

              MUSCLE: multiple sequence alignment with high accuracy and high throughput.

              We describe MUSCLE, a new computer program for creating multiple alignments of protein sequences. Elements of the algorithm include fast distance estimation using kmer counting, progressive alignment using a new profile function we call the log-expectation score, and refinement using tree-dependent restricted partitioning. The speed and accuracy of MUSCLE are compared with T-Coffee, MAFFT and CLUSTALW on four test sets of reference alignments: BAliBASE, SABmark, SMART and a new benchmark, PREFAB. MUSCLE achieves the highest, or joint highest, rank in accuracy on each of these sets. Without refinement, MUSCLE achieves average accuracy statistically indistinguishable from T-Coffee and MAFFT, and is the fastest of the tested methods for large numbers of sequences, aligning 5000 sequences of average length 350 in 7 min on a current desktop computer. The MUSCLE program, source code and PREFAB test data are freely available at http://www.drive5. com/muscle.
                Bookmark

                Author and article information

                Contributors
                Journal
                Nucleic Acids Res
                Nucleic Acids Res
                nar
                Nucleic Acids Research
                Oxford University Press
                0305-1048
                1362-4962
                07 January 2022
                30 October 2021
                30 October 2021
                : 50
                : D1
                : D1062-D1068
                Affiliations
                Barcelona Supercomputing Centre (BSC-CNS) . Jordi Girona 29, 08034 Barcelona, Spain
                Institute for Research in Biomedicine (IRB Barcelona), The Barcelona Institute of Science and Technology , Baldiri Reixac 10, 08028 Barcelona, Spain
                Barcelona Supercomputing Centre (BSC-CNS) . Jordi Girona 29, 08034 Barcelona, Spain
                Institute for Research in Biomedicine (IRB Barcelona), The Barcelona Institute of Science and Technology , Baldiri Reixac 10, 08028 Barcelona, Spain
                Barcelona Supercomputing Centre (BSC-CNS) . Jordi Girona 29, 08034 Barcelona, Spain
                Institute for Research in Biomedicine (IRB Barcelona), The Barcelona Institute of Science and Technology , Baldiri Reixac 10, 08028 Barcelona, Spain
                Barcelona Supercomputing Centre (BSC-CNS) . Jordi Girona 29, 08034 Barcelona, Spain
                Barcelona Supercomputing Centre (BSC-CNS) . Jordi Girona 29, 08034 Barcelona, Spain
                Institute for Research in Biomedicine (IRB Barcelona), The Barcelona Institute of Science and Technology , Baldiri Reixac 10, 08028 Barcelona, Spain
                Barcelona Supercomputing Centre (BSC-CNS) . Jordi Girona 29, 08034 Barcelona, Spain
                Institute for Research in Biomedicine (IRB Barcelona), The Barcelona Institute of Science and Technology , Baldiri Reixac 10, 08028 Barcelona, Spain
                Catalan Institution for Research and Advanced Studies (ICREA) , Barcelona, Spain
                Author notes
                To whom correspondence should be addressed. Tel: +34 934021077; Fax: +34 934037114; Email: toni.gabaldon@ 123456crg.eu

                The authors wish it to be known that, in their opinion, these authors should be regarded as joint First Authors.

                Author information
                https://orcid.org/0000-0003-2229-6853
                https://orcid.org/0000-0002-0309-604X
                https://orcid.org/0000-0003-0019-1735
                Article
                gkab966
                10.1093/nar/gkab966
                8728271
                34718760
                6d1066bd-334c-49b8-8dc5-73698aff694e
                © The Author(s) 2021. Published by Oxford University Press on behalf of Nucleic Acids Research.

                This is an Open Access article distributed under the terms of the Creative Commons Attribution-NonCommercial License ( https://creativecommons.org/licenses/by-nc/4.0/), which permits non-commercial re-use, distribution, and reproduction in any medium, provided the original work is properly cited. For commercial re-use, please contact journals.permissions@ 123456oup.com

                History
                : 05 October 2021
                : 02 October 2021
                : 20 September 2021
                Page count
                Pages: 7
                Funding
                Funded by: Spanish Ministry of Science and Innovation, DOI 10.13039/501100004837;
                Award ID: PGC2018-099921-B-I00
                Funded by: Catalan Research Agency;
                Award ID: SGR423
                Funded by: European Union's Horizon 2020 research and innovation programme;
                Award ID: ERC-2016–724173
                Funded by: Gordon and Betty Moore Foundation, DOI 10.13039/100000936;
                Award ID: GBMF9742
                Funded by: Instituto de Salud Carlos III, DOI 10.13039/501100004587;
                Award ID: PT17/0009/0023
                Funded by: H2020 Marie Skłodowska-Curie Actions, DOI 10.13039/100010665;
                Award ID: H2020-MSCA-IF-2017-793699
                Funded by: MICINN, DOI 10.13039/501100004837;
                Award ID: IJC2019- 039402-I
                Categories
                AcademicSubjects/SCI00010
                Database Issue

                Genetics
                Genetics

                Comments

                Comment on this article