25
views
0
recommends
+1 Recommend
0 collections
    0
    shares
      • Record: found
      • Abstract: found
      • Article: found
      Is Open Access

      MGnify: the microbiome analysis resource in 2020

      research-article

      Read this article at

      Bookmark
          There is no author summary for this article yet. Authors can add summaries to their articles on ScienceOpen to make them more accessible to a non-specialist audience.

          Abstract

          MGnify ( http://www.ebi.ac.uk/metagenomics) provides a free to use platform for the assembly, analysis and archiving of microbiome data derived from sequencing microbial populations that are present in particular environments. Over the past 2 years, MGnify (formerly EBI Metagenomics) has more than doubled the number of publicly available analysed datasets held within the resource. Recently, an updated approach to data analysis has been unveiled (version 5.0), replacing the previous single pipeline with multiple analysis pipelines that are tailored according to the input data, and that are formally described using the Common Workflow Language, enabling greater provenance, reusability, and reproducibility. MGnify's new analysis pipelines offer additional approaches for taxonomic assertions based on ribosomal internal transcribed spacer regions (ITS1/2) and expanded protein functional annotations. Biochemical pathways and systems predictions have also been added for assembled contigs. MGnify's growing focus on the assembly of metagenomic data has also seen the number of datasets it has assembled and analysed increase six-fold. The non-redundant protein database constructed from the proteins encoded by these assemblies now exceeds 1 billion sequences. Meanwhile, a newly developed contig viewer provides fine-grained visualisation of the assembled contigs and their enriched annotations.

          Related collections

          Most cited references19

          • Record: found
          • Abstract: found
          • Article: not found

          Gene Ontology: tool for the unification of biology

          Genomic sequencing has made it clear that a large fraction of the genes specifying the core biological functions are shared by all eukaryotes. Knowledge of the biological role of such shared proteins in one organism can often be transferred to other organisms. The goal of the Gene Ontology Consortium is to produce a dynamic, controlled vocabulary that can be applied to all eukaryotes even as knowledge of gene and protein roles in cells is accumulating and changing. To this end, three independent ontologies accessible on the World-Wide Web (http://www.geneontology.org) are being constructed: biological process, molecular function and cellular component.
            Bookmark
            • Record: found
            • Abstract: found
            • Article: found
            Is Open Access

            antiSMASH 4.0—improvements in chemistry prediction and gene cluster boundary identification

            Abstract Many antibiotics, chemotherapeutics, crop protection agents and food preservatives originate from molecules produced by bacteria, fungi or plants. In recent years, genome mining methodologies have been widely adopted to identify and characterize the biosynthetic gene clusters encoding the production of such compounds. Since 2011, the ‘antibiotics and secondary metabolite analysis shell—antiSMASH’ has assisted researchers in efficiently performing this, both as a web server and a standalone tool. Here, we present the thoroughly updated antiSMASH version 4, which adds several novel features, including prediction of gene cluster boundaries using the ClusterFinder method or the newly integrated CASSIS algorithm, improved substrate specificity prediction for non-ribosomal peptide synthetase adenylation domains based on the new SANDPUMA algorithm, improved predictions for terpene and ribosomally synthesized and post-translationally modified peptides cluster products, reporting of sequence similarity to proteins encoded in experimentally characterized gene clusters on a per-protein basis and a domain-level alignment tool for comparative analysis of trans-AT polyketide synthase assembly line architectures. Additionally, several usability features have been updated and improved. Together, these improvements make antiSMASH up-to-date with the latest developments in natural product research and will further facilitate computational genome mining for the discovery of novel bioactive molecules.
              Bookmark
              • Record: found
              • Abstract: found
              • Article: found
              Is Open Access

              AdapterRemoval: easy cleaning of next-generation sequencing reads

              Background With the advent of next-generation sequencing there is an increased demand for tools to pre-process and handle the vast amounts of data generated. One recurring problem is adapter contamination in the reads, i.e. the partial or complete sequencing of adapter sequences. These adapter sequences have to be removed as they can hinder correct mapping of the reads and influence SNP calling and other downstream analyses. Findings We present a tool called AdapterRemoval which is able to pre-process both single and paired-end data. The program locates and removes adapter residues from the reads, it is able to combine paired reads if they overlap, and it can optionally trim low-quality nucleotides. Furthermore, it can look for adapter sequence in both the 5’ and 3’ ends of the reads. This is a flexible tool that can be tuned to accommodate different experimental settings and sequencing platforms producing FASTQ files. AdapterRemoval is shown to be good at trimming adapters from both single-end and paired-end data. Conclusions AdapterRemoval is a comprehensive tool for analyzing next-generation sequencing data. It exhibits good performance both in terms of sensitivity and specificity. AdapterRemoval has already been used in various large projects and it is possible to extend it further to accommodate application-specific biases in the data.
                Bookmark

                Author and article information

                Journal
                Nucleic Acids Res
                Nucleic Acids Res
                nar
                Nucleic Acids Research
                Oxford University Press
                0305-1048
                1362-4962
                08 January 2020
                07 November 2019
                07 November 2019
                : 48
                : D1
                : D570-D578
                Affiliations
                [1 ] European Molecular Biology Laboratory, European Bioinformatics Institute (EMBL-EBI) , Wellcome Genome Campus, Hinxton, Cambridge CB10 1SD, UK
                [2 ] Wellcome Sanger Institute , Wellcome Genome Campus, Hinxton, Cambridge CB10 1SA, UK
                [3 ] Common Workflow Language, a project of the Software Freedom Conservancy , Inc. 137 Montague Street, Suite 380, Brooklyn, NY 11201-3548, USA
                [4 ] Center for Algorithmic Biotechnologies, Saint Petersburg State University , Russia
                Author notes
                To whom correspondence should be addressed. Tel: +44 12234 92679; Email: rdf@ 123456ebi.ac.uk
                Author information
                http://orcid.org/0000-0001-8655-7966
                http://orcid.org/0000-0001-8803-0893
                http://orcid.org/0000-0002-2703-8936
                http://orcid.org/0000-0001-7954-7057
                http://orcid.org/0000-0002-3655-5660
                http://orcid.org/0000-0002-7458-3072
                http://orcid.org/0000-0002-2937-9259
                http://orcid.org/0000-0001-8626-2148
                Article
                gkz1035
                10.1093/nar/gkz1035
                7145632
                31696235
                7a800aad-2a10-40c0-bd22-f32595249218
                © The Author(s) 2019. Published by Oxford University Press on behalf of Nucleic Acids Research.

                This is an Open Access article distributed under the terms of the Creative Commons Attribution License ( http://creativecommons.org/licenses/by/4.0/), which permits unrestricted reuse, distribution, and reproduction in any medium, provided the original work is properly cited.

                History
                : 23 October 2019
                : 30 September 2019
                Page count
                Pages: 9
                Funding
                Funded by: Horizon 2020 10.13039/100010661
                Award ID: 676559
                Award ID: 817729
                Funded by: Biotechnology and Biosciences Research Council
                Award ID: BB/M011755/1
                Award ID: BB/R015228/1
                Award ID: BB/N018354/1
                Funded by: ELIXIR
                Funded by: Russian Fund for Basic Research
                Award ID: 18-54-74004
                Funded by: European Molecular Biology Laboratory 10.13039/100013060
                Categories
                Database Issue

                Genetics
                Genetics

                Comments

                Comment on this article