95
views
0
recommends
+1 Recommend
0 collections
    0
    shares
      • Record: found
      • Abstract: found
      • Article: found
      Is Open Access

      NRPSpredictor2—a web server for predicting NRPS adenylation domain specificity

      research-article

      Read this article at

      Bookmark
          There is no author summary for this article yet. Authors can add summaries to their articles on ScienceOpen to make them more accessible to a non-specialist audience.

          Abstract

          The products of many bacterial non-ribosomal peptide synthetases (NRPS) are highly important secondary metabolites, including vancomycin and other antibiotics. The ability to predict substrate specificity of newly detected NRPS Adenylation (A-) domains by genome sequencing efforts is of great importance to identify and annotate new gene clusters that produce secondary metabolites. Prediction of A-domain specificity based on the sequence alone can be achieved through sequence signatures or, more accurately, through machine learning methods. We present an improved predictor, based on previous work (NRPSpredictor), that predicts A-domain specificity using Support Vector Machines on four hierarchical levels, ranging from gross physicochemical properties of an A-domain’s substrates down to single amino acid substrates. The three more general levels are predicted with an F-measure better than 0.89 and the most detailed level with an average F-measure of 0.80. We also modeled the applicability domain of our predictor to estimate for new A-domains whether they lie in the applicability domain. Finally, since there are also NRPS that play an important role in natural products chemistry of fungi, such as peptaibols and cephalosporins, we added a predictor for fungal A-domains, which predicts gross physicochemical properties with an F-measure of 0.84. The service is available at http://nrps.informatik.uni-tuebingen.de/.

          Related collections

          Most cited references11

          • Record: found
          • Abstract: found
          • Article: found
          Is Open Access

          The Universal Protein Resource (UniProt) in 2010

          The primary mission of UniProt is to support biological research by maintaining a stable, comprehensive, fully classified, richly and accurately annotated protein sequence knowledgebase, with extensive cross-references and querying interfaces freely accessible to the scientific community. UniProt is produced by the UniProt Consortium which consists of groups from the European Bioinformatics Institute (EBI), the Swiss Institute of Bioinformatics (SIB) and the Protein Information Resource (PIR). UniProt is comprised of four major components, each optimized for different uses: the UniProt Archive, the UniProt Knowledgebase, the UniProt Reference Clusters and the UniProt Metagenomic and Environmental Sequence Database. UniProt is updated and distributed every 3 weeks and can be accessed online for searches or download at http://www.uniprot.org.
            Bookmark
            • Record: found
            • Abstract: found
            • Article: not found

            Structural basis for the activation of phenylalanine in the non-ribosomal biosynthesis of gramicidin S.

            The non-ribosomal synthesis of the cyclic peptide antibiotic gramicidin S is accomplished by two large multifunctional enzymes, the peptide synthetases 1 and 2. The enzyme complex contains five conserved subunits of approximately 60 kDa which carry out ATP-dependent activation of specific amino acids and share extensive regions of sequence similarity with adenylating enzymes such as firefly luciferases and acyl-CoA ligases. We have determined the crystal structure of the N-terminal adenylation subunit in a complex with AMP and L-phenylalanine to 1.9 A resolution. The 556 amino acid residue fragment is folded into two domains with the active site situated at their interface. Each domain of the enzyme has a similar topology to the corresponding domain of unliganded firefly luciferase, but a remarkable relative domain rotation of 94 degrees occurs. This conformation places the absolutely conserved Lys517 in a position to form electrostatic interactions with both ligands. The AMP is bound with the phosphate moiety interacting with Lys517 and the hydroxyl groups of the ribose forming hydrogen bonds with Asp413. The phenylalanine substrate binds in a hydrophobic pocket with the carboxylate group interacting with Lys517 and the alpha-amino group with Asp235. The structure reveals the role of the invariant residues within the superfamily of adenylate-forming enzymes and indicates a conserved mechanism of nucleotide binding and substrate activation.
              Bookmark
              • Record: found
              • Abstract: found
              • Article: not found

              Specificity prediction of adenylation domains in nonribosomal peptide synthetases (NRPS) using transductive support vector machines (TSVMs)

              We present a new support vector machine (SVM)-based approach to predict the substrate specificity of subtypes of a given protein sequence family. We demonstrate the usefulness of this method on the example of aryl acid-activating and amino acid-activating adenylation domains (A domains) of nonribosomal peptide synthetases (NRPS). The residues of gramicidin synthetase A that are 8 Å around the substrate amino acid and corresponding positions of other adenylation domain sequences with 397 known and unknown specificities were extracted and used to encode this physico-chemical fingerprint into normalized real-valued feature vectors based on the physico-chemical properties of the amino acids. The SVM software package SVMlight was used for training and classification, with transductive SVMs to take advantage of the information inherent in unlabeled data. Specificities for very similar substrates that frequently show cross-specificities were pooled to the so-called composite specificities and predictive models were built for them. The reliability of the models was confirmed in cross-validations and in comparison with a currently used sequence-comparison-based method. When comparing the predictions for 1230 NRPS A domains that are currently detectable in UniProt, the new method was able to give a specificity prediction in an additional 18% of the cases compared with the old method. For 70% of the sequences both methods agreed, for <6% they did not, mainly on low-confidence predictions by the existing method. None of the predictive methods could infer any specificity for 2.4% of the sequences, suggesting completely new types of specificity.
                Bookmark

                Author and article information

                Journal
                Nucleic Acids Res
                nar
                nar
                Nucleic Acids Research
                Oxford University Press
                0305-1048
                1362-4962
                1 July 2011
                1 July 2011
                9 May 2011
                9 May 2011
                : 39
                : Web Server issue , Web Server issue
                : W362-W367
                Affiliations
                1Applied Bioinformatics, Center for Bioinformatics, Department of Computer Science, University of Tübingen, Sand 14, 72076 Tübingen, Germany, 2Department of Microbial Physiology, 3Groningen Bioinformatics Center, Groningen Biomolecular Sciences and Biotechnology Institute, University of Groningen, Nijenborgh 7, 9747AG Groningen, The Netherlands, 4Interfaculty Institute of Microbiology and Infection Medicine, University of Tübingen, Auf der Morgenstelle 28 and 5Algorithms in Bioinformatics Group, Center for Bioinformatics/Department of Computer Science, University of Tübingen, Sand 14, 72076 Tübingen, Germany
                Author notes
                *To whom correspondence should be addressed. Tel: +49 7071 29 70464; Fax: +49 7071 29 5152; Email: roettig@ 123456informatik.uni-tuebingen.de
                Article
                gkr323
                10.1093/nar/gkr323
                3125756
                21558170
                6cdb7fa3-15dc-43bd-ac94-414358043664
                © The Author(s) 2011. Published by Oxford University Press.

                This is an Open Access article distributed under the terms of the Creative Commons Attribution Non-Commercial License ( http://creativecommons.org/licenses/by-nc/3.0), which permits unrestricted non-commercial use, distribution, and reproduction in any medium, provided the original work is properly cited.

                History
                : 15 March 2011
                : 12 April 2011
                : 20 April 2011
                Page count
                Pages: 6
                Categories
                Articles

                Genetics
                Genetics

                Comments

                Comment on this article