25
views
0
recommends
+1 Recommend
0 collections
    0
    shares
      • Record: found
      • Abstract: found
      • Article: found
      Is Open Access

      Genome Analysis of Legionella pneumophila Strains Using a Mixed-Genome Microarray

      research-article

      Read this article at

      Bookmark
          There is no author summary for this article yet. Authors can add summaries to their articles on ScienceOpen to make them more accessible to a non-specialist audience.

          Abstract

          Background

          Legionella, the causative agent for Legionnaires’ disease, is ubiquitous in both natural and man-made aquatic environments. The distribution of Legionella genotypes within clinical strains is significantly different from that found in environmental strains. Developing novel genotypic methods that offer the ability to distinguish clinical from environmental strains could help to focus on more relevant (virulent) Legionella species in control efforts. Mixed-genome microarray data can be used to perform a comparative-genome analysis of strain collections, and advanced statistical approaches, such as the Random Forest algorithm are available to process these data.

          Methods

          Microarray analysis was performed on a collection of 222 Legionella pneumophila strains, which included patient-derived strains from notified cases in the Netherlands in the period 2002–2006 and the environmental strains that were collected during the source investigation for those patients within the Dutch National Legionella Outbreak Detection Programme. The Random Forest algorithm combined with a logistic regression model was used to select predictive markers and to construct a predictive model that could discriminate between strains from different origin: clinical or environmental.

          Results

          Four genetic markers were selected that correctly predicted 96% of the clinical strains and 66% of the environmental strains collected within the Dutch National Legionella Outbreak Detection Programme.

          Conclusions

          The Random Forest algorithm is well suited for the development of prediction models that use mixed-genome microarray data to discriminate between Legionella strains from different origin. The identification of these predictive genetic markers could offer the possibility to identify virulence factors within the Legionella genome, which in the future may be implemented in the daily practice of controlling Legionella in the public health environment.

          Related collections

          Most cited references28

          • Record: found
          • Abstract: found
          • Article: found
          Is Open Access

          Gene selection and classification of microarray data using random forest

          Background Selection of relevant genes for sample classification is a common task in most gene expression studies, where researchers try to identify the smallest possible set of genes that can still achieve good predictive performance (for instance, for future use with diagnostic purposes in clinical practice). Many gene selection approaches use univariate (gene-by-gene) rankings of gene relevance and arbitrary thresholds to select the number of genes, can only be applied to two-class problems, and use gene selection ranking criteria unrelated to the classification algorithm. In contrast, random forest is a classification algorithm well suited for microarray data: it shows excellent performance even when most predictive variables are noise, can be used when the number of variables is much larger than the number of observations and in problems involving more than two classes, and returns measures of variable importance. Thus, it is important to understand the performance of random forest with microarray data and its possible use for gene selection. Results We investigate the use of random forest for classification of microarray data (including multi-class problems) and propose a new method of gene selection in classification problems based on random forest. Using simulated and nine microarray data sets we show that random forest has comparable performance to other classification methods, including DLDA, KNN, and SVM, and that the new gene selection procedure yields very small sets of genes (often smaller than alternative methods) while preserving predictive accuracy. Conclusion Because of its performance and features, random forest and gene selection using random forest should probably become part of the "standard tool-box" of methods for class prediction and gene selection with microarray data.
            Bookmark
            • Record: found
            • Abstract: found
            • Article: not found

            Legionnaires' disease: description of an epidemic of pneumonia.

            An explosive, common-source outbreak of pneumonia caused by a previously unrecognized bacterium affected primarily persons attending an American Legion convention in Philadelphia in July, 1976. Twenty-nine of 182 cases were fatal. Spread of the bacterium appeared to be air borne. The source of the bacterium was not found, but epidemiologic analysis suggested that exposure may have occurred in the lobby of the headquarters hotel or in the area immediately surrounding the hotel. Person-to-person spread seemed not to have occurred. Many hotel employees appeared to be immune, suggesting that the agent may have been present in the vicinity, perhaps intermittently, for two or more years.
              Bookmark
              • Record: found
              • Abstract: found
              • Article: not found

              Distribution of Legionella species and serogroups isolated by culture in patients with sporadic community-acquired legionellosis: an international collaborative survey.

              This international collaborative survey identified culture-confirmed legionellosis in 508 patients with sporadic community-acquired legionellosis. Legionella pneumophila constituted 91.5% of the isolates. Serogroup 1 was the predominant serogroup (84.2%), and serogroups 2-13 (7.4%) accounted for the remaining serogroups. The Legionella species most commonly isolated were L. longbeachae (3.9%) and L. bozemanii (2.4%), followed by L. micdadei, L. dumoffii, L. feeleii, L. wadsworthii, and L. anisa (2.2% combined). L. longbeachae constituted 30.4% of the community-acquired Legionella isolates in Australia and New Zealand.
                Bookmark

                Author and article information

                Contributors
                Role: Editor
                Journal
                PLoS One
                PLoS ONE
                plos
                plosone
                PLoS ONE
                Public Library of Science (San Francisco, USA )
                1932-6203
                2012
                18 October 2012
                : 7
                : 10
                : e47437
                Affiliations
                [1 ]Regional Public Health Laboratory Kennemerland, Haarlem, The Netherlands
                [2 ]Department of Community Medicine, United Arab Emirates University, Al-Ain, United Arab Emirates
                [3 ]TNO Microbiology and Systems Biology, Zeist, The Netherlands
                University of Louisville, United States of America
                Author notes

                Competing Interests: Vitens water supply company ( www.vitens.nl) has provided financial support for this study, and this does not alter the authors’ adherence to all the PLOS ONE policies on sharing data and materials.

                Conceived and designed the experiments: SE NN FS JD. Performed the experiments: SE NN FS RJ. Analyzed the data: SE NN FS RJ JD. Contributed reagents/materials/analysis tools: RJ NN. Wrote the paper: SE NN FS RJ JD.

                Article
                PONE-D-12-16769
                10.1371/journal.pone.0047437
                3475688
                23094048
                40e88cbd-1572-4ea5-8357-db86bb9c8853
                Copyright @ 2012

                This is an open-access article distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited.

                History
                : 11 June 2012
                : 17 September 2012
                Page count
                Pages: 6
                Funding
                This study was supported by a grant from Vitens water supply company ( www.vitens.nl). This company had no role in study design, data collection and analysis, decision to publish, or preparation of the manuscript.
                Categories
                Research Article
                Biology
                Computational Biology
                Microarrays
                Genomics
                Comparative Genomics
                Genome Analysis Tools
                Genome Sequencing
                Mathematics
                Applied Mathematics
                Algorithms
                Medicine
                Epidemiology
                Infectious Diseases
                Bacterial Diseases
                Legionella

                Uncategorized
                Uncategorized

                Comments

                Comment on this article