239
views
0
recommends
+1 Recommend
0 collections
    0
    shares
      • Record: found
      • Abstract: found
      • Article: found
      Is Open Access

      Finished Genome of the Fungal Wheat Pathogen Mycosphaerella graminicola Reveals Dispensome Structure, Chromosome Plasticity, and Stealth Pathogenesis

      research-article
      1 , * , 2 , 3 , 2 , 1 , 4 , 5 , 2 , 6 , 7 , 7 , 8 , 9 , 10 , 11 , 7 , 2 , 12 , 13 , 12 , 8 , 14 , 11 , 7 , 15 , 16 , 2 , 2 , 8 , 14 , 17 , 18 , 19 , 15 , 15 , 7 , 14 , 15 , 20 , 21 , 2 , 3 , 8 , 7 , 6 , 7 , 2 , 7 , 15 , 22 , 7 ,   23 , 2 , 2 , 23 , 23 , 24 , 7 , * , 2 , *
      PLoS Genetics
      Public Library of Science

      Read this article at

      Bookmark
          There is no author summary for this article yet. Authors can add summaries to their articles on ScienceOpen to make them more accessible to a non-specialist audience.

          Abstract

          The plant-pathogenic fungus Mycosphaerella graminicola (asexual stage: Septoria tritici) causes septoria tritici blotch, a disease that greatly reduces the yield and quality of wheat. This disease is economically important in most wheat-growing areas worldwide and threatens global food production. Control of the disease has been hampered by a limited understanding of the genetic and biochemical bases of pathogenicity, including mechanisms of infection and of resistance in the host. Unlike most other plant pathogens, M. graminicola has a long latent period during which it evades host defenses. Although this type of stealth pathogenicity occurs commonly in Mycosphaerella and other Dothideomycetes, the largest class of plant-pathogenic fungi, its genetic basis is not known. To address this problem, the genome of M. graminicola was sequenced completely. The finished genome contains 21 chromosomes, eight of which could be lost with no visible effect on the fungus and thus are dispensable. This eight-chromosome dispensome is dynamic in field and progeny isolates, is different from the core genome in gene and repeat content, and appears to have originated by ancient horizontal transfer from an unknown donor. Synteny plots of the M. graminicola chromosomes versus those of the only other sequenced Dothideomycete, Stagonospora nodorum, revealed conservation of gene content but not order or orientation, suggesting a high rate of intra-chromosomal rearrangement in one or both species. This observed “mesosynteny” is very different from synteny seen between other organisms. A surprising feature of the M. graminicola genome compared to other sequenced plant pathogens was that it contained very few genes for enzymes that break down plant cell walls, which was more similar to endophytes than to pathogens. The stealth pathogenesis of M. graminicola probably involves degradation of proteins rather than carbohydrates to evade host defenses during the biotrophic stage of infection and may have evolved from endophytic ancestors.

          Author Summary

          The plant-pathogenic fungus Mycosphaerella graminicola causes septoria tritici blotch, one of the most economically important diseases of wheat worldwide and a potential threat to global food production. Unlike most other plant pathogens, M. graminicola has a long latent period during which it seems able to evade host defenses, and its genome appears to be unstable with many chromosomes that can change size or be lost during sexual reproduction. To understand its unusual mechanism of pathogenicity and high genomic plasticity, the genome of M. graminicola was sequenced more completely than that of any other filamentous fungus. The finished sequence contains 21 chromosomes, eight of which were different from those in the core genome and appear to have originated by ancient horizontal transfer from an unknown donor. The dispensable chromosomes collectively comprise the dispensome and showed extreme plasticity during sexual reproduction. A surprising feature of the M. graminicola genome was a low number of genes for enzymes that break down plant cell walls; this may represent an evolutionary response to evade detection by plant defense mechanisms. The stealth pathogenicity of M. graminicola may involve degradation of proteins rather than carbohydrates and could have evolved from an endophytic ancestor.

          Related collections

          Most cited references37

          • Record: found
          • Abstract: found
          • Article: not found

          Ab initio gene finding in Drosophila genomic DNA.

          Ab initio gene identification in the genomic sequence of Drosophila melanogaster was obtained using (human gene predictor) and Fgenesh programs that have organism-specific parameters for human, Drosophila, plants, yeast, and nematode. We did not use information about cDNA/EST in most predictions to model a real situation for finding new genes because information about complete cDNA is often absent or based on very small partial fragments. We investigated the accuracy of gene prediction on different levels and designed several schemes to predict an unambiguous set of genes (annotation CGG1), a set of reliable exons (annotation CGG2), and the most complete set of exons (annotation CGG3). For 49 genes, protein products of which have clear homologs in protein databases, predictions were recomputed by Fgenesh+ program. The first annotation serves as the optimal computational description of new sequence to be presented in a database. Reliable exons from the second annotation serve as good candidates for selecting the PCR primers for experimental work for gene structure verification. Our results shows that we can identify approximately 90% of coding nucleotides with 20% false positives. At the exon level we accurately predicted 65% of exons and 89% including overlapping exons with 49% false positives. Optimizing accuracy of prediction, we designed a gene identification scheme using Fgenesh, which provided sensitivity (Sn) = 98% and specificity (Sp) = 86% at the base level, Sn = 81% (97% including overlapping exons) and Sp = 58% at the exon level and Sn = 72% and Sp = 39% at the gene level (estimating sensitivity on std1 set and specificity on std3 set). In general, these results showed that computational gene prediction can be a reliable tool for annotating new genomic sequences, giving accurate information on 90% of coding sequences with 14% false positives. However, exact gene prediction (especially at the gene level) needs additional improvement using gene prediction algorithms. The program was also tested for predicting genes of human Chromosome 22 (the last variant of Fgenesh can analyze the whole chromosome sequence). This analysis has demonstrated that the 88% of manually annotated exons in Chromosome 22 were among the ab initio predicted exons. The suite of gene identification programs is available through the WWW server of Computational Genomics Group at http://genomic.sanger.ac.uk/gf. html.
            Bookmark
            • Record: found
            • Abstract: found
            • Article: not found

            A comprehensive evolutionary classification of proteins encoded in complete eukaryotic genomes

            Background Comparative analysis of genomes from distant species provides new insights into gene functions, genome evolution and phylogeny. In particular, the comparative genomics of prokaryotes has revealed previously underappreciated major trends in genome evolution, namely, extensive lineage-specific gene loss and horizontal gene transfer (HGT) [1-7]. To efficiently extract functional and evolutionary information from multiple genomes, rational classification of genes based on homologous relationships is indispensable. The two principal classes of homologs are orthologs and paralogs [8-11]. Orthologs are defined as homologous genes that evolved via vertical descent from a single ancestral gene in the last common ancestor of the compared species. Paralogs are homologous genes, which, at some stage of evolution, have evolved by duplication of an ancestral gene. Orthology and paralogy are intimately linked because, if a duplication (or a series of duplications) occurs after the speciation event that separated the compared species, orthology becomes a relationship between sets of paralogs, rather than individual genes (in which case, such genes are called co-orthologs). Correct identification of orthologs and paralogs is of central importance for both the functional and evolutionary aspects of comparative genomics [12,13]. Orthologs typically occupy the same functional niche in different organisms; in contrast, paralogs evolve to functional diversification as they diverge after the duplication [14-16]. Therefore, robustness of genome annotation depends on accurate identification of orthologs. A clear demarcation of orthologs and paralogs is also required for constructing evolutionary scenarios, which include, along with vertical inheritance, lineage-specific gene loss and HGT [5,7]. In principle, orthologs, including co-orthologs, should be identified by means of phylogenetic analysis of entire families of homologous proteins, which is expected to define orthologous protein sets as clades [17-19]. However, for genome-wide protein sets, such analysis remains extremely labor-intensive, and error-prone as well. Accordingly, procedures have been developed for identifying sets of likely orthologs without explicit referral to phylogenetic analysis. These procedures are based on the notion of a genome-specific best hit (BeT), that is, the protein from a target genome that is most similar (typically in terms of similarity scores computed using BLAST or another sequence-comparison method) to a given protein from the query genome [20,21]. The assumption central to this approach is that orthologs have a greater similarity to each other than to any other protein from the respective genomes. When multiple genomes are analyzed, pairs of probable orthologs detected on the basis of BeTs are combined into orthologous clusters represented in all or a subset of the analyzed genomes [20,22]. This approach, amended with additional procedures for detecting co-orthologous protein sets and for treating multidomain proteins, was implemented in the database of Clusters of Orthologous Groups (COGs) of proteins [20,23,24]. The current COG set includes approximately 70% of the proteins encoded in 69 genomes of prokaryotes and unicellular eukaryotes [25]. The COGs have been used for functional annotation of new genomes [26-29], target selection in structural genomics [30-32], identification of potential drug targets [33,34] and genome-wide evolutionary studies [4,13,35-38]. Sonnhammer and co-workers independently developed a similar methodology for identification of co-orthologous protein sets from pairwise genome comparisons and applied it to the sequenced eukaryotic genomes [39]. A central notion introduced in the context of the COG analysis is that of a phyletic pattern, that is, the pattern of representation (presence-absence) of analyzed species in each COG [13,20]. Similar concepts have been independently developed and applied by others [40,41]. The COGs show a remarkable scatter of phyletic patterns, with only a small minority represented in all sequenced genomes. A recent quantitative study showed that parsimonious evolutionary scenarios for most COGs involve multiple events of gene loss and HGT [7]. Both similarity and complementarity among the phyletic patterns of COGs, in conjunction with other information, such as conservation of gene order, have been successfully employed to predict gene functions [13,42,43]. The comparison of phyletic pattern has been formalized in set-theoretical algorithms and systematically applied to the computational and experimental analysis of bacterial flagellar systems, which demonstrated the considerable robustness of this approach [44]. We recently extended the system of orthologous protein clusters to complex, multicellular eukaryotes [25]. Here, we examine the phyletic patterns of KOGs in connection with known and predicted protein functions. In-depth analysis of some of these KOGs resulted in prediction of previously uncharacterized, but apparently essential, conserved eukaryotic protein functions. We also reconstruct the parsimonious scenario of evolution of the crown-group eukaryotes by assigning the loss of genes (KOGs) and emergence of new genes to the branches of the phylogenetic tree and explicitly delineate the minimal gene sets for various ancestral forms. To our knowledge, this is the first systematic, genome-wide examination of the sets of orthologous genes in eukaryotes. Results and discussion KOGs for seven sequenced eukaryotic genomes: functional and evolutionary implications of phyletic patterns Eukaryotic KOGs were constructed on the basis of the comparison of proteins encoded in the genomes of three animals (Homo sapiens [45], the fruit fly Drosophila melanogaster [46] and the nematode Caenorhabditis elegans [47]), the green plant Arabidopsis thaliana (thale cress) [48], two fungi (budding yeast Saccharomyces cerevisiae [49] and fission yeast Schizosaccharomyces pombe [50]) and the microsporidian Encephalitozoon cuniculi [51]. The procedure for KOG construction was a modification of the one previously used for COGs [20,24] and is described in greater detail elsewhere ([25]; see also Materials and methods). An important difference stems from the fact that complex eukaryotes encode many more multidomain proteins than prokaryotes and, furthermore, orthologous eukaryotic proteins often differ in domain composition, with additional domains accrued in more complex forms [3,45]. Accordingly, and unlike the original COG construction procedure, probable orthologs with different domain architectures were assigned to one KOG and were not split if they shared a common core of domains. In addition to the KOGs, which consisted of at least three species, clusters of putative orthologs from two species (TWOGs) and lineage-specific expansions (LSEs) of paralogs from each of the analyzed genomes were identified ([25,52]; see also Materials and methods). In most of the analyses discussed below, KOGs and TWOGs are treated together, unless otherwise specified. Figure 1 shows the assignment of the proteins from each of the analyzed eukaryotes to KOGs with different numbers of species, TWOGs and LSEs. The fraction of proteins assigned to KOGs tends to decrease with the increasing genome size, from 81% for S. pombe to 51% for the largest, the human genome. (For reasons that remain unclear, but might be related to its intracellular parasitic lifestyle, E. cuniculi has a relatively small fraction of conserved proteins that belonged to KOGs: approximately 60%.) The contribution of LSEs shows the opposite trend, being the greatest in the largest genomes, that is, human and Arabidopsis, and minimal in the microsporidian (Figure 1). A notable difference was observed between eukaryotes in terms of their representation in KOGs found in different numbers of species. While the three unicellular organisms are represented mainly in the highly conserved seven- or six-species KOGs, a much larger fraction of the gene set in animals and Arabidopsis is accounted for by LSEs, and by KOGs found in three or four genomes. These include animal-specific genes and genes that are shared by plants and animals but not by fungi and the microsporidian (Figure 1). The large number of KOGs in the latter group (700 KOGs represented in Arabidopsis and at least two animal species) is notable and probably results from massive, lineage-specific loss of genes during eukaryotic evolution (see below). The phyletic patterns of KOGs reveal both the existence of a conserved eukaryotic gene core and substantial diversity. The 'pan-eukaryotic' genes, which are represented in each of the seven analyzed genomes, account for around 20% of the KOGs, and approximately the same number of KOGs include all species except for the microsporidian, an intracellular parasite with a highly degraded genome [51]. Among the remaining KOGs, a large group includes representatives of the three analyzed animal species (worm, fly and humans) but a substantial fraction (approximately 30%) are KOGs with unexpected patterns, for example, one animal, one plant and one fungal species (see [53] and examples in Table 1). During the manual curation of the KOG set, the KOGs with unexpected patterns were scrutinized in an effort to detect potential highly diverged members from one or more of the analyzed genomes. Some of these unexpected patterns might indicate that a gene is still missing in the analyzed set of protein sequences from one or more of the species included; reports of newly discovered genes have appeared since the release of the initial reports on genome sequences of complex eukaryotes, for example, as a result of massive sequencing of human cDNAs [54], exhaustive annotation of the Drosophila genome [55] and comparative analysis of closely related yeast genomes [56]. The unexpected phyletic patterns seem, however, largely to reflect the extensive, lineage-specific gene loss that is characteristic of eukaryotic evolution [57]; on many occasions, this scenario is supported by the presence of orthologs in other eukaryotic lineages and/or in prokaryotes (Table 1). However, interesting exceptions to the multiple loss explanation might exist as exemplified by the ATP/ADP-translocase, which is present in Arabidopsis and Encephalitozoon and could have evolved via independent HGT from intracellular bacterial parasites ([58] and Table 2). Common phyletic patterns of genes that otherwise were not suspected to be functionally linked might suggest the existence of such connections and prompt additional analysis leading to concrete functional predictions [42,59-61]. The pair of KOG5324 and KOG4246 is a case in point that has not been described previously. The initial observation that these KOGs share the same unusual pattern of presence-absence in eukaryotes, and have similar phyletic patterns in prokaryotes, with a ubiquitous presence in archaea, prompted a more detailed examination of the multiple alignments of the respective proteins and the conservation of the (predicted) operon organization in archaea and bacteria (Table 2 and data not shown). The combination of clues from these analyses suggests that the two proteins interact in a still uncharacterized pathway of RNA processing, which also includes RNA 3'-phosphate cyclase (KOG3980)) [62] and cytosine-C5-methylase (NOL1/NOP2 in eukaryotes; KOG1122). The proteins in KOG3833 and KOG4528 are likely to represent novel enzyme families, possibly a kinase-phosphatase pair (E.V.K. and L. Aravind, unpublished data). Notably, these predicted new enzymes are present in animals and E. cuniculi but not in Arabidopsis or yeasts. In contrast, KOG3980 is present in all analyzed eukaryotic genomes except for Arabidopsis, whereas KOG1122 is pan-eukaryotic. These differences in the phyletic patterns of the components of the predicted pathway are concordant with the patterns in eukaryotes in that. Figure 2 shows the distribution of known and predicted functions of eukaryotic proteins among 20 functional categories for the entire set of KOGs and, separately, for KOGs represented in six or seven species and the animal-specific KOGs. Compared to the functional breakdown of prokaryotic COGs [25], the prevalence of signal transduction is notable among eukaryotes. This feature is particularly prominent in animal-specific KOGs, whereas the highly conserved set is comparatively enriched in proteins that are involved in translation, transcription, chaperone-like functions, cell cycle control and chromatin dynamics (Figure 2). The large number of KOGs for which only general functional prediction was feasible, and those whose functions remain unknown, even among the subset that is represented in six or seven eukaryotic species, emphasizes that our current understanding of eukaryotic biology is seriously lacking with even in respect of the functions of highly conserved genes. The distribution of KOGs by the number of paralogs in each genome is shown in Figure 3. The preponderance of lineage-specific duplication of conserved genes, that is, intra-KOG LSEs, in multicellular eukaryotes is obvious. Cases when a single gene in yeast or, particularly, Encephalitozoon, has two or more co-orthologs in animals and/or plants are most common in KOGs, whereas the reverse situation is rare. These observations support the notion of the major contribution of LSE to the evolution of eukaryotic complexity [52]. However, 131 KOGs are represented by a single ortholog in all genomes compared (Table 2) and a substantial number of KOGs have one member from a majority of the genomes (data not shown). Recent theoretical modeling of the evolution of paralogous families has suggested that, in general, ancient protein families tend to have multiple paralogs [5,63]. Therefore, whenever a KOG has a single member in all or most species, this should be attributed to selection against duplication of this particular gene. A prominent cause of such selection could be the involvement of the respective gene products in essential multisubunit complexes, such that imbalance between subunits leads to deleterious effects [64]. Known and new functions of single-member, pan-eukaryotic KOGs We examined in greater detail the 131 KOGs that are represented by a single gene in each of the seven genomes (Table 2). As can be envisaged from their presence in diverse eukaryotic taxa, including the 'minimal' genome of Encephalitozoon, and as shown by comparison with the knockout phenotype data (Table 2 and see below), these pan-eukaryotic KOGs are of particular biological importance. For the great majority of these KOGs (113 of the 131), the function has been experimentally determined or confidently predicted to a varying degree of detail using computational methods (Table 2). However, around 20 KOGs from this set remained uncharacterized at the time of this analysis and, for all but two of these, substantial functional inferences could be drawn through a combination of sequence-profile analysis, structure prediction and genomic-context analysis of prokaryotic homologs (Table 2). Some of these predicted new functions are variations on well-known themes, such as two predicted PP-loop ATPases, which are probably involved in novel, essential RNA modifications (KOGs 2522 and 2316) or two predicted E3 components of ubiquitin ligases (KOGs 0396 and 3800). Other predicted functions appear to be completely new, such as proteins in KOG3176 and 3303 which are likely to be essential components of eukaryotic replication and/or repair systems. Each of these uncharacterized but ubiquitous and largely essential eukaryotic genes is an attractive target for experimental studies. Examination of the experimentally characterized and predicted functions of pan-eukaryotic, single-member KOGs leads to interesting conclusions. Nearly all the functionally characterized KOGs in this set consist of proteins that are subunits of known multiprotein complexes (Table 2). The most prominent of these are the complexes involved in rRNA processing and ribosome assembly, such as the recently discovered rRNA processosome and the pre-40S subunit, as well as the spliceosome, and various complexes involved in transcription (Table 2). Accordingly, this set of KOGs is markedly enriched for proteins involved in various forms of RNA processing, assembly of ribonucleoprotein (RNP) particles and transcription. In addition, KOGs in the single-member pan-eukaryotic set include subunits of molecular complexes that are not directly related to RNA processing, such as the proteasome, the TCP-1 chaperonin complex [65] and the TRAPP complex involved in protein trafficking [66]. Altogether, more than 80% of the yeast proteins in the pan-eukaryotic, single-member KOGs belong to known macromolecular complexes included in the MIPS database [67], as compared to around 64% for all yeast proteins in the KOGs, which is a moderate but statistically highly significant excess (data not shown). This preponderance of multiprotein complex formation among the single-member pan-eukaryotic KOGs is fully compatible with the balance hypothesis [64]. The most unexpected observation regarding the single-member, pan-eukaryotic KOGs, is probably that in 14 of these proteins, the only detectable domain was the WD40 repeat (Table 2). This is particularly notable because WD40-repeat proteins, which are extremely abundant in eukaryotes and are present in several prokaryotic lineages as well [68], are not generally known to form well-defined, one-to-one orthologous relationships. The WD40 proteins in the pan-eukaryotic KOGs listed in Table 2 are exceptions, which is probably due to their unique and essential roles in the assembly of RNA-processing complexes. It has recently been demonstrated that, in S. cerevisiae, seven of these proteins are subunits of the 18S rRNA processosome, or at least are involved in ribosomal assembly [69,70]. Taking these results together with the unusual phyletic pattern, it seems possible to predict with considerable confidence that those WD40 proteins in the 131-KOG set that remain uncharacterized belong to the same or similar RNA-processing complexes (Table 2). With some notable exceptions, such as the WD40 proteins, the KOGs in the single-member, pan-eukaryotic set show remarkable patterns of evolutionary conservation: they are either (nearly) ubiquitous in the three kingdoms of life, for example, RNA polymerase subunits, or are universally conserved in eukaryotes and archaea but missing in bacteria, such as most of the proteins implicated in RNA processing (Table 2). Thus, it appears that elaborate molecular machines central to the functioning of the eukaryotic cell have evolved, largely from ancestral archaeo-eukaryotic components, at the onset of eukaryotic evolution, and both loss and duplication of the respective genes have been strongly selected against throughout the rest of eukaryotic evolution. Variation of evolutionary rates among KOGs Genome-wide analysis of protein evolutionary rates shows a broad range of variation [71]. Here, we investigate the variation of evolutionary rates among the ubiquitous KOGs represented in all seven analyzed genomes and the connection between the evolutionary rate and protein function in the KOG set. The characteristic evolutionary rate of each KOG, which included a member(s) from Arabidopsis, was determined by measuring the mean evolutionary distance from Arabidopsis (the outgroup in the phylogenetic tree; see below) to the other species. Even among the KOGs that include all seven species and, accordingly, appear to represent the conserved core of eukaryotic genes, the evolutionary rates differ by a factor of 20 between the fastest- and the slowest-evolving KOGs. Excluding 5% of the KOGs from each tail of the distribution still leaves almost a fourfold difference in evolutionary rates (Figure 4a). We then compared the distributions of evolutionary rates for different functional categories of KOGs (Tables 3,4 and Figure 4b). Although all the distributions substantially overlapped, there was a statistically highly significant difference between the evolutionary rates for proteins with different functions (Tables 3,4 and Figure 4b). The slowest-evolving proteins are those involved in translation and RNA processing, the fastest-evolving ones are involved in cellular trafficking and transport, whereas components of replication and transcription systems have intermediate evolutionary rates (Tables 3,4 and Figure 4b). A parsimonious scenario of gene loss and emergence in eukaryotic evolution and reconstruction of ancestral eukaryotic gene sets Assuming a particular species tree topology, methods of evolutionary parsimony analysis can be used to construct a parsimonious scenario of evolution, that is, mapping of different types of evolutionary events onto the branches of the tree. With prokaryotes, the problem is confounded by the major contributions from both lineage-specific gene loss and HGT to genome evolution, with the relative likelihoods of these events remaining uncertain [5,7]. The possibility of substantial HGT between major lineages of eukaryotes can apparently be safely disregarded, providing for an unambiguous most parsimonious scenario that includes only gene loss and emergence of new genes as elementary events. Some crucial aspects of the phylogenetic tree of the eukaryotic crown group remain a matter of contention. The consensus of many phylogenetic analyses appears to point to an animal-fungal clade and clustering of microsporidia with the fungi. However, a major uncertainty remains with respect to the topology of the animal tree: the majority of studies on protein phylogenies support a coelomate (chordate-arthropod) clade [72-74], whereas rRNA phylogeny and some protein family trees point to the so-called ecdysozoan (arthropod-nematode) clade [75-78]. We treated the phyletic pattern of each KOG as a string of binary characters (1 for the presence of the given species and 0 for its absence in the given KOG) and constructed the parsimonious scenarios of gene loss and emergence during evolution of the eukaryotic crown group for both the coelomate and the ecdysozoan topologies of the phylogenetic tree. For the purpose of this reconstruction, the Dollo parsimony approach was adopted [79]. Under this approach, gene loss is considered irreversible; thus, a gene (a KOG member) can be lost independently in several evolutionary lineages but cannot be regained. This assumption is justified by the implausibility of HGT between eukaryotes (the Dollo approach is not valid for reconstruction of prokaryotic ancestors). In the resulting parsimonious scenarios, each branch was associated with both gene loss and emergence of new genes, with the exception of the plant branch and the branch leading to the common ancestor of fungi and animals, to which gene losses could not be assigned with the current set of genomes (Figure 5a,b). There is little doubt that, once genomes of early-branching eukaryotes are included, gene loss associated with these branches will become apparent. The principal features of the reconstructed scenarios include massive gene loss in the fungal clade, with additional elimination of numerous genes in the microsporidian; emergence of a large set of new genes at the onset of the animal clade; and subsequent substantial gene loss in each of the animal lineages, particularly in the nematodes and arthropods (Figure 5a,b). The estimated number of genes lost in S. cerevisiae after its divergence from the common ancestor with the other yeast species, S. pombe, closely agreed with a previous estimate produced by a different approach [57]. The switch from the coelomate topology of the animal sub-tree to the ecdysozoan topology resulted in relatively small changes in the distribution of gains and losses: the most notable difference was the greater number of genes lost in the nematode lineage and the smaller number of genes lost in the insect lineage under the ecdysozoan scenario compared to the coelomate scenario (Figure 5a,b). The parsimony analysis described above involves explicit reconstruction of the gene sets of ancestral eukaryotic genomes. Under the Dollo parsimony model, which was used for this analysis, an ancestral gene (KOG) set is the union of the KOGs that are shared by the respective outgroup and each of the remaining species. Thus, the gene set for the common ancestor of the crown group includes all the KOGs in which Arabidopsis co-occurs with any of the other analyzed species. Similarly, the reconstructed gene set for the common ancestor of fungi and animals consists of all KOGs in which at least one fungal species co-occurs with at least one animal species. These are conservative reconstructions of ancestral gene sets because, as already indicated, gene losses in the lineages branching off the deepest bifurcation could not be detected. Under this conservative approach, 3,413 genes (KOGs) were assigned to the last common ancestor of the crown group (Figure 5a,b). More realistically, it appears likely that a certain number of ancestral genes have been lost in all, or all but one, of the analyzed lineages during subsequent evolution, such that the gene set of the eukaryotic crown group ancestor might have been close in size to those of modern yeasts. In terms of the functional composition, the reconstructed core gene set of the crown-group ancestor resembled more the highly conserved KOGs than the animal-specific KOGs (Figure 3) in being enriched in housekeeping functions such as translation, transcription and RNA processing (data not shown). The functional profiles of the gene sets that were lost in different lineages showed substantial differences (Table 5). Thus, for example, in the lineage leading to the common ancestor of the animals, the greatest loss among genes assigned to functional categories was seen in amino acid and coenzyme metabolism; in contrast, in the fly and the nematode, more substantial degradation was observed among transcription factors and proteins with chaperone-like functions. Genes for proteins involved in RNA processing and translation are, in general, not heavily affected by loss except in the highly degraded parasite E. cuniculi. On many occasions, the switch from the coelomate to the ecdysozoan topology replaces two independent, parallel losses in the insect and nematode clades with a single loss at the base of the ecdysozoan branch, although, on the whole, trees based on gene content support the coelomate topology [74]. In particular, the ecdysozoan topology, unlike the coelomate topology, implies early loss of several genes involved in translation, transcription and repair (Table 6). Notably, a large fraction of genes lost in each lineage has only a general functional prediction or no prediction at all (Table 5). This emphasizes the paucity of our current understanding of lineage-specific gene sets. As noticed previously during the analysis of the genes lost in S. cerevisiae after its divergence from the common ancestor with S. pombe, functionally connected genes tend to be co-eliminated during evolution [57]. The present study generalizes this conclusion as many functionally coherent groups of co-eliminated KOGs become apparent (Table 5). Importantly, different branches of the same complex systems tend to be eliminated in parallel in different lineages, for example, largely non-overlapping sets of genes for proteins of the ubiquitin-proteasome-signalosome systems are lost in the fungal-microsporidial lineage and in the nematodes (Table 6). It seems likely that elimination of these genes reflects independent trends for simplification of regulatory processes in these lineages. An interesting trend seen in these data is the deterioration of the mitochondrial ribosome, which occurred in several eukaryotic lineages and appears to have been partly parallel (as it occurred independently in fungi-microsporidia and in animals) and partly consecutive: early loss in the ancestral animal line was followed by elimination of additional genes for ribosomal proteins in individual lineages (Table 6). C. elegans has one of the shortest mitochondrial rRNAs and might have a 'minimal' mitochondrial ribosome [80]; the present analysis details the stages leading to this ultimate degradation of the mitochondrial ribosome. An exhaustive analysis of the patterns of gene loss is beyond the scope of this work. It seems clear that it has potential of improving our understanding of eukaryotic evolution and functional predictions through examination of co-eliminated gene groups. Evolutionary relationships between eukaryotic and prokaryotic orthologous gene sets The prokaryotic COGs and eukaryotic KOGs were identified in separate genome comparisons, although an overlap existed because both sets included the unicellular eukaryotes, namely two yeasts and the microsporidian. To identify the prokaryotic counterparts of the KOGs, the sequences of the eukaryotic proteins included in the KOGs were compared using the RPS-BLAST program to the position-specific scoring matrices (PSSMs) constructed for all prokaryotic COGs ([81] see Materials and methods for details). The results were checked manually and also by comparing the assignment of proteins from unicellular eukaryotes to each of the orthologous gene sets. Altogether, probable orthologous relationships were established between 2,456 eukaryotic KOGs and TWOGs (44% of the total) and 1,516 prokaryotic COGs. A more detailed breakdown of the relationships between eukaryotic and prokaryotic orthologous gene clusters could reveal important evolutionary trends. Figure 6a compares the occurrence of prokaryotic counterparts for the entire set of eukaryotic KOGs and its subsets conserved at different levels. Clearly, the reconstructed gene set of the common ancestor of the crown group and, particularly, the pan-eukaryotic KOGs are enriched in ancient KOGs (those with prokaryotic counterparts) as compared to the full KOG collection. In contrast, among KOGs that are inferred to have evolved in individual lineages within the crown group, a significantly lower fraction has detectable prokaryotic counterparts (Figure 6a). Early evolution of eukaryotes is known to have involved duplication of ancient genes inherited from prokaryotes [82], and this was apparent in the KOGs against COGs comparison. Although one-to-one relationships were predominant, in around 30% of cases, two or more eukaryotic KOGs corresponded to the same prokaryotic COG (Figure 6b). This indicates extensive duplication of ancestral genes at early stages of eukaryotic evolution; moreover, a substantial fraction of these genes have undergone repeated duplications, resulting in a one-to-many relationship between prokaryotic and eukaryotic orthologs (Figure 6b). An in-depth analysis of the relationships between eukaryotic and prokaryotic orthologous gene clusters should include an attempt to decipher their evolutionary history, that is, classification of the C/KOGs represented both in eukaryotes and prokaryotes into: those that have been inherited from the last universal common ancestor; the archaeo-eukaryotic subset; and those that are shared because of HGT between bacteria and eukaryotes at various stages of eukaryotic evolution. This analysis is beyond the scope of the present work. Perhaps the principal message to stress here is that, using a fairly sensitive sequence comparison method, prokaryotic homologs could be detected for only some 44% of the eukaryotic KOGs, and this fraction increased to around 54% for those genes that could be traced to the last common ancestor of the crown group (Figure 6a). This observation emphasizes the major amount of innovation that accompanied the emergence and early evolution of eukaryotes; even those KOGs for which prokaryotic counterparts will be eventually identified through more sensitive sequence and structure comparison apparently experienced rapid evolution during the prokaryote-eukaryote transition. Phyletic patterns of KOGs and dispensability of yeast and worm genes There are 860 KOGs with at least one representative from each of the seven analyzed genomes. In accord with the 'knockout rate' hypothesis [83], which has been largely supported by recent, genome-wide analysis of gene conservation [38,84], it could be expected that these highly conserved genes were essential for the survival of eukaryotic organisms. This appears particularly plausible given the near-minimal eukaryotic gene complement of the microsporidian. The prediction was put to the test using the recently published functional profile of the yeast S. cerevisiae genome, which includes the data on the growth rates of homozygous deletion strains for 96% of the open reading frames (ORFs) in the yeast genome [85]. Growth rates have been previously interpreted as a measure of fitness [84]. When the phyletic patterns of the KOGs were superimposed on the data on gene dispensability (with essential genes operationally defined as those whose deletion had a lethal effect in a rich medium) [85], it was found that 45% of the essential genes were conserved in all seven species and 25% were represented in six species (typically with the exception of E. cuniculi); 15% of the essential yeast genes had no orthologs in the other analyzed genomes (Figure 7a). In a striking contrast, among non-essential genes, only 16.5% were represented in all compared genomes and 28.5% had no detectable orthologs (Figure 7a). The reciprocal comparison is equally illustrative: essential genes composed 18.5% of the entire set of yeast genes but 35% of the genes (KOGs) represented in all seven species. This translates into a statistically highly significant dependence between a gene's (in)dispensability and conservation over long evolutionary distances. The probability of the set of highly conserved genes being so enriched for essential genes as a result of chance was estimated at 0.5) were discarded. As the divergence times for all KOGs are presumed to be the same (and equal to the time elapsed since the last common ancestor for the eukaryotic crown group), the mean evolutionary distance in a KOG is a measure of the KOG's evolutionary rate. The parsimonious evolutionary scenario, which included gene losses and emergence of KOGs mapped to the branches of the eukaryotic phylogenetic tree, was constructed by using the DOLLOP program of the PHYLIP package [97]; this program is based on the Dollo parsimony method, which assumes irreversibility of character loss [79]. For the analysis of domain accretion, conserved domains from the NCBI CDD database were detected in the eukaryotic proteins that belonged to the KOGs by using the RPS-BLAST program [81] with an E-value cut-off of 0.001. Domains with biased amino acid sequence composition, which tend to produce a high false-positive rate in RPS-BLAST searches, were excluded from the analysis. The eukaryotic KOG set is accessible at [98] and via ftp at [99]. The reconstructed ancestral gene sets are available at [100].
              Bookmark
              • Record: found
              • Abstract: found
              • Article: not found

              Genomic Islands in the Pathogenic Filamentous Fungus Aspergillus fumigatus

              Introduction Aspergillus fumigatus is exceptional amongst the aspergilli in being both a primary and opportunistic pathogen as well as a major allergen associated with severe asthma and sinusitis [1]–[3]. It was first reported to cause opportunistic invasive infection about 50 years ago [4]. In immunocompromised patients, mycelial growth can proliferate throughout pulmonary or other tissues causing invasive aspergillosis. For these patients, the incidence of invasive aspergillosis can be as high as 50% and the mortality rate is often 50%, even with antifungal treatment. Since the late 1800's [2], A. fumigatus has been demonstrated to be a primary pathogen of the airways, sinuses, lungs, damaged skin and subcutaneous tissues. For example, it can cause post-operative infection in all human organs [5]. In most cases diagnosis remains problematic and can compromise effective medical treatment. A. fumigatus is thought to possess particular metabolic capabilities and genetic determinants that allow it to initiate and establish an in vivo infection. This conclusion is supported by the observation that the majority of invasive aspergillosis disease is caused by A. fumigatus, even though its conidia comprise only a small percentage of the total conidia found in air-sampling studies [6]. While the interaction of A. fumigatus spores with the human respiratory mucosa is understood to an extent, the basic biology of the organism has until recently received little attention. Recently we presented the genomic sequence of A. fumigatus strain Af293 (FGSC A1100) [7] isolated from a neutropenic patient, who died from invasive aspergillosis [8]. Its comparison with the genomes of two distantly related species, Aspergillus nidulans and Aspergillus oryzae, has led to many unexpected discoveries, including the possibility of a hidden sexual cycle in A. fumigatus and A. oryzae, and the detection of remarkable genetic variability of this genus [9],[10]. Although members of the same genus, these three species are approximately as evolutionarily distant from each other at the molecular level as humans and fish (Figures 1 and 2) [11]. This significant phylogenetic distance has hindered some aspects of comparative genomic analysis of the aspergilli such as identification of the genetic traits responsible for differences in virulence as well as in sexual and physiological properties. 10.1371/journal.pgen.1000046.g001 Figure 1 Molecular Divergence in Molds and Yeasts. A. fumigatus proteins are compared to their orthologs in N. fischeri, A. clavatus, A. terreus, A. oryzae, A. nidulans, and A. niger (mean values: 95%, 84%, 71%, 71%, 68%, and 69%, respectively). Saccharomyces paradoxus, Saccharomyces uvarum, Candida glabrata, and Kluyveromyces lactis are compared to Saccharomyces cerevisiae (adapted from [74,75]). Mean values for these species are 90%, 82%, 64%, and 60%, respectively. Median percent identity between pairs of orthologs from A. fumigatus and each successive genome in the tree is shown. Relative divergence of humans, mice, birds and fish are shown for reference. 10.1371/journal.pgen.1000046.g002 Figure 2 Three Closely Related Aspergilli. The three most closely related aspergilli, which constitute the Affc-core group (A. fumigatus, N. fischeri, and A. clavatus), are in bold black. The maximum-likelihood tree was constructed from an alignment of 90 proteins chosen on the basis of similar lengths and identical number of intron/exon structures in order to minimize the number of inconvenient or incongruent gene models (see Materials and Methods). To maximize the resolving power of whole-genome comparative analysis, we selected the environmental type strains of a very closely related sexual species, Neosartorya fischeri NRRL181 (A. fischerianus), and a more distantly related asexual species, A. clavatus NRRL1, for complete sequencing. These three species are referred to here as the Affc lineage for A. fumigatus, N. fischeri, and A. clavatus (Figure 2). In contrast to A. fumigatus, N. fischeri is only rarely identified as a human pathogen [12]–[15]; while A. clavatus is probably an important allergen and the causative agent of extrinsic allergic alveolitis known as malt worker's lung [16]. A. clavatus also produces a number of mycotoxins and has been associated with neurotoxicosis in sheep and cattle fed infected grain worldwide (e.g. [17]). Our phenotypic characterization (Table S1) has shown that both A. fumigatus and N. fischeri can grow at 42°C, which indicates that A. fumigatus may possess other genetic determinants besides thermotolerance that allow it to establish a successful in vivo infection. As determined by multilocus sequence comparison, most A. fumigatus isolates, including Af293 and A1163, lie within the main A. fumigatus clade and persist as a single, global phylogenetic population, presumably due to its small spore size [18]. Natural A. fumigatus isolates were described previously as having low genetic diversity in comparison to N. fischeri isolates [19]. However recent studies identified a number of strain-specific [7] and polymorphic [20],[21] genes. To further explore the extent of genetic variation within the A. fumigatus species, we included in this analysis the genome sequence of a second strain, A1163, made available through Merck & Co., Inc., Whitehouse Station, NJ. Our preliminary analysis has shown that Af293 and A1163 isolates vary greatly in their resistance to antifungals (Table S2). Results/Discussion A. fumigatus Af293 vs. A. fumigatus A1163 The genome of A. fumigatus strain A1163 was sequenced by the whole genome random sequencing method [22]. Its genome (29.2 Mb) is 1.4% larger than the genome of the first sequenced strain Af293 (28.8 Mb) (Table 1). About 98% of each genome can be aligned with high confidence. Alignment of the A1163 genome against the eight Af293 chromosomes has revealed 17 large syntenic blocks, which correspond roughly to the 16 Af293 chromosomal arms (Figure 3). The syntenic blocks were defined as regions containing at least five syntenic orthologs separated by no more than 20 genes without orthologs. 10.1371/journal.pgen.1000046.g003 Figure 3 Alignment of the A1163, N. fischeri, and A. clavatus Assemblies against the Eight Af293 Chromosomes. The first three tracks from the top for each reference chromosome show syntenic blocks (horizontal bars) identified in the target genomes, A. fumigatus A1163, N. fischeri, and A. clavatus. Each assembly from the target genomes is represented by a single color. Syntenic blocks are numbered based on the target genome assembly ID and the position of the block in the target genome assembly. Tracks 4 and 5 show Asp-core gene density and blocks (horizontal bars), respectively, in the Af293 genome. Tracks 6 and 7 show Afum-specific gene density and blocks (horizontal bars), respectively. Tracks 8 and 9 show the density of clustered secondary metabolite biosynthesis genes and transposable elements, respectively, found in Af293. Pink vertical bars represent putative centromeres, the purple vertical bar in chromosome 4 represents a region of ribosomal DNA, and horizontal black bars beneath each chromosome designate sequencing gaps. 10.1371/journal.pgen.1000046.t001 Table 1 Genome Statistics Sequenced organisms Af293 A1163 N. fischeri A. clavatus Length (Mp) 28.810 29.205 32.552 27.859 Assemblies >100 Kb 18 11 13 16 GC content 50% 49% 49% 49% No. of genes 9631 9906 10407 9125 No. of LS genes 818 1408 1151 Mean gene length (Bp) 1478 1455 1466 1483 % Genes with introns 79% 80% 80% 81% % Coding 49% 49% 47% 49% Most translocation events involving A. fumigatus chromosomes appear to have taken place within 300 Kb from the telomeres. The largest exchange involved a ∼500 Kb segment between Af293 chromosomes 1 and 6 and A1163, which contain regions aligning with A1163 assembly 1 (syntenic blocks 1.1 and 1.2 in Figure 3). This appears to be a recent event that happened in A293. In addition, Af293 chromosome 1 harbours a 400 Kb subtelomeric region that does not align well with A1163 assemblies. There is evidence of gene conversion between distal subtelomeric sequences encoding RecQ family helicases in A. fumigatus chromosomes 2, 4, and 7. Consistent with previous reports [19], the identity over the shared regions is very high (99.8% at the nucleotide level). This is higher than 99.3% and 99.5% identity between the two sequenced A. niger isolates (ATCC 1015 and CBS 513.88) [23] and between A. oryzae [10] and A. flavus [8], respectively. Unique regions represent 1.2% and 2.3% (and harbour 143 and 218 genes) in the Af293 and A1163 genomes, respectively. More than half of the Af293-specific genes are also absent in A. fumigatus isolates Af294 and Af71, according to the array-based comparative genome hybridization (aCGH) data [7]. The vast majority of Af293- and A1163-specific genes are clustered together in blocks ranging in size from 10 to 400 Kb, which seem to be the most variable segment of the species genome. A manual examination of these isolate-specific islands revealed that they contain numerous pseudogenes and repeat elements. One of the regions contains a putative secondary metabolism cluster (AFUA_3G02530-AFUA3G02670). The origin of 20% of Af293-specific genes can be attributed to two segmental duplication events. One of the duplicated regions (AFUA_1G16010- AFUA_1G16170) contains an arsenic detoxification cluster. The other (AFUA_1G00420-AFUA_1G00580) contains genes that may be involved in metabolism of betaine, which is often synthesized under osmotic and heavy metal stress. Interestingly the duplicated regions are also absent in Af294 and Af71 isolates, which suggests that the duplication event took place very recently. Segmental duplication events are thought to contribute to rapid adaptation of the species by increasing their expression. Since Af293 is a clinical isolate it is possible that these chromosomal aberrations were created due to selective pressures in the host. Highly Variable Loci in A. fumigatus Although most Af293 proteins are 100% identical to their A1163 orthologs, we have identified 41 orthologous pairs that share only 37% to 95% identity. To find out if these genes are also divergent in other A. fumigatus isolates, we identified Af293 genes that do not hybridize with DNA extracted from the Af294 and Af71 strains in aCGH experiments [7]. The comparison revealed that 27 out of 41 genes were possibly polymorphic (marked as absent or divergent) with respect to at least one other isolate (Table S3). Further analysis of three polymorphic loci in other A. fumigatus isolates has demonstrated that each of them harbours two or three alleles (Table S4). A PCR survey followed by Southern blot analysis and partial DNA sequencing has shown the presence of at least two alleles at each locus containing nearly identical sequences within each group of alleles (data not shown). In filamentous fungi, this high level of variability has been previously associated with heterokaryon incompatibility (het) genes involved in a programmed cell death (PCD) pathway triggered by hyphal fusion between two genetically incompatible individuals [24],[25]. So far several het loci have been described in A. nidulans [26], although none have been characterized at the molecular level. Incidentally, our results are consistent with previously identified vegetative incompatibility groups suggesting that some of these polymorphic genes may function in heterokaryon incompatibility in A. fumigatus. Thus, four clinical isolates from the same multi-member incompatibility group (WSA-270, WSA-1195, WSA-449, and WSA-172) contained the same alleles of the polymorphic genes (Table S4). Furthermore, at least five putative A. fumigatus het genes exhibit a pattern of trans-species (or trans-specific) polymorphism (Table S5), which has been previously associated with somatic and sexual incompatibility in fungi, self-incompatibility in plants, and the major histocompatibility complex (MHC) in vertebrates. These genes are more similar to their orthologs from other Aspergillus species than to those from A1163. We chose one putative het gene, rosA (AFUA_1G15910), and its close relative, nosA (AFUA_4G09710), whose orthologs encode two Zn2C6 transcriptional regulators of sexual development in A. nidulans [27],[28] for phylogenetic analysis (Figure 4). Unexpectedly, Af293 RosA clusters with its A. clavatus ortholog, while A1163 RosA clusters with N. fischeri. This is in contrast with the NosA tree, which perfectly mirrors the species tree (Figure 2), suggesting that these allelic classes may transcend species boundaries in the aspergilli. 10.1371/journal.pgen.1000046.g004 Figure 4 The Af293 RosA and NosA Proteins. Shown in bold red are RosA, NosA and Pro1 proteins that have been experimentally characterized are shown in bold black. Branches with a bootstrap of 75% or more are indicated in bold black. The trees are maximum-likelihood trees (see Materials and Methods). This is the first study that shows the diversity of het genes in aspergilli at the molecular level as well as patterns of trans-species polymorphism. These putative het genes are distinct from those identified in Neurospora crassa or Podospora anserina [24],[25], although many of them share the same domains such as the NACHT and NB-ARC domains of the STAND superfamily [29]. Coincidentally four of the A. fumigatus variable genes encoding STAND domain proteins have previously been predicted to function in heterokaryon incompatibility [30]. The discovery of putative het loci in the aspergilli may facilitate identification of downstream components of fungal PCD pathways or other drug targets. These loci may be also used as a basis for classification of natural and clinical isolates into different compatibility groups. A. fumigatus vs. N. fischeri vs. A. clavatus The genomes of N. fischeri and A. clavatus were sequenced by the whole genome sequencing method [22]. The N. fischeri genome (32.6 Mb) is 10–15% larger than the A. clavatus and A. fumigatus genomes (Table 1). There are 10,407 protein-coding genes and a large number of transposable elements, which may have contributed to its genome size expansion. The A. clavatus genome (27.9 Mb) is the smallest seen to date among the sequenced aspergilli (Table 1). There are currently 9,125 predicted protein-coding genes. This is consistent with past comparative studies that identified notable (up to 30%) genome size differences between distantly related aspergilli [7],[9],[10]. Despite this significant genome size variability, gene-level comparisons confirmed phylogenetic proximity of A. fumigatus, N. fischeri and A. clavatus (Figures 1 and 2). The three genomes also appear to be largely syntenic. Alignment of the N. fischeri and A. clavatus genomes against the eight Af293 chromosomes has revealed 20 and 55 syntenic blocks, respectively (Table 2). There is only one large-scale reciprocal translocation between chromosomes 2 and 5 in N. fischeri (blocks 8927.1, 8927.2, 9292.1 and 9292.2, in Figure 3). The A. clavatus supercontigs align with A. fumigatus chromosomes 2 and 5, suggesting that this was the ancestral topology. 10.1371/journal.pgen.1000046.t002 Table 2 Syntenic and Afum-specific Chromosomal Blocks in Af293 Af293 blocks Syntenic to A1163 Syntenic to N. fischeri Syntenic to A. clavatus Afum specific No. of original blocks 29 24 62 13 No. of merged blocks 17 20 55 13 Merged blocks length 28.4 Mb 27.6 Mb 26.0 Mb 1.7 Mb % Coding 50% 51% 52% 31% Repeata density 0.51% 0.50% 0.47% 1.83% TEb density 1.07% 0.96% 0.80% 4.17% Syntenic blocks for each pair of genomes were defined as areas containing a minimum of five orthologous genes in the Af293 and target genomes with a maximum of 20 adjacent non-matching genes. Afum-specific blocks were defined as Af293 areas containing at least ten Afum-specific genes and separated by no more than 5 other genes. Since most syntenic regions slightly overlap, the original blocks were merged to calculate repeat and TE density. Abbreviations: arepeat elements; btransposable elements. Repeat and TE densities were estimated as described in Materials and Methods. Core and Lineage-Specific Genes Features of Core and Lineage-Specific Genes Comparative genomic analysis has showed that the three Aspergillus genomes contain a large number of species-specific genes, which is consistent with previous comparative studies [7]. We have identified 7514 orthologous core and 818, 1402 and 1151 species-specific genes in the Af293, N. fischeri and A. clavatus genomes, respectively (Figure 5). Numbers of core- and species-specific genes, however, depend on selection of genomes from which they were derived. Thus, adding new genomes to this comparison resulted in fewer core and specific genes as shown for Af293 in Table S6. The availability of additional sequenced Aspergillus genomes allowed us to explore these patterns in a more systematic manner by comparing A. fumigatus Af293 genes with different lineage specificity (i.e. number of orthologs in other species). 10.1371/journal.pgen.1000046.g005 Figure 5 Proteins with Orthologs in the Three Most Closely Related Aspergilli (A. fumigatus, N. fischeri and A. clavatus). These proteins constitute the Affc-core group, and proteins with no orthologs in N. fischeri and A. clavatus constitute the A. fumigatus-specific group (Afum). The proteins in the Affc-core can be further divided into two groups, Aspergillus-core (Asp-core), which has orthologs in all of the other aspergilli, and the Affc-specific group, which is comprised of the rest of the Affc-core. To this end, we have selected four sets of genes based on the presence of orthologs in the six other sequenced aspergilli: N. fischeri, A. clavatus, A. terreus (CH476594), A. oryzae [10], A. nidulans [9] and A. niger CBS 513.88 [23] (Table S6; Figure 5). Genes with orthologs in the three most closely related aspergilli ( A. fumigatus, N. fischeri and A. clavatus) constitute the Affc-core group. The genes in the Affc-core can be further divided into two groups, the Aspergillus-core (Asp-core) with orthologs in all six other aspergilli and the Affc-specific group, which is comprised of the remaining Affc-core genes. Finally, the A. fumigatus-specific (Afum-specific) group contains Af293 genes that have orthologs in neither N. fischeri nor A. clavatus. One of the most striking observations to arise from this comparison was the marked differences in size and number of exons among genes from different lineage-specificity groups (Table 3). For example, Asp-core genes on average are almost twice as large as Afum-specific genes. The latter have on average only 1.35 introns and almost 31% lack introns completely. In contrast, Asp-core genes contain on average 2.16 introns, only 16% of them without introns. Consistent with previous reports of increased evolutionary rates in LS genes (e.g. [31]), Affc- and Afum-specific genes in A. fumigatus exhibit low sequence identity to their orthologs from more distantly related fungi (Table 3). 10.1371/journal.pgen.1000046.t003 Table 3 Comparison of Four Af293 Gene Sets with Different Lineage Specificity Lineage specificity group Asp-core Affc-core Affc-specific Afum-specific No of genes 5424 7514 2090 818 No of orthologs in 6 aspergilli 6 2–6 2–5 0–1 Mean gene length 1722 1579 1209 802 Mean No. of introns 2.16 2.02 1.66 1.35 %Genes without introns 15.9% 19.4% 28.5% 31.4% % Affc syntenic 98.3% 96.0% 89.8% n/a % Telomere-proximal 5.6% 9.1% 38.0% 36.5% % Expressed 42.5% 42.7% 43.3% 32.4% % Orthologs in A. clavatus 100% 100% 100% n/a % Orthologs in N. crassa 81.5% 70.7% 42.6% 4.5% % Orthologs in S. cerevisiae 49.9% 41.5% 19.9% 1.2% % Identity to A. clavatus orthologs 81.3% 78.6% 71.4% n/a % Identity to N. crassa orthologs 52.3% 51.6% 47.9% 43.3% % Identity to S. cerevisiae orthologs 43.1% 42.7% 40.4% 38.0% The numbers of Af293 genes in different categories are shown for Aspergillus-core (Asp-core), Affc-core, Affc-specific, and A. fumigatus-specific (Afum-specific) groups (see main text for definitions). Telomere-proximal genes are defined as genes located within 300 Kb from the chromosome end. Affc syntenic genes are defined as Af293 genes syntenic with respect to N. fischeri and A. clavatus (see the legend to Table 2). The ‘expressed’ genes are defined as Af293 genes that showed differential expression in at least one microarray study (W. Nierman, unpublished). These vast differences in gene features between core and specific genes are more likely to be explained by relaxed selective constraints (as discussed below) than by poor annotation quality of LS genes (due to misannotated gene models, gene fragments or random ORFs). We made significant improvements to Af294 gene models by leveraging the comparative genomic data (see Materials and Methods). In addition, all Affc-specific genes have orthologs in N. fischeri and A. clavatus and 43% of them are differentially expressed in various expression studies, which is similar to the A. fumigatus genome average (Table 3). On the other hand, many Afum-specific genes may be non-functional, since only 32% of them are differentially expressed in microarray studies (vs. the 43% genome average) and only 60% of them show sequence similarity to other fungal proteins (Table S7; Figure 6). Nonetheless, at least 20% of Afum-specific genes are supported by combined evidence (homology and expression data) and therefore are likely to be functional. Nonetheless, even these genes are still smaller in size than average Affc- and Asp-core genes. 10.1371/journal.pgen.1000046.g006 Figure 6 A. fumigatus-Species Specific Genes Supported by Homology and Expression Data. Genes with no orthologs in N. fischeri and A. clavatus constitute the A. fumigatus-specific group (Afum). Genes that have homologs in other fungal genomes constitute the Homology group. Genes differentially expressed in microarray studies represent the Expressed group. Biological Roles and Chromosomal Location of LS Genes Analysis of Gene Ontology (GO) terms [32] associated with core and lineage-specific groups has demonstrated that certain biological functions are unequally distributed among these groups (Table S8). The Afum-specific group is enriched for genes involved in carbohydrate transport and catabolism, secondary metabolite biosynthesis, and detoxification. In contrast, the invariable Asp-core genome encodes many functions associated with information processing and other cellular processes that contribute to the organism's fitness in most environments. Thus, a significant number of Asp-core genes (15%) are orthologous to yeast essential genes, which represents a two-fold enrichment in comparison to the rest of the proteome. Although most Af293 genes involved in carbohydrate transport and catabolism are found in the Asp-core group, only 10% of secondary metabolism genes have orthologs in all sequenced aspergilli including siderophore, pigment and Pes1-related clusters. These three conserved clusters are also found in Penicillium species and some more distantly related fungi. Similarly, only 30% of secondary metabolism Af293 genes are shared by N. fischeri and A. clavatus. The three species also vary considerably in the numbers of enzymes that control the first step in secondary metabolite biosynthesis such as nonribosomal peptide synthases (NRPS), polyketide synthases (PKS), and dimethylallyltryptophan synthases (DMATS) (Table S9). Interestingly, N. fischeri genome contains 46 enzymes, which is 35% more than A. clavatus (35) and A. fumigatus (34) genomes. Likewise, PFAM domains overrepresented among Affc- and Afum-specific genes have been shown to function in efflux or detoxification, secondary metabolite biosynthesis, resistance to antifungals, and other accessory metabolic pathways. They include MSF and ABC transporters, various oxidoreductases, cytochrome P450, glycosyl and alpha/beta fold hydrolases, polyketide synthases, glutathione transferases and methyltransferases (Table S10). On the other hand, core genes often contain AAA-superfamily ATPase, helicase, WD40, and SH3 domains associated with such important functions as cell organization and macromolecule biosynthesis. Lineage Specific Genomic Islands In addition to difference in size and function, lineage specific genes display a significant subtelomeric bias. As opposed to telomere-distal Asp- and Affc-core genes, Affc- and Afum-specific genes tend to be located within 300 Kb from chromosome ends (P value>0.01) (Table S11). About 38% of Affc-specific genes are telomere-proximal in comparison to 6% of Asp-core and 9% Affc-core genes (Table 3). Interestingly, 46% of Afum-specific genes with paralogs are telomere-proximal (Table S7), suggesting that they may have been recently duplicated and translocated to these regions. Our findings concur with previous reports of subtelomeric bias in LS genes in A. fumigatus [7], S. cerevisiae [33] and Pichia stipitis [34]. With the exception of one Af293 locus containing four P450 genes, the Aspergillus species do not have large variable subtelomeric arrays arising by a series of tandem duplications found in some protozoan parasites [35]. Almost 50% of the Afum-specific genes can be clustered together in 13 blocks containing more than 10 Afum-specific genes separated by no more than 5 genes outside this category (Table 2). Together these regions, referred to here as Afum-specific genomic islands, show an even more significant telomeric bias (68% of the clustered genes lay within 300 Kb from telomere ends) with larger blocks found almost exclusively at chromosome ends (Figure 3). In addition to non-syntenic genes, species-specific islands harbour a disproportionate number of transposons and other repeat elements in comparison with the syntenic areas of the Af293 genome (Table 2). Notably two A. fumigatus-specific blocks (2.2 and 3.1) contain gene clusters involved in biosynthesis of mycotoxin fumigaclavine and another unknown secondary metabolite [36]. Similar genomic islands have been described in the rice blast fungus Magnaporthe oryzae [37],[38] and in A. oryzae [10] suggesting that they may be shared across all filamentous ascomycota fungi. Unlike variable subtelomeric regions found in other eukaryotes [39],[40], these areas are often quite large (up to 400 Kb) and not always located near chromosome ends. Evolutionary Origins of Lineage-Specific Genes Most Affc- and Afum-specific genes have no orthologs in non-Aspergillus fungal species, which suggests that they were created de novo in the Affc lineage. To gain insight into the origin of the LS genes in aspergilli, we have performed phylogenetic analysis of two sets of A. fumigatus- and N. fischeri-specific genes. In Af293 and N. fischeri, Set 1 contains 790 and 1230 genes, respectively, that have an Aspergillus homolog as the best BLASTp hit; Set 2 contains 28 and 178 genes, respectively, that have a non-Aspergillus homolog as the closest relative. There is a significant difference in the numbers of trees including a non-Aspergillus species as the closest relative in N. fischeri and A. fumigatus (P value = 2.6e-08). This is indicative of major differences in retention and/or uptake of new genetic material in these two species, consistent with differences in their reproductive modes. The four repetitive scenarios identified by phylogenetic analysis are displayed in Figure 7. In both A. fumigatus and N. fischeri, most of the Set 1 genes exhibit topologies that do not strictly follow the Aspergillus species tree (Figure 2), although nested within the Aspergillus clade. Similarly, all 28 A. fumigatus Set 2 genes are nested within the Aspergillus genus. In contrast to the A. fumigatus genes, N. fischeri Set 2 genes sometimes cluster with a non-Aspergillus species with high bootstrap support. As shown in Figure 7B and 7C, both N. fischeri and non-Aspergillus species genes can be nested either in this non-Aspergillus clade or in the Aspergillus clade. At first sight, these repetitive topologies can be interpreted as supportive of a horizontal gene transfer (HGT) from a non-Aspergillus species into N. fischeri or visa versa. Further analysis, however, reveals that most of the conflicts involve sparsely populated trees, long branch attraction artifacts, and other situations, where phylogenetic methods tend to mislead (e.g. [41]). The last repetitive scenario includes genes that are only present in one other distant fungal genome (Figure 7D). The evolutionary origin of genes in this category cannot be resolved at this time. 10.1371/journal.pgen.1000046.g007 Figure 7 Four Common Topologies Detected by Phylogenetic Analysis of N. fischeri-Specific Proteins. The N. fischeri proteins under consideration are in bold red. The bootstrap supporting the clade containing the N. fischeri is also in bold red. Other N. fischeri proteins are shown in bold black. Blue species names correspond to the recipient genome when different from N. fischeri. Systematic gene names are indicated. Branches with a bootstrap of 75% or more are indicated in bold black. The trees are maximum-likelihood trees (see Materials and Methods). A. Set1 protein evolved by probable duplication, differentiation and differential loss in other Aspergillus species (DDL). B. Set 2 protein evolved by probable HGT from Sordaryomycetes into the N. fischeri lineage. C. Set 2 protein evolved by probable DDL and a Fusarium solani protein (in blue) evolved by probable HGT from the N. fischeri lineage into Sordaryomycetes. D. Set 2 protein showing similarity to a protein from the Sordaryomyce Chaetomium globosum. Our results are consistent with the well established role of gene duplication and divergence as the principal source of new genes [42]–[45]. They are however in conflict with previous studies that attributed the origin of LS genes in the aspergilli to gene acquisition through HGT from other fungal species [9],[10],[46]. This assumption was based on circumstantial evidence such as mosaic phyletic distribution, phylogenetic anomalies, and differences in gene content among A. fumigatus, A. nidulans and A. oryzae. Besides the absence of readily apparent HGT examples, the fact that LS genes tend to be smaller in size and have fewer exons is difficult to explain by HGT. These gene features are quite consistent across Aspergillus species, and it is therefore unclear what could be the donor organism for LS genes. The DDL scenario does not have this weakness, since these size differences can be a direct consequence of relaxed selective constraints operating on duplicate genes. According to the DDL hypothesis, the initial redundancy in gene function allows duplicate genes to quickly accumulate nonsynonymous mutations and even premature stop codons. Notably, over 20% of all Afum-specific genes can be linked to the two very recent segmental duplications events that occurred in Af293 but not in A1163. Both translocated segments are telomere-distal and contain genes that appear to be pseudogenized indicating that translocated gene copies may have evolved under relaxed selective constraints. Similarly in other species, accelerated evolution has been often associated with subtelomeric areas suggesting that the process is dependent on the local chromatin environment (e.g. [47]). The prevailing role of duplication in the origin of LS genes in the aspergilli is further underlined by their tendency to cluster in genomic islands. These regions may function as designated “gene dumps” and simultaneously as “gene factories”, since some LS genes appear to maintain their functional integrity or at least are differentially expressed in microarray studies as shown above. As shown above, 46% of Afum-specific genes with paralogs are telomere-proximal (Table S7), suggesting that they may have been recently duplicated and translocated to these regions. Evidence for gene duplication and/or transfer to evolutionarily labile regions is found in some protozoan parasites that have large variable subtelomeric arrays arising by a series of tandem duplications [35]. Conservation of Virulence-, Allergy-, and Sex-Associated Genes Previous studies however have shown a high level of evolutionary conservation and phyletic retention among known A. fumigatus virulence-associated genes [7]. Our analysis confirmed the low rate of protein evolution among these genes in four Aspergillus species (Table S12). Interestingly, four of the virulence-associated genes, pabaA (AFUA_6G04820), fos-1 (AFUA_6G10240), pes1 (AFUA_1G10380) and pksP (AFUA_2G17600), reveal evidence of accelerated evolution in the branch leading to the two A. fumigatus isolates. This pattern can affect only a few amino acid residues (e.g. PksP) or a significant proportion of the protein (e.g. Pes1). Such a pattern can be due to either relaxation of selection or selection for rapid diversification (positive selection). In the latter case specific amino acid substitutions may decrease susceptibility to specific environmental challenges and thus enhance A. fumigatus virulence. These four genes are involved in oxidative stress or nutrient availability, which is consistent with the positive selection scenario. Indeed, PabaA is involved in biosynthesis of folate, an essential co-factor for DNA synthesis. Since PABA is apparently limited in the mammalian lung, a functional pabaA gene is required for virulence [48]. Fos1, a putative two-component histidine kinase, may play a role in the regulation of cell-wall assembly [49]. Finally, PksP and Pes1 are enzymes, which catalyze the first steps in biosynthesis of the spore pigment and an unknown non-ribosomal peptide, have been shown to mediate resistance to oxidative stress in addition to their role in A. fumigatus virulence [50],[51]. The inclusion of additional taxa in the analyses might clarify the significance of the observed differences. This overall lack of variability among known virulence-associated factors suggests that yet unknown A. fumigatus-specific genes may contribute to its ability to survive in the human host. A recent microarray study demonstrated that the Affc-specific genes are over-represented among genes that are up-expressed in the neutropenic murine lung (Elaine Bignell submitted for publication). Many of them are found in chromosomal gene clusters associated with macromolecule catabolism and secondary metabolite biosynthesis. Similarly, clustered lineage-specific genes simultaneously induced in infected tissue have been observed in the ubiquitous maize pathogen Ustilago maydis [52] and some other species (for a recent review see [53]). Alternatively A. fumigatus virulence may be a combinatorial process, dependent on a pool of genes, which interact in various combinations in different genetic backgrounds as suggested previously [7]. Similar ‘ready-made’ virulence features have been described in other environmental pathogens such as Pseudomonas aeruginosa [54] and Cryptococcus neoformans [55],[56]. In addition to virulence factors, the A. fumigatus genome encodes 20 allergens (Table S13) and 25 proteins displaying significant sequence similarity to known fungal allergens (Table S14), some of which appear to contribute to its pathogenicity [57]. For example, A. fumigatus Asp f6 (AFUA_1G14550), also known as Mn2+-dependent superoxide dismutase (MnSOD), is specifically recognized by IgE from patients with allergic bronchopulmonary aspergillosis (ABPA) and is differentially expressed during germination [58]. The broad distribution of allergens among fungal taxa (Text S1) suggests that A. fumigatus possesses the same allergen complement as most other aspergilli and that its effect on hypersensitive individuals can be explained mostly by its ubiquity in the environment. Our analysis has demonstrated that, similar to known virulence-associated genes, most sexual development genes appear to be under negative (purifying) selection in both sexual and asexual Aspergillus species (Text S1 and Table S15). More detailed analysis has revealed four genes in the N. fisheri lineage that may be under positive selection. This suggests that a few amino acid changes may enable sexuality in N. fischeri. The conservation of sex genes in asexual species is due to a latent sexuality, a recent loss of sexuality, pleiotropy, or parasexual recombination following heterokaryon formation as suggested previously [59],[60]. Conclusions Lineage-specific (LS) genes (i.e. genes with limited phylogenetic distribution of orthologs in related species) have been the focal point of many comparative genomic studies, because of the assumption that they may be responsible for phenotypic differences among species and niche adaptation. Our analyses of the genomes of A. fumigatus and the two closely related species, N. fischeri and A. clavatus, demonstrates that A. fumigatus may possess genetic determinants that allow it to establish a successful in vivo infection. LS genes that have no orthologs in the other two species comprise 8,5% of the A. fumigatus genome and often have accessory functions such as carbohydrate and amino acid metabolism, transport, detoxification, or secondary metabolite biosynthesis. Further analysis showed that these genes have distinct features (e.g. the small gene length and number of introns) and tend to cluster in subtelomeric genomic islands, which may function as “gene dumps/factories”. The phylogenies of LS genes, their subtelomeric bias and size differences are consistent with the DDL hypothesis stating that duplication being the primary genetic mechanism responsible for the origin of species-specific genes. The presence of genomic islands indicates that A. fumigatus and may possess sophisticated genetic mechanisms that facilitate its adaptation to heterogeneous environments such as soil or a living host. Materials and Methods Fungal Isolates A. fumigatus Af293 (FGSC A1100) was isolated from patients with invasive aspergillosis [61]. A. fumigatus A1163 (FGSC A1163) is a derivative of A. fumigatus CEA17 converted to pyrG+ via the ectopic insertion of the A. niger pyrG gene [62],[63]. CEA17 is a uracil auxotroph of A. fumigatus clinical isolate CEA10 (CBS144.89). The type strains of A. clavatus (NRRL 1) and N. fischeri (NRRL 181) were used for sequencing and phenotypic characterization. Accession Numbers The genome sequences of A. clavatus, N. fischeri and A. fumigatus A1163 were deposited to the GenBank under the following accession numbers: AAKD00000000, AAKE00000000 and ABDB00000000, respectively. Whole Genome Sequencing A1163, A. clavatus and N. fischeri were sequenced using the whole genome shotgun method as previously described [22]. Random shotgun libraries of 2–3 Kb, 8–12 Kb and 50 Kb were constructed from genomic DNA from each strain, and DNA template was prepared for high-throughput sequencing using Big Dye Terminator chemistry (Applied Biosystems). Sequence data was assembled using Celera Assembler. For A. fumigatus A1163, scaffolds were compared to those of the first sequenced isolate, Af293 [7]. Sequence Identity at the Nucleotide Level A1163 assemblies larger than 5 Kb were aligned to the Af293 chromosomes using the MUMmer package (http://mummer.sourceforge.net/) [64]. Alignments longer than 100 Kb were used to determine average sequence identity to avoid highly repetitive and duplicated regions. The same approach was used to estimate sequence identity between A. flavus and A. oryzae and between the two sequenced A. niger strains. Gene Structure Annotation The JCVI eukaryotic annotation pipeline was applied to the A1163, A. clavatus and N. fischeri assemblies (supercontigs) larger than 2 Kb as described earlier [7]. We used PASA [65] and EvidenceModeler [66] to generate consensus gene models based on predictions from several types of genefinders including GlimmerHMM, Genezilla, SNAP, Genewise and Twinscan. Putative pseudogenes, small species-specific genes (less than 50 amino acids), and gene models overlapping with transposable elements (TE) shown in Table S16 were excluded from the final gene lists. Repetitive Elements Identification of repeat elements was performed using RepeatMasker (http://www.repeatmasker.org/), RepeatScout (http://repeatscout.bioprojects.org/), and Tandem Repeats Finder (http://tandem.bu.edu/trf/trf.html). Putative TEs (Table S16) were identified by Transposon-PSI (http://transposonpsi.sourceforge.net), a program that performs tBLASTn searches using a set of position specific scoring matrices (PSSMs) specific for different TE families. TE and repeat densities were calculated as the percentage of nucleotide bases in the regions of interest (i.e., syntenic or non-syntenic blocks) that overlap with a feature of the appropriate type (repeat or TE). A. fumigatus Annotation Improvements We leveraged the comparative genomic data to significantly improve annotation quality of the Af293 genome, which was previously annotated with relatively little supporting evidence [7]. The refinement of initial annotation was performed using the Sybil software package (http://sybil.sourceforge.net/), which allows for rapid identification of discrepancies in gene structure among orthologs. The comparison with orthologous N. fischeri and A. clavatus genes resulted in significant changes to the Af293 gene catalogue. Over 1100 gene models were updated and 130 new genes were identified. Initial A. fumigatus A1163 gene models were also improved using the PASA pipeline, initially developed to align expressed sequence tag (EST) data onto genomic sequences [65]. The pipeline was adapted to automatically update A1163 gene models by aligning them against Af293 coding sequences (CDSs). Functional Annotation We have performed transitive functional annotation from Af293 proteins to their A1163, N. fischeri and A. clavatus orthologs. Previously GO terms [32] were assigned to Af293 proteins based on sequence similarity to PFAM domains or experimentally characterized S. cerevisiae proteins [7]. Secondary metabolism gene clusters were identified using Secondary Metabolism Region Finder (SMURF) available at http://www.jcvi.org/smurf (Nora Khaldi, unpublished). The complete list of gene clusters can be downloaded at ftp://ftp.jcvi.org/pub/software/smurf/. Gene Ontology (GO) terms [32] were assigned as described in [7] Ortholog Identification After extensive computational and manual refinement, the improved protein datasets were used to generate the final set of orthologs. Orthologous groups in Aspergillus genomes were identified using a reciprocal-best-BLAST-hit (RBH) approach with a cut-off of 1e-05. In addition to the A1163, A. clavatus and N. fischeri genomes, the previously sequenced genomes of Af293 [7], A. terreus NIH2624 (http://www.broad.mit.edu), A. oryzae RIB40 [10], A. nidulans FGSC A4 [9] and A. niger CBS 513.55 [23] were included in the comparative analysis. The results of this analysis, as well as synteny visualisation and comparative analysis tools can be also found in the Aspergillus Comparative database at http://www.tigr.org/sybil/asp. Orthologous, unique and divergent genes in Af293 were identified based on alignments of Af293 CDSs against A1163 assemblies using gmap as implemented in PASA [65] using default parameters. Synteny Analysis Syntenic blocks for each pair of genomes (Af293 vs. A. clavatus and Af293 vs. N. fischeri) were defined as areas containing a minimum of five matching (orthologous) genes with a maximum of 20 adjacent non-matching genes (having no orthologs) in the reference and target genomes. Since most syntenic regions slightly overlapped, the original blocks were merged to calculate repeat and TE density. Af293 non-syntenic blocks were defined as areas excluded from the syntenic blocks and containing at least ten Af239 non-matching genes. Statistical Analysis Genes in four lineage-specificity groups were analyzed by the EASE module [67] in MEV within TM4 (http://TM4.org) [68] to identify overrepresented Gene Ontology (GO) terms, Pfam domains and Chromosomal Regions (telomere-proximal and central). Only categories with Fisher's exact test probabilities above with P>0.05 from the EASE analyses were reported for each gene set. Selective Constraints Selective constraints were estimated for sets of orthologous genes from the Af293, A1163, A. clavatus, N. fischeri and A. terreus genomes. The rate of substitution in synonymous (d S) and in non-synonymous (d N) sites, and their ratio (d N/d S) was calculated using the PAML package [69]. If a gene is very well conserved, d N/d S 1. The results are reported only for orthologous genes sets having unsaturated d S values, the same number of exons, and sequence alignment coverage >95%. For each gene, the average d N/d S ratio for five pairwise species comparisons was calculated. Phylogenetic Analyses We assembled a local database of protein sequences from the 28 publicly available fungal genome projects (Table S17). All phylogenetic analyses in this paper were carried out on protein sequences. The A. niger ATCC 1015, Nectria haematococca, Phanerochaete chrysosporium and Trichoderma reesei genomes projects was completed under the auspices of the US Department of Energy's Office of Science, Biological and Environmental Research Program and the by the University of California, Lawrence Livermore National Laboratory (Contract No. W-7405-Eng-48), Lawrence Berkeley National Laboratory (contract No. DE-AC03-76SF00098) and Los Alamos National Laboratory (contract No. W-7405-ENG-36). To produce a reference tree of species phylogeny we used the protein sequences of 90 likely orthologs from A. niger, A. nidulans, A. terreus, A. oryzae, A. clavatus, N. fischeri, A. fumigatus and Fusarium graminearum (teleomorph of Gibberella zeae) as an outgroup. To minimize the effect of incorrect or incongruent gene models, these proteins were chosen on the basis of having identical numbers of introns in each species and similar lengths. Sequences were aligned using MUSCLE [70] and columns of low conservation were removed manually. Maximum-likelihood trees were constructed using the PHYLIP package, applying the JTT substitution model with a gamma distribution (alpha = 0.5) of rates over four categories of variable sites. Phylogenetic analyses of individual Af293, A1163, and N. fischeri proteins were carried out on sets of homologs identified in BLASTP searches against our fungal database. The top 20 hits with E<10−4 were retained for analysis. Sequences were aligned using ClustalW [71]. Poorly aligned regions were removed using Gblocks [72]. Finally, a maximum likelihood tree was drawn using PHYML [73]. Southern Blot Analysis To detect polymorphisms in the rosA (AFUA_6G07010) gene, several hybridizations were performed using rosA gene as the probe and genomic DNA cleaved with EcoRI, ClaI, BamHI or EcoRV. For comparison, an invariable gene for all species (apg5; AFUA_6G07040) was used as the hybridization probe on genomic DNA digested with HpaI. Colony Radial Growth Rate Measurement Colony radial growth rate measurements were performed as described [74]. For each isolate, four (90 mm diameter) Petri dishes containing 25 ml agar medium were inoculated centrally with 2.5 µl of 1×106 spores/ml suspension in PBS/Tween 80. Plates were then incubated at temperatures ranging from 25°C to 50°C and colony edges were marked using a plate microscope. Colonies were marked twice daily for 4–5 days. For each colony, two diameters perpendicular to each other were measured. Eight replicates were measured for each isolate. The results reported here are the mean of two experiments. At least five time points during the log phase were used to calculate growth rate. The radius of the colonies was plotted against time using least-square regression analysis, and the slope of the regression line, which represents the growth rate, was calculated. Each replicate was analysed separately and the mean of the growth rate was then calculated. Supporting Information Text S1 Allergens and sexual development genes. (0.05 MB DOC) Click here for additional data file. Table S1 Growth rates of Af293, A1163, N. fischeri, and A. clavatus isolates at various temperatures. (0.02 MB XLS) Click here for additional data file. Table S2 Resistance to antifungals among A. fumigatus clinical isolates. (0.02 MB XLS) Click here for additional data file. Table S3 Divergent A. fumigatus Af293 genes with respect to Af294, Af71, and A1163. (0.03 MB XLS) Click here for additional data file. Table S4 Distribution of polymorphic alleles among A. fumigatus isolates. (0.02 MB XLS) Click here for additional data file. Table S5 Five A. fumigatus loci exhibiting trans-species polymorphism. (0.02 MB XLS) Click here for additional data file. Table S6 A. fumigatus core and species-specific genes. (0.02 MB XLS) Click here for additional data file. Table S7 Features of A. fumigatus-specific genes. (0.02 MB XLS) Click here for additional data file. Table S8 Top biological processes overrepresented among four lineage specificity groups. (0.02 MB XLS) Click here for additional data file. Table S9 Enzymes that control the first step in secondary metabolite biosynthesis. (0.02 MB XLS) Click here for additional data file. Table S10 Top PFAM domains overrepresented among four lineage specificity groups. (0.02 MB XLS) Click here for additional data file. Table S11 Lineage specificity and chromosomal location. (0.02 MB XLS) Click here for additional data file. Table S12 Selective constraints operating on virulence-associated genes. (0.02 MB XLS) Click here for additional data file. Table S13 Known A. fumigatus Af293 allergens. (0.02 MB XLS) Click here for additional data file. Table S14 Predicted A. fumigatus Af293 allergens. (0.02 MB XLS) Click here for additional data file. Table S15 Selective constraints operating on sex genes. (0.02 MB XLS) Click here for additional data file. Table S16 Families of transposable elements identified in the Affc genomes. (0.01 MB XLS) Click here for additional data file. Table S17 Fungal genomes used in phylogenetic analyses. (0.02 MB XLS) Click here for additional data file.
                Bookmark

                Author and article information

                Contributors
                Role: Editor
                Journal
                PLoS Genet
                plos
                plosgen
                PLoS Genetics
                Public Library of Science (San Francisco, USA )
                1553-7390
                1553-7404
                June 2011
                June 2011
                9 June 2011
                : 7
                : 6
                : e1002070
                Affiliations
                [1 ]USDA–Agricultural Research Service, Purdue University, West Lafayette, Indiana, United States of America
                [2 ]Plant Research International B.V., Wageningen, The Netherlands
                [3 ]Department of Botany and Plant Pathology, Purdue University, West Lafayette, Indiana, United States of America
                [4 ]School of Veterinary and Biomedical Sciences, Murdoch University, Perth, Australia
                [5 ]IBWF e.V., Institute for Biotechnology and Drug Research, Kaiserslautern, Germany
                [6 ]HudsonAlpha Institute of Biotechnology, Huntsville, Alabama, United States of America
                [7 ]DOE Joint Genome Institute, Walnut Creek, California, United States of America
                [8 ]Rothamsted Research, Department of Plant Pathology and Microbiology, Harpenden, United Kingdom
                [9 ]School of Biological Sciences, University of Bristol, Bristol, United Kingdom
                [10 ]University of Arkansas, Fayetteville, Arkansas, United States of America
                [11 ]Syngenta, Jealott's Hill Research Centre, Bracknell, United Kingdom
                [12 ]Unidad de Biotecnología, Centro de Investigación Científica de Yucatán, A.C., (CICY), Mérida, México
                [13 ]Department of Plant Pathology and Plant-Microbe Biology, Cornell University, Ithaca, New York, United States of America
                [14 ]Architecture et Fonction des Macromolecules Biologiques, CNRS, Marseille, France
                [15 ]Wageningen University and Research Centre, Wageningen, The Netherlands
                [16 ]USDA–Agricultural Research Service, Ithaca, New York, United States of America
                [17 ]Diversity Arrays Technology Pty Ltd, Yarralumla, Australia
                [18 ]Embrapa Meio-Norte, Teresina, Piauí, Brazil
                [19 ]Bayer CropScience AG, Monheim, Germany
                [20 ]Embrapa-Cenargen, Brasilia, Brazil
                [21 ]Department of Genetics, Seed and Plant Improvement Institute, Karaj, Iran
                [22 ]Plant Pathology, Institute of Integrative Biology, Swiss Federal Institute of Technology (ETH), Zürich, Switzerland
                [23 ]CBS–KNAW Fungal Biodiversity Centre, Utrecht, The Netherlands
                [24 ]Environment and Agriculture, Curtin University, Bentley, Australia
                Fred Hutchinson Cancer Research Center, United States of America
                Author notes

                Conceived and designed the experiments: SB Goodwin, IV Grigoriev, GHJ Gema. Performed the experiments: S Ben M'Barek, AHJ Wittenberg, TAJ Van der Lee, PM Coutinho, B Henrissat, V Lombard, SB Ware, C Waalwijk. Analyzed the data: SB Goodwin, S Ben M'Barek, B Dhillon, AHJ Wittenberg, CF Crane, JK Hane, AJ Foster, J Grimwood, J Antoniw, A Bailey, B Bluhm, J Bowler, A Burgt, B Canto-Canché, ACL Churchill, L Conde-Ferràez, HJ Cools, M Csukai, P Dehal, P De Wit, B Donzelli, HC van de Geest, KE Hammond-Kosack, RCHJ van Ham, B Henrissat, A Kilian, AK Kobayashi, E Koopmann, Y Kourmpetis, A Kuzniar, E Lindquist, C Maliepaard, N Martins, R Mehrabi, JPH Nap, A Ponomarenko, JJ Rudd, A Salamov, J Schmutz, HJ Schouten, I Stergiopoulos, SFF Torriani, RP de Vries, A Wiebenga, L-H Zwiers, RP Oliver, IV Grigoriev, GHJ Gema. Contributed reagents/materials/analysis tools: CF Crane, JK Hane, A Aerts, E Lindquist, H Shapiro, H Tu. Wrote the paper: SB Goodwin, GHJ Gema. Managed the Community Sequencing Program: J Bristow

                Article
                PGENETICS-D-10-00112
                10.1371/journal.pgen.1002070
                3111534
                21695235
                135e723d-2b5f-47b6-a85a-6673f3dcb2ae
                This is an open-access article, free of all copyright, and may be freely reproduced, distributed, transmitted, modified, built upon, or otherwise used by anyone for any lawful purpose. The work is made available under the Creative Commons CC0 public domain dedication.
                History
                : 17 October 2010
                : 24 March 2011
                Page count
                Pages: 17
                Categories
                Research Article
                Agriculture
                Pest Control
                Biology
                Genetics
                Genomics

                Genetics
                Genetics

                Comments

                Comment on this article