Introduction Paracoccidioides, a dimorphic fungal pathogen, has infected approximately 10 million people in Latin America [1]. Each year, thousands of these infections develop into a systemic mycosis termed paracoccidioidomycosis, which requires prolonged treatment and has a high rate of relapse and complications [2], [3]. Despite the prevalence of Paracoccidioides infection, there is no estimate of the disease burden measured in disability-adjusted life years [4]. Other dimorphic fungi from the genera Coccidioides, Blastomyces, and Histoplasma cause over one million new infections each year in the United States alone [5], making dimorphic fungi the most common etiological agents of fungal pulmonary infection of otherwise healthy hosts [1], [6], [7]. All of these dimorphic fungi are pathogenic and during infection undergo a thermally regulated shift characterized by a morphological change between mycelial and yeast phases of growth [8]. These fungi are phylogenetically related, belonging to phylum Ascomycota, order Onygenales [9]. Onygenales also contains a number of non-dimorphic fungi including pathogens such as some Microsporum species and non-pathogens such as Uncinocarpus. A common attribute of all dimorphic pathogens is the distinct growth conditions associated with temperature dependent alterations in morphological state. Specifically, a non-virulent filamentous form consisting of long chained cells producing asexual spores is observed in soil or in culture at 23°C, and a budding yeast form (or in Coccidioides, a related spherule/endospore form) in the host pulmonary system or in culture at 37°C. In filamentous form, dimorphic fungi are thought to be saprophytic, although whether they primarily decay plant or animal matter has been debated [10]. Infection of a mammalian host typically occurs following disruption of the fungus residing in soil and subsequent inhalation of airborne microconidia, which transform into a parasitic yeast form in the pulmonary alveolar epithelium [6]. Recent studies have begun to identify genes required for pathogenicity in dimorphic fungi [11]. These include a hybrid histidine kinase (DRK1), which controls the temperature dependent mold to yeast transition [12]. The dimorphic transition is concurrent with relocalization and organization of membrane lipids [13], [14], global changes in cell wall composition including alterations in carbohydrate polymers, such as glucan structure and chitin content [15], [16], [17], and temperature and other growth condition dependent shifts in gene expression [18], [19]. Some genes induced during the dimorphic shift are required for virulence, including two adhesins (BAD1 and SOWgp), a calcium-binding protein (CBP1), and a catalase peroxidase [20]. The Paracoccidioides genus is composed of four distinct phylogenetic lineages (PS2, PS3, S1, and Pb01-like), which vary in virulence, culture adaptation, and induce different host immune responses [21], [22]. The three strains we selected for sequencing, Pb01, Pb03, and Pb18, represent three of the Paracoccidioides phylogenetic lineages. Strain Pb18 is a member of Species 1 (S1), which is composed of 38 isolates distributed across Latin America [21]. The Pb03 strain belongs to phylogenetic species 2 (PS2), which is composed of one Venezuelan and five Brazilian isolates. The extensively studied clinical isolate Pb01 is phylogenetically distinct from other strains and has been recently defined as a new species, Paracoccidioides lutzii [23]. In addition to interspecific variation, Paracoccidioides strains have been shown to contain extensive intraspecific genetic variability [24], [25], [26]. Comparing the genomes of the three isolates selected therefore allows identification of the features common across a wide range of diversity. In addition, the differences between the three strains provide a genome-wide pattern of the variation between strains in this genus. To further elucidate the genomic basis of growth and virulence in dimorphic fungi, here we describe high quality reference genomes for all three Paracoccidioides strains. Using a combination of computational approaches, we characterize gene family content and metabolic pathways across dimorphic and related fungi, particularly with respect to carbohydrate and protein metabolism. We then test experimentally if predicted metabolic pathways correlate with carbon utilization using phenotype microarrays [27]. Only a few studies have characterized carbon utilization phenotypes in filamentous fungi (e.g., [28], [29]), and none of these studies have been conducted on Onygenales. We therefore screen the non-pathogenic Onygenale Uncinocarpus reesii for growth on 190 compounds as the sole carbon source, to better understand the metabolic capabilities of dimorphic fungi in mycelial form. Our analysis comparing the Paracoccidioides genomes with diverse relatives from the order Onygenales has also provided key insights into genomic attributes that have contributed to the divergence of the Paracoccidioides lineage from other dimorphic fungal species, as well as the genetic diversity which differentiates P. brasiliensis strains from the P. lutzii (Pb01-like) species. These insights include a catalogue of unique genes and metabolic pathways that are conserved with close dimorphic relatives. Our study provides a basis from which to identify the underlying molecular differences that determine the infectious potency of Paracoccidioides strains and give rise to the clinical profiles attributable to paracoccidioidomycosis. Results Genome characteristics We produced 8-10X sequence coverage using Sanger technology for two strains of P. brasiliensis (Pb03 and Pb18), and one strain of the recently classified related species P. lutzii (Pb01) [23] (Table 1, Materials and Methods). The sequence was assembled using Arachne [30], and scaffolds representing the mitochondrial genome were separated out. To assess assembly accuracy and completeness, we generated an optical map of the Pb18 strain consisting of five linkage groups which likely correspond to complete chromosomes (Figure 1, Table S1). A total of 94% of the Pb18 assembly could be anchored to the optical map; the unanchored sequence was repetitive and similar in size to the unaligned regions of the optical map. 10.1371/journal.pgen.1002345.g001 Figure 1 Genome organization of P. brasiliensis. Alignment of Pb18 supercontigs to the five chromosomes from the optical map are shown. Tracks (left to right) show gene frequency (blue), transposon frequency (red), synteny with P. lutzii (yellow), and synteny with Histoplasma (purple). 10.1371/journal.pgen.1002345.t001 Table 1 Assembly and gene statistics. P. lutzii P. brasiliensis Pb03 Pb18 Coverage 8.0X 8.9X 9.8X Assembly size (Mb) 32.9 29.1 30.0 Total contig length (Mb) 32.6 28.8 29.4 Scaffolds 111 65 57 Scaffold N50 (Mb) 1.02 1.97 2.13 Contigs 885 552 669 Contig N50 (kb) 84.3 114.9 109.7 Quality ≥Q40 (%) 98.9 98.9 98.9 GC (%) 42.8 44.5 44.4 Predicted protein-coding genes 9,132 7,875 8,741 Dubious genes 1,002 265 699 High-confidence genes 8,130 7,610 8,042 Mean gene length (nt) 1,814 1,833 1,802 Mean coding sequence length (nt) 1,330 1,433 1,346 Mean intron length (nt) 126 140 132 Mean intron number per gene 3.1 2.5 2.8 Mean exon number per gene 4.1 3.5 3.8 GC exonic (%) 49.8 50.4 50.1 GC intronic (%) 41.7 42.4 42.3 Mean intergenic length (nt) 1,799 1,848 1,628 tRNAs 118 103 103 Transmembrane proteins 1121 1057 1084 Secreted proteins 297 291 322 GPI-anchored proteins 61 63 62 The two P. brasiliensis genomes, Pb18 and Pb03, are similar in size (30.0 Mb and 29.1 Mb respectively) (Table 1). The P. lutzii genome is nearly 3 Mb larger at 32.9 Mb. The total number of initial predicted genes varies between 7,875 in Pb03 to 9,132 in P. lutzii. We assessed the evidence supporting each gene prediction in all three genomes, and flagged as dubious those with low levels of support in each of the three genomes. Once accounting for dubious genes, the set of high confidence genes is more similar across the three genomes, varying between 7,610 in Pb03 to 8,130 in P. lutzii (Table 1). While all three Paracoccidioides genomes are highly syntenic, Pb18 and Pb03 share a significantly higher percentage of sequence similarity (∼96%) to each other in comparison to P. lutzii (∼90%) (Figure 1, Figure S1A, Table S2). Substantial synteny also exists between Paracoccidioides and Histoplasma, totaling 2.4 Mb; about half this amount is syntenic between Paracoccidioides and Coccidioides (Figure S1B, Table S2). Additionally, we used DAGchainer [31] to identify syntenic blocks of 6 or more genes between the species of Paracoccidioides. For P. brasiliensis strains Pb03 and Pb18, 95% and 93% of the genome was represented by syntenic blocks. For P. lutzii, 85% of the genome was represented by syntenic blocks. We then identified gene ontology (GO) terms enriched in these unique regions, by comparing all unique regions of all genomes to all non-unique regions of all genomes. While we mostly found transposable element-related PFAM and GO terms significantly over-represented (Table S3), some other PFAM domains were enriched in unique regions, such as protein kinases (PF00069, Fisher's exact test, p 80% length using the CODEML program of the PAML package (version 3.15) [104] using the relevant topology from the phylogenetic analysis. To identify GO terms representing categories of rapidly evolving genes, we resampled dN/dS without replacement onto genes to generate 1000 pseudoreplicates. For each GO term, we then compared the observed distribution of dN/dS to the resampled distribution to calculate a p-value. P-values were then corrected for multiple comparisons using the false discovery rate of Storey [105]. Ancestral character state reconstruction was performed using GeneTRACE [41]. Catabolic phenotype screen Metabolic analysis was conducted on Uncinocarpus reesii, a non-pathogenic soil dwelling fungus in the Onygenales. U. reesii is the closest non-pathogenic relative of Paracoccidioides for which a complete genome sequence is available. U. reesii strain UAMH 3881 was purified to single colonies on agar plates containing ½ Yedex at room temperature. To confirm the species of the culture, genomic DNA was extracted by bead beating in phenol and the ITS region was amplified using ITS1F [106] and ITS4 [107] primers and sequenced by Sanger technology (GenBank accession JF451137). By comparing to other sequences in the GenBank nt nucleotide database using MEGABLAST, the closest matches were identified as other Uncinocarpus reesii ITS sequences from other strains, at 97–99% identity and 99–100% coverage. After growth for 1 week, spores were gathered using sterile cotton swabs by scraping from the surface of growth on the plate. Cells were innoculated into 12 ml FF innoculating fluid (Biolog, Hayward CA). The resuspension of cells was gently vortexed to disaggregate the cell mass, then spun at 5000 x G for four minutes to separate clumps from well-dispersed innoculum. Supernatants were diluted with another 12 ml and mixed by inversion. 100 µl of resuspension was innoculated into PM1 and PM2a plates in triplicate (Biolog, Hayward CA). Growth was monitored at days 1, 3, 5, and 7 by measuring optical density at 590 nm of U. reesii growing on a compound by subtracting average OD590 values of the values for compounds without cells, as well as subtracting the average absorbance values for negative controls that contain cells without carbon sources. The Biolog system measures absorbance at 590 nm, and does not measure colors typical of secondary metabolites, such as red, orange, or brown. Furthermore, in the fungal-specific system, the experiment only measures turbidity rather than in bacterial experiments where 590 nm is measured in the presence of growth sensing purple dyes. Values above 0.15 after background subtraction at day 7 were taken as evidence of growth that was further supported by kinetic increases in the light scattering measurements during the time course. The growth substrates that were identified from the 190 compounds screened were separated into categories containing nitrogenous compounds, carbohydrates and oligosaccharides. FunK1 analysis Orthologous groups of protein kinases identified as unique to the dimorphic fungi were identified as FunK1 genes and then aligned using MUSCLE [101]. This alignment was subsequently used with Hmmer [108] to identify additional FunK1 genes. The final set of FunK1 genes was realigned with MUSCLE and visualized with WebLogo [109]. Secondary metabolism genes We predicted secondary metabolite gene clusters (SMGC) using the Secondary Metabolite Unique Regions Finder (SMURF; http://www.jcvi.org/smurf/index.php). Ten backbone enzymes were found in each of the Paracoccidioides genomes, and all are conserved as single copy orthologs, found in syntenic regions between the genomes. Within each genome, two classes of enzymes were found; a total of six nonribosomal peptide synthases (NRPSs) and four polyketide synthases (PKSs) (Table S15). No dimethylallyl tryptophan synthases (DMATSs) or and terpene cyclases (TCs) were identified. Two clusters are closely linked in the P. brasiliensis genomes; one is a cluster around a NRPS (PABG_00426/PADG_02836) and the other around a PKS (PABG_00438/PADG_02849). Drug target search To filter genes conserved in humans, Paracoccidioides protein sequences from P. lutzii, Pb03 and Pb18 were queried using BLAST against a human genome protein sequence database (HG17, Build 35) (Filter query sequence = False, expect = 0.001). P. brasiliensis sequences that did not hit a human sequence were mapped to conserved orthoMCL orthologs in Paracoccidioides. This initial analysis identified 1,977 genes, of which 1,628 (82.4%) genes were named as conserved hypothetical protein, hypothetical protein or predicted proteins indicating a high fraction of poorly conserved or dubious genes. From those genes, 38 genes were selected (Table S18), which were present in the three Paracoccidioides isolates, based on their potential as drug targets based on localization or conservation in other pathogens. These function of these genes and support for their potential as targets is described in the supplement (Text S1 part 6). Supporting Information Figure S1 Dotplot views of mummer genome alignments. A. Nucmer alignments of Paracoccidioides genomes. B. Promer alignments of Paracoccidioides genomes to H. capsulatum and C. immitis. (EPS) Click here for additional data file. Figure S2 Distribution of SNPs for P. lutzii (red) and P. brasiliensis (Pb03) (blue) across P. brasiliensis (Pb18) chromosomes. Pb18 supercontigs are shown as grey shaded rectangles at the top of each chromosome plot. (EPS) Click here for additional data file. Figure S3 Transposable element composition of the Paracoccidioides genomes. (EPS) Click here for additional data file. Figure S4 CLANS clustering of Paracoccidioides Transposase_5 domain Tc1 elements. (EPS) Click here for additional data file. Figure S5 CLANS clustering of Transposase_5 domain Tc1 elements. (EPS) Click here for additional data file. Table S1 Pb18 optical map and assembly alignment statistics. (DOC) Click here for additional data file. Table S2 Nucmer genome alignment statistics. (DOC) Click here for additional data file. Table S3 Gene Ontology (GO) and PFAM terms enriched in unique regions of the Paracoccidioides genomes. (DOC) Click here for additional data file. Table S4 Simple and low complexity repeats content of genomes. (DOC) Click here for additional data file. Table S5 Mitochondrial genome statistics. (DOC) Click here for additional data file. Table S6 Gene Ontology (GO) terms significantly over- or under-represented. (XLS) Click here for additional data file. Table S7 CAZY carbohydrate active enzyme classification counts for 15 fungal genomes. (XLS) Click here for additional data file. Table S8 MEROPS peptidase classification counts for 15 fungal genomes. (XLS) Click here for additional data file. Table S9 Degradation pathways for amino acids and carbohydrates in Paracoccidioides and Uncinocarpus. (DOC) Click here for additional data file. Table S10 Carbon source utilization by Uncinocarpus reesii. (XLS) Click here for additional data file. Table S11 Predicted Histidine kinases. (DOC) Click here for additional data file. Table S12 Rapidly evolving Gene Ontology (GO) terms in selected groups. (DOC) Click here for additional data file. Table S13 Conservation of mating genes. (XLS) Click here for additional data file. Table S14 Conservation of meiosis genes. (XLS) Click here for additional data file. Table S15 Predicted secondary metabolite backbone enzymes. (DOC) Click here for additional data file. Table S16 Homologs of VelvetA complex, LaeA, and Ryp1. (DOC) Click here for additional data file. Table S17 Conservation of genes involved in sterol biosynthesis. (DOC) Click here for additional data file. Table S18 Chitin synthases. (DOC) Click here for additional data file. Table S19 Potential drug targets conserved in Paracoccidioides. (DOC) Click here for additional data file. Table S20 Sequence statistics for trimmed reads. (DOC) Click here for additional data file. Text S1 Supplementary methods and text. (DOC) Click here for additional data file.