Introduction HIV infection was first detected in the United Kingdom (as AIDS) in 1981–2 [1] among MSM. Early outbreaks with UK sources include Scottish IDUs dated to 1983 [2] and haemophiliacs to 1984 [3]. All strains isolated initially were of the B subtype, both in MSM and IDUs [4] and also in the small number of individuals infected through heterosexual contact during that decade [5]. However within 10 years, multiple subtypes had been detected within the UK [6]. From the mid 1990s increasing numbers of HIV infections in the UK were being found in heterosexuals, until the current situation was attained whereby this risk group comprises the majority of new HIV diagnoses [7]. This increase coincided with increasing immigration from southern and Eastern Africa, particularly from South Africa, Uganda and Zimbabwe [8]. Genetic characterisation of viruses from infected heterosexuals revealed that while subtype B was still observed in the majority of samples obtained during 1996/7 [9], by the year 2000, subtype C was most common (35%) with subtype A at 15%, reflecting the main subtypes in those countries. Subtype B was present in only 25% of individuals [10]. Thus, the heterosexual risk group in the UK has become strongly associated with non-B HIV subtypes. Recently there has been some evidence of limited crossover among risk groups with a study of over 5000 patients from London reporting 2 small clusters of subtype A (n = 21) among MSM, of whom approximately 50% of individuals were white [11]. We have applied recently developed methods of molecular phylodynamics to the analysis of partial HIV pol gene sequences obtained during routine clinical care from over 2000 MSM attending a single large clinic in London [12]. We showed that 25% of individuals whose virus showed a link to at least one other individual in the study were in fact linked to 10 or more others. Using relaxed clock approaches [13] we found that 25% of transmissions within these clusters took place within a maximum of 6 months after infection. This suggested that the elevated risks of transmission associated with acute HIV infection could be important for driving a significant component of the HIV epidemic among MSM. In this study we have analysed the entire dataset of individuals infected with non-B subtypes of HIV and receiving clinical care within the UK who are represented in the UK HIV Drug Resistance Database. The overwhelming majority (95%) of non-B subtype HIV in this dataset is associated with heterosexual transmission and 83% with Black-African ethnicity [14]. Since 2003 in the UK, a baseline HIV genotyping assay has been recommended when antiretroviral therapy is initiated and accordingly a large proportion of sequences within the database have been obtained prior to therapy. Non-B subtype HIV pol sequences were available from over 11,000 individuals for this study: for comparison the estimated number of HIV-infected Black African and Caribbean individuals in the UK was 24,000 in 2007 [7]. We therefore estimate we have analysed almost 40% of the UK heterosexual HIV-infected population. Results Detection of transmission clusters From the sequence dataset representing over 25,000 subjects, non-B subtypes were identified mainly using the REGA method [15], with additional information from ad hoc phylogenetic analysis (see Methods). Due to the limited number of subtypes other than A and C, these other non-B subtypes were grouped for analysis. This gave datasets of the following sizes: for subtype A, N = 1581; for C, N = 6096 and for other non-B subtypes, N = 3394. Within these groups, the initial subset of sequences linked to at least one other was selected from all pairwise comparisons using the threshold of 4.5% nucleotide similarity at third codon positions [12]. This identified sequences from 367 patients infected with subtype A, 1372 infected with subtype C and 1035 infected with other non-B subtypes, a total of 2774 individuals. The datasets were then modified by removal of codons associated with drug resistance (see Methods) and Bayesian MCMC phylogenetic analysis was performed on subtype A and subtype C separately. In the resulting trees, 4 subtype A and 14 subtype C phylogenetic clades of ≥10 individuals were identified with a posterior probability of 1 (Figures S1 & 2). This corresponds to 25% of the subtype A closely-related sequences and 21% of the subtype C closely-related sequences. A similar analysis was performed on the 1035 sequences from other non-B subtypes. In the last case, the main fully supported clades reflected subtype divisions and were unrelated to transmission patterns. However, from within the main subtype splits we were able to identify 7 fully supported subtrees of ≥10 individuals for further analysis (Figure S3). Unlike the case for the subtype B sequences previously studied [12], the clustering of non-subtype B sequences includes patient linkage outside of the UK. We therefore performed further analyses in which the nearest sequences to each cluster from the global HIV database were included. This leads to the breakdown of a number of clades through the inclusion of sequences from outside the UK within what were previously monophyletic groups (Figure 1A, 1B & S4). The resulting distribution of cluster size is shown in Figure 2. Including the closest sequences from the global HIV database left 296 individuals that were in UK-based groups of 3 or more individuals. Large clusters still comprise a significant proportion of patients with a link to at least one other. The largest for subtype A was a cluster with 24 individuals and that for subtype C was one of 33 individuals. The percentage of sequences found in clusters ≥10 individuals was 14% (subtype A); 6% (subtype C) and 1% (others), respectively. A total of 143 of the original 2774 (5%) individuals were found in large clusters, although these comprised 48% of individuals within UK-based groups of 3 or more. 10.1371/journal.ppat.1000590.g001 Figure 1 Time-scaled Bayesian MCMC phylogenies of clusters of ≥10 patients. Red dots indicate the most recent common ancestor (MRCA) of UK transmission clusters as defined against analysis with global diversity. The scale bar is in calendar years. Grey lines indicate non-UK-based segments of the phylogeny, black lines indicate UK-based lineages. A) Subtype A. Scale bar indicates calendar years. B) Subtype C. Scale bar indicates 2 calendar years. 10.1371/journal.ppat.1000590.g002 Figure 2 Distribution of cluster size. Frequency of UK-based clusters, as defined in the text, of size 2 or higher, identified by subtype. A) Non-B subtypes (this study). B) Subtype B [12]. In this and our previous study of subtype B sequences, the distribution of individuals in clusters strongly suggested a power law relationship indicative of a scale-free network. With the additional data available we have examined the fit of a power law to the non-B subtype data. The goodness of fit to a power law varies with the maximum time depth allowed for clusters. We have used the date of sampling to limit the time depth and having considered a range of values (Figure S5), find that restricting the analysis to subclusters with a maximum depth of 5 years reveals a very good fit (Figure 3; R2 = 0.95; p 8 years result in too many connections for a good fit to a power law. (0.04 MB PDF) Click here for additional data file. Figure S6 Time-scaled phylogenies of subtype A clusters of size ≥10 with terminal branches removed. Red nodes indicate UK transmission clusters as defined against analysis with global diversity. Black/grey nodes indicate where terminal branches have been removed. The scale bar is in calendar years. (0.04 MB PDF) Click here for additional data file. Figure S7 Time-scaled phylogenies of subtype C clusters of size ≥10 with terminal branches removed. Red nodes indicate UK transmission clusters as defined against analysis with global diversity. Black/grey nodes indicate where terminal branches have been removed. The scale bar is in calendar years. (0.05 MB PDF) Click here for additional data file. Figure S8 Time-scaled phylogenies of other non-B subtype clusters of size ≥10 with terminal branches removed. Red nodes indicate UK transmission clusters as defined against analysis with global diversity. Black/grey nodes indicate where terminal branches have been removed. The scale bar is in calendar years. (0.05 MB PDF) Click here for additional data file. Figure S9 Histograms of CD4 counts by HIV subtype. (A–D) Distribution of first available CD4 count (“Diagnosis”) by subtype. (E–H) Distribution of CD4 count at first treatment (Treatment”) by subtype. (I–L) Distribution of CD4 count at treatment after correction by subtype. (M–N) Combined distributions at diagnosis (M) and first treatment (N). (0.11 MB PDF) Click here for additional data file. Figure S10 Recruitment to the UK HIV Drug Resistance Database. Number of individuals recruited to the UK HI Drug Resistance Database by year, according to treatment status at recruitment. Naïve: recruited with HIV genotype assay taken before initiation of therapy (to identify transmitted drug resistance). Experienced: recruited with genotype assay performed due to failure of existing antiretroviral therapy. (0.05 MB PDF) Click here for additional data file. Text S1 (0.10 MB PDF) Click here for additional data file.