12
views
0
recommends
+1 Recommend
0 collections
    0
    shares
      • Record: found
      • Abstract: found
      • Article: found
      Is Open Access

      Recapitulating phylogenies using k-mers: from trees to networks

      research-article

      Read this article at

      Bookmark
          There is no author summary for this article yet. Authors can add summaries to their articles on ScienceOpen to make them more accessible to a non-specialist audience.

          Abstract

          Ernst Haeckel based his landmark Tree of Life on the supposed ontogenic recapitulation of phylogeny, i.e. that successive embryonic stages during the development of an organism re-trace the morphological forms of its ancestors over the course of evolution. Much of this idea has since been discredited. Today, phylogenies are often based on families of molecular sequences. The standard approach starts with a multiple sequence alignment, in which the sequences are arranged relative to each other in a way that maximises a measure of similarity position-by-position along their entire length. A tree (or sometimes a network) is then inferred. Rigorous multiple sequence alignment is computationally demanding, and evolutionary processes that shape the genomes of many microbes (bacteria, archaea and some morphologically simple eukaryotes) can add further complications. In particular, recombination, genome rearrangement and lateral genetic transfer undermine the assumptions that underlie multiple sequence alignment, and imply that a tree-like structure may be too simplistic. Here, using genome sequences of 143 bacterial and archaeal genomes, we construct a network of phylogenetic relatedness based on the number of shared k-mers (subsequences at fixed length k). Our findings suggest that the network captures not only key aspects of microbial genome evolution as inferred from a tree, but also features that are not treelike. The method is highly scalable, allowing for investigation of genome evolution across a large number of genomes. Instead of using specific regions or sequences from genome sequences, or indeed Haeckel’s idea of ontogeny, we argue that genome phylogenies can be inferred using k-mers from whole-genome sequences. Representing these networks dynamically allows biological questions of interest to be formulated and addressed quickly and in a visually intuitive manner.

          Related collections

          Most cited references23

          • Record: found
          • Abstract: found
          • Article: not found

          Phylogenetic classification and the universal tree.

          From comparative analyses of the nucleotide sequences of genes encoding ribosomal RNAs and several proteins, molecular phylogeneticists have constructed a "universal tree of life," taking it as the basis for a "natural" hierarchical classification of all living things. Although confidence in some of the tree's early branches has recently been shaken, new approaches could still resolve many methodological uncertainties. More challenging is evidence that most archaeal and bacterial genomes (and the inferred ancestral eukaryotic nuclear genome) contain genes from multiple sources. If "chimerism" or "lateral gene transfer" cannot be dismissed as trivial in extent or limited to special categories of genes, then no hierarchical universal classification can be taken as natural. Molecular phylogeneticists will have failed to find the "true tree," not because their methods are inadequate or because they have chosen the wrong genes, but because the history of life cannot properly be represented as a tree. However, taxonomies based on molecular sequences will remain indispensable, and understanding of the evolutionary process will ultimately be enriched, not impoverished.
            Bookmark
            • Record: found
            • Abstract: found
            • Article: not found

            Complete genome sequence of the Q-fever pathogen Coxiella burnetii.

            The 1,995,275-bp genome of Coxiella burnetii, Nine Mile phase I RSA493, a highly virulent zoonotic pathogen and category B bioterrorism agent, was sequenced by the random shotgun method. This bacterium is an obligate intracellular acidophile that is highly adapted for life within the eukaryotic phagolysosome. Genome analysis revealed many genes with potential roles in adhesion, invasion, intracellular trafficking, host-cell modulation, and detoxification. A previously uncharacterized 13-member family of ankyrin repeat-containing proteins is implicated in the pathogenesis of this organism. Although the lifestyle and parasitic strategies of C. burnetii resemble that of Rickettsiae and Chlamydiae, their genome architectures differ considerably in terms of presence of mobile elements, extent of genome reduction, metabolic capabilities, and transporter profiles. The presence of 83 pseudogenes displays an ongoing process of gene degradation. Unlike other obligate intracellular bacteria, 32 insertion sequences are found dispersed in the chromosome, indicating some plasticity in the C. burnetii genome. These analyses suggest that the obligate intracellular lifestyle of C. burnetii may be a relatively recent innovation.
              Bookmark
              • Record: found
              • Abstract: found
              • Article: not found

              Dynamics of Genome Rearrangement in Bacterial Populations

              Introduction Genome arrangement has profound effects on organismal phenotype. Genome arrangement likely impacts gene expression [1],[2],[3], and can result in total loss of gene function when a rearrangement breakpoint occurs inside a reading frame. Moreover, rearrangements are known to affect linkage and introduce genetic incompatibility in eukaryotes [4]. Similar recombination-stifling effects have been proposed in prokaryotes [5],[6], whose capacity for genetic exchange among divergent taxa has only recently been appreciated [7]. In naturally competent microbes which undergo frequent homologous recombination, genome arrangements themselves may be better indicators of vertical inheritance than other molecular characters. Our ability to measure gene order and chromosome structure has undergone several revolutions, beginning with careful study of linkage maps [8], later moving towards direct observation by microscope, FISH, Radiation Hybrid, paired-end genome sequencing, and Optical Mapping techniques [9],[10],[11],[12]. The continued improvement in measurement technology has offered revelations regarding the pattern and extent of genome rearrangement in organisms ranging from bacteria [13] to mammals [14]. In circular bacterial chromosomes, DNA replication divides the circular chromosome into two domains called replichores. Replication begins when DNA polymerase holoenzymes anneal to the origin of replication (ori). Two holoenzymes then simultaneously copy the circular chromosome in opposite directions, and initially the DNA polymerase holoenzymes are co-localized in the cell in a so-called “replication factory” [15]. Each holoenzyme copies about half the chromosome, and they eventually meet each other in the Ter macrodomain. The Ter macrodomain spans a large portion of the chromosome opposite the origin of replication and contains several ter sites which bind proteins that halt procession of DNA polymerase [16]. In cases where homologous recombination has taken place during replication, the XerCD molecular machinery resolves the chromosome dimer at the dif site [17],[18]. Moreover, the predominant site of replication termination appears to be at or near the dif site [19]. We refer to each half of the chromosome, delineated by ori and dif, as a replichore. Hereafter we will use the word “terminus” or phrase “terminus of replication” to refer to the approximate location of the dif site. Genome sequencing has revealed that rearrangements do not occur with uniformly distributed endpoints on circular prokaryotic chromosomes. Instead, a striking pattern of inversions with endpoints biased by the origin and terminus of replication has commonly been observed [20],[21],[22],[23]. Several explanations for the observed pattern have been devised, all of which focus on the nature of DNA replication in circular chromosomes. An inter-replichore inversion refers to a chromosomal inversion with one endpoint in each replichore. Such inversions swap the relative orientations of the origin and terminus. If the inversion endpoints are equally distant from the origin, then replichore sizes remain unchanged—a so-called “symmetric inversion”. Previous genome analyses indicate that inversions typically occur with breakpoints in oppositely oriented repetitive elements [24],[25],[26]. When DNA damage occurs, the homology-dependent recombination-repair machinery recruits another copy of the repetitive element as a repair template. Inversions, deletions, and duplications occur when the resulting Holliday junction is incorrectly resolved. Whereas recombination among inverted repeats leads to inversions, recombination among direct repeats leads to deletion. When the recombination among direct repeats occurs during replication, the segment becomes deleted from one chromosome and duplicated in the other. Bacterial DNA replication appears to induce a multitude of mutational biases and selective forces with respect to their chromosome architecture [27]. Chromosomes are thought to remain small due to a general deletion bias [28]. Strand-specific oligomers such as χ sites [29] assist with DNA repair, while KOPS/AIMS [30],[31] have roles in DNA replication and chromosome segregation. Such sequence signals would be disrupted by inversions within a single replichore, but not by inter-replichore inversions. Moreover, a large survey of Salmonella genomes in culture has provided evidence that genomes with equal-sized replichores (balanced replichores) may be under positive selection [32]. It is currently unknown whether symmetric inter-replichore inversions are frequently observed simply because they occur more frequently than other rearrangements (a recombination bias), or whether other patterns of rearrangement commonly occur but are strongly selected against [26]. The observed frequency of rearrangement relative to neutral substitution is highly variable in different organisms. The frequency of observed rearrangement in modern genomes correlates with the presence of repeats induced by mobile genetic elements [26],[33]. Interestingly, mobile genetic elements (IS elements/transposons) are also associated with the generation of pseudogenes, genome reduction, and adaptive evolution of niche change [34]. Large-scale inversion and deletion are both driven by homologous recombination among repeat elements. Taken together, these associations suggest that methods to predict episodes of ancient genome rearrangement may be able to uncover historical genome reduction and transitions in ecological niche. Studies of Yersinia have revealed extensive genomic rearrangement relative to conspecific isolates, and IS elements have been implicated in the rearrangement process. The recent availability of several finished Yersinia genome sequences offers the possibility to investigate patterns and biases associated with genomic rearrangement. Yersinia pestis played a role as the causative agent of the three major plague epidemics which together resulted in millions of deaths over the past two millenia [35]. Previous molecular studies have indicated that Yersinia pestis is a recently emerged clone of Y. pseudotuberculosis, with an estimated divergence less than 20,000 years ago [36], although some ambiguity in the branching order of Y. pestis isolates remains [37]. Given its pathogenic lifestyle, Y. pestis population dynamics are different from those of non-pathogens and the effect of population dynamics on genome arrangement warrants consideration. Upon infection of a human host, Y. pestis likely undergoes expansive population growth. Transmission to a new host is usually mediated by a flea vector which can viably harbor only a small number of Yersinia cells compared to an infected human. As such, modern Y. pestis may have undergone several cycles of unconstrained population growth followed by extreme transmission bottlenecks. The unconstrained growth phase could permit generation of cell lines with genomic rearrangement, which are subsequently fixed by the transmission bottlenecks. Such population dynamics would serve to increase the observed rate of rearrangement. Previous experimental work has characterized patterns of genome arrangement in isolates of E. coli and Salmonella whose genomes were artificially perturbed in the laboratory [38]. Our study represents the first attempt to quantify selection and recombination bias acting on genome arrangement in a naturally evolving population. Results Genome Arangement History of Yersinia We apply a Bayesian MCMC sampler to investigate selection and recombination bias acting on genome rearrangements in sequenced Yersinia isolates. At the time of this study, nine finished Yersinia genomes were publicly available, listed in Table 1, and several more had been sequenced to draft quality. As the Yersinia pestis are very recently diverged, only a small number of nucleotide substitutions have been observed in fully sequenced genomes [39], and efforts to reconstruct the Yersinia phylogeny have consequently been forced to integrate presence/abscence patterns of IS elements and VNTR sequences [37]. 10.1371/journal.pgen.1000128.t001 Table 1 Fully sequenced Yersinia genomes analyzed for genome rearrangements. Organism Pathogenesis Genome Size dif o Accession Ref Y. pestis Antiqua Plague 4,702,289 nt 0.39 + CP000308 [39] Y. pestis Nepal516 Plague 4,534,590 nt 0.43 + CP000305 [39] Y. pestis 15–70 (Pestoides F) Plague 4,517,345 nt 0.77 + NC009381 unpubl. Y. pestis CO92 Plague 4,653,728 nt 0.55 + AL590842 [54] Y. pestis KIM Plague 4,600,755 nt 0.51 + AE009952 [25] Y. pestis 91001 avirulent 4,595,065 nt 0.50 + AE017042 [78] Y. pseudotuberculosis IP 32954 enterocolitis 4,744,671 nt 0.54 + BX936398 [79] Y. pseudotuberculosis IP 31758 enterocolitis 4,721,828 nt 0.46 − AAKT02000001 [80] Y. enterocolitica 8081 enterocolitis 4,615,899 nt 0.48 + AM286415 [42] The reported genome size is the size of the primary circular chromosome without plasmids. The dif column indicates the approximate position of the replication terminus dif site, ranging between 0 and 1, where the origin of replication is at 0 and 1 on the circular chromosome. The o column indicates whether the origin and terminus dif site have the canonical relative orientation (+) or the inverse relative orientation (−): see text for details. Pairwise comparisons of Yersinia genomes have revealed a large number of genomic rearrangements [25],[40] which may be suitable phylogenetic characters. As large-scale genome rearrangement is thought to be a low-homoplasy molecular character [41] impervious to lateral exchange by homologous recombination, even a small number of rearrangements may suffice to resolve phylogenetic tree topology. Genome Alignment and Replichore Sizes In order to compute a rearrangement history, we require genomes to be encoded as a signed permutation matrix indicating order and orientation of homologous segments in each genome. We used the Mauve multiple genome alignment software to identify and align 84 Locally Collinear Blocks (LCBs) shared among the 9 Yersinia genomes. Differential gene content among Yersinia lineages precludes a nine-way alignment that completely covers each genome. On average 81.5% of each genome is contained within LCBs, and the remaining lineage-specific regions reside in breakpoint regions. The breakpoint regions cannot be unambiguously assigned to either neighboring LCB, and the uncertainty about their placement in ancestral genome arrangements causes corresponding uncertainty in ancestral replichore sizes. While Y. pestis and Y. pseudotuberculosis share a majority of their gene content, Y. enterocolitica has substantial differential content relative to the other eight taxa [42]. To mitigate inference problems related to differential gene content (see Methods), we removed Y. enterocolitica from our analysis and computed an alignment on the remaining 8 taxa using a procedure described in Methods. The alignment of eight Y. pestis and Y. pseudotuberculosis strains, shown in Figure 1, consists of 78 LCBs (79 before considering genome circularity) that cover an average of 93.3% of each genome. The distribution of LCB lengths (Figure 2) appears to be geometric, consistent with expectation under the Nadeau-Taylor random breakage model [14]. For the purpose of inferring ancestral replichore sizes, we divide each of the 78 breakpoint regions in half and assign each half to a neighboring LCB. The origin and terminus of replication in each genome were assigned on the basis of a consensus prediction and homology (see Methods). 10.1371/journal.pgen.1000128.g001 Figure 1 A genome alignment of eight Yersinia isolates. Whole genome alignment of eight Yersinia genomes using Mauve [77] reveals 78 locally collinear blocks conserved among all eight taxa. Each chromosome has been laid out horizontally and homologous blocks in each genome are shown as identically colored regions linked across genomes. Regions that are inverted relative to Y. pestis KIM are shifted below a genome's center axis. The origin of replication in each genome is approximately at coordinate 1 and the terminus dif sites are approximately midway through each genome, as marked by grey vertical bars. The termini were identified by sequence comparison with Y. pestis KIM, where they were characterized by extensive sequence analysis [25]. Figure generated by Mauve, free/open-source software available from http://gel.ahabs.wisc.edu/mauve. 10.1371/journal.pgen.1000128.g002 Figure 2 Lengths of Locally Collinear Blocks shared by the eight Yersinia genomes. Block lengths are taken from the Y. pestis KIM reference genome. Bayesian Analysis of Rearrangement Phylogeny We used a modified version of the BADGER 1.01b software to sample the posterior probability distribution of phylogenetic trees, mutation rate, and genome arrangement histories using inversions as mutation operations. The model treats all inversion events to be equally likely a priori, with no explicit preference for rearrangements that maintain or improve replichore balance. The prior distribution on branch lengths creates a strong preference for histories with fewer inversions. Like other Bayesian MCMC samplers for phylogenetics, the method used here creates an initial phylogenetic tree with mutation events mapped onto the branches, then repeatedly proposes modifications to the current tree topology, mutation history, and branch lengths. Any proposed modifications are accepted with probability dictated by the Metropolis-Hastings ratio [43],[44]. The initial proposed reconstruction of inversion history typically has very low likelihood and proposed modifications are generally accepted until the likelihood reaches a local maxima. The initial period of sampling is commonly referred to as burn-in. Samples taken during burn-in are discarded since the Markov-chain has not yet converged to the true posterior distribution. As applied to the 78 Yersinia LCBs, we ran chains with 1,510,000 modification proposal steps, discarded the first 10,000 steps of each chain as burn-in and then subsampled every 50 steps (details in Methods). The resulting posterior sampling consists of 30,000 complete genome arrangement histories. Each sampled history contains a tree topology with inversion events mapped onto the branches. In total, the sampled histories contain 30,000 tree topology estimates and 2,520,185 genome arrangements, of which 2,280,185 are inferred ancestral arrangements and 240,000 are modern genome arrangements. Visualization of the posterior distribution of trees using SplitsTree v4 [45] reveals a small amount of topological ambiguity as a splits network (Figure 3). Contributing to topological ambiguity are seven different tree topologies with parsimonious inversion histories of 79 events. All seven parsimonious topologies differ in their grouping of Y. pestis isolates. Nonetheless, the Y. pestis are found to be monophyletic, with subgroupings that are consistent with previously published genome analyses [39]. Application of a maximum parsimony algorithm to reconstruct inversion phylogeny recovers one of the seven parsimonious topologies identified by BADGER, also with 79 inversions [46],[47]. Internal branches of the Y. pestis clade are very short relative to external branches, a phenomenon which could have numerous explanations including exponential population growth, population subdivision, an ancestral selective sweep, or recently accelerated mutation rates possibly associated with pathogen population dynamics or relaxed selection in culture. Of note, SNP phylogenies also exhibit short internal branches [39]. 10.1371/journal.pgen.1000128.g003 Figure 3 Consensus phylogenetic network of Yersinia based on inversions. Consensus phylogenetic network for eight of the Yersinia listed in Table 1. Branch lengths are proportional to the average number of per-branch inversion events. Splits with Bayesian posterior probability (Bpp)>0.2 are shown in black, splits with Bpp between 0.1 and 0.2 in gray. To visualize the network at Bpp 0.2, imagine removing gray edges and straightening the black edges. The inversion phylogeny supports a Y. pestis clade, and at Bpp 0.2 it supports subclades which agree with SNP phylogenies [39]. Of note, internal branches in the Y. pestis are short relative to Y. pseudotuberculosis, suggesting either rapid population growth, subdivision, or other effects. Network visualization created using SplitsTree 4 [45]. Visualizing Inversion History To quickly scan for patterns in the genome rearrangement history of Yersinia, we developed a 3D video system to visualize the series of rearrangement events. The posterior sampling of inversion history contains 30,000 samples. We selected the one history with maximum a posteriori probability and rendered the series of rearrangement events on each branch of the phylogeny using custom Java software. The chromosome is rendered as a torus with positions of the replication origin and terminus marked. The replichores present in an ancestral node of the tree are colored distinctively, left replichore in blue, right replichore in green. The intensity of the colors changes on a gradient from origin to terminus, such that segments near the origin in the ancestor are dark blue or green, while segments near the terminus are light. Supplementary Videos S1, S2, S3, S4, S5, S6, S7, and S8 show the inversion history along each external branch of the maximum a posteriori tree estimate. Several striking patterns of rearrangement can be seen in the videos, especially those representing longer branches such as the branch leading to Y. pestis 91001 (Video S3). First, the terminus remains positioned mostly opposite the origin throughout the rearrangement history. Second, light-colored segments which were near the terminus in the ancestral genome arrangement tend to remain near the terminus. Third, when large inversions happen within a single replichore, they appear to be quickly followed by a second inversion that reverts the first. We now describe statistics to quantify the significance of these observations, along with other aspects of genome arrangement evolution that are not as easily recognizable through visualization. Selection for Replichore Balance When the terminus of replication lies opposite the origin on the circular chromosome, replichore sizes are equal and the genome is said to be balanced. If we assume the origin is at positions 0 and 1 on the circular chromosome and the terminus dif site lies at some position b where 0 88% of sampled genome arrangements have replichores within 30% of perfect balance. (B): Histograms showing the degree of imbalance for arrangements sampled on branches leading to modern genomes. Each histogram is labeled with the corresponding strain's name. Genomes with perfectly balanced replichores have 0% imbalance while a genome with the origin and terminus at the same locus would have 100% imbalance. Many, but not all, parsimonious inversion histories have imbalanced genome arrangements at common ancestors of Y. pseudotuberculosis and Y. pestis Pestoides F that contribute toward the observed imbalance in the posterior distribution for those taxa. Not all modern genomes are balanced genomes. Y. pestis Pestoides F is conspicuously imbalanced, with a terminus position of 0.77 (54% imbalance). As such, we might ask whether the imbalance observed in ancestral genome arrangements is confined to the Y. pestis Pestoides F lineage. Figure 4B shows the imbalance observed on each external branch of the phylogeny, with internal branches pooled. Clearly all lineages undergo imbalance, although the Pestoides F isolate has a greater fraction of imbalanced genomes in its history. Surprisingly, the Y. pseudotuberculosis exhibit a high degree of imbalance as well. As they are sister taxa to Pestoides F, the imbalance could be attributed to imbalance at the common ancestor. In fact, the common ancestor is frequently predicted to have an imbalanced genome, and reconstructions with a balanced common ancestor require intermediate states of imbalance on branches leading to the modern Y. psuedotuberculosis genomes. Alternative explanations for the unusual terminus position in Y. pestis Pestoides F could be entertained, one such explanation being assembly error. As the assembly has been validated using a 40 kb Fosmid library, we do not believe this to be the case (P. Chain, personal comm.). Another alternative is that the primary replication terminus has shifted to a different location in the Y. pestis Pestoides F lineage. Visual inspection of the rearrangement pattern for Y. pestis Pestoides F in Figure 1 reveals several instances of local overlapping inversions characteristic of symmetric inversion about the terminus (seen as a “fan” pattern of crossing lines). If Pestoides F has indeed switched to a new primary terminus site it would introduce some error in our calculation of the historic replichore balance distribution. However, since only about 10% of inversions occur on the branch leading to Y. pestis Pestoides F, the error would be negligible. The error would serve to overdisperse the estimated balance distribution and result in weaker apparent bias towards replichore balance. Substantial ambiguity exists in the phylogenetic tree topology reconstructed from the Yersinia genome arrangements. BADGER found seven parsimonious topologies, and in total 48 unique topologies were sampled with inversion counts ranging from 79 to 87. Parsimony has enjoyed a long history as a guiding philosophy in evolutionary inference, so it is of interest to know whether parsimonious reconstructions agree with our expectation of replichore balance in genome arrangements. The mean estimate of imbalance turns out to be slightly smaller for parsimonious histories and the variance is much lower, as shown in Table 2. The difference in balance between parsimonious and other reconstructions is significant (KS test, p 95% sequence identity (see Figure 1). Similarly, the predicted origin lies in the middle of a 53 Kbp segment conserved among all Yersinia at >95% sequence identity. Comparison of our origin and terminus predictions to those made by an automated prediction system [76] reveals that our predictions agree with those made by the automated system within 1 kbp in nearly all cases. Discrepancy occurs in the terminus prediction for Y. pestis 91001. The discrepancy seemingly results from numerous recent rearrangements having disrupted the signal of strand-specific oligomer skew and in turn confusing the automated system. Estimating Significance in Kolmogorov-Smirnov Tests We report analysis on 30,000 samples from the posterior distribution of inversion histories. We assume that Yersinia has one true evolutionary history, and that at most one of the inferred histories represents the true history. As such, when comparing the distributions of quantities of interest, we do so on a per-sample basis using the Kolmogorov-Smirnov test. We take the median p-value over the 30,000 tests to be an estimator of the p-value which would be obtained had the test been applied to the one true history. We report mean D values as average estimates of the difference between target distributions. Permutation Testing for Episodes of Imbalance We use random permutation to generate a null distribution of the number and duration of episodes of imbalance. A tree sample with inversions mapped onto its branches has one genome arrangement for each leaf (8 in total), one arrangement for each internal node (6 in total), and some number of intermediate arrangements along each branch of the tree. For each sample in the posterior distribution of trees and inversion histories, we assign imbalance values the intermediate genome arrangements in the sample. For each branch of a given tree sample, we generate a permuted distribution by randomly shuffling the imbalance values of intermediate genome arrangements on that branch. We then count the number of transitions to and from imbalance along the original branch and along the branch with permuted values. Thus, the randomly permuted data have the same total number of balanced and imbalanced states with the same balance values, but any clusters of imbalanced states will be uniformly random. Our permutation approach disregards the actual inversion events, but generates random permutations with the same overall balance values. It is not possible to construct a random permutation of imbalance values by shuffling the inversion events themselves, since overlapping inversion events have strong ordering constraints and violation of these constraints would often change the imbalance values. Moreover, a strategy which samples inversion events uniformly at random would not yield a set of balance values consistent with the set we desire to permute. Expected Length of Within- and Inter-Replichore Inversions Assume the endpoints of an inversion are in positions x and y, with x, y∈[0,1]. The inversion length can be expressed as the function min{|x−y|,1−|x−y|}, since the inversion occurs on a circular chromosme of length 1 and for any inversion longer than 0.5, a complementary inversion with shorter length exists. If we assume that the inversion endpoints are uniformly distributed, then the expected length is the integral average of the function min{|x−y|,1−|x−y|} over the appropriate area A: (9) where |A| denotes the size of the area. In the case of within-replichore inversions, area A is the union of the two squares as delineated by the dashed line of Fig. 12, in case of inter-replichore inversions, A is the union of the two rectangles. For simplicity we suppress the full details of integration, and the resulting equations for within- and inter-replichore inversions are given in Equations 4 and 5, respectively. 10.1371/journal.pgen.1000128.g012 Figure 12 Calculating expected inversion length. The expected length of within- and inter-replichore inversions can be calculated as integral averages of the function min{|x−y|,1−|x−y|} over the appropriate areas. Here, 0
                Bookmark

                Author and article information

                Journal
                F1000Res
                F1000Res
                F1000Research
                F1000Research
                F1000Research (London, UK )
                2046-1402
                23 December 2016
                2016
                : 5
                : 2789
                Affiliations
                [1 ]Institute for Molecular Bioscience, The University of Queensland, Brisbane, Australia
                [1 ]Department of Biological Sciences, Wayne State University, Detroit, MI, USA
                [1 ]Department of Biological Sciences, Wayne State University, Detroit, MI, USA
                Institute for Molecular Bioscience, The University of Queensland, Australia
                [1 ]Department Evolutionary Genetics, Max Planck Institute for Evolutionary Biology, Plön, Germany
                Institute for Molecular Bioscience, The University of Queensland, Australia
                Author notes

                GB, MAR and CXC conceived the study and designed the experiments. GB carried out the experiments. GB and CXC prepared the first draft of the manuscript. All authors were involved in the revision of the draft manuscript and have agreed to the final content.

                Competing interests: No competing interests were disclosed.

                Competing interests: No competing interests were disclosed.

                Competing interests: No competing interests were disclosed.

                Competing interests: No competing interests were disclosed.

                Competing interests: No competing interests were disclosed.

                Competing interests: No competing interests were disclosed.

                Author information
                http://orcid.org/0000-0002-3729-8176
                Article
                10.12688/f1000research.10225.2
                5224691
                28105314
                13c25fb5-8a88-4265-af6f-16fc77855c5e
                Copyright: © 2016 Bernard G et al.

                This is an open access article distributed under the terms of the Creative Commons Attribution Licence, which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.

                History
                : 20 December 2016
                Funding
                Funded by: Australian Research Council
                Award ID: DP150101875
                We thank funding support from the Australian Research Council (DP150101875) awarded to MAR and CXC, and a James S. McDonnell Foundation grant awarded to MAR.
                The funders had no role in study design, data collection and analysis, decision to publish, or preparation of the manuscript.
                Categories
                Research Note
                Articles
                Developmental Evolution
                Evolutionary/Comparative Genetics

                phylogenies,phylogenetic trees,phylogenetic networks,k-mers

                Comments

                Comment on this article