4
views
0
recommends
+1 Recommend
0 collections
    0
    shares
      • Record: found
      • Abstract: found
      • Article: found
      Is Open Access

      Complete mitochondrial genome of the summer heath fritillary butterfly, Mellicta ambigua (Lepidoptera: Nymphalidae)

      research-article
      a , b , a , a , c , a
      Mitochondrial DNA. Part B, Resources
      Taylor & Francis
      Mellicta ambigua, mitochondrial genome, nymphalinae, phylogeny

      Read this article at

      Bookmark
          There is no author summary for this article yet. Authors can add summaries to their articles on ScienceOpen to make them more accessible to a non-specialist audience.

          Abstract

          We sequenced the mitochondrial genome (mitogeome) of the summer heath fritillary bullterfly, Mellicta ambigua Ménétriès, 1859 (Lepidoptera: Nymphalidae), which is listed as an endangered insect in South Korea. The 15,205-bp long complete genome contained 13 protein-coding genes (PCGs), 2 rRNA genes, 22 tRNA genes, and 1 A + T-rich region with an arrangement identical to that observed in most insect mitogenomes. Unlike the other PCGs, COI had the atypical CGA start codon frequently found in lepidopteran COI. The A/T content of the whole mitogenome was 80.57%; however, it varied among the regions/genes as follows: A + T-rich region, 93.39%; srRNA, 85.37%; lrRNA, 84.92%; tRNAs, 81.13%; and PCGs, 79.22%. Phylogenetic analyses using concatenated sequences of the 13 PCGs and 2 rRNAs placed M. ambigua as a sister group to the within-tribe species, Melitaea cinxia, with the highest nodal support both in the maximum-likelihood (ML) and Bayesian inference (BI) methods.

          Related collections

          Most cited references27

          • Record: found
          • Abstract: found
          • Article: not found

          Partitionfinder: combined selection of partitioning schemes and substitution models for phylogenetic analyses.

          In phylogenetic analyses of molecular sequence data, partitioning involves estimating independent models of molecular evolution for different sets of sites in a sequence alignment. Choosing an appropriate partitioning scheme is an important step in most analyses because it can affect the accuracy of phylogenetic reconstruction. Despite this, partitioning schemes are often chosen without explicit statistical justification. Here, we describe two new objective methods for the combined selection of best-fit partitioning schemes and nucleotide substitution models. These methods allow millions of partitioning schemes to be compared in realistic time frames and so permit the objective selection of partitioning schemes even for large multilocus DNA data sets. We demonstrate that these methods significantly outperform previous approaches, including both the ad hoc selection of partitioning schemes (e.g., partitioning by gene or codon position) and a recently proposed hierarchical clustering method. We have implemented these methods in an open-source program, PartitionFinder. This program allows users to select partitioning schemes and substitution models using a range of information-theoretic metrics (e.g., the Bayesian information criterion, akaike information criterion [AIC], and corrected AIC). We hope that PartitionFinder will encourage the objective selection of partitioning schemes and thus lead to improvements in phylogenetic analyses. PartitionFinder is written in Python and runs under Mac OSX 10.4 and above. The program, source code, and a detailed manual are freely available from www.robertlanfear.com/partitionfinder.
            Bookmark
            • Record: found
            • Abstract: found
            • Article: found
            Is Open Access

            Selecting optimal partitioning schemes for phylogenomic datasets

            Background Partitioning involves estimating independent models of molecular evolution for different subsets of sites in a sequence alignment, and has been shown to improve phylogenetic inference. Current methods for estimating best-fit partitioning schemes, however, are only computationally feasible with datasets of fewer than 100 loci. This is a problem because datasets with thousands of loci are increasingly common in phylogenetics. Methods We develop two novel methods for estimating best-fit partitioning schemes on large phylogenomic datasets: strict and relaxed hierarchical clustering. These methods use information from the underlying data to cluster together similar subsets of sites in an alignment, and build on clustering approaches that have been proposed elsewhere. Results We compare the performance of our methods to each other, and to existing methods for selecting partitioning schemes. We demonstrate that while strict hierarchical clustering has the best computational efficiency on very large datasets, relaxed hierarchical clustering provides scalable efficiency and returns dramatically better partitioning schemes as assessed by common criteria such as AICc and BIC scores. Conclusions These two methods provide the best current approaches to inferring partitioning schemes for very large datasets. We provide free open-source implementations of the methods in the PartitionFinder software. We hope that the use of these methods will help to improve the inferences made from large phylogenomic datasets.
              Bookmark
              • Record: found
              • Abstract: found
              • Article: not found

              The Glanville fritillary genome retains an ancient karyotype and reveals selective chromosomal fusions in Lepidoptera

              Butterflies and moths (Lepidoptera) have a large number of short holocentric chromosomes1 2 3 with substantial variation in chromosome number4 5 6 (n=5–223). However, the most common chromosome numbers are n=29–31 (refs 7, 8), and the distribution is markedly skewed with only a few species having n>31. The ancestral chromosome number has been inferred to be 31 (refs 9, 10), but until recently this has been difficult to confirm due to lack of comprehensive phylogenies. In spite of much variation in the number of chromosomes, the amount of DNA is approximately the same in different species, suggesting that species with fewer chromosomes have, on average, longer chromosomes7. Lepidopteran karyotypes are thought to have evolved via fusion and fission events7. Due to the holocentric chromosome structure with dispersed kinetochore activity, such events have been expected to be less deleterious than in monocentric chromosomes11 12. Conversely, holocentricity may restrict gene flow13 through meiotic14 and recombination suppression mechanisms15. Detailed sequence-level studies of structural variation in lepidopteran chromosomes became feasible with the publication of the whole-genome sequence of Bombyx mori (n=28)16. Paradoxically, in spite of their holocentric chromosome structure, genomic comparison of Heliconius melpomene (n=21) with B. mori suggested a highly conserved chromosomal gene content and an ancestral chromosome number of n=31 (ref. 10), supporting previous low-resolution comparisons in other lepidopteran species17 18 19 20 21 22 23 24. Linkage maps suggest that the karyotypes of B. mori and H. melpomene have evolved via several fusion events. Potential fusion chromosomes have been identified in both species10 21 22 25, but confirmation has not been possible without a whole-genome sequence and a high-density linkage map for a species with the putative ancestral karyotype. Here we describe the genome of the Glanville fritillary butterfly (Melitaea cinxia), the first butterfly species with n=31 and for which both the genome and a high-density linkage map26 are now available. Our analysis of chromosome fusions in B. mori and H. melpomene suggests unexpected features shaping karyotype evolution in Lepidoptera. Results Sequencing of the M. cinxia genome was based on samples from a polymorphic natural population (Supplementary Note 1), as the yield of DNA from a single individual was insufficient and no inbred lines were available (Supplementary Note 2). The initial 393 Mb assembly of the genome was performed using 454 and Illumina paired-end (PE) reads from a single male and Illumina PE reads from a pool of 10 full-sibs (Supplementary Notes 2 and 3). The contigs were then scaffolded27 with PE and mate-pair (MP) library data from several full-sib families and an unrelated individual, which were sequenced with SOLiD, Illumina and 454 platforms (Supplementary Figs 1–3; Supplementary Table 1). The final assembly of the nuclear genome comprises 49,851 contigs (N50=13 kb) and 8,262 scaffolds (N50=119 kb), with an overall coverage of 95 × (Supplementary Figs 4–6; Supplementary Tables 1 and 3). A linkage map based on 40,718 single nucleotide polymorphisms (SNPs)26 assigned 3,507 scaffolds (318 Mb) to 31 linkage groups, matching the 31 chromosomes reported for this species in a cytogenetic study28 (Supplementary Table 6; Supplementary Note 5). For subsequent superscaffolding, we applied an in-house method utilizing the linkage map, long MP data and PacBio reads. The resulting 1,453 superscaffolds (N50=331 kb) cover 72% of the genome (Supplementary Fig. 7; Supplementary Tables 4 and 5). The remaining 4,846 scaffolds covering 111 Mb lack consistent map information. The mitochondrial genome (15,171 bp) was assembled and annotated separately (Supplementary Figs 8 and 13; Supplementary Tables 23 and 24; Supplementary Notes 3 and 8). The quality of the assembly was assessed using several approaches and data sets, including PE, MP and PacBio read data and independently assembled transcriptome data, which were mapped to genomic contigs and scaffolds (Supplementary Note 6). Estimates of genome completeness and correctness indicate that the assembly is of high quality (Supplementary Figs 9 and 10; Supplementary Tables 7–14). Most importantly, the scaffolds show high consistency with the linkage map (91.4% non-chimeric scaffolds). The quality of the superscaffolds was assessed by comparison with an independent high-density linkage map, which showed that only 2.4% of superscaffolds have short chimeric stretches, mostly at their ends. We predicted 16,667 gene models, the vast majority of which (96%) were supported by transcriptome data (Supplementary Tables 2, 17 and 18; Supplementary Note 8). Clustering the protein sequences into orthologous groups shows that, consistent with previous reports29, the sequenced lepidopteran genomes have very similar gene content despite 140 My30 of independent evolution (Fig. 1, Supplementary Figs 17–19). Functional annotations were performed separately for the predicted gene models and the assembled transcripts, yielding protein descriptions and gene ontology (GO) classifications for 12,410, KEGG pathways for 3,685 and InterProScan hits for 8,529 gene models (Supplementary Figs 14–16). We identified noncoding RNA genes including miRNA precursors, ribosomal RNAs, transfer RNAs and spliceosomal small nuclear RNAs (Supplementary Figs 11 and 12, Supplementary Tables 19–22). Moreover, we carried out manual curation of gene models and descriptions for 558 genes (Supplementary Fig. 20; Supplementary Table 25; Supplementary Data 1), including the Hox gene cluster. We identified all canonical Hox genes and four copies of the special homeobox (Shx) genes, two ShxA, and one ShxB and C (Supplementary Figs 21 and 22; Supplementary Note 8). All the Hox genes follow the gene order and location described for other Lepidoptera, but the duplication of ShxA and lack of ShxD are distinct from other nymphalid butterflies10 31. Genomic variation was characterized with three independent data sets (Supplementary Figs 23 and 24; Supplementary Tables 26 and 27; Supplementary Note 9). In a group of 10 full-sibs and an independent individual sequenced with Illumina, more than five million SNPs were identified corresponding to an average density of 13.2 SNPs per kb. The SNP density was 8.2 SNPs/kb in the coding exons, which is roughly half of the density in introns (15.3 SNPs per kb). Approximately half a million indels with an average density of 1.7 per kb were identified. Longer indel variants (>50 bp) were detected using the PacBio data comprising 2,165 deletions and 313 insertions. We have described elsewhere genetic variation in four regional metapopulations of M. cinxia using extensive RNA-seq data32. While the GC content varies among Lepidoptera (Supplementary Table 18), the average GC content of the M. cinxia genome (33%) is distributed remarkably uniformly across all the chromosomes, similarly to that found in B. mori (Supplementary Figs 26, 27, 30 and 31; Supplementary Note 10). The median gene density is 3 per 100 kb in both species (Supplementary Fig. 28). Uniform GC and gene content distributions across the chromosomes are characteristics of species with holocentric chromosomes33 34 35 36, contrasting with species that have monocentric chromosomes with localized centromeres, in which the genome is compartmentalized to GC-rich and GC-poor regions with higher and lower gene densities37. Repetitive elements comprise 28% of the assembled M. cinxia genome (Supplementary Tables 15 and 16; Supplementary Note 7). The proportion of repetitive elements fluctuates across the chromosomes from 7 to 42% within 100 kb sliding windows (Supplementary Figs 29–31), but it does not show a clear pattern. The distribution of repeats is strikingly different from that in human and mouse, which have a high repeat content in the pericentromeric and subtelomeric regions38, but it also differs from holocentric nematodes, in which repeats are enriched in distal chromosome regions34 35. With this study, a whole-genome sequence and a high-resolution linkage map are available for three lepidopteran species, M. cinxia, B. mori 16 17 and H. melpomene 10 39. In interspecific chromosomal comparisons, 4,485 one-to-one orthologous genes with map information were identified between M. cinxia and B. mori, and 3,869 between M. cinxia and H. melpomene. The majority (96%) of these orthologues mapped to orthologous chromosomes among the three species (Fig. 2; Supplementary Tables 28 and 29; Supplementary Note 11). The remaining 4%, representing putative translocated genes, were relatively evenly distributed and comprise 140 My)30 Lepidoptera (Supplementary Tables 30–32; Supplementary Note 11). The phylogenetic range covers almost all Ditrysia and thus represents at least 95% of existing species (Fig. 3a). The distribution of karyotypes on a phylogeny of 312 species in the family Nymphalidae (Fig. 3b; Supplementary Fig. 32; Supplementary Note 11; Supplementary Data 2) further indicates that n=31 is unambiguously the ancestral karyotype in this family, although there are some subfamilies (for example, Danainae and Satyrinae) that show much variation in chromosome number even among closely related lineages. Our data argue against the suggestion7 that repeated fusion and fission events followed by selection would have maintained the n=31 karyotype in Lepidoptera (see also Saura et al. 6). Rather, the results indicate that high macrosynteny is a manifestation of the exceptional stability of the ancestral karyotype18 22 23. The M. cinxia genome allows us to identify potential fusion and fission events that have shaped the B. mori (n=28) and H. melpomene (n=21) genomes from the ancestral karyotype. Our data confirm 3 fusion events in B. mori and 10 fusions in H. melpomene 10 22 (Figs 2 and 5; Supplementary Figs 36 and 37; Supplementary Note 12). A prominent feature of the fusions in both species is the participation of the shortest orthologous M. cinxia chromosomes (chrs 29–31 and 22–31 in B. mori and H. melpomene fusions, respectively; Fig. 2). The bias towards the shortest chromosomes is highly significant, P=0.001. Reconstruction of the fusion chromosomes revealed that four of the H. melpomene fusions are lineage specific (Fig. 2; Fig. 5a). Surprisingly, the six ancestral chromosomes participating in the fusion events in B. mori are also involved in H. melpomene fusions, albeit with non-orthologous fusion partners (Fig. 5b), suggesting a preference for a subset of chromosomes to participate in fusion events in evolutionarily distant lineages. The probability of the same six chromosomes being involved in independent fusion events in the two species by chance is low, P=0.05. These results suggest that selection favours a subset of possible fusion events, possibly at the level of chromosome segregation or through the hypothetical sequence elements associated with the shortest chromosomes. A preference for short chromosomes in fusions may be related to a negative relationship between the rate of intrachromosomal rearrangement and chromosome length in M. cinxia (Fig. 6; r=−0.48, P=0.007), in which chromosome length is furthermore inversely related to the percentage of repetitive sequence (r=−0.73, P 1 kb) MP libraries and PacBio sequencing, DNA was extracted from pools of thorax or abdomen tissues using the CsCl purification method46, which was modified to increase the yield, integrity and purity of DNA (Supplementary Fig. 2). The Illumina PE libraries were prepared according to Tuupanen et al. 47 but using PE adapters and larger size selection, and sequenced with an Illumina Genome Analyzer IIx (500 bp library) or a HiSeq 2000 (800 bp library) following standard PE-sequencing protocols. SOLiD and Illumina MP libraries were produced as described by the manufacturer (SOLiD MP library kit, Life technologies, CA, USA) with in-house modifications, and sequenced using SOLiD 5500XL and HiScan SQ. The 454 MP libraries were constructed by Roche 454 Life Sciences Sequencing Services (Branford, CT, USA) and sequenced with 454 FLX. Libraries for PacBio sequencing were constructed following the manufacturer’s protocols, and run on PacBioRS. Transcriptome data from RNA-seq experiments were used in gene prediction, functional annotation and variation and linkage disequilibrium (LD) analyses (Supplementary Table 2; Supplementary Note 2). For gene prediction and functional annotation, we used pooled abdomen and mixed tissue samples consisting of head, thorax and larval tissues. For variation analyses, only thorax samples were used. RNA was extracted using the Trizol method (Life Technologies) followed by acid phenol–chloroform–isoamyl alcohol and chloroform extractions. RNA-sequencing libraries for the pooled samples were constructed using the Illumina TruSeq RNA Sample Preparation kit (A) and sequenced with Illumina HiSeq 2000 according to the manufacturer’s instructions. For the variation analyses, two RNA-seq libraries were prepared for each individual using an in-house polyA-anchoring-based RNA-seq library protocol (Supplementary Note 2). These libraries were sequenced according to the manufacturer’s instructions with Illumina HiSeq 2000 and HighScan SQ sequencers using the PE mode. Genome assembly Before the assembly, raw reads were filtered and trimmed as described in Supplementary Note 3. To correct sequencing errors and to eliminate additional variation from heterogenous DNA samples, we used two in-house error-correction methods, Coral48 for 454 and Illumina reads, and HybridSHREC49 for SOLiD colour-space reads. Error-corrected 454 and Illumina PE reads were assembled using Newbler (Roche) and SOAPdenovo50 (Supplementary Note 3). Contigs with a minimal length of 500 bp were used in scaffolding. For scaffolding, we used in-house MIP Scaffolder software27, and required at least two read pairs for connecting a pair of contigs. Scaffolding was performed in seven stages in which the PE and MP libraries were added in ascending order of insert size. The most substantial increase in the N50 was observed with the 16 kb 454 MP library (Supplementary Fig. 4). After scaffolding, we used Illumina PE libraries to close the gaps between the contigs using SOAPdenovo GapCloser50. Only scaffolds longer than 1,500 bp were included into the final scaffold set. Ribosomal DNA and mtDNA were assembled separately using contigs, which were excluded from genome scaffolding due to their high abundance (Supplementary Note 3). In addition, assembly of 454 reads from a single female was used in the assembly of mtDNA. To increase the continuity of the genome assembly, we constructed superscaffolds using an in-house-developed method that utilizes the linkage map, MP and PacBio data (Supplementary Note 3). First, MP reads and PacBio reads were aligned against existing scaffolds using BWA51 and an in-house SANS aligner52, respectively. After subsequent filtering, the linkage map was used as a guide to determine the most reliable path between the scaffolds to yield individual superscaffolds. Linkage map The linkage map for M. cinxia has been described by Rastas et al. 26 and in Supplementary Note 5. For the purpose of validating the superscaffolds, another linkage map was constructed by a similar procedure as in Rastas et al. 26 using 12,109 SNPs from an independent full-sib family with 19 offspring. Genome validation Correctness and consistency of the genomic assembly were assessed using eight approaches (Supplementary Table 7; Supplementary Note 6). (1) Correctness of the contigs was assessed by mapping PE and MP reads to the genome, and calculating the concordant mappings. (2) Correctness of the scaffolds was evaluated by re-scaffolding the contigs using PacBio reads and calculating the contig joins concordant with the scaffolds. (3) Consistency of the scaffolds was estimated by counting non-chimeric scaffolds based on the linkage map. (4) Completeness of the genome was evaluated by aligning assembled transcripts to scaffolds and calculating the proportion of aligned transcripts. (5) Completeness was further assessed by estimating the proportion of conserved core genes found in the genome and (6) the level of sequence synteny among other lepidopteran species. (7) Correctness and consistency of the superscaffolds were assessed by comparing gene order against B. mori within superscaffolds and (8) by estimating the proportion of non-chimeric scaffolds using an independent linkage map. Prediction of repetitive elements M. cinxia-specific TEs were predicted de novo as described in Supplementary Note 7. Long terminal repeat retrotransposons were searched using LTR_Finder53. The predicted TEs of M. cinxia were combined with the RebBase (v. 20120418) library of consensus TEs from different species54. The consensus sequences of TE families were collected as a M. cinxia repeat library and annotated using Repbase18.05, RepeatPeps54 and Dfam 1.1 (ref. 55) databases. RepeatMasker-open-4-0-2 (Smit, A.F.A., Hubley, R. and Green, P. RepeatMasker at http://repeatmasker.org) was used to estimate the distribution of TEs and other interspersed repeat elements in the genome. Gene model prediction and functional annotation Gene models were predicted for the repeat-masked genome using an evidence-based approach in MAKER56, which combines ab initio modelling with RNA-seq and protein sequence evidence (Supplementary Note 8). Ab initio gene prediction was performed with SNAP57. Protein data consisted of all Arthropoda proteins from UniProtKB (UniProt release 2012_02) and whole proteomes of four species from Ensembl and two unpublished proteomes. As RNA-seq data, we used de novo-assembled transcripts58 and TopHat/Cufflinks59 mappings (Supplementary Note 4). Noncoding and mtDNA genes were predicted as described in Supplementary Note 8. Functional descriptions, gene ontologies and enzyme commission numbers were predicted for the protein sequences translated from the gene models and from the assembled transcripts using an in-house PANNZER annotation pipeline60 (Supplementary Fig. 14; Supplementary Note 8). Protein domains and other functional elements were detected and annotated using InterProScan61. Metabolic pathways and KEGG orthologues were predicted using the KAAS server62. Gene orthologies were predicted for the 5 lepidopteran species for which genome sequence information is available, 15 other arthropoda and 2 outgroups using an in-house EPT method63 (Supplementary Fig. 17). Variation analyses SNPs and indels were detected from four data sets as described in Supplementary Note 9. The variation statistics described in the main paper are based on Illumina PE reads from a genomic pool of 10 full-sibs, which were also used in the genome assembly. The reads were mapped to the genome using BWA51, and variants were detected using a GATK pipeline64. Long indels were detected using a PacBio genomic pool from 100 individuals (Supplementary Table 1). PacBio reads were mapped onto genomic scaffolds with BWA-SW51 and indels exceeding 50 bp were detected. Linkage disequilibrium (r 2) was estimated from the Illumina RNA-seq data for the population in the Åland Islands (Finland) using an in-house script (Supplementary Fig. 25; Supplementary Note 9). Phylogenetic analyses The phylogenetic analyses were based on 312 species of the family Nymphalidae for which chromosome number and DNA sequences of 3–11 genes were available (Supplementary Note 11). DNA sequences were manually aligned, and a phylogenetic hypothesis was inferred in the maximum likelihood framework using RAxML65. The haploid chromosome numbers were mapped onto the tree using Mesquite ( http://mesquiteproject.org). Genome scans and synteny analyses GC, gene and repeat contents were calculated within 100 kb sliding windows and 10 kb shifts for the superscaffolds of M. cinxia and the genome sequence of B. mori 16 (Supplementary Note 10). Since the superscaffolds were not ordered within bins in the current linkage map, the order and orientation of the superscaffolds within each bin were determined based on synteny to B. mori. Chromosome mapping was carried out using orthologous genes between M. cinxia and B. mori and between M. cinxia and H. melpomene to define the level of gene conservation and translocations among chromosomes (Supplementary Notes 11 and 12). Furthermore, the mapped genes were used for the identification of fusion chromosomes in B. mori and H. melpomene. The same data set was used for calculating the number of breakpoints in the chromosomes, which were scaled by the number of one-to-one orthologues in the chromosomes, and used to measure the intrachromosomal rearrangement rate. Pairwise correlations between rearrangement rate, repeat content and chromosome lengths were calculated using the Pearson correlation coefficient. Chromosome mappings (Fig. 2) are illustrated using Circos66. Possible bias in the ancestral chromosomes that are involved in fusion events in B. mori and H. melpomene was measured as follows. First, the probability for the same six ancestral chromosomes to be involved in independent fusions in both species was calculated as We assumed that B. mori has 6 and H. melpomene 20 fusion chromosomes, and each ancestral chromosome fused only once. Second, we measured the bias towards small ancestral chromosomes being involved in these fusions. The ancestral (M. cinxia) chromosomes were ranked according to chromosome number, which reflects the length (M. cinxia chromosomes are numbered from the largest to the smallest). The median rank is 28 for B. mori and 18.5 for H. melpomene. The probabilities of obtaining at least as large medians by chance are 0.00092 and 0.14, respectively. The former is the probability of obtaining either chromosomes 28–31, or 27 and 29–31 from randomly chosen 6 chromosomes (out of 31), thus The latter probability was computed by simulating random draws of 20 chromosomes (out of 31). These P values were combined by Fisher’s method67 to obtain the single P value of 0.0013. The approximate fusion sites were detected by aligning the fusion chromosomes of B. mori (11, 23 and 24) against the orthologous chromosomes of M. cinxia (12+31, 14+30 and 27+29) using Mauve68 (Supplementary Note 12). The content of TEs within the potential fusion regions was compared with the genome-wide content using RepeatMasker-open-4-0-2 with B. mori- and H. melpomene-specific repeat libraries10 16. Chromosome fusions in Fig. 5 and Supplementary Fig. 36 are visualized using in-house scripts. The annotated genome will be included in the EnsemblMetazoa http://metazoa.ensembl.org/Melitaea_cinxia/Info/Index. Further information, including superscaffolds, linkage map and annotations are available through our website at http://www.helsinki.fi/science/metapop/research/mcgenome.html. Author contributions R.L. coordinated the project. R.L., P.A., L.P. and M.J.F. designed the strategy for genome, transcriptome and RAD-tag sequencing and supervised the laboratory work. J.K. and M.J.F. prepared samples for genome sequencing. P.A. and L.P. developed in-house library construction methods. M. Turunen prepared Illumina PE libraries and A.V. full-length transcriptome libraries. P.A., L.P. and M. Taipale coordinated DNA and RNA sequencing. P.S. and L.S. processed sequence data. L.S., V.M., N.V., J.Y. and E.U. developed an in-house method for genome scaffolding. P.S. and L.S. assembled and scaffolded the genome. L.S. and E.A.H. assembled the mtDNA and L.S. the rDNA sequences. L.S. developed the superscaffolding method and implemented it on the genomic scaffolds. L.S., J.Y. and V.M. developed assembly validation software. L.S., P.R., P.S., N.V., P.K., J.Y. and V.M. contributed to assembly validation. P.R. developed the method for linkage map analyses and constructed linkage maps and LD analyses. V.A. and J.T. carried out de novo TE prediction. J.T. and A.H.S. annotated TE families, constructed the TE library and carried out repeat predictions. V.A. and D.H. predicted gene models. D.L. supervised gene prediction and functional annotation. P.K., M.J.F. and K.Q. predicted and annotated ncRNA genes. L.H. and P.K. developed methods for functional annotation and orthologue prediction, and performed orthology analyses. P.K. performed functional annotation. R.M.W. performed OrthoDB orthology prediction. V.A. coordinated manual annotation and performed genome scans. V.A., S.L., Z.C., A.D., O.-P.S., M.A.d.J., H.V., R.C.M., L.C.F., E.A.H., W.S.C., J.K., P.S., P.R., Q.Z., L.H., F.A., J.K.H., A.J., J.S., C.W.W. and E.G.W. participated in manual annotation. E.A.H. annotated and performed analyses for mtDNA genes and L.C.F. for Hox cluster genes. V.A., P.K., N.V., L.S. and P.R. performed synteny analyses. V.A., M.J.F. and L.S. performed analyses of fusion chromosomes. N.W. carried out phylogenetic analyses. P.S., R.K. and E.P. detected SNP and indel variants. V.A. and P.S. performed variation analyses. P.S., P.K., V.A., L.S., W.S.C. and L.H. conducted various sequence analyses. V.A. coordinated writing of the Supplementary information. V.A., L.S., P.S., P.K., P.R., M.J.F., L.H., P.A., R.L., E.A.H., L.C.F., N.W., S.P.O., J.K., A.H.S., J.T., L.P., M. Taipale and K.Q. participated in writing the Supplementary information. V.A., M.J.F., I.H., P.A., R.L., L.H. and M.R.G. wrote the manuscript. All authors read and commented on the manuscript. Additional information Accession codes: The genome sequence of the Glanville fritillary butterfly, Melitaea cinxia, has been deposited in DDBJ/EMBL/GenBank nucleotide core database under the accession code APLT00000000. How to cite this article: Ahola, V. et al. The Glanville fritillary genome retains an ancient karyotype and reveals selective chromosomal fusions in Lepidoptera. Nat. Commun. 5:4737 doi: 10.1038/ncomms5737 (2014). Supplementary Material Supplementary Information Supplementary Figures 1-37, Supplementary Tables 1-34, Supplementary Notes 1-12 and Supplementary References Supplementary Data 1 List of manually annotated genes. Supplementary Data 2 Thirty-five species for which haploid chromosome numbers and DNA sequence data were available. GenBank accession numbers are given for sequences used to infer the phylogenetic hypothesis. Outgroups were only used to root the tree and were not considered further for the analysis of chromosome number evolution.
                Bookmark

                Author and article information

                Journal
                Mitochondrial DNA B Resour
                Mitochondrial DNA B Resour
                Mitochondrial DNA. Part B, Resources
                Taylor & Francis
                2380-2359
                7 May 2021
                2021
                : 6
                : 5
                : 1603-1605
                Affiliations
                [a ]Department of Applied Biology, College of Agriculture & Life Sciences, Chonnam National University , Gwangju, Republic of Korea
                [b ]Experiment and Analysis Division, Honam Regional Office, Animal and Plant Quarantine Agency , Gunsan, Republic of Korea
                [c ]Research Institute for East Asian Environment and Biology , Seoul, Republic of Korea
                Author notes
                CONTACT Iksoo Kim ikkim81@ 123456chonnam.ac.kr Department of Applied Biology, College of Agriculture & Life Sciences, Chonnam National University , Gwangju61186, Republic of Korea
                Article
                1917318
                10.1080/23802359.2021.1917318
                8118395
                34027067
                688470f3-e7d9-4676-9fff-3e21f8d2a55a
                © 2021 The Author(s). Published by Informa UK Limited, trading as Taylor & Francis Group.

                This is an Open Access article distributed under the terms of the Creative Commons Attribution License ( http://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.

                History
                Page count
                Figures: 1, Tables: 0, Pages: 3, Words: 2124
                Categories
                Research Article
                Mitogenome Announcement

                mellicta ambigua,mitochondrial genome,nymphalinae,phylogeny

                Comments

                Comment on this article