Analysis of Quality Changes in Mulberry Fruit at Three Developmental Stages Using Targeted Metabolomics Based on LC‒MS

There is no author summary for this article yet. Authors can add summaries to their articles on ScienceOpen to make them more accessible to a non-specialist audience.

Related collections

Most cited references 70

Record: found
Abstract: found
Article: not found

A novel integrated method for large-scale detection, identification, and quantification of widely targeted metabolites: application in the study of rice metabolomics.

Wei-Wei Chen, Mei-Liang Gong, Zilong Guo … (2013)

Liquid chromatography-mass spectrometry (LC-MS)-based metabolomics has been facilitated by the construction of MS(2) spectral tag (MS2T) library from the total scan ESI MS/MS data, and the development of widely targeted metabolomics method using MS/MS data gathered from authentic standards. In this report, a novel strategy called stepwise multiple ion monitoring-enhanced product ions (stepwise MIM-EPI) was developed to construct the MS2T library, in which stepwise MIM was used as survey scans to trigger the acquisition of EPI. A total number of 698 (almost) non-redundant metabolites with MS(2) spectra were obtained, of which 135 metabolites were identified/annotated. Integrating the data gathered from our MS2T library and other available multiple reaction monitoring (MRM) information, a widely targeted metabolomics method was developed to quantify 277 metabolites, including some phytohormones. Evaluation of the dehydration responses and natural variations of these metabolites in rice leaf not only suggested the coordinated regulation of abscisic acid (ABA) with metabolites such as serotonin derivative(s), polyamine conjugates under drought stress, but also revealed some C-glycosylated flavones as the potential markers for the discrimination of indica and japonica rice subspecies. The new MS2T library construction and widely targeted metabolomics strategy could be used as a tool for rice functional genomics.

0 comments Cited 473 times – based on 0 reviews      Review now

Bookmark

Record: found
Abstract: found
Article: not found

Rewiring of the Fruit Metabolome in Tomato Breeding

Guangtao Zhu, Shouchuang Wang, Zejun Huang … (2018)

Humans heavily rely on dozens of domesticated plant species that have been further improved through intensive breeding. To evaluate how breeding changed the tomato fruit metabolome, we have generated and analyzed a dataset encompassing genomes, transcriptomes, and metabolomes from hundreds of tomato genotypes. The combined results illustrate how breeding globally altered fruit metabolite content. Selection for alleles of genes associated with larger fruits altered metabolite profiles as a consequence of linkage with nearby genes. Selection of five major loci reduced the accumulation of anti-nutritional steroidal glycoalkaloids in ripened fruits, rendering the fruit more edible. Breeding for pink tomatoes modified the content of over 100 metabolites. The introgression of resistance genes from wild relatives in cultivars also resulted in major and unexpected metabolic changes. The study reveals a multi-omics view of the metabolic breeding history of tomato, as well as provides insights into metabolome-assisted breeding and plant biology.

0 comments Cited 255 times – based on 0 reviews      Review now

Bookmark

Record: found
Abstract: found
Article: found

Is Open Access

Draft genome sequence of the mulberry tree Morus notabilis

Ningjia He, Chi Zhang, Xiwu Qi … (2013)

Mulberry is a deciduous tree and is an economically important food crop for the domesticated silkworm, Bombyx mori. The practice of producing valuable silk from silkworms nourished by mulberry leaves started at least 5,000 years ago1 and helped to shape world history through the Silk Road. The family Moraceae comprises 37 genera with ~1,100 species, including well-known plants such as mulberry, breadfruit, fig, banyan and upas2. Mulberry belongs to the genus Morus with 10–13 recognized species and over a 1,000 cultivated varieties3, which are widely planted in the Eurasian continent, Africa and the United States. Mulberry leaf production for silkworm uses ~626,000 and 280,000 hectares of land in China and India, respectively4. Mulberry also attracts farmers for its delicious fruit, bark for paper production and multiple usages in traditional oriental medicine5 6. B. mori, a lepidopteran model system and a specialist, feeds on mulberry leaves. The majority of known Lepidoptera species are herbivorous and are, therefore, economically important as major pests of agriculture and forestry. The adoption of silkworm rearing has led to intensive studies on feeding stimulants that are critical to the understanding of plant–insect interactions. The genome sequencing of silkworm was completed in 2008 (refs 7, 8). However, very little genomic information is available for species in the genus Morus. Although the genomic sequence of mulberry will facilitate the improvement of mulberry plants, the mulberry–silkworm genome pair will deepen our understanding of the fundamentals in plant–herbivore adaptation. Here we report the draft genome sequence of a mulberry species (M. notabilis). The estimated 357-Mb genome of M. notabilis, composed of 7 chromosome pairs, is sequenced using Illumina technology to a 236-fold depth coverage. On the basis of the 330-Mb assembly genome, we identify 128 Mb repetitive sequences and 29,338 protein-coding genes. Comparative genomic analyses reveal that mulberry evolved more rapidly than other sequenced Rosales. The identification and analyses of mulberry genes involved in resistance will accelerate the improvement of mulberry plants. The presence of predicted mulberry micro RNAs (miRNAs) in two tissues of the silkworm suggest probable interactions at molecular levels between the plant–herbivore pair. Results Genome sequencing and assembly We applied a whole-genome shotgun sequencing strategy to the mulberry species M. notabilis, which contains seven distinct pairs of chromosomes in their somatic cells (Fig. 1). A total of 78.34 billion high-quality bases (236-fold genome coverage) were assembled into a 330.79-Mb mulberry genome with a scaffold N50 length of 390,115 bp and contig N50 length of 34,476 bp (Table 1 and Supplementary Tables S1 and S2). There were 16,281 kb (4.9%) gaps and 314,510 kb (95.1%) non-gapped continuous sequences in the final assembly. We selected 10.46 Gb high-quality sequenced short reads from the library with an average insert size of 500 bp to calculate the distribution of K-mer depth, defined as 17 bp here. A total of 8,577,674,309 17-mer were obtained and the genome size of M. notabilis was determined to be 357.4 Mb (Supplementary Methods, Supplementary Fig. S1 and Supplementary Table S3). Over 80% of the assembly was represented by 681 scaffolds and the largest scaffold was 3,477,367 bp, with 93.96% of bases covered by more than 20 reads (Supplementary Fig. S2) and 97% of 10,000 random expressed sequence tags (ESTs) more than 90% covered by a scaffold (Supplementary Table S4). The 35.02% GC content of the mulberry genome is similar to that of other eudicots (Supplementary Methods and Supplementary Fig. S3). Repetitive sequences A combination of both de novo repeat prediction and homology-based search against the Repbase library (v15.02) resulted in 127.98 Mb repetitive sequences in the non-gapped mulberry genome (Supplementary Table S5). The transposable element (TE) content in the mulberry genome was probably underestimated because of the inherent limitations of de novo sequencing in dealing with repetitive sequences. After the exclusion of ‘N’s, according to the average coverage depth and the total reads mapped to the repetitive-sequence (~127.7 MB) and non-repetitive-sequence regions (~166.0 Mb) in the mulberry genome, we estimated that there are about 18.48 Mb repetitive sequences in the unassembled sequences. Hence, up to ~47% of the mulberry genome is composed of repetitive sequences. The proportion of repetitive sequences in the mulberry genome is comparable with that in apple (42%), whereas it is slightly higher than that in poplar (35%). More than 50% of mulberry repetitive sequences could be clearly classified into known categories, such as Gypsy-like (6.58%) and Copia-like (6.84%) long-terminal repeat retrotransposons. About 99.11% of TEs had a >10% divergence rate, indicating that most mulberry TEs are relatively ancient (Supplementary Fig. S4). Gene prediction and functional annotation We identified 27,085 high-confidence protein-coding loci with complete gene structures in the mulberry genome, using 21 Gb RNA-seq data from five tissues and 5,833 unique ESTs for gene model prediction and validation (Supplementary Method and Supplementary Table S6). Of the 27,085 predicted genes, 99.93% were supported by de novo gene prediction, 58.38% (15,811 genes) by RNA-seq/EST and 69.94% (18,943 genes) by homology-based approaches. More than half (52.19%) of the genes were supported by all three methods. Including 2,253 partial genes annotated by RNA-seq data and ESTs (Supplementary Table S7), we predicted 29,338 genes with an average mRNA length of 2,849 bp, an average coding gene length of 1,156 bp and a mean number of 4.6 exons per gene (Supplementary Table S8). Of these genes, 60.8% were supported by RNA-seq data and 76.92% (22,566/29,338) had homologous targets in functional databases, such as the NCBI non-redundant protein, Swissprot, InterPro, KEGG (Kyoto Encyclopedia of Genes and Genomes) and COG (Clusters of Orthologous Groups; Supplementary Table S9). On the basis of the RNA-seq data, we calculated the tissue specificity index τ, to screen for tissue-specific genes and housekeeping genes. We found that 241, 213, 285, 360 and 404 genes specifically expressed in the root, bark, winter bud, male flower and leaf, respectively. In comparison, 1,805 genes were expressed constitutively in the 5 tissue/organs, including 116 encoding ribosomal proteins and 26 encoding translation initiation factors (Supplementary Fig. S5). Genome evolution Comparison of the mulberry genome to a rich collection of Rosales genome sequences, including Cannabis sativa 9, Malus domestica 10 and Fragaria vesca 11, offers insights into levels and patterns of DNA-level diversity in this important clade. A phylogenetic tree based on single-copy mulberry genes and other 12 sequenced plants (Fig. 2) supports Moraceae as one of the closest relatives of Rosaceae12 13. The results suggest the speciation times of 63.5 million years ago (mya) for mulberry and C. sativa (Cannabaceae), 88.2 mya for mulberry and apple/strawberry (Rosaceae), and 101.6 mya for mulberry and Medicago truncatula (Fabales)14. Ks plots suggest that mulberry (Moraceae) and C. sativa diverged later than the divergence of apple and strawberry in the Rosaceae family (Fig. 3). Different gene groups of several plants were then used to construct three phylogenetic trees (Fig. 4). First, we used single-copy genes in the predicted mulberry gene data sets and their best-matched ones in other species to reconstruct phylogeny (Fig. 4a). Second, we used single-copy genes of genewise-predicted mulberry genes to reconstruct phylogeny (Fig. 4b). Third, we used best-matched genes in collinear positions across different genomes to reconstruct phylogeny (Fig. 4c). In all of the reconstructed phylogenetic trees, the branch of mulberry is longer than those of the other species, suggesting that mulberry evolved much (~3 times) faster than other Rosales. To investigate the syntenic and evolutionary relationship of the mulberry genome, without any available genetic map, in-silico gene staining or genome zipper approach was performed against the strawberry (F. vesca) genome sequences15. The gene density distribution of the conserved syntenic regions against strawberry was computed and visualized as a heatmap using a sliding window approach (Fig. 5 and Supplementary Data 1). Alignment of mulberry scaffolds to their best-matched grape chromosomal regions (Fig. 6a) often revealed two additional but less pronounced homologous regions, indicating that mulberry shares the eudicot-common triplication revealed initially by the analysis of the grape genome16. Similarly, a region of the strawberry or cannabis genomes often has one primary and two secondary homologous grape genome regions (Fig. 6b,c), contrary to an earlier report of no paleopolypoidization in strawberry11. The fact that mulberry, strawberry and cannabis have the pan-eudicot hexaploidization as their most recent polyploidy is further supported by the distribution of synonymous nucleotide substitution rates of homologous genes in collinear blocks within and between these genomes (Fig. 6d,e). Diversifying selection The divergent morphologies and phytochemistries for which various Rosales are cultivated may reflect diversifying selections on orthologous genes. By regression analysis between the ω, the non-synonymous (Ka) versus synonymous (Ks) nucleotide substitution rate ratio (Ka/Ks) and the Ks values, we estimated that 307, 338, 353 and 197 gene pairs have significantly higher-than-average non-synonymous (Ka) versus synonymous (Ks) nucleotide substitution rate ratios (ω), indicating diversifying selection for M. notabilis–C. sativa, M. notabilis–F. vesca, M. notabilis–M. domestica and M. notabilis–M. truncatula (Supplementary Data 2). Interestingly, for the subset of genes that meet the more stringent Fisher’s exact test, diversifying selection between 222 pairs of M. notabilis–C. sativa genes (Supplementary Fig. S6 and Supplementary Table S10) is enriched in aging and stress response-related genes, perhaps linked to the difference in life expectancy of the plants. In M. notabilis–F. vesca and M. notabilis–M. domestica comparisons, 228 and 258 diversifying selected orthologous pairs (Supplementary Data 2) may be associated with functional differences, for example, Morus000754 (mulberry)–MDP0000252168 (apple) and Morus009486 (mulberry)–MDP0000290357 (apple) involved in cutin biosynthetic processes may be related to the apple’s thick cuticle (although mechanisms of cuticle biogenesis are not clear17). Particularly prominent in the mulberry–Rosaceae (apple, strawberry) diversification are the gene pairs related to plastid components (Supplementary Data 3 and 4), suggesting that Rubisco18 and many plastid genes were under positive diversifying selection. Resistance genes The mulberry genome has 142 nucleotide-binding site (NBS)-containing resistance (R) genes that constitute about 0.53% of all Morus genes, comparable to that of Arabidopsis (0.52%) and strawberry (0.58%), and lower than that of poplar (0.86%) and apple (1.49%) (Supplementary Data 5 and Supplementary Table S11). All of these R genes were classified into six groups, TIR-NBS-LRR, CC-NBS-LRR, NBS-LRR, NBS, CC-NBS and TIR-NBS, with the maximum number of 46 belonging to the CC-NBS-LRR group. The Morus genome contains 127 cysteine protease (CP; 0.47%) and 129 aspartic protease (AP; 0.48%) encoding genes, which is comparable to that of apple (0.59%, 0.37%) and of strawberry (0.49%, 0.53%; Supplementary Data 6 and 7, and Supplementary Table S12). Prominent among these are 13 CP and 4 AP genes expressed in the laticifers of mulberry (Supplementary Table S13). Interestingly, one of the four AP genes (Morus008067) is under diversifying selection with an apple gene (MDP0000201076; Supplementary Data 2). Protease inhibitor genes To alleviate insect infestation, plants have evolved a defence mechanism to interfere with the digestive systems of insects by expressing a number of plant protease inhibitors (PIs). On the basis of the known PI sequences and their conserved domains, we identified 79 PIs in the mulberry genome (Supplementary Table S14). Twenty-two family C1 cysteine peptidase inhibitor genes and 19 family A1/C1 serine peptidase inhibitor genes were annotated in the Morus genome, accounting for half of the identified inhibitor genes. Mulberry miRNAs identified in silkworm tissues Adaptation of silkworm to the seasonal growth of mulberry leaves may involve cross-kingdom molecular signalling. By aligning the Morus genome to various plant small RNA databases, we predicted 311 small nuclear RNAs and 223 miRNAs (Supplementary Table S15). Five of the mulberry miRNAs, absent in the silkworm genome, were found in the miRNA database derived from silkworm larval haemolymph (two), anterior-middle silk glands (two), and posterior silk glands (one) (Supplementary Table S16). The sequencing of small RNAs was repeated using a different batch of silkworm haemolymph. The presence of the mulberry miRNAs in silkworm haemolymph identified in an earlier database was confirmed in the repeat experiment. Discussion Early studies proposed a basic chromosome number of 14 for mulberry species19. This number is widely cited in the literature even though later cytological studies on two M. indica species proposed a basic chromosome number of 7 for Morus species20. The diverse levels of polyploidization in the genus are reflected in the wide range of chromosome numbers: 14 in M. notabilis 21, 28 in M. indica or M. alba, 42 in M. bombycis and even 308 in M. nigra 22. Because of the high complexity of polyploid genomes, the species (M. notabilis) with 14 chromosomes is chosen for whole-genome sequencing. To verify the number of chromosomes of the M. notabilis, somatic cells at metaphase stage in the apical bud was used for cytological analyses. We confirmed that the cells of M. notabilis contained 14 chromosomes. Chromosome karyotyping clearly grouped the 14 chromosomes of M. notabilis into seven distinct pairs, supporting the basic chromosome number of seven proposed in the studies on M. indica 20. Phylogenetic analysis indicated that the mulberry genes form a clade with those of other Rosales. Moraceae, conventionally considered as belonging to Urticales, is thought to be one of the closest relatives of Rosaceae. However, a recent report suggested that the families Ulmaceae, Cannabaceae, Moraceae and Urticaceae belong to a single clade23, named as the urticalean rosids24. Moraceae was later classified into Rosales by the Angiosperm Phylogeny Group III13. Our results support this reclassification. Mulberry is rapidly evolving at the nucleotide level. It’s fast evolving genes may have contributed to the flexibility of mulberry to adapt to environments outside of its native range, facilitating its spread to Europe, Africa and the United States. In contrast to its rapid nucleotide changes, Rosales ploidies have evolved conservatively. Mulberry, strawberry, cannabis, papaya and grape underwent the most recent pan-eudicot hexaploidization. Widespread neopolyploidy in mulberry with up to 308 (44 × ) chromosomes22 and strawberry with up to 70, suggest an intriguing scenario that these lineages may be receptive to the benefits of a new wave of polyploidization. Mulberry is a woody perennial tree and constant pruning is a common practice not only to collect leaves for silkworms but also to boost leaf production. Pruning increases risk of pest infestation and pathogen infection; therefore, a robust defence system helps to fend off these biotic stresses. Proteins encoded by plant R genes allow the recognition of pathogen effectors, such as their cognate avirulence gene products25. Most of the extensively studied plant R genes are NBS-containing R genes26. In the mulberry genome, we identified a total of 142 NBS-containing R genes. Mulberry is a lactiferous plant and protein components, such as the chitinase-like protein, in mulberry latexes are believed to be involved in the defence system against microbes or herbivores27 28 29. Cysteine proteases in the laticifers of papaya and aspartic proteases secreted into the pitcher of Nepenthes alata 30 31 have also been shown to be toxic to herbivorous insects. Sequencing of mulberry genome revealed 127 CP genes and 129 aspartic protease genes. The functional studies of these genes will expand our knowledge on mulberry defence mechanisms. It remains unclear how the oligophagous silkworm bypasses plant defence mechanisms that interfere with insect digestive systems. In particular, plant PIs reduce the activity of the digestive enzymes in the guts of herbivorous insects, resulting in serious developmental malformations, lethality and reduced procreation32 33. Previous studies reported that plants produce more PIs with multidomains and multimeric structures, which have antinutritional effects on Spodoptera frugiperda 34. The insect circumvents plant PIs via inducible PI-insensitive proteases and the degradation of plant PIs by specific proteases35 36. The diamondback moth, Plutella xylostella, a notorious Lepidopteran pest of cruciferous crops, inactivates mustard trypsin inhibitor 2 to break through host plant defence37. Parallel transcriptome analysis of the silkworm–mulberry oligophagy, benefiting from the respective genome sequences may accelerate our understanding of the fundamentals in plant–herbivore adaptation. A total of five mulberry miRNAs were found in the silkworm sequencing data. None of them seem to come from the silkworm genome. One of them, MIR156, is abundantly expressed in the old leaves at the vegetative growth stage of rice and has a major role in the juvenile-to-adult transition in plants38 39 40. Noting that rice MIR168a can be transferred to human and regulate the low-density lipoprotein receptor adaptor protein 1 (ref. 41), it remains unclear whether mulberry MIR156 in silkgland signals leaf aging and stimulates cocoon spinning, or whether tissue-specific presence of other mulberry MIRs has a role in coordinating development of silkworm. In summary, genomic information is an important resource for modern genetic research of mulberry. The genomic features of mulberry, such as gene families, segmental duplication, and syntenic blocks not only enrich the data available for plant comparative genomics but also accelerate future identification of target genes from closely related species of the family Moraceae. Genetic markers can be developed based on these genome sequences for studies involving genetic map construction, positional cloning, strain identification and marker-assisted selection. These molecular tools and genomic techniques will accelerate agricultural improvement. As a model system for studies of plant–herbivore relationships, the availability of the mulberry and silkworm genome sequences offers a unique opportunity to gain insights into such biological partnerships prevalent in most terrestrial habitats. Methods Karyotype analysis of M. notabilis C.K. Schn Young leaves were treated with 2 mM 8-hydroxy-quinoline for 3 h at room temperature, and then fixed in 3:1 methanol/glacial acetic acid for 2 h at 4 °C. Fixed leaves were incubated with 1/15 M KCl solution for 30 min and digested by 2.5% (W/V) cellulose (YaKult Co., Japan) and 2.5% (W/V) pectolyase (YaKult Co.) for 1.5 h at 37 °C. Digested leaves were treated with ddH2O for 10 min and post-fixed in 3:1 methanol/glacial acetic acid for 30 min at room temperature. Post-fixed leaves were smashed and two drops of cell suspension were added on a glass slide for Giemsa staining at room temperature for 6 h. Slides were analysed under a microscope (Olympus Corporation, Japan). DNA and RNA preparation A wild mulberry species, M. notabilis, with a chromosome number of 14 was used for genome sequencing. Genomic DNA used as a template for the library construction was extracted from the winter buds by a CTAB method. Total RNA was isolated from five tissues (root; 1-year-old branch bark; winter bud; male flower; leaf) according to the methods of Wan and Wilkins42, and was treated with RNase-free DNase I for 30 min at 37 °C (New England BioLabs) to remove residual DNA. Beads with oligo(dT) were used to isolate poly(A) mRNA. First-strand complementary DNA was synthesized using random hexamer primers and reverse transcriptase (Invitrogen). The second-strand cDNA was synthesized using DNA polymerase I (New England BioLabs) in the presence of RNase H (Invitrogen). Genome sequencing A whole-genome shotgun approach was used to sequence the mulberry genome. Sequencing libraries were prepared following the manufacturer’s instructions (Illumina, San Diego, CA). For short-insert DNA libraries, 5 μg of genomic DNA was fragmented by nebulization with compressed nitrogen gas. The DNA ends were blunted with an ‘A’ base to the ends of the DNA fragments. Next, the DNA adaptors (Illumina) with a single ‘T’ base overhang at the 3′-end were ligated to the DNA fragment. We then purified the ligation products on a 2% agarose gel, and excised and purified gel slices for each insert size (Qiagen Gel Extraction Kit). For long (≥2 Kb), mate-paired libraries, 10–30 μg genomic DNA was fragmented by nebulization with compressed nitrogen gas. We then used biotin-labelled dNTPs for polishing and gel selection for the main bands of 2, 5 and 10 Kb. The DNA fragments were then circularized for self-ligation. The two ends of the DNA fragment were merged together and the linear DNA fragments were digested by DNA exonuclease. The circularized DNA was fragmented again, followed by enrichment of the ‘merged ends’ with magnetic beads using biotin and streptavidin interaction, then the ends were blunted, and ‘A’ base and adaptors were added. We followed the manufacturer’s instructions (Illumina) for paired-end (PE) sequencing runs by the following workflow: cluster generation, template hybridization, isothermal amplification, linearization, blocking, denaturing and hybridization of sequencing primers. The base-calling pipeline (SolexaPipeline-0.3) was used to obtain sequences from the raw fluorescent images. Genome assembly Before de novo assembly, we filtered the low-quality data by the following five steps: (1) trim the low-quality bases on both 5′- and 3′-end of each read according to quality reports from Hiseq2000 pipeline; (2) discard those reads with Ns>10% of the read length; (3) remove those reads when the total low-quality bases (Q 50% of the read length; (4) discard the reads contaminated by adapters; and (5) remove duplicated reads caused by PCR during library construction. SOAPdenovo is a genome assembler developed in BGI-Shenzhen and this software preformed de Bruijn graph algorithm assemblies in a stepwise strategy43. We first assembled short reads from fragmented small insert-size ( 85% and heat shock protein length longer than 50 bp were recorded. The M. notabilis genome was aligned to the Rfam database (v 9.1) with BLASTN (e-value, 1). The raw output was further analysed by the INFERNAL software, which was used to predict miRNA and small nuclear RNA by searching DNA sequence databases owing to the RNA structure and sequence similarities. In-silico gene staining We used BLASTP (e-value, 1e−5) to identify reciprocal best-hit orthologous gene pairs between mulberry and strawberry. This reciprocal best-hit matrix and the orthologous gene pairs were used to further define the syntenic blocks between two species in the MCscan pipeline. The scaffolds of mulberry with syntenic blocks were aligned together according to the syntenic order in the strawberry linkage groups using Genome Zipper15. The distributions of gene density and orthologous gene density were calculated using a 500-kb sliding window approach. Identification of mulberry miRNAs in silkworm tissues The small RNA was extracted from 12 ml of silkworm haemolymph (collected from the fifth instar day-5 larvae) using mirVana PARIS kit (Ambion, USA). The sequencing of small RNA in haemolymph was conducted following the procedure describe by Liu51. The sequences of small RNA in the anterior-middle and posterior silk glands were downloaded from http://www.ncbi.nlm.nih.gov/gds?term=GSE17965. The small RNA sequenced data of three silkworm tissues were used as queries to search against mulberry-predicted miRNAs by BLASTN without mismatch. The sequences aligned to silkworm genome, rRNAs and tRNAs were filtered out. Phylogenetic tree and determine the speciation time Single-copy genes from 13 plant species were used to reconstruct a phylogenetic tree based on the maximum likelihood method. Orthologous gene pairs had been determined by top-ranked BLAST hits in each other with an e-value 1e−10. The Ks value52 between the orthologous pairs were calculated by the yn00 programme in PAML package53. The speciation time base on Ks value was dated by the equation T=Ks/2λ with λ=6.1 × 10−9 (ref. 54). Orthologous gene pairs likely to be under positive (diversifying) selection between mulberry and each of the other four plants were determined by regression analysis between Ka and Ks values based on a 95% prediction interval range55. Gene pairs with ω-values greater than the prediction interval upper limit were considered to show evidence of positive selection. Gene Ontology groups in which the high omega pairs were significantly included were determined by BLAST2GO56 with a cut-off P-value<0.05 using Fisher’s exact test. Inference of gene collinearity We inferred gene collinearity with MCSCAN57, a multiple-chromosome alignment tool, complemented by analyses using COLINEARSCAN58, a pairwise-chromosome alignment tool. The inferred collinear genes were used to perform phylogenetic and evolutionary analyses. Dating evolutionary event We used collinear genes between plants, and within-each-plant homologues with high confidence, to infer evolutionary events. For example, collinear genes between mulberry scaffolds are likely to have resulted from ancient polyploidization event(s) if present; and collinear genes between mulberry and grape are likely to have resulted from a divergence of the two species. The synonymous nucleotide substitution rates (Ks) were calculated by using Nei–Gojobori approach52 implemented in PAML53. The distributions of Ks values were drawn to infer the relative time of evolutionary events. Homologous dotplotting We used predicted gene sets that are described above and a gene data set predicted by Genewise59 in the analysis. Genome sequences and annotations of grape, apple, strawberry and cannabis were downloaded from online databases, and the most up-to-date versions till October 2012 were used in the analyses. In comparison with genomes with available pseudochromosomes, we used protein–protein searches using BLASTP to reveal putative homologous genes, and the output was used to make dotplot; genes were placed along with their chromosomal order as coordinates. When a comparison was done involving genomes (for example, cannabis and mulberry) without available pseudochromosomes, that is, those with unanchored scaffolds, gene coding DNA sequences from a genome sequences with pseudochromosomes (for example, grape) were searched against the cannabis and mulberry genomes using BLASTN, and hits on the pseudochromosomes were located. The BLASTN output was used to produce dotplots. To detect the genome duplication events, the unanchored scaffolds were linked to their best-matched grape genomic regions on the putative pseudochromosomes. The putative pseudochromosomal regions of mulberry and cannabis were identified this way. A corresponding grape region would have two matched regions clustered together in the dotplot. Data used in this study The genome data were downloaded from the following websites and are associated with the accession codes provided. Arabidopsis thaliana (TAIR9), ftp://ftp.arabidopsis.org/Genes/TAIR9_genome_release/, GCA_000001735.1. C. sativa, http://genome.ccbr.utoronto.ca/downloads.html, GCA_000230575.1. Carica papaya (version 1th), ftp://ftp.jgi-psf.org/pub/compgen/phytozome/v5.0/Cpapaya/, GCA_000150535.1. Cucumis sativus (version 1th) http://cucumber.genomics.org.cn/page/cucumber/download.jsp, GCA_000004075.1. F. vesca (version 1.1), http://www.rosaceae.org/species/fragaria/fragaria_vesca/genome_v1.1, GCA_000184155.1. Glycine max (version 1.0), ftp://ftp.jgi-psf.org/pub/compgen/phytozome/v5.0/Gmax/, GCA_000004515.1. M. domestica (version 1.0), http://genomics.research.iasma.it/index.html, GCA_000148765.2. M. truncatula, ftp://ftp.jgi-psf.org/pub/compgen/phytozome/v8.0/Mtruncatula/, GCA_000219495.1. Populus trichocarpa (version 5.0), ftp://ftp.jgi-psf.org/pub/compgen/phytozome/v5.0/Ptrichocarpa/, GCA_000002775.1. Prunus persica, ftp://ftp.jgi-psf.org/pub/compgen/phytozome/v8.0/Ppersica/, GCA_000346465.1. Pyrus bretschneideri, http://peargenome.njau.edu.cn:8004/default.asp?d=1&m=1, GCA_000315295.1. Theobroma cacao (version 1.0), http://cocoagendb.cirad.fr/gbrowse/download.html, GCA_000403535.1. Vitis vinifera, http://www.genoscope.cns.fr/externe/Download/Projets/Projet_ML/data/12X/, GCA_000003745.2. Author contributions N.H., C.Z., X.Q., Shancen Z. and Y.T. have contributed equally to this paper. N.H., D.J., S.L., M.L., Q.X. and Z.X. coordinated the project; C.Z., Y.T., Shancen Z., C.G. and D.L. performed genome and transcriptome sequencing; Tae-Ho L., Xiyin W., Q.C., R.H., X.T., G.Y., D.L., Jinpeng W. and Tao L. performed evolution analyses; Y.X., N.H. and Xiling W. contributed to the cytological analyses; X.Q., Q.F., Tian L., A.Z., Q.Z., B.M., L.J., J.L., P.S., L.F., J.S., J.Z., C.W., Q.S., Q.W., K.Z. and H.W. analysed the genomic data; M.Y., C.T., Z.W., F.D., J.C., Y.L., Shutang Z., Tianbao L., Shougong Z., Jian W., Junyi W., H.Y., G.Y. and Jun W. made the characteristic analyses of the Morus genome; N.H. and Shancen Z. wrote the paper; N.H., A.P. and G.Y. revised the manuscript. Additional information Accession codes: The Morus genome data has been deposited in the Genbank short-read archive (Bioproject: PRJNA202089; short reads: SRA075563). The version described in this paper is ATGF01000000. The miRNA data has been deposited in the Gene Expression Omnibus (GEO) under the accession code GSE48168. How to cite this article: He, N. et al. Draft genome sequence of the mulberry tree Morus notabilis. Nat. Commun. 4:2445 doi: 10.1038/ncomms3445 (2013). Supplementary Material Supplementary Data 1 The distribution of mulberry gene according to the syntenic order in the strawberry linkage groups using Genome Zipper. Supplementary Data 2 Gene ontology information of orthologous gene pairs determined that were under diversifying selection, between M. notabilis and each of C. sativa, M. domestica, F. vesca and M. truncatula. Supplementary Data 3 Fisher exact test results of the 228 orthologous pairs, which were under diversifying selection, between M. notabilis and F. vesca. Supplementary Data 4 Fisher exact test results of the 258 orthologous pairs, which were under diversifying selection, between M. notabilis and M. domestica. Supplementary Data 5 NBS-containing resistance genes in the M. notabilis genome. Supplementary Data 6 Predicted cysteine proteases in M. notabilis genome. Supplementary Data 7 Predicted aspartic proteases in M. notabilis genome. Supplementary Figures, Tables and Methods Supplementary Figures S1-S6, Supplementary Tables S1-S16 and Supplementary Methods