[Objective] To analyze chloroplast genome characteristics and codon preference of Yucca treculeana, so as to provide reference for the study of chloroplast-related gene expression, modification and species evolution.
[Method] Chloroplast genome of Y. treculeana was sequenced, assembled and annotated to analyze codon preference and its influencing factors, and the optimal codon was screened by establishing high and low gene expression libraries. Phylogenetic tree was constructed based on 20 published chloroplast genome data of Agavaceae plants.
[Result] The total length of Y. treculeana chloroplast genome was 157579 bp, and the length of the large single copy region (LSC), small single copy region (SSC), and two reverse repeat regions (IR), namely IRa and IRb regions were 85940, 18279 and 26680 bp, respectively. The total GC content was 37.8%. Among 135 genes (85 protein-coding gene, 38 tRNA, 8 rRNA and 4 unknown functional genes), 51 coding DNA sequences (CDS) with length greater than 300 bp were screened out, and the ENC was greater than 41.0. The contents ofGC1, GC2, GC3 and GC3s were 46.75%, 39.61%, 29.19% and 26.06%, respectively, indicating that the third codon most ended in A/T. GCall was significantly correlated with GC 1, GC2 and GC3 ( P<0.01, the same below), but GC3 was not significantly correlated with GC1 and GC2 ( P>0.05, the same below), suggesting that the base composition of the first and second codon of chloroplast genome was basically similar, but the similarity with the third codon was not high. Selection and mutation were the main causes of chloroplast genome codon preference. 14 optimal codons ending in A/U were screened out. Y. treculeana was sisterly with Y. queretaroensis and Y. schidigera, and the bootstrap values was 100%.
[Conclusion] The chloroplast genome of Y. treculeana has a conserved tetrad structure, and the codon bias is weak, which is mainly affected by multiple factors such as selection and mutation. It is an accurate and reliable method to construct phylogenetic tree based on chloroplast genome in studying the taxonomic identification and phylogenetic relationship among plant species.
摘要:【目的】分析无刺龙舌兰叶绿体基因组特征及密码子偏好性, 为无刺龙舌兰叶绿体相关基因的表达、修饰和 物种进化研究提供参考。 【方法】对无刺龙舌兰的叶绿体基因组进行测序、组装和注释, 分析密码子偏好性及其影响因 素, 并通过建立高、低基因表达库, 筛选出最优密码子。基于20个已发表的龙舌兰科植物叶绿体基因组数据构建系统 发育进化树。 【结果】无刺龙舌兰叶绿体基因组总长157579 bp, 大单拷贝区(LSC)、小单拷贝区(SSC)和2个反向重复 区(IRa和IRb)的长度分别为85940、18279和26680 bp, GC含量为37.8%, 包括135个基因(85个蛋白编码基因、38个 tRNA基因、8个rRNA基因及4个未知功能的基因), 从中筛选出51个长度大于300 bp的基因编码区(CDS)序列, 其有 效密码子数(ENC)均大于41.0。GC1、GC2、GC3和GC3s含量分别为46.75%、39.61%、29.19%和26.06%, 说明密码子 第3位多以A/T结尾。GCall与GC1、GC2和GC3均呈极显著相关( P<0.01, 下同), 但GC3与GC1和GC2均无显著相关性 ( P>0.05,下同), 表明密码子第1、2位的碱基组成相似, 但与第3位的相似度不高。选择和突变是导致叶绿体基因组密 码子偏好性的主要因素。筛选出14个多以A/U结尾的最优密码子。无刺龙舌兰与克雷塔罗丝兰和西地格丝兰为姊 妹关系, 自荐值为100%。 【结论】无刺龙舌兰叶绿体基因组为保守的四分体结构, 叶绿体基因组密码子偏好性较弱, 主 要受选择和突变等多因素影响。基于植物叶绿体基因组构建系统发育进化树在物种的分类鉴定及确定各物种间系 统发育关系的研究中是一种准确、可靠的方法。