Chromosome-level genome assembly and characterization of <i>Sophora Japonica</i>

There is no author summary for this article yet. Authors can add summaries to their articles on ScienceOpen to make them more accessible to a non-specialist audience.

Abstract

Sophora japonica is a medium-size deciduous tree belonging to Leguminosae family and famous for its high ecological, economic and medicinal value. Here, we reveal a draft genome of S. japonica, which was ∼511.49 Mb long (contig N50 size of 17.34 Mb) based on Illumina, Nanopore and Hi-C data. We reliably assembled 110 contigs into 14 chromosomes, representing 91.62% of the total genome, with an improved N50 size of 31.32 Mb based on Hi-C data. Further investigation identified 271.76 Mb (53.13%) of repetitive sequences and 31,000 protein-coding genes, of which 30,721 (99.1%) were functionally annotated. Phylogenetic analysis indicates that S. japonica separated from Arabidopsis thaliana and Glycine max ∼107.53 and 61.24 million years ago, respectively. We detected evidence of species-specific and common-legume whole-genome duplication events in S. japonica. We further found that multiple TF families (e.g. BBX and PAL) have expanded in S. japonica, which might have led to its enhanced tolerance to abiotic stress. In addition, S. japonica harbours more genes involved in the lignin and cellulose biosynthesis pathways than the other two species. Finally, population genomic analyses revealed no obvious differentiation among geographical groups and the effective population size continuously declined since 2 Ma. Our genomic data provide a powerful comparative framework to study the adaptation, evolution and active ingredients biosynthesis in S. japonica. More importantly, our high-quality S. japonica genome is important for elucidating the biosynthesis of its main bioactive components, and improving its production and/or processing.

Related collections

Most cited references 72

Record: found
Abstract: found
Article: found

Is Open Access

The Sequence Alignment/Map format and SAMtools

Heng Li, Bob Handsaker, Alec Wysoker … (2009)

Summary: The Sequence Alignment/Map (SAM) format is a generic alignment format for storing read alignments against reference sequences, supporting short and long reads (up to 128 Mbp) produced by different sequencing platforms. It is flexible in style, compact in size, efficient in random access and is the format in which alignments from the 1000 Genomes Project are released. SAMtools implements various utilities for post-processing alignments in the SAM format, such as indexing, variant caller and alignment viewer, and thus provides universal tools for processing read alignments. Availability: http://samtools.sourceforge.net Contact: rd@sanger.ac.uk

0 comments Cited 14661 times – based on 0 reviews      Review now

Bookmark

Record: found
Abstract: found
Article: not found

HISAT: a fast spliced aligner with low memory requirements.

Daehwan Kim, Ben Langmead, Steven L Salzberg (2018)

HISAT (hierarchical indexing for spliced alignment of transcripts) is a highly efficient system for aligning reads from RNA sequencing experiments. HISAT uses an indexing scheme based on the Burrows-Wheeler transform and the Ferragina-Manzini (FM) index, employing two types of indexes for alignment: a whole-genome FM index to anchor each alignment and numerous local FM indexes for very rapid extensions of these alignments. HISAT's hierarchical index for the human genome contains 48,000 local FM indexes, each representing a genomic region of ∼64,000 bp. Tests on real and simulated data sets showed that HISAT is the fastest system currently available, with equal or better accuracy than any other method. Despite its large number of indexes, HISAT requires only 4.3 gigabytes of memory. HISAT supports genomes of any size, including those larger than 4 billion bases.

0 comments Cited 6250 times – based on 0 reviews      Review now

Bookmark

Record: found
Abstract: found
Article: not found

PLINK: a tool set for whole-genome association and population-based linkage analyses.

Shaun Purcell, Benjamin M. Neale, Kathe Todd-Brown … (2007)

Whole-genome association studies (WGAS) bring new computational, as well as analytic, challenges to researchers. Many existing genetic-analysis tools are not designed to handle such large data sets in a convenient manner and do not necessarily exploit the new opportunities that whole-genome data bring. To address these issues, we developed PLINK, an open-source C/C++ WGAS tool set. With PLINK, large data sets comprising hundreds of thousands of markers genotyped for thousands of individuals can be rapidly manipulated and analyzed in their entirety. As well as providing tools to make the basic analytic steps computationally efficient, PLINK also supports some novel approaches to whole-genome data that take advantage of whole-genome coverage. We introduce PLINK and describe the five main domains of function: data management, summary statistics, population stratification, association analysis, and identity-by-descent estimation. In particular, we focus on the estimation and use of identity-by-state and identity-by-descent information in the context of population-based whole-genome studies. This information can be used to detect and correct for population stratification and to identify extended chromosomal segments that are shared identical by descent between very distantly related individuals. Analysis of the patterns of segmental sharing has the potential to map disease loci that contain multiple rare variants in a population-based linkage analysis.

0 comments Cited 5679 times – based on 0 reviews      Review now

Bookmark

All references

Author and article information

Journal

Journal ID (nlm-ta): DNA Res

Journal ID (iso-abbrev): DNA Res

Journal ID (publisher-id): dnares

Title: DNA Research: An International Journal for Rapid Publication of Reports on Genes and Genomes

Publisher: Oxford University Press

ISSN (Print): 1340-2838

ISSN (Electronic): 1756-1663

Publication date Collection: June 2022

Publication date (Electronic): 25 April 2022

Publication date PMC-release: 25 April 2022

Volume: 29

Issue: 3

Electronic Location Identifier: dsac009

Affiliations

[1 ] State Key Laboratory of Grassland Agro-Ecosystems, and College of Ecology, Lanzhou University , Lanzhou 730000, China

[2 ] Key Laboratory of Bio-resource and Eco-Environment of Ministry of Education, College of Life Sciences, Sichuan University , Chengdu 610000, China

[3 ] Institute of Loess Plateau, Shanxi University , Taiyuan 030006, China

Author notes

To whom correspondence should be addressed. Tel. 13880788291. Email: rudf@ 123456lzu.edu.cn (D.R.); Tel. 13880788291. lbb2015@ 123456sxu.edu.cn (B.L.)

Weixiao Lei, Zefu Wang and Man Cao contributed equally to this work.

Article

Publisher ID: dsac009

DOI: 10.1093/dnares/dsac009

PMC ID: 9154292

PubMed ID: 35466378

SO-VID: e1fe0472-24ee-4b5a-a9f9-de0460e4f989

License:

This is an Open Access article distributed under the terms of the Creative Commons Attribution-NonCommercial License ( https://creativecommons.org/licenses/by-nc/4.0/), which permits non-commercial re-use, distribution, and reproduction in any medium, provided the original work is properly cited. For commercial re-use, please contact journals.permissions@oup.com

History

Date received : 14 December 2021

Date: 03 April 2022

Date accepted : 07 April 2022

Date: 27 May 2022

Page count

Pages: 10

Funding

Funded by: National Natural Science Foundation of China, DOI 10.13039/501100001809;

Award ID: 32001085

Funded by: Fundamental Research Funds for Central Universities;

Award ID: lzujbky-2020-34

Award ID: lzujbky-2020-ct02

Comments

Comment on this article

scite_

Cited by 2

See all cited by

Most referenced authors 3,526

See all reference authors

Chromosome-level genome assembly and characterization of Sophora Japonica

Read this article at

Abstract

Related collections

Genome Integrity

Most cited references 72

The Sequence Alignment/Map format and SAMtools

HISAT: a fast spliced aligner with low memory requirements.

PLINK: a tool set for whole-genome association and population-based linkage analyses.

Author and article information

Journal

Affiliations

Author notes

Article

History

Page count

Funding

Categories

Comments

Comment on this article

Similar content 303

Cited by 2

Most referenced authors 3,526