Bioinformatics Analysis of MSH1 Genes of Green Plants: Multiple Parallel Length Expansions, Intron Gains and Losses, Partial Gene Duplications, and Alternative Splicing

There is no author summary for this article yet. Authors can add summaries to their articles on ScienceOpen to make them more accessible to a non-specialist audience.

Abstract

MutS homolog 1 (MSH1) is involved in the recombining and repairing of organelle genomes and is essential for maintaining their stability. Previous studies indicated that the length of the gene varied greatly among species and detected species-specific partial gene duplications in Physcomitrella patens. However, there are critical gaps in the understanding of the gene size expansion, and the extent of the partial gene duplication of MSH1 remains unclear. Here, we screened MSH1 genes in 85 selected species with genome sequences representing the main clades of green plants (Viridiplantae). We identified the MSH1 gene in all lineages of green plants, except for nine incomplete species, for bioinformatics analysis. The gene is a singleton gene in most of the selected species with conserved amino acids and protein domains. Gene length varies greatly among the species, ranging from 3234 bp in Ostreococcus tauri to 805,861 bp in Cycas panzhihuaensis. The expansion of MSH1 repeatedly occurred in multiple clades, especially in Gymnosperms, Orchidaceae, and Chloranthus spicatus. MSH1 has exceptionally long introns in certain species due to the gene length expansion, and the longest intron even reaches 101,025 bp. And the gene length is positively correlated with the proportion of the transposable elements (TEs) in the introns. In addition, gene structure analysis indicated that the MSH1 of green plants had undergone parallel intron gains and losses in all major lineages. However, the intron number of seed plants (gymnosperm and angiosperm) is relatively stable. All the selected gymnosperms contain 22 introns except for Gnetum montanum and Welwitschia mirabilis, while all the selected angiosperm species preserve 21 introns except for the ANA grade. Notably, the coding region of MSH1 in algae presents an exceptionally high GC content (47.7% to 75.5%). Moreover, over one-third of the selected species contain species-specific partial gene duplications of MSH1, except for the conserved mosses-specific partial gene duplication. Additionally, we found conserved alternatively spliced MSH1 transcripts in five species. The study of MSH1 sheds light on the evolution of the long genes of green plants.

Related collections

Most cited references 79

Record: found
Abstract: found
Article: found

Is Open Access

RAxML version 8: a tool for phylogenetic analysis and post-analysis of large phylogenies

Alexandros Stamatakis (2014)

Motivation: Phylogenies are increasingly used in all fields of medical and biological research. Moreover, because of the next-generation sequencing revolution, datasets used for conducting phylogenetic analyses grow at an unprecedented pace. RAxML (Randomized Axelerated Maximum Likelihood) is a popular program for phylogenetic analyses of large datasets under maximum likelihood. Since the last RAxML paper in 2006, it has been continuously maintained and extended to accommodate the increasingly growing input datasets and to serve the needs of the user community. Results: I present some of the most notable new features and extensions of RAxML, such as a substantial extension of substitution models and supported data types, the introduction of SSE3, AVX and AVX2 vector intrinsics, techniques for reducing the memory requirements of the code and a plethora of operations for conducting post-analyses on sets of trees. In addition, an up-to-date 50-page user manual covering all new RAxML options is available. Availability and implementation: The code is available under GNU GPL at https://github.com/stamatak/standard-RAxML. Contact: alexandros.stamatakis@h-its.org Supplementary information: Supplementary data are available at Bioinformatics online.

0 comments Cited 7531 times – based on 0 reviews      Review now

Bookmark

Record: found
Abstract: found
Article: not found

TBtools - an integrative toolkit developed for interactive analyses of big biological data

Chengjie Chen, Hao Chen, Yi Zhang … (2020)

The rapid development of high-throughput sequencing techniques has led biology into the big-data era. Data analyses using various bioinformatics tools rely on programming and command-line environments, which are challenging and time-consuming for most wet-lab biologists. Here, we present TBtools (a Toolkit for Biologists integrating various biological data-handling tools), a stand-alone software with a user-friendly interface. The toolkit incorporates over 130 functions, which are designed to meet the increasing demand for big-data analyses, ranging from bulk sequence processing to interactive data visualization. A wide variety of graphs can be prepared in TBtools using a new plotting engine ("JIGplot") developed to maximize their interactive ability; this engine allows quick point-and-click modification of almost every graphic feature. TBtools is platform-independent software that can be run under all operating systems with Java Runtime Environment 1.6 or newer. It is freely available to non-commercial users at https://github.com/CJ-Chen/TBtools/releases.

0 comments Cited 3406 times – based on 0 reviews      Review now

Bookmark

Record: found
Abstract: found
Article: not found

De novo transcript sequence reconstruction from RNA-seq using the Trinity platform for reference generation and analysis.

Brian Haas, Alexie Papanicolaou, Moran Yassour … (2013)

De novo assembly of RNA-seq data enables researchers to study transcriptomes without the need for a genome sequence; this approach can be usefully applied, for instance, in research on 'non-model organisms' of ecological and evolutionary importance, cancer samples or the microbiome. In this protocol we describe the use of the Trinity platform for de novo transcriptome assembly from RNA-seq data in non-model organisms. We also present Trinity-supported companion utilities for downstream applications, including RSEM for transcript abundance estimation, R/Bioconductor packages for identifying differentially expressed transcripts across samples and approaches to identify protein-coding genes. In the procedure, we provide a workflow for genome-independent transcriptome analysis leveraging the Trinity platform. The software, documentation and demonstrations are freely available from http://trinityrnaseq.sourceforge.net. The run time of this protocol is highly dependent on the size and complexity of data to be analyzed. The example data set analyzed in the procedure detailed herein can be processed in less than 5 h.

0 comments Cited 1642 times – based on 0 reviews      Review now

Bookmark

All references

Author and article information

Journal

Journal ID (publisher-id): IJMCFK

Title: International Journal of Molecular Sciences

Abbreviated Title: IJMS

Publisher: MDPI AG

ISSN (Electronic): 1422-0067

Publication date Created: September 2023

Publication date (Electronic): September 03 2023

Volume: 24

Issue: 17

Page: 13620

Article

DOI: 10.3390/ijms241713620

SO-VID: 8ba4b143-c020-4e5d-b9a4-556885cf6667

License:

https://creativecommons.org/licenses/by/4.0/

History

Data availability:

Bioinformatics Analysis of MSH1 Genes of Green Plants: Multiple Parallel Length Expansions, Intron Gains and Losses, Partial Gene Duplications, and Alternative Splicing

Read this article at

Abstract

Related collections

Journal of Green Building

Most cited references 79

RAxML version 8: a tool for phylogenetic analysis and post-analysis of large phylogenies

TBtools - an integrative toolkit developed for interactive analyses of big biological data

De novo transcript sequence reconstruction from RNA-seq using the Trinity platform for reference generation and analysis.

Author and article information

Journal

Article

History

Comments

Comment on this article

Similar content 119

Cited by 1

Most referenced authors 5,474