The intronic branch point sequence is under strong evolutionary constraint in the bovine and human genome

There is no author summary for this article yet. Authors can add summaries to their articles on ScienceOpen to make them more accessible to a non-specialist audience.

Abstract

The branch point sequence is a cis-acting intronic motif required for mRNA splicing. Despite their functional importance, branch point sequences are not routinely annotated. Here we predict branch point sequences in 179,476 bovine introns and investigate their variability using a catalogue of 29.4 million variants detected in 266 cattle genomes. We localize the bovine branch point within a degenerate heptamer “nnyTrAy”. An adenine residue at position 6, that acts as branch point, and a thymine residue at position 4 of the heptamer are more strongly depleted for mutations than coding sequences suggesting extreme purifying selection. We provide evidence that mutations affecting these evolutionarily constrained residues lead to alternative splicing. We confirm evolutionary constraints on branch point sequences using a catalogue of 115 million SNPs established from 3,942 human genomes of the gnomAD database.

Abstract

Kadri and colleagues present the evolutionary constraints of the branch point motifs in the bovine and human genome. The functional role of these predicted branch points sequences in the bovine genome is inferred using splicing quantitative trait loci analyses.

Related collections

Most cited references 61

Record: found
Abstract: found
Article: not found

STAR: ultrafast universal RNA-seq aligner.

Alexander Dobin, Carrie A. Davis, Felix Schlesinger … (2013)

Accurate alignment of high-throughput RNA-seq data is a challenging and yet unsolved problem because of the non-contiguous transcript structure, relatively short read lengths and constantly increasing throughput of the sequencing technologies. Currently available RNA-seq aligners suffer from high mapping error rates, low mapping speed, read length limitation and mapping biases. To align our large (>80 billon reads) ENCODE Transcriptome RNA-seq dataset, we developed the Spliced Transcripts Alignment to a Reference (STAR) software based on a previously undescribed RNA-seq alignment algorithm that uses sequential maximum mappable seed search in uncompressed suffix arrays followed by seed clustering and stitching procedure. STAR outperforms other aligners by a factor of >50 in mapping speed, aligning to the human genome 550 million 2 × 76 bp paired-end reads per hour on a modest 12-core server, while at the same time improving alignment sensitivity and precision. In addition to unbiased de novo detection of canonical junctions, STAR can discover non-canonical splices and chimeric (fusion) transcripts, and is also capable of mapping full-length RNA sequences. Using Roche 454 sequencing of reverse transcription polymerase chain reaction amplicons, we experimentally validated 1960 novel intergenic splice junctions with an 80-90% success rate, corroborating the high precision of the STAR mapping strategy. STAR is implemented as a standalone C++ code. STAR is free open source software distributed under GPLv3 license and can be downloaded from http://code.google.com/p/rna-star/.

0 comments Cited 14471 times – based on 0 reviews      Review now

Bookmark

Record: found
Abstract: found
Article: found

Is Open Access

fastp: an ultra-fast all-in-one FASTQ preprocessor

Shifu Chen, Yanqing Zhou, Yaru Chen … (2018)

Abstract Motivation Quality control and preprocessing of FASTQ files are essential to providing clean data for downstream analysis. Traditionally, a different tool is used for each operation, such as quality control, adapter trimming and quality filtering. These tools are often insufficiently fast as most are developed using high-level programming languages (e.g. Python and Java) and provide limited multi-threading support. Reading and loading data multiple times also renders preprocessing slow and I/O inefficient. Results We developed fastp as an ultra-fast FASTQ preprocessor with useful quality control and data-filtering features. It can perform quality control, adapter trimming, quality filtering, per-read quality pruning and many other operations with a single scan of the FASTQ data. This tool is developed in C++ and has multi-threading support. Based on our evaluation, fastp is 2–5 times faster than other FASTQ preprocessing tools such as Trimmomatic or Cutadapt despite performing far more operations than similar tools. Availability and implementation The open-source code and corresponding instructions are available at https://github.com/OpenGene/fastp.

0 comments Cited 6136 times – based on 0 reviews      Review now

Bookmark

Record: found
Abstract: found
Article: not found

The Genome Analysis Toolkit: a MapReduce framework for analyzing next-generation DNA sequencing data.

Aaron McKenna, Matthew Hanna, Eric R. Banks … (2010)

Next-generation DNA sequencing (NGS) projects, such as the 1000 Genomes Project, are already revolutionizing our understanding of genetic variation among individuals. However, the massive data sets generated by NGS--the 1000 Genome pilot alone includes nearly five terabases--make writing feature-rich, efficient, and robust analysis tools difficult for even computationally sophisticated individuals. Indeed, many professionals are limited in the scope and the ease with which they can answer scientific questions by the complexity of accessing and manipulating the data produced by these machines. Here, we discuss our Genome Analysis Toolkit (GATK), a structured programming framework designed to ease the development of efficient and robust analysis tools for next-generation DNA sequencers using the functional programming philosophy of MapReduce. The GATK provides a small but rich set of data access patterns that encompass the majority of analysis tool needs. Separating specific analysis calculations from common data management infrastructure enables us to optimize the GATK framework for correctness, stability, and CPU and memory efficiency and to enable distributed and shared memory parallelization. We highlight the capabilities of the GATK by describing the implementation and application of robust, scale-tolerant tools like coverage calculators and single nucleotide polymorphism (SNP) calling. We conclude that the GATK programming framework enables developers and analysts to quickly and easily write efficient and robust NGS tools, many of which have already been incorporated into large-scale sequencing projects like the 1000 Genomes Project and The Cancer Genome Atlas.

0 comments Cited 5916 times – based on 0 reviews      Review now

Bookmark

All references

Author and article information

Contributors

Naveen Kumar Kadri:

ORCID: http://orcid.org/0000-0002-2799-3896

naveen.kadri@usys.ethz.ch

Journal

Journal ID (nlm-ta): Commun Biol

Journal ID (iso-abbrev): Commun Biol

Title: Communications Biology

Publisher: Nature Publishing Group UK (London )

ISSN (Electronic): 2399-3642

Publication date (Electronic): 21 October 2021

Publication date PMC-release: 21 October 2021

Publication date Collection: 2021

Volume: 4

Electronic Location Identifier: 1206

Affiliations

GRID grid.5801.c, ISNI 0000 0001 2156 2780, Animal Genomics, ETH Zürich, , Universitätstrasse 2, ; 8092 Zürich, Switzerland

Author information

Naveen Kumar Kadri http://orcid.org/0000-0002-2799-3896

Hubert Pausch http://orcid.org/0000-0002-0501-6760

Article

Publisher ID: 2725

DOI: 10.1038/s42003-021-02725-7

PMC ID: 8531310

PubMed ID: 33398033

SO-VID: db9da8b4-e0c2-40c1-87d2-0de1f8c26c1f

License:

Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made. The images or other third party material in this article are included in the article’s Creative Commons license, unless indicated otherwise in a credit line to the material. If material is not included in the article’s Creative Commons license and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this license, visit http://creativecommons.org/licenses/by/4.0/.

History

Date received : 11 April 2021

Date accepted : 29 September 2021

Funding

Funded by: FundRef https://doi.org/10.13039/501100001711, Schweizerischer Nationalfonds zur Förderung der Wissenschaftlichen Forschung (Swiss National Science Foundation);

Award ID: 310030185229

Award Recipient : Hubert Pausch

Funded by: FundRef https://doi.org/10.13039/100010661, EC | Horizon 2020 Framework Programme (EU Framework Programme for Research and Innovation H2020);

Award ID: 815668

Award Recipient : Hubert Pausch

Funded by: Eidgenössische Technische Hochschule Zürich Research grant Grant from Swiss Federal Office for Agriculture

Custom metadata

Keywords: genetics,genomics

Data availability:

Keywords: genetics, genomics

Comments

Comment on this article

scite_

Cited by 58

See all cited by

Most referenced authors 1,350

See all reference authors

The intronic branch point sequence is under strong evolutionary constraint in the bovine and human genome

Read this article at

Abstract

Abstract

Related collections

Genome Engineering using CRISPR

Most cited references 61

STAR: ultrafast universal RNA-seq aligner.

fastp: an ultra-fast all-in-one FASTQ preprocessor

The Genome Analysis Toolkit: a MapReduce framework for analyzing next-generation DNA sequencing data.

Author and article information

Contributors

Journal

Affiliations

Author information

Article

History

Funding

Categories

Custom metadata

Comments

Comment on this article

Similar content 93

Cited by 58

Most referenced authors 1,350