Manipulation of FASTQ data with Galaxy

There is no author summary for this article yet. Authors can add summaries to their articles on ScienceOpen to make them more accessible to a non-specialist audience.

Abstract

Summary: Here, we describe a tool suite that functions on all of the commonly known FASTQ format variants and provides a pipeline for manipulating next generation sequencing data taken from a sequencing machine all the way through the quality filtering steps.

Availability and Implementation: This open-source toolset was implemented in Python and has been integrated into the online data analysis platform Galaxy (public web access: http://usegalaxy.org; download: http://getgalaxy.org). Two short movies that highlight the functionality of tools described in this manuscript as well as results from testing components of this tool suite against a set of previously published files are available at http://usegalaxy.org/u/dan/p/fastq

Contact: james.taylor@ 123456emory.edu ; anton@ 123456bx.psu.edu

Supplementary information: Supplementary data are available at Bioinformatics online.

Related collections

Most cited references 3

Record: found
Abstract: found
Article: not found

Galaxy: a web-based genome analysis tool for experimentalists.

Daniel Blankenberg, Gregory Von Kuster, Nathaniel Coraor … (2010)

High-throughput data production has revolutionized molecular biology. However, massive increases in data generation capacity require analysis approaches that are more sophisticated, and often very computationally intensive. Thus, making sense of high-throughput data requires informatics support. Galaxy (http://galaxyproject.org) is a software system that provides this support through a framework that gives experimentalists simple interfaces to powerful tools, while automatically managing the computational details. Galaxy is distributed both as a publicly available Web service, which provides tools for the analysis of genomic, comparative genomic, and functional genomic data, or a downloadable package that can be deployed in individual laboratories. Either way, it allows experimentalists without informatics or programming expertise to perform complex large-scale analysis with just a Web browser.

0 comments Cited 202 times – based on 0 reviews      Review now

Bookmark

Record: found
Abstract: found
Article: not found

A framework for collaborative analysis of ENCODE data: making large-scale analyses biologist-friendly.

Yi Zhang, Istvan Albert, Johnny He … (2007)

The standardization and sharing of data and tools are the biggest challenges of large collaborative projects such as the Encyclopedia of DNA Elements (ENCODE). Here we describe a compact Web application, Galaxy2(ENCODE), that effectively addresses these issues. It provides an intuitive interface for the deposition and access of data, and features a vast number of analysis tools including operations on genomic intervals, utilities for manipulation of multiple sequence alignments, and molecular evolution algorithms. By providing a direct link between data and analysis tools, Galaxy2(ENCODE) allows addressing biological questions that are beyond the reach of existing software. We use Galaxy2(ENCODE) to show that the ENCODE regions contain >2000 unannotated transcripts under strong purifying selection that are likely functional. We also show that the ENCODE regions are representative of the entire genome by estimating the rate of nucleotide substitution and comparing it to published data. Although each of these analyses is complex, none takes more than 15 min from beginning to end. Finally, we demonstrate how new tools can be added to Galaxy2(ENCODE) with almost no effort. Every section of the manuscript is supplemented with QuickTime screencasts. Galaxy2(ENCODE) and the screencasts can be accessed at http://g2.bx.psu.edu.

0 comments Cited 54 times – based on 0 reviews      Review now

Bookmark

Record: found
Abstract: found
Article: not found

Using galaxy to perform large-scale interactive data analyses.

Paul William James Taylor, Anton Nekrutenko, Dan Blankenberg … (2007)

While most experimental biologists know where to download genomic data, few have a concrete plan on how to analyze it. This situation can be corrected by: (1) providing unified portals serving genomic data and (2) building Web applications to allow flexible retrieval and on-the-fly analyses of the data. Powerful resources, such as the UCSC Genome Browser already address the first issue. The second issue, however, remains open. For example, how to find human protein-coding exons with the highest density of single nucleotide polymorphisms (SNPs) and extract orthologous sequences from all sequenced mammals? Indeed, one can access all relevant data from the UCSC Genome Browser. But once the data is downloaded how would one deal with millions of SNPs and gigabytes of alignments? Galaxy (http://g2.bx.psu.edu) is designed specifically for that purpose. It amplifies the strengths of existing resources (such as UCSC Genome Browser) by allowing the user to access and, most importantly, analyze data within a single interface in an unprecedented number of ways. Copyright 2007 by John Wiley & Sons, Inc.

0 comments Cited 29 times – based on 0 reviews      Review now

Bookmark

All references

Author and article information

Journal

Journal ID (nlm-ta): Bioinformatics

Journal ID (publisher-id): bioinformatics

Journal ID (hwp): bioinfo

Title: Bioinformatics

Publisher: Oxford University Press

ISSN (Print): 1367-4803

ISSN (Electronic): 1367-4811

Publication date (Print): 15 July 2010

Publication date (Electronic): 18 June 2010

Publication date PMC-release: 18 June 2010

Volume: 26

Issue: 14

Pages: 1783-1785

Affiliations

¹ Huck Institute for the Life Sciences, Penn State University, University Park, PA 16803, ² Cold Spring Harbor Laboratory, Watson School of Biological Sciences, Howard Hughes Medical Institute, Cold Spring Harbor, NY 11724 and ³ Departments of Biology and Mathematics and Computer Science, Emory University, Atlanta, GA 30322, USA

Author notes

* To whom correspondence should be addressed.

^† The authors wish it to be known that, in their opinion, the first two authors should be regarded as joint First Authors.

^‡ http://galaxyproject.org

Associate Editor: John Quackenbush

Article

Publisher ID: btq281

DOI: 10.1093/bioinformatics/btq281

PMC ID: 2894519

PubMed ID: 20562416

SO-VID: 49e218cf-67c8-4b68-bc12-3d2dbedf3914

License:

This is an Open Access article distributed under the terms of the Creative Commons Attribution Non-Commercial License ( http://creativecommons.org/licenses/by-nc/2.5), which permits unrestricted non-commercial use, distribution, and reproduction in any medium, provided the original work is properly cited.

History

Date received : 1 April 2010

Date revision received : 20 May 2010

Date accepted : 24 May 2010

Comments

Comment on this article

scite_

Cited by 298

See all cited by

Most referenced authors 96

See all reference authors

- Version 1

Manipulation of FASTQ data with Galaxy

Read this article at

Abstract

Related collections

REPO4EU WP2 Databases

Most cited references 3

Galaxy: a web-based genome analysis tool for experimentalists.

A framework for collaborative analysis of ENCODE data: making large-scale analyses biologist-friendly.

Using galaxy to perform large-scale interactive data analyses.

Author and article information

Journal

Affiliations

Author notes

Article

History

Categories

Comments

Comment on this article

Similar content 170

Cited by 298

Most referenced authors 96