113
views
0
recommends
+1 Recommend
0 collections
    0
    shares
      • Record: found
      • Abstract: found
      • Article: found
      Is Open Access

      Using phylogenetically-informed annotation (PIA) to search for light-interacting genes in transcriptomes from non-model organisms

      research-article

      Read this article at

      Bookmark
          There is no author summary for this article yet. Authors can add summaries to their articles on ScienceOpen to make them more accessible to a non-specialist audience.

          Abstract

          Background

          Tools for high throughput sequencing and de novo assembly make the analysis of transcriptomes ( i.e. the suite of genes expressed in a tissue) feasible for almost any organism. Yet a challenge for biologists is that it can be difficult to assign identities to gene sequences, especially from non-model organisms. Phylogenetic analyses are one useful method for assigning identities to these sequences, but such methods tend to be time-consuming because of the need to re-calculate trees for every gene of interest and each time a new data set is analyzed. In response, we employed existing tools for phylogenetic analysis to produce a computationally efficient, tree-based approach for annotating transcriptomes or new genomes that we term Phylogenetically-Informed Annotation (PIA), which places uncharacterized genes into pre-calculated phylogenies of gene families.

          Results

          We generated maximum likelihood trees for 109 genes from a Light Interaction Toolkit (LIT), a collection of genes that underlie the function or development of light-interacting structures in metazoans. To do so, we searched protein sequences predicted from 29 fully-sequenced genomes and built trees using tools for phylogenetic analysis in the Osiris package of Galaxy (an open-source workflow management system). Next, to rapidly annotate transcriptomes from organisms that lack sequenced genomes, we repurposed a maximum likelihood-based Evolutionary Placement Algorithm (implemented in RAxML) to place sequences of potential LIT genes on to our pre-calculated gene trees. Finally, we implemented PIA in Galaxy and used it to search for LIT genes in 28 newly-sequenced transcriptomes from the light-interacting tissues of a range of cephalopod mollusks, arthropods, and cubozoan cnidarians. Our new trees for LIT genes are available on the Bitbucket public repository ( http://bitbucket.org/osiris_phylogenetics/pia/) and we demonstrate PIA on a publicly-accessible web server ( http://galaxy-dev.cnsi.ucsb.edu/pia/).

          Conclusions

          Our new trees for LIT genes will be a valuable resource for researchers studying the evolution of eyes or other light-interacting structures. We also introduce PIA, a high throughput method for using phylogenetic relationships to identify LIT genes in transcriptomes from non-model organisms. With simple modifications, our methods may be used to search for different sets of genes or to annotate data sets from taxa outside of Metazoa.

          Electronic supplementary material

          The online version of this article (doi:10.1186/s12859-014-0350-x) contains supplementary material, which is available to authorized users.

          Related collections

          Most cited references77

          • Record: found
          • Abstract: found
          • Article: found
          Is Open Access

          Galaxy: a comprehensive approach for supporting accessible, reproducible, and transparent computational research in the life sciences

          Increased reliance on computational approaches in the life sciences has revealed grave concerns about how accessible and reproducible computation-reliant results truly are. Galaxy http://usegalaxy.org, an open web-based platform for genomic research, addresses these problems. Galaxy automatically tracks and manages data provenance and provides support for capturing the context and intent of computational methods. Galaxy Pages are interactive, web-based documents that provide users with a medium to communicate a complete computational analysis.
            Bookmark
            • Record: found
            • Abstract: found
            • Article: not found

            The genome of the model beetle and pest Tribolium castaneum.

            Tribolium castaneum is a member of the most species-rich eukaryotic order, a powerful model organism for the study of generalized insect development, and an important pest of stored agricultural products. We describe its genome sequence here. This omnivorous beetle has evolved the ability to interact with a diverse chemical environment, as shown by large expansions in odorant and gustatory receptors, as well as P450 and other detoxification enzymes. Development in Tribolium is more representative of other insects than is Drosophila, a fact reflected in gene content and function. For example, Tribolium has retained more ancestral genes involved in cell-cell communication than Drosophila, some being expressed in the growth zone crucial for axial elongation in short-germ development. Systemic RNA interference in T. castaneum functions differently from that in Caenorhabditis elegans, but nevertheless offers similar power for the elucidation of gene function and identification of targets for selective insect control.
              Bookmark
              • Record: found
              • Abstract: found
              • Article: found
              Is Open Access

              Performance, Accuracy, and Web Server for Evolutionary Placement of Short Sequence Reads under Maximum Likelihood

              We present an evolutionary placement algorithm (EPA) and a Web server for the rapid assignment of sequence fragments (short reads) to edges of a given phylogenetic tree under the maximum-likelihood model. The accuracy of the algorithm is evaluated on several real-world data sets and compared with placement by pair-wise sequence comparison, using edit distances and BLAST. We introduce a slow and accurate as well as a fast and less accurate placement algorithm. For the slow algorithm, we develop additional heuristic techniques that yield almost the same run times as the fast version with only a small loss of accuracy. When those additional heuristics are employed, the run time of the more accurate algorithm is comparable with that of a simple BLAST search for data sets with a high number of short query sequences. Moreover, the accuracy of the EPA is significantly higher, in particular when the sample of taxa in the reference topology is sparse or inadequate. Our algorithm, which has been integrated into RAxML, therefore provides an equally fast but more accurate alternative to BLAST for tree-based inference of the evolutionary origin and composition of short sequence reads. We are also actively developing a Web server that offers a freely available service for computing read placements on trees using the EPA.
                Bookmark

                Author and article information

                Contributors
                speiser@mailbox.sc.edu
                abrinas@gmail.com
                sashazaharoff@gmail.com
                Battelle@whitney.ufl.edu
                hbracken@fiu.edu
                jessebreinholt@gmail.com
                seth.bybee@gmail.com
                cronin@umbc.edu
                algarm@bio.ku.dk
                arl3@pdx.edu
                nipam@uclink.berkeley.edu
                Megan.Porter@usd.edu
                meredith.protas@dominican.edu
                arivera@pacific.edu
                serb@iastate.edu
                kzigler@sewanee.edu
                kcrandall@gwu.edu
                todd.oakley@lifesci.ucsb.edu
                Journal
                BMC Bioinformatics
                BMC Bioinformatics
                BMC Bioinformatics
                BioMed Central (London )
                1471-2105
                19 November 2014
                19 November 2014
                2014
                : 15
                : 1
                : 350
                Affiliations
                [ ]Department of Ecology, Evolution, and Marine Biology, University of California Santa Barbara, Santa Barbara, CA USA
                [ ]Department of Biological Sciences, University of South Carolina, Columbia, SC USA
                [ ]The Whitney Laboratory for Marine Bioscience, University of Florida, St. Augustine, FL USA
                [ ]Department of Biological Sciences, Florida International University-Biscayne Bay Campus, North Miami, FL USA
                [ ]Florida Museum of Natural History, University of Florida, Gainesville, FL USA
                [ ]Department of Biology, Brigham Young University, Provo, UT USA
                [ ]Department of Biological Sciences, University of Maryland Baltimore County, Baltimore, MD USA
                [ ]Department of Biology, Marine Biological Section, University of Copenhagen, Copenhagen, Denmark
                [ ]Department of Biology, Portland State University, Portland, OR USA
                [ ]Department of Molecular and Cell Biology & Department of Integrative Biology, University of California, Berkeley, CA USA
                [ ]Department of Biology, University of South Dakota, Vermillion, SD USA
                [ ]Department of Natural Sciences and Mathematics, Dominican University of California, San Rafael, CA USA
                [ ]Department of Biology, University of the Pacific, Stockton, CA USA
                [ ]Department of Ecology, Evolution, and Organismal Biology, Iowa State University, Ames, IA USA
                [ ]Department of Biology, Sewanee: The University of the South, Sewanee, TN USA
                [ ]Computational Biology Institute, George Washington University, Ashburn, VA USA
                [ ]Department of Invertebrate Zoology, National Museum of Natural History, Smithsonian Institution, Washington, DC USA
                Article
                350
                10.1186/s12859-014-0350-x
                4255452
                25407802
                dfc3b478-a8a7-46e4-bb97-b04f3252f97f
                © Speiser et al.; licensee BioMed Central Ltd. 2014

                This is an Open Access article distributed under the terms of the Creative Commons Attribution License ( http://creativecommons.org/licenses/by/4.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly credited. The Creative Commons Public Domain Dedication waiver ( http://creativecommons.org/publicdomain/zero/1.0/) applies to the data made available in this article, unless otherwise stated.

                History
                : 7 May 2014
                : 9 October 2014
                Categories
                Software
                Custom metadata
                © The Author(s) 2014

                Bioinformatics & Computational biology
                bioinformatics,eyes,evolution,galaxy,next-generation sequence analysis,orthology,phototransduction,transcriptomes,vision

                Comments

                Comment on this article

                scite_
                0
                0
                0
                0
                Smart Citations
                0
                0
                0
                0
                Citing PublicationsSupportingMentioningContrasting
                View Citations

                See how this article has been cited at scite.ai

                scite shows how a scientific paper has been cited by providing the context of the citation, a classification describing whether it supports, mentions, or contrasts the cited claim, and a label indicating in which section the citation was made.

                Similar content105

                Cited by33

                Most referenced authors1,837