141
views
0
recommends
+1 Recommend
0 collections
    0
    shares
      • Record: found
      • Abstract: found
      • Article: not found

      Annotation of the Drosophila melanogaster euchromatic genome: a systematic review

      research-article

      Read this article at

      ScienceOpenPublisherPMC
      Bookmark
          There is no author summary for this article yet. Authors can add summaries to their articles on ScienceOpen to make them more accessible to a non-specialist audience.

          Abstract

          The recent completion of the Drosophila melanogaster genomic sequence to high quality, and the availability of a greatly expanded set of Drosophila cDNA sequences, afforded FlyBase the opportunity to significantly improve genomic annotations.

          Abstract

          Background

          The recent completion of the Drosophila melanogaster genomic sequence to high quality and the availability of a greatly expanded set of Drosophila cDNA sequences, aligning to 78% of the predicted euchromatic genes, afforded FlyBase the opportunity to significantly improve genomic annotations. We made the annotation process more rigorous by inspecting each gene visually, utilizing a comprehensive set of curation rules, requiring traceable evidence for each gene model, and comparing each predicted peptide to SWISS-PROT and TrEMBL sequences.

          Results

          Although the number of predicted protein-coding genes in Drosophila remains essentially unchanged, the revised annotation significantly improves gene models, resulting in structural changes to 85% of the transcripts and 45% of the predicted proteins. We annotated transposable elements and non-protein-coding RNAs as new features, and extended the annotation of untranslated (UTR) sequences and alternative transcripts to include more than 70% and 20% of genes, respectively. Finally, cDNA sequence provided evidence for dicistronic transcripts, neighboring genes with overlapping UTRs on the same DNA sequence strand, alternatively spliced genes that encode distinct, non-overlapping peptides, and numerous nested genes.

          Conclusions

          Identification of so many unusual gene models not only suggests that some mechanisms for gene regulation are more prevalent than previously believed, but also underscores the complex challenges of eukaryotic gene prediction. At present, experimental data and human curation remain essential to generate high-quality genome annotations.

          Related collections

          Most cited references63

          • Record: found
          • Abstract: found
          • Article: not found

          Genome sequence of the nematode C. elegans: a platform for investigating biology.

          (1999)
          The 97-megabase genomic sequence of the nematode Caenorhabditis elegans reveals over 19,000 genes. More than 40 percent of the predicted protein products find significant matches in other organisms. There is a variety of repeated sequences, both local and dispersed. The distinctive distribution of some repeats and highly conserved genes provides evidence for a regional organization of the chromosomes.
            Bookmark
            • Record: found
            • Abstract: found
            • Article: not found

            Comparative genomics of the eukaryotes.

            A comparative analysis of the genomes of Drosophila melanogaster, Caenorhabditis elegans, and Saccharomyces cerevisiae-and the proteins they are predicted to encode-was undertaken in the context of cellular, developmental, and evolutionary processes. The nonredundant protein sets of flies and worms are similar in size and are only twice that of yeast, but different gene families are expanded in each genome, and the multidomain proteins and signaling pathways of the fly and worm are far more complex than those of yeast. The fly has orthologs to 177 of the 289 human disease genes examined and provides the foundation for rapid analysis of some of the basic processes involved in human disease.
              Bookmark
              • Record: found
              • Abstract: found
              • Article: not found

              A computer program for aligning a cDNA sequence with a genomic DNA sequence.

              We address the problem of efficiently aligning a transcribed and spliced DNA sequence with a genomic sequence containing that gene, allowing for introns in the genomic sequence and a relatively small number of sequencing errors. A freely available computer program, described herein, solves the problem for a 100-kb genomic sequence in a few seconds on a workstation.
                Bookmark

                Author and article information

                Journal
                Genome Biol
                Genome Biology
                BioMed Central (London )
                1465-6906
                1465-6914
                2002
                31 December 2002
                : 3
                : 12
                : research0083.1-83.22
                Affiliations
                [1 ]Department of Molecular and Cell Biology, University of California, Life Sciences Addition, Berkeley, CA 94720-3200, USA
                [2 ]FlyBase-Berkeley, University of California, Berkeley, CA 94720-3200, USA
                [3 ]FlyBase-Harvard, Department of Molecular and Cell Biology, Harvard University, Biological Laboratories, 16 Divinity Avenue, Cambridge, MA 02138-2020, USA
                [4 ]Howard Hughes Medical Institute, University of California, Berkeley, CA 94720, USA
                [5 ]FlyBase-Cambridge, Department of Genetics, University of Cambridge, Downing Street, Cambridge CB2 3EH, UK
                [6 ]EMBL Outstation, European Bioinformatics Institute, Wellcome Trust Genome Campus, Hinxton, Cambridge CB10 1SD, UK
                [7 ]Department of Genome Sciences, Lawrence Berkeley National Laboratory, One Cyclotron Road Mailstop 64-121, Berkeley, CA 94720, USA
                Correspondence: Sima Misra. E-mail: sima@fruitfly.org
                Article
                gb-2002-3-12-research0083
                10.1186/gb-2002-3-12-research0083
                151185
                12537572
                e7b73bfb-5f2f-48b5-b2c8-7ca1f6ed88cb
                Copyright © 2002 Misra et al., licensee BioMed Central Ltd
                History
                : 16 October 2002
                : 28 November 2002
                : 28 November 2002
                Categories
                Research

                Genetics
                Genetics

                Comments

                Comment on this article