88
views
0
recommends
+1 Recommend
0 collections
    0
    shares
      • Record: found
      • Abstract: found
      • Article: found
      Is Open Access

      GENCODE reference annotation for the human and mouse genomes

      research-article
      1 , 2 , 3 , 4 , 5 , 6 , 7 , 1 , 1 , 8 , 9 , 10 , 2 , 1 , 1 , 1 , 11 , 3 , 1 , 12 , 1 , 2 , 1 , 1 , 1 , 1 , 1 , 1 , 1 , 11 , 1 , 12 , 1 , 13 , 14 , 8 , 1 , 8 , 12 , 1 , 1 , 1 , 1 , 1 , 15 , 8 , 1 , 1 , 8 , 16 , 1 , 10 , 8 , 17 , 18 , 11 , 19 , 20 , 6 , 7 , 2 , 3 , 12 , 1
      Nucleic Acids Research
      Oxford University Press

      Read this article at

      Bookmark
          There is no author summary for this article yet. Authors can add summaries to their articles on ScienceOpen to make them more accessible to a non-specialist audience.

          Abstract

          The accurate identification and description of the genes in the human and mouse genomes is a fundamental requirement for high quality analysis of data informing both genome biology and clinical genomics. Over the last 15 years, the GENCODE consortium has been producing reference quality gene annotations to provide this foundational resource. The GENCODE consortium includes both experimental and computational biology groups who work together to improve and extend the GENCODE gene annotation. Specifically, we generate primary data, create bioinformatics tools and provide analysis to support the work of expert manual gene annotators and automated gene annotation pipelines. In addition, manual and computational annotation workflows use any and all publicly available data and analysis, along with the research literature to identify and characterise gene loci to the highest standard. GENCODE gene annotations are accessible via the Ensembl and UCSC Genome Browsers, the Ensembl FTP site, Ensembl Biomart, Ensembl Perl and REST APIs as well as https://www.gencodegenes.org.

          Related collections

          Most cited references21

          • Record: found
          • Abstract: found
          • Article: found
          Is Open Access

          PhyloCSF: a comparative genomics method to distinguish protein coding and non-coding regions

          Motivation: As high-throughput transcriptome sequencing provides evidence for novel transcripts in many species, there is a renewed need for accurate methods to classify small genomic regions as protein coding or non-coding. We present PhyloCSF, a novel comparative genomics method that analyzes a multispecies nucleotide sequence alignment to determine whether it is likely to represent a conserved protein-coding region, based on a formal statistical comparison of phylogenetic codon models. Results: We show that PhyloCSF's classification performance in 12-species Drosophila genome alignments exceeds all other methods we compared in a previous study. We anticipate that this method will be widely applicable as the transcriptomes of many additional species, tissues and subcellular compartments are sequenced, particularly in the context of ENCODE and modENCODE, and as interest grows in long non-coding RNAs, often initially recognized by their lack of protein coding potential rather than conserved RNA secondary structures. Availability and Implementation: The Objective Caml source code and executables for GNU/Linux and Mac OS X are freely available at http://compbio.mit.edu/PhyloCSF Contact: mlin@mit.edu; manoli@mit.edu
            Bookmark
            • Record: found
            • Abstract: found
            • Article: not found

            Assessment of transcript reconstruction methods for RNA-seq

            RNA sequencing (RNA-seq) is transforming genome biology, enabling comprehensive transcriptome profiling with unprecendented accuracy and detail. Due to technical limitations of current high-throughput sequencing platforms, transcript identity, structure and expression level must be inferred programmatically from partial sequence reads of fragmented gene products. We evaluated 24 protocol variants of 14 independent computational methods for exon identification, transcript reconstruction and expression level quantification from RNA-seq data. Our results show that most algorithms are able to identify discrete transcript components with high success rates, but that assembly of complete isoform structures poses a major challenge even when all constituent elements are identified. Expression level estimates also varied widely across methods, even when based on similar transcript models. Consequently, the complexity of higher eukaryotic genomes imposes severe limitations in transcript recall and splice product discrimination that are likely to remain limiting factors for the analysis of current-generation RNA-seq data.
              Bookmark
              • Record: found
              • Abstract: found
              • Article: not found

              A highly conserved program of neuronal microexons is misregulated in autistic brains.

              Alternative splicing (AS) generates vast transcriptomic and proteomic complexity. However, which of the myriad of detected AS events provide important biological functions is not well understood. Here, we define the largest program of functionally coordinated, neural-regulated AS described to date in mammals. Relative to all other types of AS within this program, 3-15 nucleotide "microexons" display the most striking evolutionary conservation and switch-like regulation. These microexons modulate the function of interaction domains of proteins involved in neurogenesis. Most neural microexons are regulated by the neuronal-specific splicing factor nSR100/SRRM4, through its binding to adjacent intronic enhancer motifs. Neural microexons are frequently misregulated in the brains of individuals with autism spectrum disorder, and this misregulation is associated with reduced levels of nSR100. The results thus reveal a highly conserved program of dynamic microexon regulation associated with the remodeling of protein-interaction networks during neurogenesis, the misregulation of which is linked to autism.
                Bookmark

                Author and article information

                Journal
                Nucleic Acids Res
                Nucleic Acids Res
                nar
                Nucleic Acids Research
                Oxford University Press
                0305-1048
                1362-4962
                08 January 2019
                24 October 2018
                24 October 2018
                : 47
                : Database issue , Database issue
                : D766-D773
                Affiliations
                [1 ]European Molecular Biology Laboratory, European Bioinformatics Institute, Wellcome Genome Campus, Hinxton, Cambridge CB10 1SD, UK
                [2 ]UC Santa Cruz Genomics Institute, University of California, Santa Cruz, Santa Cruz, CA 95064, USA
                [3 ]Center for Integrative Genomics, University of Lausanne, 1015 Lausanne, Switzerland
                [4 ]Department of Medical Oncology, Inselspital, University Hospital, University of Bern, Bern, Switzerland
                [5 ]Department of Biomedical Research (DBMR), University of Bern, Bern, Switzerland
                [6 ]MIT Computer Science and Artificial Intelligence Laboratory, 32 Vasser St, Cambridge, MA 02139, USA
                [7 ]Broad Institute of MIT and Harvard, 415 Main Street, Cambridge, MA 02142, USA
                [8 ]Department of Molecular Biophysics and Biochemistry, Yale University, New Haven, CT 06520, USA
                [9 ]Department of Bioscience, Brunel University London, Uxbridge UB8 3PH, UK
                [10 ]Functional Proteomics, Division of Cancer Biology, Institute of Cancer Research, 123 Old Brompton Road, London SW7 3RP, UK
                [11 ]Centre for Genomic Regulation (CRG), The Barcelona Institute for Science and Technology, Dr. Aiguader 88, Barcelona, E-08003 Catalonia, Spain
                [12 ]Bioinformatics Unit, Spanish National Cancer Research Centre (CNIO), Madrid, Spain
                [13 ]Department of Molecular, Cellular & Developmental Biology, Yale University, New Haven, CT 06520, USA
                [14 ]Systems Biology Institute, Yale University, West Haven, CT 06516, USA
                [15 ]Centre of New Technologies, University of Warsaw, Warsaw, Poland
                [16 ]Department of Biomedical Informatics, College of Medicine, The Ohio State University, Columbus, OH 43210, USA
                [17 ]Program in Computational Biology & Bioinformatics, Yale University, Bass 432, 266 Whitney Avenue, New Haven, CT 06520, USA
                [18 ]Department of Computer Science, Yale University, Bass 432, 266 Whitney Avenue, New Haven, CT 06520, USA
                [19 ]Universitat Pompeu Fabra (UPF), Barcelona, E-08003 Catalonia, Spain
                [20 ]Department of Medical and Molecular Genetics, King's College London, Guys Hospital, Great Maze Pond, London SE1 9RT, UK
                Author notes
                To whom correspondence should be addressed. Tel: +44 1223 492581; Fax: +44 1223 494494; Email: flicek@ 123456ebi.ac.uk
                Author information
                http://orcid.org/0000-0002-3197-5367
                http://orcid.org/0000-0002-7669-2934
                http://orcid.org/0000-0002-7445-2419
                http://orcid.org/0000-0003-2887-815X
                http://orcid.org/0000-0002-0935-7271
                http://orcid.org/0000-0003-4894-7773
                http://orcid.org/0000-0002-1672-050X
                http://orcid.org/0000-0002-8386-1580
                http://orcid.org/0000-0002-0380-7171
                http://orcid.org/0000-0001-5350-3056
                http://orcid.org/0000-0002-3897-7955
                Article
                gky955
                10.1093/nar/gky955
                6323946
                30357393
                cb67a12e-bb31-4cd5-81c1-2fbd5502b17c
                © The Author(s) 2018. Published by Oxford University Press on behalf of Nucleic Acids Research.

                This is an Open Access article distributed under the terms of the Creative Commons Attribution License ( http://creativecommons.org/licenses/by/4.0/), which permits unrestricted reuse, distribution, and reproduction in any medium, provided the original work is properly cited.

                History
                : 08 October 2018
                : 20 September 2018
                : 15 August 2018
                Page count
                Pages: 8
                Funding
                Funded by: National Human Genome Research Institute 10.13039/100000051
                Award ID: U41HG007234
                Funded by: Wellcome Trust 10.13039/100004440
                Award ID: WT108749/Z/15/Z
                Award ID: WT200990/Z/16/Z
                Categories
                Database Issue

                Genetics
                Genetics

                Comments

                Comment on this article