101
views
0
recommends
+1 Recommend
0 collections
    0
    shares
      • Record: found
      • Abstract: found
      • Article: found
      Is Open Access

      Opportunities and challenges in long-read sequencing data analysis

      review-article

      Read this article at

      Bookmark
          There is no author summary for this article yet. Authors can add summaries to their articles on ScienceOpen to make them more accessible to a non-specialist audience.

          Abstract

          Long-read technologies are overcoming early limitations in accuracy and throughput, broadening their application domains in genomics. Dedicated analysis tools that take into account the characteristics of long-read data are thus required, but the fast pace of development of such tools can be overwhelming. To assist in the design and analysis of long-read sequencing projects, we review the current landscape of available tools and present an online interactive database, long-read-tools.org, to facilitate their browsing. We further focus on the principles of error correction, base modification detection, and long-read transcriptomics analysis and highlight the challenges that remain.

          Related collections

          Most cited references89

          • Record: found
          • Abstract: found
          • Article: not found

          Fast and accurate long-read assembly with wtdbg2

          Existing long-read assemblers require thousands of CPU hours to assemble a human genome and are being outpaced by sequencing technologies in terms of both throughput and cost. We developed a long-read assembler wtdbg2 (https://github.com/ruanjue/wtdbg2) that is 2–17 times as fast as published tools while achieving comparable contiguity and accuracy. It paves the way for population-scale long-read assembly in future.
            Bookmark
            • Record: found
            • Abstract: found
            • Article: not found

            An integrated semiconductor device enabling non-optical genome sequencing.

            The seminal importance of DNA sequencing to the life sciences, biotechnology and medicine has driven the search for more scalable and lower-cost solutions. Here we describe a DNA sequencing technology in which scalable, low-cost semiconductor manufacturing techniques are used to make an integrated circuit able to directly perform non-optical DNA sequencing of genomes. Sequence data are obtained by directly sensing the ions produced by template-directed DNA polymerase synthesis using all-natural nucleotides on this massively parallel semiconductor-sensing device or ion chip. The ion chip contains ion-sensitive, field-effect transistor-based sensors in perfect register with 1.2 million wells, which provide confinement and allow parallel, simultaneous detection of independent sequencing reactions. Use of the most widely used technology for constructing integrated circuits, the complementary metal-oxide semiconductor (CMOS) process, allows for low-cost, large-scale production and scaling of the device to higher densities and larger array sizes. We show the performance of the system by sequencing three bacterial genomes, its robustness and scalability by producing ion chips with up to 10 times as many sensors and sequencing a human genome.
              Bookmark
              • Record: found
              • Abstract: found
              • Article: not found

              Assessment of transcript reconstruction methods for RNA-seq

              RNA sequencing (RNA-seq) is transforming genome biology, enabling comprehensive transcriptome profiling with unprecendented accuracy and detail. Due to technical limitations of current high-throughput sequencing platforms, transcript identity, structure and expression level must be inferred programmatically from partial sequence reads of fragmented gene products. We evaluated 24 protocol variants of 14 independent computational methods for exon identification, transcript reconstruction and expression level quantification from RNA-seq data. Our results show that most algorithms are able to identify discrete transcript components with high success rates, but that assembly of complete isoform structures poses a major challenge even when all constituent elements are identified. Expression level estimates also varied widely across methods, even when based on similar transcript models. Consequently, the complexity of higher eukaryotic genomes imposes severe limitations in transcript recall and splice product discrimination that are likely to remain limiting factors for the analysis of current-generation RNA-seq data.
                Bookmark

                Author and article information

                Contributors
                amarasinghe.s@wehi.edu.au
                mritchie@wehi.edu.au
                gouil.q@wehi.edu.au
                Journal
                Genome Biol
                Genome Biol
                Genome Biology
                BioMed Central (London )
                1474-7596
                1474-760X
                7 February 2020
                7 February 2020
                2020
                : 21
                : 30
                Affiliations
                [1 ]GRID grid.1042.7, Epigenetics and Development Division, , The Walter and Eliza Hall Institute of Medical Research, ; Parkville, 3052 Australia
                [2 ]GRID grid.1008.9, ISNI 0000 0001 2179 088X, Department of Medical Biology, , The University of Melbourne, ; Parkville, 3010 Australia
                [3 ]GRID grid.1058.c, ISNI 0000 0000 9442 535X, Bioinformatics, Murdoch Children’s Research Institute, ; Parkville, 3052 Australia
                [4 ]GRID grid.1008.9, ISNI 0000 0001 2179 088X, School of Biosciences, Faculty of Science, , The University of Melbourne, ; Parkville, 3010 Australia
                [5 ]GRID grid.1008.9, ISNI 0000 0001 2179 088X, School of Mathematics and StatisticsThe University of Melbourne, ; Parkville, 3010 Australia
                Author information
                http://orcid.org/0000-0002-5142-7886
                Article
                1935
                10.1186/s13059-020-1935-5
                7006217
                32033565
                89bac05e-9bbf-4b01-9ec5-e881b3c42498
                © The Author(s) 2020

                Open Access This article is distributed under the terms of the Creative Commons Attribution 4.0 International License ( http://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution, and reproduction in any medium, provided you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made. The Creative Commons Public Domain Dedication waiver ( http://creativecommons.org/publicdomain/zero/1.0/) applies to the data made available in this article, unless otherwise stated.

                History
                : 28 August 2019
                : 15 January 2020
                Categories
                Review
                Custom metadata
                © The Author(s) 2020

                Genetics
                long-read sequencing,data analysis,pacbio,oxford nanopore
                Genetics
                long-read sequencing, data analysis, pacbio, oxford nanopore

                Comments

                Comment on this article