4
views
0
recommends
+1 Recommend
0 collections
    0
    shares
      • Record: found
      • Abstract: found
      • Article: found
      Is Open Access

      webGQT: A Shiny Server for Genotype Query Tools for Model-Based Variant Filtering

      research-article

      Read this article at

      Bookmark
          There is no author summary for this article yet. Authors can add summaries to their articles on ScienceOpen to make them more accessible to a non-specialist audience.

          Abstract

          Summary

          Genotype Query Tools (GQT) were developed to discover disease-causing variations from billions of genotypes and millions of genomes, processes data at substantially higher speed over other existing methods. While GQT has been available to a wide audience as command-line software, the difficulty of constructing queries among non-IT or non-bioinformatics researchers has limited its applicability. To overcome this limitation, we developed webGQT, an easy-to-use tool with a graphical user interface. With pre-built queries across three modules, webGQT allows for pedigree analysis, case-control studies, and population frequency studies. As a package, webGQT allows researchers with less or no applied bioinformatics/IT experience to mine potential disease-causing variants from billions.

          Results

          webGQT offers a flexible and easy-to-use interface for model-based candidate variant filtering for Mendelian diseases from thousands to millions of genomes at a reduced computation time. Additionally, webGQT provides adjustable parameters to reduce false positives and rescue missing genotypes across all modules. Using a case study, we demonstrate the applicability of webGQT to query non-human genomes. In addition, we demonstrate the scalability of webGQT on large data sets by implementing complex population-specific queries on the 1000 Genomes Project Phase 3 data set, which includes 8.4 billion variants from 2504 individuals across 26 different populations. Furthermore, webGQT supports filtering single-nucleotide variants, short insertions/deletions, copy number or any other variant genotypes supported by the VCF specification. Our results show that webGQT can be used as an online web service, or deployed on personal computers or local servers within research groups.

          Availability

          webGQT is made available to the users in three forms: 1) as a webserver available at https://vm1138.kaj.pouta.csc.fi/webgqt/, 2) as an R package to install on personal computers, and 3) as part of the same R package to configure on the user's own servers. The application is available for installation at https://github.com/arumds/webgqt.

          Related collections

          Most cited references7

          • Record: found
          • Abstract: found
          • Article: not found

          vcfr : a package to manipulate and visualize variant call format data in R

          Software to call single-nucleotide polymorphisms or related genetic variants has converged on the variant call format (VCF) as the output format of choice. This has created a need for tools to work with VCF files. While an increasing number of software exists to read VCF data, many only extract the genotypes without including the data associated with each genotype that describes its quality. We created the r package vcfr to address this issue. We developed a VCF file exploration tool implemented in the r language because r provides an interactive experience and an environment that is commonly used for genetic data analysis. Functions to read and write VCF files into r as well as functions to extract portions of the data and to plot summary statistics of the data are implemented. vcfr further provides the ability to visualize how various parameterizations of the data affect the results. Additional tools are included to integrate sequence (fasta) and annotation data (GFF) for visualization of genomic regions such as chromosomes. Conversion functions translate data from the vcfr data structure to formats used by other r genetics packages. Computationally intensive functions are implemented in C++ to improve performance. Use of these tools is intended to facilitate VCF data exploration, including intuitive methods for data quality control and easy export to other r packages for further analysis. vcfr thus provides essential, novel tools currently not available in r.
            Bookmark
            • Record: found
            • Abstract: found
            • Article: not found

            Efficient genotype compression and analysis of large genetic variation datasets

            Genotype Query Tools (GQT) is a new indexing strategy that expedites analyses of genome variation datasets in VCF format based on sample genotypes, phenotypes and relationships. GQT’s compressed genotype index minimizes decompression for analysis, and performance relative to existing methods improves with cohort size. We show substantial (up to 443 fold) performance gains over existing methods and demonstrate GQT’s utility for exploring massive datasets involving thousands to millions of genomes.
              Bookmark
              • Record: found
              • Abstract: found
              • Article: found
              Is Open Access

              The UK’s 100,000 Genomes Project: manifesting policymakers’ expectations

              The UK’s 100,000 Genomes Project has the aim of sequencing 100,000 genomes from UK National Health Service (NHS) patients while concomitantly transforming clinical care such that whole genome sequencing becomes routine clinical practice in the UK. Policymakers claim that the project will revolutionize NHS care. We wished to explore the 100,000 Genomes Project, and in particular, the extent to which policymaker claims have helped or hindered the work of those associated with Genomics England – the company established by the Department of Health to deliver the project. We interviewed 20 individuals linked to, or working for Genomics England. Interviewees had double-edged views about the context within which they were working. On the one hand, policymakers’ expectations attached to the venture were considered vacuous “genohype”; on the other hand, they were considered the impetus needed for those trying to advance genomic research into clinical practice. Findings should be considered for future genomes projects.
                Bookmark

                Author and article information

                Contributors
                Journal
                Front Genet
                Front Genet
                Front. Genet.
                Frontiers in Genetics
                Frontiers Media S.A.
                1664-8021
                03 March 2020
                2020
                : 11
                : 152
                Affiliations
                [1] 1Department of Veterinary Biosciences, Department of Medical and Clinical Genetics, University of Helsinki , Helsinki, Finland
                [2] 2Genetics Research Program, The Folkhälsan Research Center , Helsinki, Finland
                [3] 3Department of Computer Science, University of Colorado , Boulder, CO, United States
                [4] 4The BioFrontiers Institute, University of Colorado , Boulder, CO, United States
                Author notes

                Edited by: Harinder Singh, J. Craig Venter Institute, United States

                Reviewed by: Christopher M. Watson, University of Leeds, United Kingdom; Francesco Musacchia, Telethon Institute of Genetics and Medicine, Italy; Nitish Kumar Mishra, University of Nebraska Medical Center, United States

                *Correspondence: Meharji Arumilli, meharji.arumilli@ 123456helsinki.fi

                This article was submitted to Bioinformatics and Computational Biology, a section of the journal Frontiers in Genetics

                Article
                10.3389/fgene.2020.00152
                7063093
                32194629
                e7beb151-c17f-4b8a-befb-f95fb9d8a497
                Copyright © 2020 Arumilli, Layer, Hytönen and Lohi

                This is an open-access article distributed under the terms of the Creative Commons Attribution License (CC BY). The use, distribution or reproduction in other forums is permitted, provided the original author(s) and the copyright owner(s) are credited and that the original publication in this journal is cited, in accordance with accepted academic practice. No use, distribution or reproduction is permitted which does not comply with these terms.

                History
                : 06 September 2019
                : 10 February 2020
                Page count
                Figures: 2, Tables: 1, Equations: 0, References: 14, Pages: 9, Words: 6046
                Funding
                Funded by: National Institutes of Health 10.13039/100000002
                Award ID: RML R00HG009532
                Funded by: Jane ja Aatos Erkon Säätiö 10.13039/501100004012
                Funded by: Academy of Finland 10.13039/501100002341
                Categories
                Genetics
                Technology and Code

                Genetics
                variant,filtering,r package,shiny server,gqt,webgqt,bigdata
                Genetics
                variant, filtering, r package, shiny server, gqt, webgqt, bigdata

                Comments

                Comment on this article