5
views
0
recommends
+1 Recommend
0 collections
    0
    shares
      • Record: found
      • Abstract: found
      • Article: found
      Is Open Access

      nQuack: An R package for predicting ploidal level from sequence data using site‐based heterozygosity

      research-article

      Read this article at

      Bookmark
          There is no author summary for this article yet. Authors can add summaries to their articles on ScienceOpen to make them more accessible to a non-specialist audience.

          Abstract

          Premise

          Traditional methods of ploidal‐level estimation are tedious; using DNA sequence data for cytotype estimation is an ideal alternative. Multiple statistical approaches to leverage sequence data for ploidy inference based on site‐based heterozygosity have been developed. However, these approaches may require high‐coverage sequence data, use inappropriate probability distributions, or have additional statistical shortcomings that limit inference abilities. We introduce nQuack, an open‐source R package that addresses the main shortcomings of current methods.

          Methods and Results

          nQuack performs model selection for improved ploidy predictions. Here, we implement expectation maximization algorithms with normal, beta, and beta‐binomial distributions. Using extensive computer simulations that account for variability in sequencing depth, as well as real data sets, we demonstrate the utility and limitations of nQuack.

          Conclusions

          Inferring ploidy based on site‐based heterozygosity alone is difficult. Even though nQuack is more accurate than similar methods, we suggest caution when relying on any site‐based heterozygosity method to infer ploidy.

          Related collections

          Most cited references40

          • Record: found
          • Abstract: found
          • Article: found
          Is Open Access

          Twelve years of SAMtools and BCFtools

          Abstract Background SAMtools and BCFtools are widely used programs for processing and analysing high-throughput sequencing data. They include tools for file format conversion and manipulation, sorting, querying, statistics, variant calling, and effect analysis amongst other methods. Findings The first version appeared online 12 years ago and has been maintained and further developed ever since, with many new features and improvements added over the years. The SAMtools and BCFtools packages represent a unique collection of tools that have been used in numerous other software projects and countless genomic pipelines. Conclusion Both SAMtools and BCFtools are freely available on GitHub under the permissive MIT licence, free for both non-commercial and commercial use. Both packages have been installed >1 million times via Bioconda. The source code and documentation are available from https://www.htslib.org.
            Bookmark
            • Record: found
            • Abstract: found
            • Article: not found

            RepeatModeler2 for automated genomic discovery of transposable element families

            The accelerating pace of genome sequencing throughout the tree of life is driving the need for improved unsupervised annotation of genome components such as transposable elements (TEs). Because the types and sequences of TEs are highly variable across species, automated TE discovery and annotation are challenging and time-consuming tasks. A critical first step is the de novo identification and accurate compilation of sequence models representing all of the unique TE families dispersed in the genome. Here we introduce RepeatModeler2, a pipeline that greatly facilitates this process. This program brings substantial improvements over the original version of RepeatModeler, one of the most widely used tools for TE discovery. In particular, this version incorporates a module for structural discovery of complete long terminal repeat (LTR) retroelements, which are widespread in eukaryotic genomes but recalcitrant to automated identification because of their size and sequence complexity. We benchmarked RepeatModeler2 on three model species with diverse TE landscapes and high-quality, manually curated TE libraries: Drosophila melanogaster (fruit fly), Danio rerio (zebrafish), and Oryza sativa (rice). In these three species, RepeatModeler2 identified approximately 3 times more consensus sequences matching with >95% sequence identity and sequence coverage to the manually curated sequences than the original RepeatModeler. As expected, the greatest improvement is for LTR retroelements. Thus, RepeatModeler2 represents a valuable addition to the genome annotation toolkit that will enhance the identification and study of TEs in eukaryotic genome sequences. RepeatModeler2 is available as source code or a containerized package under an open license ( https://github.com/Dfam-consortium/RepeatModeler , http://www.repeatmasker.org/RepeatModeler/ ).
              Bookmark
              • Record: found
              • Abstract: found
              • Article: found
              Is Open Access

              A Robust, Simple Genotyping-by-Sequencing (GBS) Approach for High Diversity Species

              Advances in next generation technologies have driven the costs of DNA sequencing down to the point that genotyping-by-sequencing (GBS) is now feasible for high diversity, large genome species. Here, we report a procedure for constructing GBS libraries based on reducing genome complexity with restriction enzymes (REs). This approach is simple, quick, extremely specific, highly reproducible, and may reach important regions of the genome that are inaccessible to sequence capture approaches. By using methylation-sensitive REs, repetitive regions of genomes can be avoided and lower copy regions targeted with two to three fold higher efficiency. This tremendously simplifies computationally challenging alignment problems in species with high levels of genetic diversity. The GBS procedure is demonstrated with maize (IBM) and barley (Oregon Wolfe Barley) recombinant inbred populations where roughly 200,000 and 25,000 sequence tags were mapped, respectively. An advantage in species like barley that lack a complete genome sequence is that a reference map need only be developed around the restriction sites, and this can be done in the process of sample genotyping. In such cases, the consensus of the read clusters across the sequence tagged sites becomes the reference. Alternatively, for kinship analyses in the absence of a reference genome, the sequence tags can simply be treated as dominant markers. Future application of GBS to breeding, conservation, and global species and population surveys may allow plant breeders to conduct genomic selection on a novel germplasm or species without first having to develop any prior molecular tools, or conservation biologists to determine population structure without prior knowledge of the genome or diversity in the species.
                Bookmark

                Author and article information

                Contributors
                shellyleegaynor@gmail.com
                Journal
                Appl Plant Sci
                Appl Plant Sci
                10.1002/(ISSN)2168-0450
                APS3
                Applications in Plant Sciences
                John Wiley and Sons Inc. (Hoboken )
                2168-0450
                14 July 2024
                Jul-Aug 2024
                : 12
                : 4 , Special Issue: Twice as Nice: New Techniques and Discoveries in Polyploid Biology ( doiID: 10.1002/aps3.v12.4 )
                : e11606
                Affiliations
                [ 1 ] Florida Museum of Natural History University of Florida Gainesville 32611 Florida USA
                [ 2 ] Department of Biology University of Florida Gainesville 32611 Florida USA
                [ 3 ] School of Integrative Plant Science Cornell University Ithaca 14850 New York USA
                [ 4 ] Department of Ecology and Evolution University of Chicago Chicago 60637 Illinois USA
                [ 5 ] Department of Biology The College of Idaho Caldwell 83605 Idaho USA
                Author notes
                [*] [* ] Correspondence Michelle L. Gaynor, Florida Museum of Natural History, University of Florida, Dickinson Hall, 1659 Museum Rd., Gainesville, Florida 32611, USA.

                Email: shellyleegaynor@ 123456gmail.com

                Author information
                http://orcid.org/0000-0002-3912-6079
                http://orcid.org/0000-0002-5631-5365
                http://orcid.org/0000-0001-8204-6552
                http://orcid.org/0000-0001-5672-0929
                http://orcid.org/0000-0003-1579-9380
                http://orcid.org/0000-0001-8638-4137
                http://orcid.org/0000-0001-8457-7840
                http://orcid.org/0000-0001-9310-8659
                Article
                APS311606
                10.1002/aps3.11606
                11342224
                034b3565-6175-40eb-ac30-8ba22fab5bca
                © 2024 The Author(s). Applications in Plant Sciences published by Wiley Periodicals LLC on behalf of Botanical Society of America.

                This is an open access article under the terms of the http://creativecommons.org/licenses/by-nc/4.0/ License, which permits use, distribution and reproduction in any medium, provided the original work is properly cited and is not used for commercial purposes.

                History
                : 01 May 2024
                : 12 February 2024
                : 28 May 2024
                Page count
                Figures: 4, Tables: 2, Pages: 12, Words: 8326
                Categories
                Software Note
                Software Note
                Custom metadata
                2.0
                July-August 2024
                Converter:WILEY_ML3GV2_TO_JATSPMC version:6.4.7 mode:remove_FC converted:23.08.2024

                copy number variation,expectation maximization,ploidal inference,ploidy,polyploidy

                Comments

                Comment on this article