3
views
0
recommends
+1 Recommend
0 collections
    0
    shares
      • Record: found
      • Abstract: found
      • Article: found
      Is Open Access

      Improved haplotype resolution of highly duplicated MHC genes in a long-read genome assembly using MiSeq amplicons

      research-article

      Read this article at

      Bookmark
          There is no author summary for this article yet. Authors can add summaries to their articles on ScienceOpen to make them more accessible to a non-specialist audience.

          Abstract

          Long-read sequencing offers a great improvement in the assembly of complex genomic regions, such as the major histocompatibility complex (MHC) region, which can contain both tandemly duplicated MHC genes (paralogs) and high repeat content. The MHC genes have expanded in passerine birds, resulting in numerous MHC paralogs, with relatively high sequence similarity, making the assembly of the MHC region challenging even with long-read sequencing. In addition, MHC genes show rather high sequence divergence between alleles, making diploid-aware assemblers incorrectly classify haplotypes from the same locus as sequences originating from different genomic regions. Consequently, the number of MHC paralogs can easily be over- or underestimated in long-read assemblies. We therefore set out to verify the MHC diversity in an original and a haplotype-purged long-read assembly of one great reed warbler Acrocephalus arundinaceus individual (the focal individual) by using Illumina MiSeq amplicon sequencing. Single exons, representing MHC class I (MHC-I) and class IIB (MHC-IIB) alleles, were sequenced in the focal individual and mapped to the annotated MHC alleles in the original long-read genome assembly. Eighty-four percent of the annotated MHC-I alleles in the original long-read genome assembly were detected using 55% of the amplicon alleles and likewise, 78% of the annotated MHC-IIB alleles were detected using 61% of the amplicon alleles, indicating an incomplete annotation of MHC genes. In the haploid genome assembly, each MHC-IIB gene should be represented by one allele. The parental origin of the MHC-IIB amplicon alleles in the focal individual was determined by sequencing MHC-IIB in its parents. Two of five larger scaffolds, containing 6–19 MHC-IIB paralogs, had a maternal and paternal origin, respectively, as well as a high nucleotide similarity, which suggests that these scaffolds had been incorrectly assigned as belonging to different loci in the genome rather than as alternate haplotypes of the same locus. Therefore, the number of MHC-IIB paralogs was overestimated in the haploid genome assembly. Based on our findings we propose amplicon sequencing as a suitable complement to long-read sequencing for independent validation of the number of paralogs in general and for haplotype inference in multigene families in particular.

          Related collections

          Most cited references43

          • Record: found
          • Abstract: found
          • Article: not found

          DADA2: High resolution sample inference from Illumina amplicon data

          We present DADA2, a software package that models and corrects Illumina-sequenced amplicon errors. DADA2 infers sample sequences exactly, without coarse-graining into OTUs, and resolves differences of as little as one nucleotide. In several mock communities DADA2 identified more real variants and output fewer spurious sequences than other methods. We applied DADA2 to vaginal samples from a cohort of pregnant women, revealing a diversity of previously undetected Lactobacillus crispatus variants.
            Bookmark
            • Record: found
            • Abstract: not found
            • Article: not found

            Cutadapt removes adapter sequences from high-throughput sequencing reads

              Bookmark
              • Record: found
              • Abstract: found
              • Article: found
              Is Open Access

              MEGA11: Molecular Evolutionary Genetics Analysis Version 11

              The Molecular Evolutionary Genetics Analysis (MEGA) software has matured to contain a large collection of methods and tools of computational molecular evolution. Here, we describe new additions that make MEGA a more comprehensive tool for building timetrees of species, pathogens, and gene families using rapid relaxed-clock methods. Methods for estimating divergence times and confidence intervals are implemented to use probability densities for calibration constraints for node-dating and sequence sampling dates for tip-dating analyses. They are supported by new options for tagging sequences with spatiotemporal sampling information, an expanded interactive Node Calibrations Editor , and an extended Tree Explorer to display timetrees. Also added is a Bayesian method for estimating neutral evolutionary probabilities of alleles in a species using multispecies sequence alignments and a machine learning method to test for the autocorrelation of evolutionary rates in phylogenies. The computer memory requirements for the maximum likelihood analysis are reduced significantly through reprogramming, and the graphical user interface has been made more responsive and interactive for very big data sets. These enhancements will improve the user experience, quality of results, and the pace of biological discovery. Natively compiled graphical user interface and command-line versions of MEGA11 are available for Microsoft Windows, Linux, and macOS from www.megasoftware.net .
                Bookmark

                Author and article information

                Contributors
                Journal
                PeerJ
                PeerJ
                peerj
                PeerJ
                PeerJ Inc. (San Diego, USA )
                2167-8359
                12 July 2023
                2023
                : 11
                : e15480
                Affiliations
                [1 ]Department of Biology, Molecular Ecology and Evolution Lab, Lund University , Lund, Sweden
                [2 ]Department of Biology and Environmental Science, Faculty of Health and Life Sciences, Linnaeus University , Kalmar, Sweden
                [3 ]Bird Group, Natural History Museum , Tring, Hertfordshire, United Kingdom
                Article
                15480
                10.7717/peerj.15480
                10349553
                413b9008-482f-4b86-ba87-eec0a6d074e6
                ©2023 Mellinger et al.

                This is an open access article distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use, distribution, reproduction and adaptation in any medium and for any purpose provided that it is properly attributed. For attribution, the original author(s), title, publication source (PeerJ) and either DOI or URL of the article must be cited.

                History
                : 13 January 2023
                : 8 May 2023
                Funding
                Funded by: European Research Council
                Award ID: 679799
                Funded by: Swedish Research Council
                Award ID: 2015-05149
                Award ID: 2020-04285
                Funded by: Jörgen Lindström’s Foundation
                Award ID: 137301
                This work was supported by the European Research Council (ERC) under the European Union’s Horizon 2020 Research and Innovation Programme (grant number 679799 to Helena Westerdahl), the Swedish Research Council (grant numbers 2015-05149, 2020-04285 to Helena Westerdahl) and by the Jörgen Lindström’s Foundation (grant number 137301 attributed to Samantha Mellinger). The funders had no role in study design, data collection and analysis, decision to publish, or preparation of the manuscript.
                Categories
                Computational Biology
                Genetics
                Genomics
                Molecular Biology
                Zoology

                haploid genome assembly,amplicon sequencing,major histocompatibility complex,mhc diversity,family,linkage analysis,copy number variation

                Comments

                Comment on this article