20
views
0
recommends
+1 Recommend
0 collections
    0
    shares
      • Record: found
      • Abstract: found
      • Article: found
      Is Open Access

      Evolutionary superscaffolding and chromosome anchoring to improve Anopheles genome assemblies

      research-article
      1 , , 2 , 3 , 4 , 5 , 1 , 1 , 1 , 4 , 6 , 7 , 8 , 6 , 9 , 10 , 11 , 11 , 6 , 10 , 6 , 12 , 13 , 14 , 15 , 7 , 16 , 15 , 17 , 18 , 5 , 6 , 12 ,
      BMC Biology
      BioMed Central
      Genome assembly, Gene synteny, Comparative genomics, Mosquito genomes, Orthology, Bioinformatics, Computational evolutionary biology, Chromosomes, Physical mapping

      Read this article at

      Bookmark
          There is no author summary for this article yet. Authors can add summaries to their articles on ScienceOpen to make them more accessible to a non-specialist audience.

          Abstract

          Background

          New sequencing technologies have lowered financial barriers to whole genome sequencing, but resulting assemblies are often fragmented and far from ‘finished’. Updating multi-scaffold drafts to chromosome-level status can be achieved through experimental mapping or re-sequencing efforts. Avoiding the costs associated with such approaches, comparative genomic analysis of gene order conservation (synteny) to predict scaffold neighbours (adjacencies) offers a potentially useful complementary method for improving draft assemblies.

          Results

          We evaluated and employed 3 gene synteny-based methods applied to 21 Anopheles mosquito assemblies to produce consensus sets of scaffold adjacencies. For subsets of the assemblies, we integrated these with additional supporting data to confirm and complement the synteny-based adjacencies: 6 with physical mapping data that anchor scaffolds to chromosome locations, 13 with paired-end RNA sequencing (RNAseq) data, and 3 with new assemblies based on re-scaffolding or long-read data. Our combined analyses produced 20 new superscaffolded assemblies with improved contiguities: 7 for which assignments of non-anchored scaffolds to chromosome arms span more than 75% of the assemblies, and a further 7 with chromosome anchoring including an 88% anchored Anopheles arabiensis assembly and, respectively, 73% and 84% anchored assemblies with comprehensively updated cytogenetic photomaps for Anopheles funestus and Anopheles stephensi.

          Conclusions

          Experimental data from probe mapping, RNAseq, or long-read technologies, where available, all contribute to successful upgrading of draft assemblies. Our evaluations show that gene synteny-based computational methods represent a valuable alternative or complementary approach. Our improved Anopheles reference assemblies highlight the utility of applying comparative genomics approaches to improve community genomic resources.

          Related collections

          Most cited references61

          • Record: found
          • Abstract: found
          • Article: not found

          The genome sequence of the malaria mosquito Anopheles gambiae.

          Anopheles gambiae is the principal vector of malaria, a disease that afflicts more than 500 million people and causes more than 1 million deaths each year. Tenfold shotgun sequence coverage was obtained from the PEST strain of A. gambiae and assembled into scaffolds that span 278 million base pairs. A total of 91% of the genome was organized in 303 scaffolds; the largest scaffold was 23.1 million base pairs. There was substantial genetic variation within this strain, and the apparent existence of two haplotypes of approximately equal frequency ("dual haplotypes") in a substantial fraction of the genome likely reflects the outbred nature of the PEST strain. The sequence produced a conservative inference of more than 400,000 single-nucleotide polymorphisms that showed a markedly bimodal density distribution. Analysis of the genome sequence revealed strong evidence for about 14,000 protein-encoding transcripts. Prominent expansions in specific families of proteins likely involved in cell adhesion and immunity were noted. An expressed sequence tag analysis of genes regulated by blood feeding provided insights into the physiological adaptations of a hematophagous insect.
            Bookmark
            • Record: found
            • Abstract: found
            • Article: found
            Is Open Access

            Integrating Hi-C links with assembly graphs for chromosome-scale assembly

            Long-read sequencing and novel long-range assays have revolutionized de novo genome assembly by automating the reconstruction of reference-quality genomes. In particular, Hi-C sequencing is becoming an economical method for generating chromosome-scale scaffolds. Despite its increasing popularity, there are limited open-source tools available. Errors, particularly inversions and fusions across chromosomes, remain higher than alternate scaffolding technologies. We present a novel open-source Hi-C scaffolder that does not require an a priori estimate of chromosome number and minimizes errors by scaffolding with the assistance of an assembly graph. We demonstrate higher accuracy than the state-of-the-art methods across a variety of Hi-C library preparations and input assembly sizes. The Python and C++ code for our method is openly available at https://github.com/machinegun/SALSA.
              Bookmark
              • Record: found
              • Abstract: found
              • Article: not found

              Mosquito genomics. Extensive introgression in a malaria vector species complex revealed by phylogenomics.

              Introgressive hybridization is now recognized as a widespread phenomenon, but its role in evolution remains contested. Here, we use newly available reference genome assemblies to investigate phylogenetic relationships and introgression in a medically important group of Afrotropical mosquito sibling species. We have identified the correct species branching order to resolve a contentious phylogeny and show that lineages leading to the principal vectors of human malaria were among the first to split. Pervasive autosomal introgression between these malaria vectors means that only a small fraction of the genome, mainly on the X chromosome, has not crossed species boundaries. Our results suggest that traits enhancing vectorial capacity may be gained through interspecific gene flow, including between nonsister species.
                Bookmark

                Author and article information

                Contributors
                robert.waterhouse@unil.ch
                igor@vt.edu
                Journal
                BMC Biol
                BMC Biol
                BMC Biology
                BioMed Central (London )
                1741-7007
                2 January 2020
                2 January 2020
                2020
                : 18
                : 1
                Affiliations
                [1 ]ISNI 0000 0001 2165 4204, GRID grid.9851.5, Department of Ecology and Evolution, , University of Lausanne, and Swiss Institute of Bioinformatics, ; 1015 Lausanne, Switzerland
                [2 ]ISNI 0000 0001 2097 5006, GRID grid.16750.35, Department of Computer Science, , Princeton University, ; Princeton, NJ 08450 USA
                [3 ]ISNI 0000 0001 2171 9311, GRID grid.21107.35, Department of Computer Science, , Johns Hopkins University, ; Baltimore, MD 21218 USA
                [4 ]ISNI 0000 0001 2188 7059, GRID grid.462058.d, ISEM, Univ Montpellier, CNRS, EPHE, IRD, ; Montpellier, France
                [5 ]ISNI 0000 0001 0694 4940, GRID grid.438526.e, The Interdisciplinary PhD Program in Genetics, Bioinformatics, and Computational Biology, , Virginia Polytechnic Institute and State University, ; Blacksburg, VA 24061 USA
                [6 ]ISNI 0000 0001 0694 4940, GRID grid.438526.e, Department of Entomology, , Virginia Polytechnic Institute and State University, ; Blacksburg, VA 24061 USA
                [7 ]ISNI 0000 0001 0790 959X, GRID grid.411377.7, Departments of Biology and Computer Science, , Indiana University, ; Bloomington, IN 47405 USA
                [8 ]ISNI 0000 0001 2163 0069, GRID grid.416738.f, Centers for Disease Control and Prevention, ; Atlanta, GA 30329 USA
                [9 ]ISNI 0000 0001 1781 3962, GRID grid.412266.5, Department of Medical Entomology and Parasitology, Faculty of Medical Sciences, , Tarbiat Modares University, ; Tehran, Iran
                [10 ]ISNI 0000 0001 2297 5165, GRID grid.94365.3d, Genome Informatics Section, Computational and Statistical Genomics Branch, National Human Genome Research Institute, , National Institutes of Health, ; Bethesda, MD 20892 USA
                [11 ]ISNI 0000 0000 9709 7726, GRID grid.225360.0, European Molecular Biology Laboratory, , European Bioinformatics Institute, ; Wellcome Genome Campus, Hinxton, CB10 1SD UK
                [12 ]ISNI 0000 0001 1088 3909, GRID grid.77602.34, Laboratory of Ecology, Genetics and Environmental Protection, , Tomsk State University, ; Tomsk, Russia 634050
                [13 ]ISNI 0000 0001 2150 7757, GRID grid.7849.2, Laboratoire de Biométrie et Biologie Evolutive, , Université Lyon 1, Unité Mixte de Recherche 5558 Centre National de la Recherche Scientifique, ; 69622 Villeurbanne, France
                [14 ]Institut national de recherche en informatique et en automatique, Montbonnot, 38334 Grenoble, Rhône-Alpes France
                [15 ]ISNI 0000 0001 2168 0066, GRID grid.131063.6, Eck Institute for Global Health and Department of Biological Sciences, , University of Notre Dame, ; Galvin Life Sciences Building, Notre Dame, IN 46556 USA
                [16 ]ISNI 0000 0004 1936 9510, GRID grid.253615.6, Department of Mathematics and Computational Biology Institute, , George Washington University, ; Ashburn, VA 20147 USA
                [17 ]ISNI 0000 0004 1936 7494, GRID grid.61971.38, Department of Mathematics, , Simon Fraser University, ; Burnaby, British Columbia V5A 1S6 Canada
                [18 ]ISNI 0000 0001 2315 1184, GRID grid.411461.7, Department of Electrical Engineering and Computer Science, , University of Tennessee, ; Knoxville, TN 37996 USA
                Author information
                http://orcid.org/0000-0003-4199-9052
                http://orcid.org/0000-0003-2458-8323
                http://orcid.org/0000-0002-6689-1163
                http://orcid.org/0000-0003-1702-874X
                http://orcid.org/0000-0002-8693-8678
                http://orcid.org/0000-0002-5657-4762
                http://orcid.org/0000-0001-5893-6184
                http://orcid.org/0000-0002-3029-0964
                http://orcid.org/0000-0002-5731-8808
                http://orcid.org/0000-0002-3834-4621
                http://orcid.org/0000-0001-9955-0683
                http://orcid.org/0000-0002-1472-8962
                http://orcid.org/0000-0001-7765-983X
                http://orcid.org/0000-0001-7318-3678
                http://orcid.org/0000-0003-2983-8934
                http://orcid.org/0000-0002-5790-3548
                http://orcid.org/0000-0002-3681-7536
                http://orcid.org/0000-0003-2154-2549
                http://orcid.org/0000-0002-5140-8095
                http://orcid.org/0000-0003-0646-0721
                http://orcid.org/0000-0001-9837-1878
                http://orcid.org/0000-0002-4804-7436
                http://orcid.org/0000-0003-0752-3747
                Article
                728
                10.1186/s12915-019-0728-3
                6939337
                31898513
                1513b4d1-4f3f-4fb8-b21e-e8c9f546f38a
                © The Author(s). 2019

                Open AccessThis article is distributed under the terms of the Creative Commons Attribution 4.0 International License ( http://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution, and reproduction in any medium, provided you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made. The Creative Commons Public Domain Dedication waiver ( http://creativecommons.org/publicdomain/zero/1.0/) applies to the data made available in this article, unless otherwise stated.

                History
                : 13 November 2019
                : 26 November 2019
                Funding
                Funded by: FundRef http://dx.doi.org/10.13039/100000060, National Institute of Allergy and Infectious Diseases;
                Award ID: AI112734
                Award ID: AI099528
                Award ID: AI135298
                Award Recipient :
                Funded by: FundRef http://dx.doi.org/10.13039/100007917, Agricultural Research Service;
                Award ID: 223822
                Award Recipient :
                Funded by: FundRef http://dx.doi.org/10.13039/100000001, National Science Foundation;
                Award ID: IIS-1462107
                Award ID: CCF-1053753
                Award ID: DBI-1350041
                Award ID: DEB-1249633
                Award Recipient :
                Funded by: FundRef http://dx.doi.org/10.13039/100000002, National Institutes of Health;
                Award ID: U24CA211000
                Award ID: HG006677
                Award Recipient :
                Funded by: FundRef http://dx.doi.org/10.13039/501100001665, Agence Nationale de la Recherche;
                Award ID: ANR-10-BINF-01-01
                Award Recipient :
                Funded by: FundRef http://dx.doi.org/10.13039/100006492, Division of Intramural Research, National Institute of Allergy and Infectious Diseases;
                Award ID: 1ZIAHG200398
                Award Recipient :
                Funded by: FundRef http://dx.doi.org/10.13039/501100000038, Natural Sciences and Engineering Research Council of Canada;
                Award ID: RGPIN-249834
                Award Recipient :
                Funded by: FundRef http://dx.doi.org/10.13039/501100004784, Novartis Stiftung für Medizinisch-Biologische Forschung;
                Award ID: #18B116
                Award Recipient :
                Funded by: FundRef http://dx.doi.org/10.13039/501100001711, Schweizerischer Nationalfonds zur Förderung der Wissenschaftlichen Forschung;
                Award ID: PP00P3_170664
                Award Recipient :
                Categories
                Research Article
                Custom metadata
                © The Author(s) 2020

                Life sciences
                genome assembly,gene synteny,comparative genomics,mosquito genomes,orthology,bioinformatics,computational evolutionary biology,chromosomes,physical mapping

                Comments

                Comment on this article