3
views
0
recommends
+1 Recommend
0 collections
    0
    shares
      • Record: found
      • Abstract: found
      • Article: not found

      The landscape of tolerated genetic variation in humans and primates

      1 , 1 , 1 , 1 , 1 , 2 , 1 , 1 , 1 , 1 , 3 , 1 , 1 , 1 , 1 , 1 , 1 , 1 , 2 , 4 , 4 , 5 , 5 , 6 , 7 , 3 , 8 , 9 , 3 , 10 , 11 , 12 , 3 , 13 , 14 , 13 , 15 , 16 , 17 , 17 , 17 , 7 , 7 , 18 , 18 , 19 , 20 , 21 , 22 , 23 , 24 , 25 , 26 , 3 , 26 , 27 , 28 , 29 , 30 , 28 , 31 , 32 , 33 , 34 , 35 , 36 , 12 , 37 , 28 , 38 , 39 , 39 , 39 , 40 , 41 , 42 , 42 , 42 , 43 , 44 , 45 , 46 , 47 , 48 , 47 , 48 , 49 , 47 , 50 , 51 , 52 , 53 , 54 , 3 , 55 , 56 , 3 , 57 , 58 , 59 , 13 , 60 , 61 , 62 , 61 , 63 , 64 , 63 , 64 , 65 , 66 , 67 , 68 , 69 , 70 , 17 , 71 , 72 , 73 , 74 , 19 , 75 , 13 , 7 , 11 , 12 , 76 , 7 , 77 , 5 , 6 , 2 , 4 , 78 , 2 , 77 , 79 , 1 , 80 , 18 , 3 , 17 , 55 , 57 , 1
      Science
      American Association for the Advancement of Science (AAAS)

      Read this article at

      ScienceOpenPublisherPubMed
      Bookmark
          There is no author summary for this article yet. Authors can add summaries to their articles on ScienceOpen to make them more accessible to a non-specialist audience.

          Abstract

          Personalized genome sequencing has revealed millions of genetic differences between individuals, but our understanding of their clinical relevance remains largely incomplete. To systematically decipher the effects of human genetic variants, we obtained whole-genome sequencing data for 809 individuals from 233 primate species and identified 4.3 million common protein-altering variants with orthologs in humans. We show that these variants can be inferred to have nondeleterious effects in humans based on their presence at high allele frequencies in other primate populations. We use this resource to classify 6% of all possible human protein-altering variants as likely benign and impute the pathogenicity of the remaining 94% of variants with deep learning, achieving state-of-the-art accuracy for diagnosing pathogenic variants in patients with genetic diseases.

          Abstract

          INTRODUCTION

          Millions of people have received genome and exome sequencing to date, a collective effort that has illuminated for the first time the vast catalog of small genetic differences that distinguish us as individuals within our species. However, the effects of most of these genetic variants remain unknown, limiting their clinical utility and actionability. New approaches that can accurately discern disease-causing from benign mutations and interpret genetic variants on a genome-wide scale would constitute a meaningful initial step towards realizing the potential of personalized genomic medicine.

          RATIONALE

          As a result of the short evolutionary distance between humans and nonhuman primates, our proteins share near-perfect amino acid sequence identity. Hence, the effects of a protein-altering mutation found in one species are likely to be concordant in the other species. By systematically cataloging common variants of nonhuman primates, we aimed to annotate these variants as being unlikely to cause human disease as they are tolerated by natural selection in a closely related species. Once collected, the resulting resource may be applied to infer the effects of unobserved variants across the genome using machine learning.

          RESULTS

          Following the strategy outlined above we obtained whole-genome sequencing data for 809 individuals from 233 primate species and cataloged 4.3 million common missense variants. We confirmed that human missense variants seen in at least one nonhuman primate species were annotated as benign in the ClinVar clinical variant database in 99% of cases. By contrast, common variants from mammals and vertebrates outside the primate lineage were substantially less likely to be benign in the ClinVar database (71 to 87% benign), restricting this strategy to nonhuman primates. Overall, we reclassified more than 4 million human missense variants of previously unknown consequence as likely benign, resulting in a greater than 50-fold increase in the number of annotated missense variants compared to existing clinical databases.

          To infer the pathogenicity of the remaining missense variants in the human genome, we constructed PrimateAI-3D, a semisupervised 3D-convolutional neural network that operates on voxelized protein structures. We trained PrimateAI-3D to separate common primate variants from matched control variants in 3D space as a semisupervised learning task. We evaluated the trained PrimateAI-3D model alongside 15 other published machine learning methods on their ability to distinguish between benign and pathogenic variants in six different clinical benchmarks and demonstrated that PrimateAI-3D outperformed all other classifiers in each of the tasks.

          CONCLUSION

          Our study addresses one of the key challenges in the variant interpretation field, namely, the lack of sufficient labeled data to effectively train large machine learning models. By generating the most comprehensive primate sequencing dataset to date and pairing this resource with a deep learning architecture that leverages 3D protein structures, we were able to achieve meaningful improvements in variant effect prediction across multiple clinical benchmarks.

          Related collections

          Most cited references152

          • Record: found
          • Abstract: not found
          • Article: not found

          Controlling the False Discovery Rate: A Practical and Powerful Approach to Multiple Testing

            Bookmark
            • Record: found
            • Abstract: not found
            • Article: not found

            Fitting Linear Mixed-Effects Models Usinglme4

              Bookmark
              • Record: found
              • Abstract: found
              • Article: found
              Is Open Access

              Highly accurate protein structure prediction with AlphaFold

              Proteins are essential to life, and understanding their structure can facilitate a mechanistic understanding of their function. Through an enormous experimental effort 1 – 4 , the structures of around 100,000 unique proteins have been determined 5 , but this represents a small fraction of the billions of known protein sequences 6 , 7 . Structural coverage is bottlenecked by the months to years of painstaking effort required to determine a single protein structure. Accurate computational approaches are needed to address this gap and to enable large-scale structural bioinformatics. Predicting the three-dimensional structure that a protein will adopt based solely on its amino acid sequence—the structure prediction component of the ‘protein folding problem’ 8 —has been an important open research problem for more than 50 years 9 . Despite recent progress 10 – 14 , existing methods fall far short of atomic accuracy, especially when no homologous structure is available. Here we provide the first computational method that can regularly predict protein structures with atomic accuracy even in cases in which no similar structure is known. We validated an entirely redesigned version of our neural network-based model, AlphaFold, in the challenging 14th Critical Assessment of protein Structure Prediction (CASP14) 15 , demonstrating accuracy competitive with experimental structures in a majority of cases and greatly outperforming other methods. Underpinning the latest version of AlphaFold is a novel machine learning approach that incorporates physical and biological knowledge about protein structure, leveraging multi-sequence alignments, into the design of the deep learning algorithm. AlphaFold predicts protein structures with an accuracy competitive with experimental structures in the majority of cases using a novel deep learning architecture.
                Bookmark

                Author and article information

                Contributors
                (View ORCID Profile)
                (View ORCID Profile)
                (View ORCID Profile)
                (View ORCID Profile)
                (View ORCID Profile)
                (View ORCID Profile)
                (View ORCID Profile)
                (View ORCID Profile)
                (View ORCID Profile)
                (View ORCID Profile)
                (View ORCID Profile)
                (View ORCID Profile)
                (View ORCID Profile)
                (View ORCID Profile)
                (View ORCID Profile)
                (View ORCID Profile)
                (View ORCID Profile)
                (View ORCID Profile)
                (View ORCID Profile)
                (View ORCID Profile)
                (View ORCID Profile)
                (View ORCID Profile)
                (View ORCID Profile)
                (View ORCID Profile)
                (View ORCID Profile)
                (View ORCID Profile)
                (View ORCID Profile)
                (View ORCID Profile)
                (View ORCID Profile)
                (View ORCID Profile)
                (View ORCID Profile)
                (View ORCID Profile)
                (View ORCID Profile)
                (View ORCID Profile)
                (View ORCID Profile)
                (View ORCID Profile)
                (View ORCID Profile)
                (View ORCID Profile)
                (View ORCID Profile)
                (View ORCID Profile)
                (View ORCID Profile)
                (View ORCID Profile)
                (View ORCID Profile)
                (View ORCID Profile)
                (View ORCID Profile)
                (View ORCID Profile)
                (View ORCID Profile)
                (View ORCID Profile)
                (View ORCID Profile)
                (View ORCID Profile)
                (View ORCID Profile)
                (View ORCID Profile)
                (View ORCID Profile)
                (View ORCID Profile)
                (View ORCID Profile)
                (View ORCID Profile)
                (View ORCID Profile)
                (View ORCID Profile)
                (View ORCID Profile)
                (View ORCID Profile)
                (View ORCID Profile)
                (View ORCID Profile)
                (View ORCID Profile)
                (View ORCID Profile)
                (View ORCID Profile)
                (View ORCID Profile)
                (View ORCID Profile)
                (View ORCID Profile)
                (View ORCID Profile)
                (View ORCID Profile)
                (View ORCID Profile)
                (View ORCID Profile)
                (View ORCID Profile)
                (View ORCID Profile)
                (View ORCID Profile)
                (View ORCID Profile)
                Journal
                Science
                Science
                American Association for the Advancement of Science (AAAS)
                0036-8075
                1095-9203
                June 02 2023
                June 02 2023
                : 380
                : 6648
                Affiliations
                [1 ]Illumina Artificial Intelligence Laboratory, Illumina Inc., Foster City, CA, 94404, USA.
                [2 ]Program in Medical and Population Genetics, Broad Institute of MIT and Harvard, Boston, MA, 02142, USA.
                [3 ]Institute of Evolutionary Biology (UPF-CSIC), PRBB, Dr. Aiguader 88, 08003 Barcelona, Spain.
                [4 ]Division of Genetics and Genomics, Department of Pediatrics, Boston Children’s Hospital, Harvard Medical School, Boston, MA, 02115, USA.
                [5 ]Department of Biomedical Informatics, Harvard Medical School, Boston, MA, 02115, USA.
                [6 ]Division of Genetics, Brigham and Women’s Hospital, Harvard Medical School, Boston, MA, 02115, USA.
                [7 ]School of Science, Engineering & Environment, University of Salford, Salford M5 4WT, UK.
                [8 ]Department of Evolutionary Anthropology, University of Vienna, Djerassiplatz 1, 1030 Vienna, Austria.
                [9 ]Human Evolution and Archaeological Sciences (HEAS), University of Vienna, 1030 Vienna, Austria.
                [10 ]Département d'anthropologie, Université de Montréal, 3150 Jean-Brillant, Montréal, QC H3T 1N8, Canada.
                [11 ]Academy of Scientific and Innovative Research (AcSIR), Ghaziabad 201002, India.
                [12 ]Laboratory for the Conservation of Endangered Species, CSIR-Centre for Cellular and Molecular Biology, Hyderabad 500007, India.
                [13 ]Bioinformatics Research Centre, Aarhus University, Aarhus 8000, Denmark.
                [14 ]Section for Ecoinformatics & Biodiversity, Department of Biology, Aarhus University, 8000 Aarhus, Denmark.
                [15 ]Research Group on Primate Biology and Conservation, Mamirauá Institute for Sustainable Development, Estrada da Bexiga 2584, Tefé, Amazonas, CEP 69553-225, Brazil.
                [16 ]Evolutionary Biology and Ecology (EBE), Département de Biologie des Organismes, Université libre de Bruxelles (ULB), Av. Franklin D. Roosevelt 50, CP 160/12, B-1050 Brussels, Belgium.
                [17 ]CNAG-CRG, Centre for Genomic Regulation (CRG), Barcelona Institute of Science and Technology (BIST), Baldiri i Reixac 4, 08028 Barcelona, Spain.
                [18 ]Human Genome Sequencing Center and Department of Molecular and Human Genetics, Baylor College of Medicine, Houston, TX 77030, USA.
                [19 ]Department of Ecology and Genetics, Animal Ecology, Uppsala University, SE-75236 Uppsala, Sweden.
                [20 ]Tanzania National Parks, Arusha, Tanzania.
                [21 ]North Carolina Museum of Natural Sciences, Raleigh, NC 27601, USA.
                [22 ]Department of Biological and Biomedical Sciences, North Carolina Central University, Durham, NC 27707, USA.
                [23 ]Department of Biological Sciences, North Carolina State University, Raleigh, NC 27695, USA.
                [24 ]Department of Evolutionary Anthropology, Duke University, Durham, NC 27708, USA.
                [25 ]Renaissance Computing Institute, University of North Carolina at Chapel Hill, Chapel Hill, NC 27599, USA.
                [26 ]Copenhagen Zoo, 2000 Frederiksberg, Denmark.
                [27 ]Universidade Federal de Viçosa, Viçosa, 36570-900, Brazil.
                [28 ]Universidade Federal do Amazonas, Departamento de Genética, Laboratório de Evolução e Genética Animal (LEGAL), Manaus, Amazonas, 69080-900, Brazil.
                [29 ]Department of Anthropology, University of Utah, Salt Lake City, UT 84102, USA.
                [30 ]Universidade Federal do Para, Guamá, Belém - PA, 66075-110, Brazil.
                [31 ]Research Group on Terrestrial Vertebrate Ecology, Mamirauá Institute for Sustainable Development, Tefé, Amazonas, 69553-225, Brazil.
                [32 ]Rede de Pesquisa para Estudos sobre Diversidade, Conservação e Uso da Fauna na Amazônia – RedeFauna, Manaus, Amazonas, 69080-900, Brazil.
                [33 ]Comunidad de Manejo de Fauna Silvestre en la Amazonía y en Latinoamérica – ComFauna, Iquitos, Loreto, 16001, Peru.
                [34 ]Universidade Federal de Rondonia, Porto Velho, Rondônia, 78900-000, Brazil.
                [35 ]PPGREN - Programa de Pós-Graduação “Conservação e Uso dos Recursos Naturais and BIONORTE - Programa de Pós-Graduação em Biodiversidade e Biotecnologia da Rede BIONORTE, Universidade Federal de Rondonia, Porto Velho, Rondônia, 78900-000, Brazil.
                [36 ]Instituto Nacional de Pesquisas da Amazonia, Petrópolis, Manaus - AM, 69067-375, Brazil.
                [37 ]Universidade Federal do Mato Grosso, Boa Esperança, Cuiabá - MT, 78060-900, Brazil.
                [38 ]Department of Biology, Trinity University, San Antonio, TX 78212, USA.
                [39 ]Life Sciences and Environment, Technology and Environment of Mahajanga, University of Mahajanga, Mahajanga, 401, Madagascar.
                [40 ]New York University, New York City, NY 10012, USA.
                [41 ]Washington University in St. Louis, St. Louis, MO 63130, USA.
                [42 ]Keeling Center for Comparative Medicine and Research, MD Anderson Cancer Center, Houston, TX 77030, USA.
                [43 ]Yale University, New Haven, CT 06520, USA.
                [44 ]Universidad Nacional de Formosa, Argentina Fundacion ECO, Formosa, Argentina.
                [45 ]Arizona State University, Tempe, AZ 85281, USA.
                [46 ]Guinea Worm Eradication Program, The Carter Center Ethiopia, PoB 16316, Addis Ababa 1000, Ethiopia.
                [47 ]State Key Laboratory of Genetic Resources and Evolution, Kunming Institute of Zoology, Chinese Academy of Sciences, Kunming, Yunnan 650223, China.
                [48 ]Center for Evolutionary & Organismal Biology, Zhejiang University School of Medicine, Hangzhou 310058, China.
                [49 ]Villum Center for Biodiversity Genomics, Section for Ecology and Evolution, Department of Biology, University of Copenhagen, DK-2100 Copenhagen, Denmark.
                [50 ]Liangzhu Laboratory, Zhejiang University Medical Center, 1369 West Wenyi Road, Hangzhou 311121, China.
                [51 ]Women’s Hospital, School of Medicine, Zhejiang University, 1 Xueshi Road, Shangcheng District, Hangzhou 310006, China.
                [52 ]Tanzania Wildlife Research Institute (TAWIRI), Head Office, P.O. Box 661, Arusha, Tanzania.
                [53 ]Institute of International Animal Health/One Health, Friedrich-Loeffler-Institut, Federal Research Institute for Animal Health, 17493 Greifswald - Insei Riems, Germany.
                [54 ]Department of Environmental Ecology, Faculty of Environmental Sciences, University of Science and Central Institute for Natural Resources and Environmental Studies, Vietnam National University, Hanoi 100000, Vietnam.
                [55 ]Catalan Institution of Research and Advanced Studies (ICREA), Passeig de Lluís Companys, 23, 08010 Barcelona, Spain.
                [56 ]Department of Zoology, State Museum of Natural History Stuttgart, 70191 Stuttgart, Germany.
                [57 ]Institut Català de Paleontologia Miquel Crusafont, Universitat Autònoma de Barcelona, Edifici ICTA-ICP, c/ Columnes s/n, 08193 Cerdanyola del Vallès, Barcelona, Spain.
                [58 ]Centre for Genomic Regulation (CRG), The Barcelona Institute of Science and Technology, Av. Doctor Aiguader, N88, 08003 Barcelona, Spain.
                [59 ]BarcelonaBeta Brain Research Center, Pasqual Maragall Foundation, C. Wellington 30, 08005 Barcelona, Spain.
                [60 ]Cuc Phuong Commune, Nho Quan District, Ninh Binh Province 430000, Vietnam.
                [61 ]Genome Institute of Singapore (GIS), Agency for Science, Technology and Research (A*STAR), 60 Biopolis Street, Genome, Singapore 138672, Republic of Singapore.
                [62 ]Mandai Nature, 80 Mandai Lake Road, Singapore 729826, Republic of Singapore.
                [63 ]SingHealth Duke-NUS Institute of Precision Medicine (PRISM), Singapore 168582, Republic of Singapore.
                [64 ]Cancer and Stem Cell Biology Program, Duke-NUS Medical School, Singapore 168582, Republic of Singapore.
                [65 ]SingHealth Duke-NUS Genomic Medicine Centre, Singapore 168582, Republic of Singapore.
                [66 ]Department of Natural Sciences, National Museums Scotland, Chambers Street, Edinburgh EH1 1JF, UK.
                [67 ]School of Geosciences, University of Edinburgh, Drummond Street, Edinburgh EH8 9XP, UK.
                [68 ]Cognitive Ethology Laboratory, Germany Primate Center, Leibniz Institute for Primate Research, 37077 Göttingen, Germany.
                [69 ]Department of Primate Cognition, Georg-August-Universität Göttingen, 37077 Göttingen, Germany.
                [70 ]Leibniz Science Campus Primate Cognition, 37077 Göttingen, Germany.
                [71 ]Universitat Pompeu Fabra, Pg. Luís Companys 23, 08010 Barcelona, Spain.
                [72 ]Department of Anthropology & Archaeology, University of Calgary, 2500 University Dr NW, Calgary, AB T2N 1N4, Canada.
                [73 ]Department of Medical Genetics, 3330 Hospital Drive NW, HMRB 202, Calgary, AB T2N 4N1, Canada.
                [74 ]Alberta Children’s Hospital Research Institute, University of Calgary, 2500 University Dr NW, Calgary, AB T2N 1N4, Canada.
                [75 ]Institute of Ecology and Evolution, School of Biological Sciences, University of Edinburgh, Edinburgh EH8 9XP, UK.
                [76 ]Gene Bank of Primates and Primate Genetics Laboratory, German Primate Center, Leibniz Institute for Primate Research, Kellnerweg 4, 37077 Göttingen, Germany.
                [77 ]Department of Genetics, Yale School of Medicine, New Haven, CT 06520, USA.
                [78 ]Analytic and Translational Genetics Unit, Department of Medicine, Massachusetts General Hospital and Harvard Medical School, Boston, MA 02115, USA.
                [79 ]Center for Genomic Medicine, Massachusetts General Hospital, Boston, MA 02114, USA.
                [80 ]Toyota Technological Institute at Chicago, Chicago, IL 60637, USA.
                Article
                10.1126/science.abn8197
                37262156
                6305a238-0240-4538-94c0-0d03618ff6ff
                © 2023

                Free to read

                History

                Comments

                Comment on this article