The Use of Orthologous Sequences to Predict the Impact of Amino Acid Substitutions on Protein Function

There is no author summary for this article yet. Authors can add summaries to their articles on ScienceOpen to make them more accessible to a non-specialist audience.

Abstract

Computational predictions of the functional impact of genetic variation play a critical role in human genetics research. For nonsynonymous coding variants, most prediction algorithms make use of patterns of amino acid substitutions observed among homologous proteins at a given site. In particular, substitutions observed in orthologous proteins from other species are often assumed to be tolerated in the human protein as well. We examined this assumption by evaluating a panel of nonsynonymous mutants of a prototypical human enzyme, methylenetetrahydrofolate reductase (MTHFR), in a yeast cell-based functional assay. As expected, substitutions in human MTHFR at sites that are well-conserved across distant orthologs result in an impaired enzyme, while substitutions present in recently diverged sequences (including a 9-site mutant that “resurrects” the human-macaque ancestor) result in a functional enzyme. We also interrogated 30 sites with varying degrees of conservation by creating substitutions in the human enzyme that are accepted in at least one ortholog of MTHFR. Quite surprisingly, most of these substitutions were deleterious to the human enzyme. The results suggest that selective constraints vary between phylogenetic lineages such that inclusion of distant orthologs to infer selective pressures on the human enzyme may be misleading. We propose that homologous proteins are best used to reconstruct ancestral sequences and infer amino acid conservation among only direct lineal ancestors of a particular protein. We show that such an “ancestral site preservation” measure outperforms other prediction methods, not only in our selected set for MTHFR, but also in an exhaustive set of E. coli LacI mutants.

Author Summary

The rapid pace of technological advances in DNA sequencing methods is leading to the discovery of genetic variants at a remarkable rate. Indeed, it is conceivable that entire individual genomes will be sequenced routinely in the near future. While these platforms greatly increase our ability to catalog variation, they are also creating a downstream need to efficiently process and filter this information to ultimately identify genetic causes underlying human disease. Since empirical evaluation of the biological effects of mutation is not practical at such a scale, computational methods that predict such effects are needed. In this paper, we describe a novel methodology to predict whether mutations that lead to amino acid substitutions in proteins will impact protein function and, therefore, may be more likely to have physiological consequences. Specifically, we use orthologous proteins to reconstruct the likely sequences of ancestral proteins in the human lineage. We found that the longer a position has been preserved from direct ancestors in the lineage leading to the human enzyme, the more likely that mutation at that site will have a deleterious effect. We demonstrated that the method should be generally applicable to all proteins.

Related collections

Most cited references 27

Record: found
Abstract: not found
Article: not found

Distinguishing homologous from analogous proteins.

W. M. Fitch (1970)

0 comments Cited 439 times – based on 0 reviews      Review now

Bookmark

Record: found
Abstract: found
Article: not found

Homocysteine metabolism.

J Selhub (1999)

Homocysteine is a sulfur amino acid whose metabolism stands at the intersection of two pathways: remethylation to methionine, which requires folate and vitamin B12 (or betaine in an alternative reaction); and transsulfuration to cystathionine, which requires pyridoxal-5'-phosphate. The two pathways are coordinated by S-adenosylmethionine, which acts as an allosteric inhibitor of the methylenetetrahydrofolate reductase reaction and as an activator of cystathionine beta-synthase. Hyperhomocysteinemia, a condition that recent epidemiological studies have shown to be associated with increased risk of vascular disease, arises from disrupted homocysteine metabolism. Severe hyperhomocysteinemia is due to rare genetic defects resulting in deficiencies in cystathionine beta synthase, methylenetetrahydrofolate reductase, or in enzymes involved in methyl-B12 synthesis and homocysteine methylation. Mild hyperhomocysteinemia seen in fasting conditions is due to mild impairment in the methylation pathway (i.e. folate or B12 deficiencies or methylenetetrahydrofolate reductase thermolability). Post-methionine-load hyperhomocysteinemia may be due to heterozygous cystathionine beta-synthase defect or B6 deficiency. Early studies with nonphysiological high homocysteine levels showed a variety of deleterious effects on endothelial or smooth muscle cells in culture. More recent studies with human beings and animals with mild hyperhomocysteinemia provided encouraging results in the attempt to understand the mechanism that underlies this relationship between mild elevations of plasma homocysteine and vascular disease. The studies with animal models indicated the possibility that the effect of elevated homocysteine is multifactorial, affecting both the vascular wall structure and the blood coagulation system.

0 comments Cited 275 times – based on 0 reviews      Review now

Bookmark

Record: found
Abstract: found
Article: not found

SNPs, protein structure, and disease.

Z. Wang, J. Moult (2001)

Inherited disease susceptibility in humans is most commonly associated with single nucleotide polymorphisms (SNPs). The mechanisms by which this occurs are still poorly understood. We have analyzed the effect of a set of disease-causing missense mutations arising from SNPs, and a set of newly determined SNPs from the general population. Results of in vitro mutagenesis studies, together with the protein structural context of each mutation, are used to develop a model for assigning a mechanism of action of each mutation at the protein level. Ninety percent of the known disease-causing missense mutations examined fit this model, with the vast majority affecting protein stability, through a variety of energy related factors. In sharp contrast, over 70% of the population set are found to be neutral. The remaining 30% are potentially involved in polygenic disease. Copyright 2001 Wiley-Liss, Inc.

0 comments Cited 212 times – based on 0 reviews      Review now

Bookmark

All references

Author and article information

Contributors

: Role: Editor

Journal

Journal ID (nlm-ta): PLoS Genet

Journal ID (publisher-id): plos

Journal ID (pmc): plosgen

Title: PLoS Genetics

Publisher: Public Library of Science (San Francisco, USA )

ISSN (Print): 1553-7390

ISSN (Electronic): 1553-7404

Publication date Collection: May 2010

Publication date (Print): May 2010

Publication date (Electronic): 27 May 2010

Volume: 6

Issue: 5

Electronic Location Identifier: e1000968

Affiliations

[1 ]California Institute for Quantitative Biosciences, Department of Molecular and Cellular Biology, University of California Berkeley, Berkeley, California, United States of America

[2 ]Evolutionary Systems Biology Group, SRI International, Menlo Park, California, United States of America

University of Michigan, United States of America

Author notes

* E-mail: nmarini@ 123456berkeley.edu (NJM); pdthomas@ 123456usc.edu (PDT)

[¤]

Current address: Division of Bioinformatics, Department of Preventive Medicine, University of Southern California, Los Angeles, California, United States of America

Conceived and designed the experiments: NJM PDT JR. Performed the experiments: NJM. Analyzed the data: NJM PDT. Contributed reagents/materials/analysis tools: PDT. Wrote the paper: NJM PDT JR.

Article

Publisher ID: 09-PLGE-RA-2059R2

DOI: 10.1371/journal.pgen.1000968

PMC ID: 2877731

PubMed ID: 20523748

SO-VID: c9196f4a-163b-409c-8739-50b8a4caf424

Copyright © Marini et al. This is an open-access article distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited.

History

Date received : 24 November 2009

Date accepted : 22 April 2010

Page count

Pages: 11

Comments

Comment on this article

scite_

Cited by 16

See all cited by

Most referenced authors 531

See all reference authors

The Use of Orthologous Sequences to Predict the Impact of Amino Acid Substitutions on Protein Function

Read this article at

Abstract

Author Summary

Related collections

RNA drug delivery

Most cited references 27

Distinguishing homologous from analogous proteins.

Homocysteine metabolism.

SNPs, protein structure, and disease.

Author and article information

Contributors

Journal

Affiliations

Author notes

Article

History

Page count

Categories

Comments

Comment on this article

Similar content 18

Cited by 16

Most referenced authors 531