There is no author summary for this article yet. Authors can add summaries to their articles on ScienceOpen to make them more accessible to a non-specialist audience.
Abstract
The DiscovEHR collaboration between the Regeneron Genetics Center and Geisinger Health
System couples high-throughput sequencing to an integrated health care system using
longitudinal electronic health records (EHRs). We sequenced the exomes of 50,726 adult
participants in the DiscovEHR study to identify ~4.2 million rare single-nucleotide
variants and insertion/deletion events, of which ~176,000 are predicted to result
in a loss of gene function. Linking these data to EHR-derived clinical phenotypes,
we find clinical associations supporting therapeutic targets, including genes encoding
drug targets for lipid lowering, and identify previously unidentified rare alleles
associated with lipid levels and other blood level traits. About 3.5% of individuals
harbor deleterious variants in 76 clinically actionable genes. The DiscovEHR data
set provides a blueprint for large-scale precision medicine initiatives and genomics-guided
therapeutic discovery.
Lay Summary Acute myeloid leukemia is a highly malignant hematopoietic tumor that affects about 13,000 adults yearly in the United States. The treatment of this disease has changed little in the past two decades, since most of the genetic events that initiate the disease remain undiscovered. Whole genome sequencing is now possible at a reasonable cost and timeframe to utilize this approach for unbiased discovery of tumor-specific somatic mutations that alter the protein-coding genes. Here we show the results obtained by sequencing a typical acute myeloid leukemia genome and its matched normal counterpart, obtained from the patient’s skin. We discovered 10 genes with acquired mutations; two were previously described mutations thought to contribute to tumor progression, and 8 were novel mutations present in virtually all tumor cells at presentation and relapse, whose function is not yet known. Our study establishes whole genome sequencing as an unbiased method for discovering initiating mutations in cancer genomes, and for identifying novel genes that may respond to targeted therapies. We used massively parallel sequencing technology to sequence the genomic DNA of tumor and normal skin cells obtained from a patient with a typical presentation of FAB M1 Acute Myeloid Leukemia (AML) with normal cytogenetics. 32.7-fold ‘haploid’ coverage (98 billion bases) was obtained for the tumor genome, and 13.9-fold coverage (41.8 billion bases) was obtained for the normal sample. Of 2,647,695 well-supported Single Nucleotide Variants (SNVs) found in the tumor genome, 2,588,486 (97.7%) also were detected in the patient’s skin genome, limiting the number of variants that required further study. For the purposes of this initial study, we restricted our downstream analysis to the coding sequences of annotated genes: we found only eight heterozygous, non-synonymous somatic SNVs in the entire genome. All were novel, including mutations in protocadherin/cadherin family members (CDH24 and PCLKC), G-protein coupled receptors (GPR123 and EBI2), a protein phosphatase (PTPRT), a potential guanine nucleotide exchange factor (KNDC1), a peptide/drug transporter (SLC15A1), and a glutamate receptor gene (GRINL1B). We also detected previously described, recurrent somatic insertions in the FLT3 and NPM1 genes. Based on deep readcount data, we determined that all of these mutations (except FLT3) were present in nearly all tumor cells at presentation, and again at relapse 11 months later, suggesting that the patient had a single dominant clone containing all of the mutations. These results demonstrate the power of whole genome sequencing to discover novel cancer-associated mutations.
Rare genetic variants contribute to complex disease risk; however, the abundance of rare variants in human populations remains unknown. We explored this spectrum of variation by sequencing 202 genes encoding drug targets in 14,002 individuals. We find rare variants are abundant (1 every 17 bases) and geographically localized, so that even with large sample sizes, rare variant catalogs will be largely incomplete. We used the observed patterns of variation to estimate population growth parameters, the proportion of variants in a given frequency class that are putatively deleterious, and mutation rates for each gene. We conclude that because of rapid population growth and weak purifying selection, human populations harbor an abundance of rare variants, many of which are deleterious and have relevance to understanding disease risk.
Genes in which germline mutations confer highly or moderately increased risks of cancer are called cancer predisposition genes. More than 100 of these genes have been identified, providing important scientific insights in many areas, particularly the mechanisms of cancer causation. Moreover, clinical utilization of cancer predisposition genes has had a substantial impact on diagnosis, optimized management and prevention of cancer. The recent transformative advances in DNA sequencing hold the promise of many more cancer predisposition gene discoveries, and greater and broader clinical applications. However, there is also considerable potential for incorrect inferences and inappropriate clinical applications. Realizing the promise of cancer predisposition genes for science and medicine will thus require careful navigation.
scite shows how a scientific paper has been cited by providing the context of the citation, a classification describing whether it supports, mentions, or contrasts the cited claim, and a label indicating in which section the citation was made.