A remaining hurdle to whole-genome sequencing (WGS) becoming a first-tier genetic
test has been accurate detection of copy-number variations (CNVs). Here, we used several
datasets to empirically develop a detailed workflow for identifying germline CNVs
>1 kb from short-read WGS data using read depth-based algorithms. Our workflow is
comprehensive in that it addresses all stages of the CNV-detection process, including
DNA library preparation, sequencing, quality control, reference mapping, and computational
CNV identification. We used our workflow to detect rare, genic CNVs in individuals
with autism spectrum disorder (ASD), and 120/120 such CNVs tested using orthogonal
methods were successfully confirmed. We also identified 71 putative genic de novo
CNVs in this cohort, which had a confirmation rate of 70%; the remainder were incorrectly
identified as de novo due to false positives in the proband (7%) or parental false
negatives (23%). In individuals with an ASD diagnosis in which both microarray and
WGS experiments were performed, our workflow detected all clinically relevant CNVs
identified by microarrays, as well as additional potentially pathogenic CNVs < 20
kb. Thus, CNVs of clinical relevance can be discovered from WGS with a detection rate
exceeding microarrays, positioning WGS as a single assay for genetic variation detection.