Sebastian M. Waszak1,2,3,@, Yehudit Hasin1, Thomas Zichner3, Tsviya Olender1, Ifat Keydar1, Miriam Khen1, Adrian M. Stütz3,
Andreas Schlattl3, Doron Lancet1, and Jan O. Korbel3,4
1Department of Molecular Genetics, Crown Human Genome Center, Weizmann Institute of Science, Rehovot 76100, Israel
2Department of Biotechnology and Bioinformatics, Weihenstephan-Triesdorf University of Applied Sciences, Freising 85350, Germany
3Genome Biology Research Unit, European Molecular Biology Laboratory, Heidelberg 69117, Germany
4European Bioinformatics Institute, EMBL-EBI, Hinxton, Cambridge CB10 1SD, UK
@Present address: Institute of Bioengineering, School of Life Sciences, École Polytechnique Fédérale de Lausanne (EPFL), Lausanne 1015, Switzerland
PLoS Computational Biology 2010 Nov 11; 6(11):e1000988. doi:10.1371/journal.pcbi.1000988
Copy-number variations (CNVs) are widespread in the human genome, but comprehensive assignments of integer locus copy-numbers (i.e., copy-number genotypes) that, for example, enable discrimination of homozygous from heterozygous CNVs, have remained challenging. Here we present CopySeq, a novel computational approach with an underlying statistical framework that analyzes the depth-of-coverage of high-throughput DNA sequencing reads, and can integrate paired-end and breakpoint junction analysis based CNV-analysis approaches, to infer locus copy-number genotypes. We benchmarked CopySeq by genotyping 500 chromosome 1 CNV regions in 150 personal genomes sequenced at low-coverage. The assessed copy-number genotypes were highly concordant with our performed qPCR experiments (Pearson correlation coefficient 0.94), and with the published results of two microarray platforms (95-99% concordance). We further demonstrated the utility of CopySeq for analyzing gene regions enriched for segmental duplications by comprehensively inferring copy-number genotypes in the CNV-enriched >800 olfactory receptor (OR) human gene and pseudogene loci. CopySeq revealed that OR loci display an extensive range of locus copy-numbers across individuals, with zero to two copies in some OR loci, and two to nine copies in others. Among genetic variants affecting OR loci we identified deleterious variants including CNVs and SNPs affecting ~15% and ~20% of the human OR gene repertoire, respectively, implying that genetic variants with a possible impact on smell perception are widespread. Finally, we found that for several OR loci the reference genome appears to represent a minor-frequency variant, implying a necessary revision of the OR repertoire for future functional studies. CopySeq can ascertain genomic structural variation in specific gene families as well as at a genome-wide scale, where it may enable the quantitative evaluation of CNVs in genome-wide association studies involving high-throughput sequencing.
Rozowsky et al. (2009) PeakSeq enables systematic scoring of ChIP-seq experiments relative to controls. Nature Biotechnology 27:66.
McCarroll et al. (2008) Integrated detection and population-genetic analysis of SNPs and copy number variation. Nature Genetics 40:1166.
Please visit the CopySeq google group to get news on updates and also feel free to post bugs and questions that might be of general interest for other users. For further information please contact Sebastian Waszak or Jan Korbel.
Last update: 06.04.2011