Calling variants in sequencing data

This is an extremely simplified tutorial on how to call variants in NGS data. The only assumption is that you have paired-end sequencing data (file_1.fastq and file_2.fastq) of some species with a reference genome, where I assume yeast (sacCer3) below.

# All commands below are based on Linux (

# Download the genome
wget ''
tar -xzf chromFa.tar.gz
cat *.fa > sacCer3.fa

# Install BWA
# Create a genome index
bwa index -a bwtsw sacCer3.fa

# Map sequences to the genome
bwa aln sacCer3.fa file_1.fastq > aln_1.sai
bwa aln sacCer3.fa file_2.fastq > aln_2.sai
bwa sampe sacCer3.fa aln_1.sai aln_2.sai file_1.fastq file_2.fastq > aln.sam

# Install samtools
# Convert alignments to bam
samtools view -S -b aln.sam > aln.bam

# Sort BAM file (will create aln_sort.bam)
samtools sort aln.bam aln_sort

# Index BAM file
samtools index aln_sort.bam

# Call variants and convert results to VCF
# See:
# See:
samtools mpileup -uf sacCer3.fa aln_sort.bam | bcftools view -bvcg - > var.raw.bcf
bcftools view var.raw.bcf | perl varFilter -D10000 > var.flt.vcf
head -n 50 var.flt.vcf

# Annotate variants
# Please see Annovar:
# or snpEff:
# Here is a short example showing Ensembl gene annotations with Annovar:

# Convert vcf to annovar format
perl -format vcf4 var.flt.vcf > var.annovar

# Download the Ensembl gene annotation for sacCer3
perl -buildver sacCer3 -downdb ensGene sacCer3db
perl -buildver sacCer3 -downdb seq sacCer3db/sacCer3_seq
perl sacCer3db/sacCer3_ensGene.txt -seqdir sacCer3db/sacCer3_seq -format ensGene -outfile sacCer3db/sacCer3_ensGeneMrna.fa

# Annotate the SNVs
perl -geneanno -buildver sacCer3 -dbtype ensGene var.annovar <path_to_annovar>/sacCer3db/

# Merging vcf files
# Please see VCFtools
# and tabix for details.
# Here is a short example of how to merge two vcf files called var1.vcf and var2.vcf:

vcf-sort var1.vcf > var1.sort.vcf
bgzip var1.sort.vcf
tabix -p vcf var1.sort.vcf.gz
vcf-sort var2.vcf > var2.sort.vcf
bgzip var2.sort.vcf
tabix -p vcf var2.sort.vcf.gz
export PATH=${PATH}:<path_to_tabix>
perl -I<path_to_vcftools_perl_directory> vcf-merge var1.sort.vcf.gz var2.sort.vcf.gz > merged.vcf

# Further information:
# SEQanswers:
# Samtools:
# VCFtools:
# Picard:
# Annovar:
# snpEff:

Valid HTML 4.01 Transitional

CSS ist valide!

*EMBL Disclaimer
Author: Tobias Rausch
Last change: 20th November 2012
Main Page of the EMBL-Heidelberg