This lesson is still being designed and assembled (Pre-Alpha version)

Nanopore - Variant Calling

Overview

Teaching: 30 min
Exercises: 30 min
Questions
  • How can I use nanopore data for variant calling?

Objectives
  • Align reads from long-read data.

  • Perform variant calling using nanopore data.

Sequence alignment

Tools for generating alignments

Alignment with dorado and minimap2 on NeSI

Check that we’ve got a reference genome:

ls genome
e-coli-k12-MG1655.fasta

Load the SAMtools module:

module load SAMtools

Load the Dorado module:

module load Dorado

Make a folder for our aligned data:

mkdir bam-aligned

We can use dorado again to align the reads to our reference genome (note, this can also be done as part of the basecalling process, but that means you’re performing alignemnt prior to having done any QC checking).

dorado uses the minimap2 aligner (another software tool from ONT). The syntax for performing alignemnt on basecalled data is (-t 4 specifies four threads: faster):

dorado aligner -t 4 \
  genome/e-coli-k12-MG1655.fasta \
  bam-unaligned/ecoli-pod5-pass-basecalls.bam | \
  samtools sort -o bam-aligned/ecoli-pod5-pass-aligned-sort.bam 

Index the bam file:


samtools index bam-aligned/ecoli-pod5-pass-aligned-sort.bam

Generate basic mapping information using samtools:

samtools coverage bam-aligned/ecoli-pod5-pass-aligned-sort.bam
#rname      startpos     endpos   numreads   covbases    coverage    meandepth   meanbaseq   meanmapq
U00096.3           1    4641652      12139    4641652         100      24.8486        15.4       59.5

Variant calling

NB - code below is taken from the “Variant Calling Workflow” section of the Genomic Data Carpentry course:

https://datacarpentry.org/wrangling-genomics/04-variant_calling/index.html

Load the BCFtools module:

module load BCFtools

Make a folder for the output data:

mkdir bcf-files

Generate a “pileup” file to enable variant calling (might take a little bit of time):

bcftools mpileup --threads 4 \
   -B -Q5 --max-BQ 30 -I \
   -o bcf-files/ecoli-pass.bcf \
   -f genome/e-coli-k12-MG1655.fasta  \
   bam-aligned/ecoli-pod5-pass-aligned-sort.bam

Make a folder to hold vcf (variant call format) files:

mkdir vcf-files

Use bcftools to call variants in the e. coli data:

bcftools call --ploidy 1 -m -v \
   -o vcf-files/ecoli-variants.vcf \
   bcf-files/ecoli-pass.bcf 

Use vcfutils.pl to filter the variants (in this case, everything gets kept):

vcfutils.pl varFilter vcf-files/ecoli-variants.vcf  > vcf-files/ecoli-final-variants.vcf
more vcf-files/ecoli-final-variants.vcf

Key Points

  • Long-read based algorithms (e.g., minimap2) need to be used when aligning long read data.

  • Post-alignment, standard software tools (e.g., samtools, bcftools, etc) can be used to perform variant calling.