This lesson is still being designed and assembled (Pre-Alpha version)

Nanopore - Variant Calling

Overview

Teaching: 10 min
Exercises: 10 min
Questions
  • How can I use nanopore data for variant calling?

Objectives
  • Align reads from long-read data.

  • Perform variant calling using nanopore data.

Sequence alignment

Tools for generating alignments

Alignment with minimap2 on NeSI

Load the minimap2 module:

module load minimap2

Check that we’ve got a reference genome:

ls genome
e-coli-k12-MG1655.fasta

Load the SAMtools module:

module load SAMtools

Make a folder for our aligned data:

mkdir bam-files

Use minimap2 followed by samtools to generate aligned data in .bam format:

minimap2 -ax map-ont genome/e-coli-k12-MG1655.fasta \
   fastq_fastmodel/pass/*.gz | \
   samtools sort -o bam-files/ecoli-pass-aligned-sort.bam

Index the bam file:

samtools index bam-files/ecoli-pass-aligned-sort.bam

Generate basic mapping information using samtools:

samtools coverage bam-files/ecoli-pass-aligned-sort.bam
#rname      startpos     endpos   numreads   covbases    coverage    meandepth   meanbaseq   meanmapq
U00096.3           1    4641652      14273    4641652         100        28.93        17.3       59.8

Variant calling

NB - code below is taken from the “Variant Calling Workflow” section of the Genomic Data Carpentry course:

https://datacarpentry.org/wrangling-genomics/04-variant_calling/index.html

Load the BCFtools module:

module load BCFtools

Make a folder for the output data:

mkdir bcf-files

Generate a “pileup” file to enable variant calling (might take a little bit of time):

bcftools mpileup -B -Q5 --max-BQ 30 -I \
   -o bcf-files/ecoli-pass.bcf \
   -f genome/e-coli-k12-MG1655.fasta  \
   bam-files/ecoli-pass-aligned-sort.bam

Make a folder to hold vcf (variant call format) files:

mkdir vcf-files

Use bcftools to call variants in the e. coli data:

bcftools call --ploidy 1 -m -v \
   -o vcf-files/ecoli-variants.vcf \
   bcf-files/ecoli-pass.bcf 

Use vcfutils.pl to filter the variants (in this case, everything gets kept):

vcfutils.pl varFilter vcf-files/ecoli-variants.vcf  > vcf-files/ecoli-final-variants.vcf
more vcf-files/ecoli-final-variants.vcf

Key Points

  • Long-read based algorithms (e.g., minimap2) need to be used when aligning long read data.

  • Post-alignment, standard software tools (e.g., samtools, bcftools, etc) can be used to perform variant calling.