Nanopore - Variant Calling

Overview

Teaching: 10 min
Exercises: 10 min

Questions

How can I use nanopore data for variant calling?

Objectives

Align reads from long-read data.

Perform variant calling using nanopore data.

Sequence alignment

If a “reference” genome exists for the organism you are sequencing, reads can be “aligned” to the reference.
This involves finding the place in the reference genome that each read matches to.
Due to high sequence similarity within members of the same species, most reads should map to the reference.

Tools for generating alignments

There are MANY software packages available for aligning data from next generation sequencing experiments.
For short-read data e.g., Illumina sequencing), the two “original” short read aligners were:
- BWA: http://bio-bwa.sourceforge.net
- Bowtie: http://bowtie-bio.sourceforge.net
Over the years, MANY more “aligners” have been developed.
For long-read data, ONT provides the Minimap2 aligner (but there are also a number of other options).

Alignment with `minimap2` on NeSI

Load the minimap2 module:

module load minimap2

Check that we’ve got a reference genome:

ls genome

e-coli-k12-MG1655.fasta

Load the SAMtools module:

module load SAMtools

Make a folder for our aligned data:

mkdir bam-files

Use minimap2 followed by samtools to generate aligned data in .bam format:

minimap2 -ax map-ont genome/e-coli-k12-MG1655.fasta \
   fastq_fastmodel/pass/*.gz | \
   samtools sort -o bam-files/ecoli-pass-aligned-sort.bam

Index the bam file:

samtools index bam-files/ecoli-pass-aligned-sort.bam

Generate basic mapping information using samtools:

samtools coverage bam-files/ecoli-pass-aligned-sort.bam

#rname      startpos     endpos   numreads   covbases    coverage    meandepth   meanbaseq   meanmapq
U00096.3           1    4641652      14273    4641652         100        28.93        17.3       59.8

Variant calling

NB - code below is taken from the “Variant Calling Workflow” section of the Genomic Data Carpentry course:

https://datacarpentry.org/wrangling-genomics/04-variant_calling/index.html

Load the BCFtools module:

module load BCFtools

Make a folder for the output data:

mkdir bcf-files

Generate a “pileup” file to enable variant calling (might take a little bit of time):

bcftools mpileup -B -Q5 --max-BQ 30 -I \
   -o bcf-files/ecoli-pass.bcf \
   -f genome/e-coli-k12-MG1655.fasta  \
   bam-files/ecoli-pass-aligned-sort.bam

Make a folder to hold vcf (variant call format) files:

mkdir vcf-files

Use bcftools to call variants in the e. coli data:

bcftools call --ploidy 1 -m -v \
   -o vcf-files/ecoli-variants.vcf \
   bcf-files/ecoli-pass.bcf 

Use vcfutils.pl to filter the variants (in this case, everything gets kept):

vcfutils.pl varFilter vcf-files/ecoli-variants.vcf  > vcf-files/ecoli-final-variants.vcf

more vcf-files/ecoli-final-variants.vcf

Key Points

Long-read based algorithms (e.g., minimap2) need to be used when aligning long read data.

Post-alignment, standard software tools (e.g., samtools, bcftools, etc) can be used to perform variant calling.

previous episode

Nanopore Sequencing

next episode

Nanopore - Variant Calling

Overview

Sequence alignment

Tools for generating alignments

Alignment with `minimap2` on NeSI

Variant calling

Key Points

previous episode

next episode

previous episode

Nanopore Sequencing

next episode

Nanopore - Variant Calling

Overview

Sequence alignment

Tools for generating alignments

Alignment with minimap2 on NeSI

Variant calling

Key Points

previous episode

next episode

Alignment with `minimap2` on NeSI