This lesson is still being designed and assembled (Pre-Alpha version)

Nanopore Sequencing: Setup

Compute platform

This workshop is designed to be conducted on the NeSI compute infrastructure. All software and data are already set up for you to use during the workshop.

Software used

If you are attempting to work through this material on a non-NeSI compute system, the following software will need to be installed:

Software Version Manual Description
BCFtools 1.16 link Variant calling
bioawk 20110810 link awk for biological data
FastQC 0.12.1 link Read QA/QC
dorado 6.2.1 link GPU basecaller for ONT POD5/FAST5 data
Nanoplot 1.41.0 link QA/QC for ONT reads
SAMtools 1.16.1 link Read mapping

Data availability

The data utilised in this workshop are publicly available, but are reasonably old (2016), so are not a particularly good demonstration of the quality or volume of data that the current ONT sequencing devices are able to produce.

See the following link for a description of the data (and a rather nostalgic look at the excitement associated with ONT’s “new” version 9 chemistry):

http://lab.loman.net/2016/07/30/nanopore-r9-data-release/

The specific data used for this workshop can be downloaded here (note that this URL is also provided within the post at link above), although to reduce file sizes, only a subset of reads were used:

https://s3.climb.ac.uk/nanopore/E_coli_K12_1D_R9.2_SpotON_2.tgz

Also worth noting - the original data format was one fast5 file per read, whereas the current fast5 format used by ONT is multiple reads per file. The command single_to_multi_fast5 from the ont_fast5_api package:

https://github.com/nanoporetech/ont_fast5_api

was used here to convert the single-read fast5 data to multi-read fast5.

The fast5 files were then coverted to pod5 format via ONT’s pod5 tool:

https://github.com/nanoporetech/pod5-file-format