02 - Setup¶

Running on NeSI vs on your computer¶

During this workshop we will be running the material on the NeSI platform, using the Jupyter interface, however it is also possible to run this material locally on your own machine.

One of the differences between running on NeSI or your own machine is that on NeSI we preinstall popular software and make it available to our users, whereas on your own machine you need to install the software yourself (e.g. using a package manager such as conda).

We have provided a guide for setting up your own machine using conda here) (note we will not be able to provide assistance if you decide to take this approach during the workshop).

Connect to Jupyter on NeSI

Connect to https://jupyter.nesi.org.nz
Enter NeSI username, HPC password and 6 digit second factor token (as set on MyNeSI)
Choose server options as below
make sure to choose the correct project code nesi02659, number of CPUs 4, memory 4GB prior to pressing button.

Start a terminal session from the JupyterLab launcher

Create a working directory¶

When you connect to NeSI JupyterLab you always start in a new hidden directory. To make sure you can find your work next time, you should change to another location. Here we will switch to our project directory, since home directories can run out of space quickly. If you are using your own project use that instead of "nesi02659".

code

cd ~/obss_2023/intro_snakemake

You can also navigate to the above directory in the JupyterLab file browser, which can be useful for editing files and viewing images and html documents.

Load the Snakemake module¶

We use "environment modules" on NeSI to manage installed software. This allows you to pick and choose which software is available in your environment. More details about environment modules can be found on the NeSI support page.

The JupyterLab terminal comes with some modules preloaded and it can often be nicer to start with a clean environment:

module purge

We can search for available Snakemake modules using the module spider command:

module spider snakemake

which shows we have many versions of snakemake installed. Now load a specific version of snakemake into your environment:

module load snakemake/7.32.3-gimkl-2022a-Python-3.11.3

Test that the snakemake command is now available by running the following command:

snakemake --version

It should print out the version of snakemake, i.e. "7.6.2".

You can also run module list to see the list of modules that are currently loaded.

Get the data¶

We'll use the data from the DNA variant calling workshop yesterday

code

cd ~/obss_2023/intro_snakemake
mkdir data
curl -L -o sub.tar.gz https://ndownloader.figshare.com/files/14418248 
tar -xzf sub.tar.gz
mv sub/ data/trimmed_fastq_small
cp -r ~/obss_2023/genomic_dna/data/ref_genome data/

Initialise Git (optional but recommended)

As we're going to be incrementally developing our scripts, we can also take the opportunity to place them under version control from the start. Remember to ignore the data directory

code

git init
echo "data/" >> .gitignore

Once you create your scripts and get them to work, remember to add and commit so that you have snapshots to fall back to if needed

Back to homepage