This lesson is still being designed and assembled (Pre-Alpha version)

Genome Assembly: Glossary

Key Points

Introduction
  • Understand the genome assembly workflow.

Using the slurm scheduler
  • The SLURM scheduler faciliates the fair distribution of compute resource amongst users.

Exploring input data and genome characteristics
  • While long-read data are beneficial for producing high-quality genomes, short-read data can help understand the underlying properties of the genome of the focal species.

  • It is important to consider characteristics such as genome size, ploidy, and heterozygosity before assembling a genome, as this may influence program choice, parameter-setting, and overall assembly quality.

Draft genome assembly
  • There are a wide variety of genome assembly programs available, each with specific strengths, weaknesses, and requirements. Building familiarity with a range of assemblers by reading program manuals and associated articles can be helpful.

Assessing assembly quality
  • All genome assemblies are imperfect, but some are more imperfect than others. The quality of an assembly must be assessed using multiple lines of evidence.

Assembly polishing and post-processing
  • Multiple rounds of polishing using different data sets can improve the accuracy of a genome assembly, and is particularly important when assemblies are produced using Nanopore data.

  • We have done a lot of work to get our genome assembly to this stage, but there are a number of downstream processes that can be done that can further improve the assembly and make it more useful for biological research.

Glossary

FIXME