1. Quick Start
This chapter guides you through setting up the environment and executing the automated pipeline.
In the following sections whenever a “parameter” in brackets {} is shown, the intention is to fill in your own filename or value. Each parameter will be explained in the section in detail.
Notice the small “Copy to Clipboard” button on the right hand side of each code chunk, this can be used to copy the code.
1.1 Singularity Setup
The workflow is distributed as a self-contained Singularity container image, which includes all necessary software dependencies and helper scripts.
Prerequisites: Singularity/Apptainer version 3.x or later must be installed on your system. If you are working with a High Performance Computing (HPC) system, this is likely already installed. Try writing singularity --help in your terminal (that’s connected to the HPC system) and see if the command is recognized.
1.2 Download the Image
Download the required workflow image file (naam_workflow.sif) directly through the terminal:
wget https://github.com/EMC-Viroscience/nanopore-amplicon-analysis-manual/releases/latest/download/naam_workflow.sifOr go to the github page and manually download it there, then transfer it to your HPC system.
1.3 Verify Container
You can verify the download by checking the container version or starting an interactive shell.
# Check version
singularity run naam_workflow.sif --version
# Start interactive shell
singularity shell naam_workflow.sifsingularity shell naam_workflow.sif will drop you into a shell running inside the container. The conda environment needed for this workflow is automatically active on start-up of the interactive shell. All the tools of the conda environment will therefore be ready to use.
Please note that you do not have to run conda activate {environment} to activate the environment – everything is inside naam_workflow.sif. If you’re curious about the conda environment we’re using, you can check it out here
1.4 Project Setup
We use a tool called Snakemake to automate the analysis. To simplify the creation of the required configuration files, the container includes a helper script called amplicon_project.py.
Required Configuration Files
Before running the setup script, ensure you have the following two files ready:
virus_config.yaml: Defines parameters for the viruses you are analyzing.sample_map.csv: Links your barcode directories to the specific virus ID defined in the config.
Example virus_config.yaml:
sars-cov-2:
# Paths to reference and primer files
reference_genome: /abs/path/to/reference.fasta
primer: /abs/path/to/primer.fasta
primer_reference: /abs/path/to/primer_reference.fasta
# Required analysis parameters
min_length: 250
coverage: 30
primer_allowed_mismatch: 2
# Optional workflow steps
run_nextclade: true
nextclade_dataset: 'nextstrain/sars-cov-2/wuhan-hu-1/orfs' # Official Nextclade maintained dataset
measles:
reference_genome: /path/to/reference.fasta
primer: /path/to/primer.fasta
primer_reference: /path/to/primer_reference.fasta
min_length: 100
coverage: 30
primer_allowed_mismatch: 2
run_nextclade: true
nextclade_dataset: '/abs/path/to/custom/dataset' # Custom user created dataset
mpox:
reference_genome: /path/to/reference.fasta
primer: /path/to/primer.fasta
primer_reference: /path/to/primer_reference.fasta
min_length: 1000
coverage: 30
primer_allowed_mismatch: 2
run_nextclade: false # Nextclade will not run for this virus
nextclade_dataset: null
# ... add other viruses if needed.Key parameters within each virus entry include:
reference_genome, primer, primer_reference: Absolute paths to the respective FASTA files.min_length: The minimum read length to keep after QC. Must be below the expected amplicon size.coverage: The minimum read depth required for consensus calling. 30x is a common minimum.primer_allowed_mismatch: The number of mismatches allowed while matching and determining the position of the primers.run_nextclade: Set to true to enable the Nextclade analysis for this virus, false otherwise.nextclade_dataset: Path to a Nextclade dataset. This can be an official dataset name (the workflow will download it) or an absolute path to a custom dataset you have locally.
Example sample_map.csv:
barcode_dir,virus_id
barcode01,sars-cov-2
barcode02,sars-cov-2
barcode03,measles
barcode04,measles
barcode05,mpoxbarcode_dir: Name of barcode directory containing raw fastq.gz files.virus_id: Virus name, must match with names in the virus config.
Initializing the Project Directory
Use singularity exec to run the setup script. This will create your project folder and generate the sample.tsv and Snakefile required for the pipeline.
singularity exec \
--bind /mnt/viro0002-data:/mnt/viro0002-data \
--bind $HOME:$HOME \
--bind $PWD:$PWD \
naam_workflow.sif \
python /amplicon_project.py \
-p {project.folder} \
-n {name} \
-d {reads} \
--virus-config {virus_config.yaml} \
--sample-map {sample_map.csv}Please use absolute paths for the reads, virus_config.yaml and sample_map.csv so that they can always be located.
{project.folder}: The new directory where results will be stored.{name}: Name of your study (no spaces).{reads}: Folder containing your barcode subdirectories (e.g., barcode01).{virus_config.yaml}: Path to your YAML config.{sample_map.csv}: Path to your CSV map.
The --bind arguments are needed to explicitly tell Singularity to mount the necessary host directories into the container. The part before the colon is the path on the host machine that you want to make available. The path after the colon is the path inside the container where the host directory should be mounted.
As a default, Singularity often automatically binds your home directory ($HOME) and the current directory ($PWD). We also explicitly bind /mnt/viro0002-data in this example. If your input files (reads, reference, databases) or output project directory reside outside these locations, you MUST add specific --bind /host/path:/container/path options for those locations, otherwise the container won’t be able to find them.
1.5 Executing the Pipeline
Once the project directory is initialized, navigate into it and run the workflow.
- Navigate to the project:
cd {project.folder}This folder should contain your Snakefile and sample.tsv files, which were generated during step 1.4.
- Dry Run (Optional but Recommended): Check for errors without executing commands.
singularity exec \
--bind /mnt/viro0002-data:/mnt/viro0002-data \
--bind $HOME:$HOME \
--bind $PWD:$PWD \
naam_workflow.sif \
snakemake --snakefile Snakefile --cores 1 --dryrun- Run the workflow: Remove
--dryrunand set the number of threads
singularity exec \
--bind /mnt/viro0002-data:/mnt/viro0002-data \
--bind $HOME:$HOME \
--bind $PWD:$PWD \
naam_workflow.sif \
snakemake --snakefile Snakefile --cores {threads}Directory Structure: Upon completion, your project folder will contain a results/ directory with subfolders for QC, consensus, variants, and nextclade (if enabled).