▶️ Running the Pipeline

Execute MosaiCatcher on your data using different compute environments.

Note

Verify your conda/mamba environment is properly configured
Check Snakemake is installed: which snakemake
Ensure your preferred profile is available in workflow/snakemake_profiles/

Local execution

Run the pipeline on your local machine or server:

Conda environments (recommended)

snakemake \
    --cores <N> \
    --config data_location=<DATA_FOLDER> \
    --sdm conda

Containerized execution (Apptainer)

snakemake \
    --cores <N> \
    --config \
        data_location=<DATA_FOLDER> \
        reference=hg38 \
    --sdm conda apptainer \
    --apptainer-args "-B /<disk>:/<disk>"

The container image is automatically selected based on the reference config parameter (e.g., hg38-v2.4.0). Images are pulled from GHCR on first run.

Replace: - <N> with the number of cores available - <DATA_FOLDER> with your data location - <disk> with your disk path (e.g., /g or /scratch at EMBL)

HPC execution (SLURM)

Run on HPC clusters using SLURM scheduling:

snakemake \
    --config \
        data_location=<DATA_FOLDER> \
    --profile workflow/snakemake_profiles/mosaicatcher-pipeline/v9/HPC/slurm_generic/ \
    --apptainer-args "-B /<disk>:/<disk>"

Configuration

Before first execution, modify the SLURM profile configuration for your cluster:

workflow/snakemake_profiles/mosaicatcher-pipeline/HPC/slurm_generic/config.yaml

Adjust memory, job time limits, and other cluster-specific parameters.

Out-of-Memory (OOM) handling

MosaiCatcher uses Snakemake's restart feature to automatically handle OOM errors. Memory allocation scales on each retry, with starting values and step sizes tuned per rule type:

Lightweight rules (indexing, QC): start at 1 GB, double on each retry
Medium rules (alignment, sorting): start at 4 GB, scale up to 32 GB
Heavy rules (segmentation, StrandPhaseR): start at 8 GB, scale up to 64 GB

Jobs are retried up to 5 times before failing. No manual intervention is needed in most cases.

EMBL HPC

EMBL users have a dedicated pre-configured profile. See the EMBL HPC guide for details on shared references, container cache, and job grouping.

--profile workflow/snakemake_profiles/mosaicatcher-pipeline/v9/HPC/slurm_EMBL_apptainer/

Next steps

See Quick Start for minimal examples
See Input Data Formats to prepare your data
See Advanced Modes for specialized analyses
See Parameters for configuration options