Parameters
MosaiCatcher arguments
How to pass configuration arguments?
All these arguments can be specified in two ways:
- In the config/config.yaml file, by replacing existing values
- Using the
--config
snakemake argument (--config
must be called only one time with all the arguments behind it, e.g:--config input_bam_location=<INPUT> output_location=<OUTPUT> email=<EMAIL>
)
General parameters
Parameter | Comment | Default | Example |
---|---|---|---|
email |
Email address for completion summary | None | None |
Data location & Input/output options
Parameter | Comment | Parameter type | Default |
---|---|---|---|
data_location |
Path to parent folder containing samples | String | .tests/data_CHR17/ |
samples_to_process |
If multiple plates in the data_location parent folder, specify one or a comma-sep list of samples | None | "[SampleA,SampleB]" |
publishdir |
Path to backup location where important data is copied | String |
Ashleys-QC upstream pipeline
Parameter | Comment | Parameter type | Default |
---|---|---|---|
input_bam_legacy |
Mutualy exclusive with ashleys_pipeline. Will use selected folder to identify high-quality libraries to process |
Boolean | False |
ashleys_pipeline |
Allow to load and use ashleys-qc-pipeline snakemake preprocessing module and to start from FASTQ inputs | Boolean | False |
ashleys_threshold |
Threshold for Ashleys-qc binary classification | Float | 0.5 |
bypass_ashleys |
Set all cells as high-quality (labels to 1) | False | |
MultiQC |
Enable or disable MultiQC analysis (includes FastQC, samtools flagstats & idxstats) | Boolean | False |
hand_selection |
Enable or disable hand selection through the Jupyter Notebook | Boolean | False |
split_qc_plot |
Enable or disable the split of QC plot into individual pages plots | Boolean | False |
paired_end |
Enable or disable the use of paired-end data | Boolean | False |
Reference data & Chromosomes
Parameter | Comment | Default (options) |
---|---|---|
reference |
Reference genome | hg38 (hg19, T2T, mm10) |
chromosomes |
List of chromosomes to be processed | Human: chr[1..22,X,Y], Mouse: chr[1..20,X,Y] |
chromosomes_to_exclude |
List of chromosomes to exclude | [] |
Counts configuration
Parameter | Comment | Default |
---|---|---|
window |
Window size used for binning by mosaic count (Can be of high importance regarding library coverage) | 100000 |
blacklist_regions |
Enable/Disable blacklisting | True |
Optional modules
Parameter | Comment | Default |
---|---|---|
multistep_normalisation |
Allow to perform multistep normalisation including GC correction for visualization (Marco Cosenza). | False |
breakpointR |
Enable breakpointR module to compute breakpoints on Strand-Seq data (David Porubsky). | False |
Targeted execution
Parameter | Comment | Default |
---|---|---|
ashleys_pipeline_only |
Stop the execution after ashleys-qc-pipeline submodule. Requires ashleys_pipeline to be True |
Boolean |
breakpointR_only |
Stop the execution after breakpointR submodule. Requires breakpointR to be True |
False |
whatshap_only |
Stop the execution after whatshap submodule (haplotagging of BAM files). | False |
SV calling parameters
Parameter | Comment | Default |
---|---|---|
multistep_normalisation_for_SV_calling |
Allow to use multistep normalisation count file during SV calling (Marco Cosenza). | False |
hgsvc_based_normalized_counts |
Use HGSVC based normalisation . | True |
SV calling algorithm processing options
Parameter | Comment | Default |
---|---|---|
min_diff_jointseg |
Minimum difference in error term to include another breakpoint in the joint segmentation (default=0.5) | 0.1 |
min_diff_singleseg |
Minimum difference in error term to include another breakpoint in the single-cell segmentation (default=1) | 0.5 |
additional_sce_cutoff |
Minimum gain in mismatch distance needed to add an additional SCE | 20000000 |
sce_min_distance |
Minimum distance of an SCE to a break in the joint segmentation | 500000 |
llr |
Likelihood ratio used to detect SV calls | 4 |
Downstream analysis
Parameter | Comment | Default |
---|---|---|
arbigent |
Enable ArbiGent mode of execution to genotype SV based on arbitrary segments | False |
arbigent_bed_file |
Allow to specify custom ArbiGent BED file | "" |
scNOVA |
Enable scNOVA mode of execution to compute Nucleosome Occupancy (NO) of detected SV | False |
scTRIP_multiplot |
Enable scTRIP multiplot generation for all chromosomes of all cells | False |
EMBL specific options
Parameter | Comment | Default |
---|---|---|
genecore |
Enable/disable genecore mode to give as input the genecore shared folder in /g/korbel/shared/genecore | False |
genecore_date_folder |
Specify folder to be processed | |
genecore_prefix |
Specify genecore prefix folder | /g/korbel/STOCKS/Data/Assay/sequencing/2023 |
genecore_regex_elements |
Specify genecore regex element to be used to distinguish sample from well number | PE20 |
If genecore
and genecore_date_folder
are correctly specified, each plate will be processed independently by creating a specific folder in the data_location
folder.
Execution profile
Location: workflow/snakemake_profiles/
Parameter | Comment | Conda | Singularity | HPC | Local |
---|---|---|---|---|---|
local/conda | / | X | X | ||
local/conda_singularity | / | X | X | X | |
HPC/slurm_generic (to modify) | / | X | X | ||
HPC/slurm_EMBL (optimised for EMBL HPC) | / | X | X |
Snakemake arguments
Here are presented some essential snakemake options that could help you.
--cores, -c
Use at most N CPU cores/jobs in parallel. If N is omitted or ‘all’, the limit is set to the number of available CPU cores. In case of cluster/cloud execution, this argument sets the number of total cores used over all jobs (made available to rules via workflow.cores).
--printshellcmds, -p
Recommended to print out the shell commands that will be executed.
--use-conda
If defined in the rule, run job in a conda environment. If this flag is not set, the conda directive is ignored and use the current environment (and path system) to execute the command.
--conda-frontend [mamba|conda]
Choose the conda frontend for installing environments. Mamba is much faster and highly recommended but could not be installed by default on your system. Default: “conda”
--use-singularity
If defined in the rule, run job within a singularity container. If this flag is not set, the singularity directive is ignored.
--singularity-args "-B /mounting_point:/mounting_point"
Pass additional args to singularity. -B
stands for binding point between the host and the container.
--dryrun, -n
Do not execute anything, and display what would be done. If you have a very large workflow, use –dry-run –quiet to just print a summary of the DAG of jobs.
--rerun-incomplete, --ri
Re-run all jobs the output of which is recognized as incomplete.
--keep-going, -k
Go on with independent jobs if a job fails.
-T, --retries, --restart-times
Number of times to restart failing jobs (defaults to 0).
--forceall, -F
Force the execution of the selected (or the first) rule and all rules it is dependent on regardless of already created output.
ℹ️ Note
Currently, the binding command needs to correspond to the mounting point of your system (i.e: "/tmp:/tmp").
On seneca for example (EMBL), use "/g:/g"
if you are working on /g/korbel[2]
or "/scratch:/scratch"
if you plan to work on scratch
.
Obviously, all other snakemake CLI options can also be used.