Development Roadmap
Technical-related features
- Zenodo automatic download of external files + indexes (1.2.1)
- Multiple samples in the parent folder (1.2.2)
- Automatic testing of BAM SM tag compared to sample folder name (1.2.3)
- On-error/success e-mail (1.3)
- HPC execution (slurm profile for the moment) (1.3)
- Full singularity image with preinstalled conda envs (1.5.1)
- Single BAM folder with side config file (1.6.1)
- (EMBL) GeneCore mode of execution: allow selection and execution directly by specifying genecore run folder (2022-11-02-H372MAFX5 for instance) (1.8.2)
- Version synchronisation between ashleys-qc-pipeline and mosaicatcher-pipeline (1.8.3)
- Report captions update (1.8.5)
- Clustering plot (heatmap) & SV calls plot update (1.8.6)
-
ashleys_pipeline_onlyparameter: using mosaicatcher-pipeline, trigger ashleys-qc-pipeline only and will stop after the generation of the counts, ashleys predictions & plots to allow the user manual reviewing/selection of the cells to be processed (2.2.0) - Target alternative execution ending:
breakpointr_onlyparameter to stop the execution after breakpointR ;whatshap_onlyparameter to stop the execution after whatshap (2.3.3) - Snakemake v9 + Pixi migration: unified package management and reproducible environments (2.4.0)
- Assembly-specific containers on GHCR: one image per reference genome (hg38, hg19, T2T, mm10, mm39) embedding the matching BSgenome R package (2.4.0)
- Centralized version management with automated bumping (
pixi run bump-patch|bump-minor|bump-major|bump-beta) and changelog generation (2.4.0) - HPC storage optimization: configurable
reference_base_dirfor multi-user reference genome sharing (2.4.0) - New EMBL HPC Apptainer profile:
workflow/snakemake_profiles/mosaicatcher-pipeline/v9/HPC/slurm_EMBL_apptainer/(2.4.0) - Plotting options (enable/disable segmentation back colors)
Bioinformatic-related features
- Self-handling of low-coverage cells (1.6.1)
- Upstream ashleys-qc-pipeline and FASTQ handle (1.6.1)
- Change of reference genome (currently only GRCh38) (1.7.0)
- Ploidy detection at the segment and the chromosome level: used to bypass StrandPhaseR if more than half of a chromosome is haploid (1.7.0)
- inpub_bam_legacy mode (bam/selected folders) (1.8.4)
- Blacklist regions files for T2T & hg19 (1.8.5)
- ArbiGent integration: Strand-Seq based genotyper to study SV containly at least 500bp of uniquely mappable sequence (1.9.0)
- scNOVA integration: Strand-Seq Single-Cell Nucleosome Occupancy and genetic Variation Analysis (1.9.2)
-
multistep_normalisationandmultistep_normalisation_for_SV_callingparameters to replace GC analysis module (library size normalisation, GC correction, Variance Stabilising Transformation) (2.1.1) - Strand-Seq processing based on mm10 assembly (2.1.2)
- UCSC ready to use file generation including counts & SV calls (2.1.2)
-
blacklist_regionsparameter: (2.2.0) - IGV ready to use XML session generation: (2.2.2)
- BreakpointR integration through
breakpointrparameter (2.3.3) - ashleys-qc-pipeline fully integrated (no longer a git submodule): preprocessing lives directly in mosaicatcher-pipeline (2.4.0)
- mm39 full support: normalization files, blacklist regions, and BSgenome package for mouse GRCm39 assembly (2.4.0)
- CanFam (canfam3/canfam4) framework-ready: reference infrastructure in place, normalization files pending (2.4.0)
- Pre-built iGenomes index download (
download_prebuilt_indexesparameter): skip local BWA index building by downloading from AWS iGenomes (2.4.0) - Ploidy estimation module (
ploidyparameter): optional detection of haploid chromosomes/segments to guide StrandPhaseR (2.4.0) -
keep_ashleys_predictionsparameter: control retention of ashleys ML prediction files (2.4.0) - Pooled samples
Small issues to fix
- replace
input_bam_locationbydata_location(harmonization with ashleys-qc-pipeline) - List of commands available through list_commands parameter (1.8.6)
- Move pysam / SM tag comparison script to snakemake rule (2.2.0)
- Reference properly reference genome in IGV session script generation (2.3.5)