danaSeq
Metagenomic analysis pipelines for Oxford Nanopore and Illumina sequencing data. Four independent Nextflow DSL2 pipelines cover real-time read classification, long-read and short-read assembly, and downstream MAG analysis including binning, annotation, taxonomy, and metabolic profiling. Assembly pipelines produce a FASTA + depth table that feeds directly into mag_analysis.
Architecture
danaSeq/
├── nanopore_live/ Real-time analysis during sequencing
│ │ 9 modules, 14 processes -> DuckDB
│
├── nanopore_assembly/ Long-read assembly + mapping + depth
│ │ 3 modules: preprocess, assembly, mapping
│ │ Flye/metaMDBG/myloasm -> minimap2 -> CoverM
│ │ Output: assembly.fasta + depths.txt + BAMs
│
├── illumina_assembly/ Multi-assembler consensus + mapping + depth
│ │ 7 modules: preprocess, error_correct, normalize,
│ │ merge_reads, assembly, dedupe, mapping
│ │ Output: assembly.fasta + depths.txt + BAMs
│
├── mag_analysis/ Technology-agnostic downstream analysis
│ │ 10 modules: binning, annotation, taxonomy, rrna,
│ │ metabolism, mge, eukaryotic, gene_depths,
│ │ phylogeny, viz
│ ├── viz/ Interactive Svelte dashboard
│ └── bin/ Pipeline scripts
│
└── tests/ Pipeline tests
Pipelines
| Pipeline |
Purpose |
Key tools |
| nanopore_live |
Real-time analysis during sequencing |
Kraken2, Prokka/Bakta, HMMER3, DuckDB |
| nanopore_assembly |
Long-read assembly + mapping + depth |
Flye, metaMDBG, myloasm, minimap2, CoverM |
| illumina_assembly |
Multi-assembler consensus assembly |
Tadpole, Megahit, SPAdes, metaSPAdes, BBMap |
| mag_analysis |
Technology-agnostic downstream analysis |
7-binner consensus, DAS Tool, Binette, 50+ processes |
Getting Started
1. Clone the repository
git clone https://github.com/rec3141/danaSeq.git
cd danaSeq
2. Choose a runtime
| Method |
Best for |
Setup |
| Conda |
Local/laptop development |
cd <pipeline> && ./install.sh |
| Docker |
Reproducible runs, CI |
docker pull ghcr.io/rec3141/danaseq-mag-analysis:latest |
| Apptainer |
HPC clusters |
./run-*.sh --apptainer (auto-pulls SIF) |
3. Download databases
# Interactive menu (shows sizes and descriptions)
./download-databases.sh
# Human reference (~4 GB, required by both assembly pipelines)
./download-databases.sh --human
# MAG analysis databases
./download-databases.sh --genomad --checkv --checkm2 --kaiju
Quality Standards
MAGs are classified per MIMAG (Bowers et al. 2017):
| Quality |
Completeness |
Contamination |
Additional |
| High |
>90% |
<5% |
23S, 16S, 5S rRNA + tRNAs |
| Medium |
>50% |
<10% |
-- |
| Low |
<50% |
<10% |
-- |
Resource Requirements
| Configuration |
CPUs |
RAM |
Storage |
| Minimum (no Kraken) |
16 |
32 GB |
500 GB |
| Recommended |
32 |
128 GB |
1 TB |
| Full analysis |
32 |
256 GB |
2 TB |
License
MIT. See LICENSE.