papa2¶
Python-first amplicon denoising derived from DADA2.
papa2 is a Python package that implements the DADA2 (Divisive Amplicon Denoising Algorithm 2) workflow entirely in Python, backed by a standalone C/C++ core (libpapa2.so). It produces byte-identical results to R's dada2 package and is designed for researchers who prefer working in Python ecosystems.
What is DADA2?¶
DADA2 is a high-resolution method for correcting and inferring the true sequences in amplicon sequencing data, resolving differences of as little as one nucleotide. Instead of clustering reads into OTUs, it produces Amplicon Sequence Variants (ASVs) — exact biological sequences representing true sample diversity.
papa2 brings this proven algorithm to Python with no R dependency.
Features¶
- Byte-identical results to R's
dada2package — 20/20 parity tests pass - Complete DADA2 port — all 37 R functions have Python equivalents
- Full workflow in Python: filtering, dereplication, error learning, denoising, paired-end merging, chimera removal, sequence tables, taxonomy, and visualization
- LOESS error modeling that faithfully mirrors R's
loessErrfun - Bayesian taxonomy classifier (
assign_taxonomy) matching R'sassignTaxonomy - Interactive Sankey diagrams for read tracking through the pipeline
- Parallel processing across samples via
ProcessPoolExecutor - Drop-in replacement for
dada2R scripts translated to Python
Status¶
The current project is CPU-only. GPU support is planned.
Provenance¶
papa2 was split out of development work in:
- Fork: rec3141/dada2
- Branch: gpu-python
That work branched from upstream DADA2 at commit 72da7700 (2026-02-13).
Upstream: benjjneb/dada2
Project Layout¶
papa2/
├── papa2/ Python package
│ ├── dada.py Core denoising: dada(), learn_errors()
│ ├── filter.py Quality filtering: filter_and_trim()
│ ├── io.py FASTQ I/O: derep_fastq()
│ ├── error.py Error models: loess_errfun(), pacbio_errfun()
│ ├── taxonomy.py Taxonomic classification: assign_taxonomy()
│ ├── paired.py Paired-end merging: merge_pairs()
│ ├── chimera.py Chimera removal: remove_bimera_denovo()
│ ├── utils.py Utilities: sequence tables, plotting, QC
│ └── _cdada.py ctypes bindings to libpapa2.so
├── src/ Standalone C/C++ core
├── tests/ Parity tests against R dada2
└── libpapa2.so Compiled shared library
Quick Smoke Test¶
Development Setup¶
License¶
See LICENSE.