chorus-epinformerseq-v2
Per-cell EPInformer-seq checkpoints for 11 Roadmap Epigenomics cell types.
Drop-in artifacts for the Chorus
epinformerseq oracle.
Architecture
PerCellProfileNet — a 1024-bp LegNet-backboned profile model that predicts
per-bp DNase + H3K27ac coverage tracks plus a log10-count head per channel.
Each cell line gets its own main checkpoint (no FiLM, no cell embedding).
At inference, each main net is paired with a per-cell frozen BiasNet
to subtract a Tn5/bias prior, following the ChromBPNet recipe.
The retired joint CellCondProfileNet (one network conditioned on 11 cells
via FiLM) is preserved separately in the EPInformer research repo and is not
used by Chorus.
Layout
per_cell/
K562/main.pt # PerCellProfileNet state_dict (~136K params)
GM12878/main.pt
HepG2/main.pt
A549/main.pt
H1/main.pt
HeLa/main.pt
HMEC/main.pt
HSMM/main.pt
HUVEC/main.pt
NHEK/main.pt
NHLF/main.pt
bias/
K562/bias.pt # BiasNet state_dict (~37K params, frozen at inference)
GM12878/bias.pt
... (11 cells)
Each main + bias pair is ~0.8 MB combined.
Training
- Window: 1024 bp, centered on Roadmap DNase + H3K27ac summits.
- Data: per-rep ENCODE BAMs (one rep per cell), Roadmap consortium peak sets, fold-10 leave-chromosomes-out split (chr11+chr21 held out).
- Loss: multinomial NLL on the per-bp profile + MSE on log10 total count per channel.
- Optimizer: AdamW with OneCycleLR.
Usage
Install Chorus (see the repo) and the weights will fetch automatically on first call:
from chorus.oracles import EPInformerSeqOracle
oracle = EPInformerSeqOracle(cell_type="K562")
oracle.load_pretrained_model() # downloads from this repo on first run
result = oracle.predict(
sequence="A" * 1024,
assay_ids=["Enhancer_H3K27ac_DNase:K562"],
)
Available cells: K562, GM12878, HepG2, A549, H1, HeLa, HMEC, HSMM, HUVEC, NHEK, NHLF.
Available assays: Enhancer_H3K27ac_DNase (composite, default),
Enhancer_DNase, Enhancer_H3K27ac.
Background CDFs
For variant scoring and percentile normalization, Chorus pulls per-track
CDFs from the companion dataset
lucapinello/chorus-backgrounds
(epinformerseq_pertrack.npz, ~0.77 MB).
Citation
If you use these weights, please cite the EPInformer paper and the Chorus pipeline (citations in the Chorus README).