chorus-epinformerseq-v2

Per-cell EPInformer-seq checkpoints for 11 Roadmap Epigenomics cell types. Drop-in artifacts for the Chorus epinformerseq oracle.

Architecture

PerCellProfileNet — a 1024-bp LegNet-backboned profile model that predicts per-bp DNase + H3K27ac coverage tracks plus a log10-count head per channel. Each cell line gets its own main checkpoint (no FiLM, no cell embedding). At inference, each main net is paired with a per-cell frozen BiasNet to subtract a Tn5/bias prior, following the ChromBPNet recipe.

The retired joint CellCondProfileNet (one network conditioned on 11 cells via FiLM) is preserved separately in the EPInformer research repo and is not used by Chorus.

Layout

per_cell/
  K562/main.pt        # PerCellProfileNet state_dict (~136K params)
  GM12878/main.pt
  HepG2/main.pt
  A549/main.pt
  H1/main.pt
  HeLa/main.pt
  HMEC/main.pt
  HSMM/main.pt
  HUVEC/main.pt
  NHEK/main.pt
  NHLF/main.pt
bias/
  K562/bias.pt        # BiasNet state_dict (~37K params, frozen at inference)
  GM12878/bias.pt
  ... (11 cells)

Each main + bias pair is ~0.8 MB combined.

Training

  • Window: 1024 bp, centered on Roadmap DNase + H3K27ac summits.
  • Data: per-rep ENCODE BAMs (one rep per cell), Roadmap consortium peak sets, fold-10 leave-chromosomes-out split (chr11+chr21 held out).
  • Loss: multinomial NLL on the per-bp profile + MSE on log10 total count per channel.
  • Optimizer: AdamW with OneCycleLR.

Usage

Install Chorus (see the repo) and the weights will fetch automatically on first call:

from chorus.oracles import EPInformerSeqOracle

oracle = EPInformerSeqOracle(cell_type="K562")
oracle.load_pretrained_model()  # downloads from this repo on first run

result = oracle.predict(
    sequence="A" * 1024,
    assay_ids=["Enhancer_H3K27ac_DNase:K562"],
)

Available cells: K562, GM12878, HepG2, A549, H1, HeLa, HMEC, HSMM, HUVEC, NHEK, NHLF.

Available assays: Enhancer_H3K27ac_DNase (composite, default), Enhancer_DNase, Enhancer_H3K27ac.

Background CDFs

For variant scoring and percentile normalization, Chorus pulls per-track CDFs from the companion dataset lucapinello/chorus-backgrounds (epinformerseq_pertrack.npz, ~0.77 MB).

Citation

If you use these weights, please cite the EPInformer paper and the Chorus pipeline (citations in the Chorus README).

Downloads last month

-

Downloads are not tracked for this model. How to track
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support