vcbench-geneformer-perturbation β€” Geneformer V2-316M fine-tuned on Norman 2019

Geneformer V2-316M fine-tuned for Norman 2019 K562 perturbation classification. The fine-tuning uses BertForSequenceClassification over 247 perturbation classes; PRR is recovered downstream by passing the predicted class probabilities through the canonical perturbation-mean predictor for the corresponding class.

This is a VCBench Dim A foundation-model checkpoint β€” see AppliedScientific/VCBench for the evaluation code.

Headline results (Dim A, Norman 2019 K562, GEARS test split)

Regime PRR DES
FT+D (fine-tuned + decoder) 0.627 0.878

Geneformer V2-316M scores VC Level 1 under VCBench v1.0 (exceeds no-change baseline 0.000 on Dim A; fails to exceed mean-prediction baseline 0.579).

Files

model.safetensors                        # Fine-tuned classifier weights (1.27 GB)
config.json                              # HF model config (BertForSequenceClassification)
training_args.bin
norman_id_class_dict.pkl                 # Perturbation ID β†’ class index mapping
norman_labeled_train.dataset/            # Tokenized training split (123 MB, Arrow)
norman_labeled_test.dataset/             # Tokenized held-out test split (14 MB, Arrow)

Loading

from transformers import BertForSequenceClassification

model = BertForSequenceClassification.from_pretrained(
    "appliedscientific/vcbench-geneformer-perturbation",
    revision="8be3f3681718b706d351b12b19b0a0b4d76420ca",  # pin to a specific revision
    num_labels=247,
)

For the full Dim A evaluation pipeline (predicted probabilities β†’ predicted expression β†’ PRR), see AppliedScientific/VCBench/src/models/run_geneformer_perturbation.py.

Citation

@misc{vcbench_geneformer_norman,
  author       = {{VCBench contributors}},
  title        = {Geneformer V2-316M fine-tuned on Norman 2019 (VCBench)},
  year         = {2026},
  publisher    = {Hugging Face},
  howpublished = {\url{https://huggingface.co/appliedscientific/vcbench-geneformer-perturbation}},
  note         = {Companion Dim A artefact to VCBench v1.0 (AppliedScientific/VCBench, release tag v1.0.0)}
}

License

MIT (Geneformer V2 base checkpoint also MIT, attribution: Ying Lab).

Access

Publicly available on HuggingFace.

Downloads last month
22
Safetensors
Model size
0.3B params
Tensor type
F32
Β·
Inference Providers NEW
This model isn't deployed by any Inference Provider. πŸ™‹ Ask for provider support

Collection including appliedscientific/vcbench-geneformer-perturbation

Evaluation results

  • PRR (Pearson r on perturbation deltas) on Norman 2019 K562 (107 GEARS test perturbations, seed=1 simulation split)
    self-reported
    0.627
  • DES (top-20 DEG sign agreement) on Norman 2019 K562 (107 GEARS test perturbations, seed=1 simulation split)
    self-reported
    0.878