Instructions to use appliedscientific/vcbench-geneformer-perturbation with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- Transformers
How to use appliedscientific/vcbench-geneformer-perturbation with Transformers:
# Use a pipeline as a high-level helper from transformers import pipeline pipe = pipeline("text-classification", model="appliedscientific/vcbench-geneformer-perturbation")# Load model directly from transformers import AutoTokenizer, AutoModelForSequenceClassification tokenizer = AutoTokenizer.from_pretrained("appliedscientific/vcbench-geneformer-perturbation") model = AutoModelForSequenceClassification.from_pretrained("appliedscientific/vcbench-geneformer-perturbation") - Notebooks
- Google Colab
- Kaggle
vcbench-geneformer-perturbation β Geneformer V2-316M fine-tuned on Norman 2019
Geneformer V2-316M fine-tuned for Norman 2019 K562 perturbation
classification. The fine-tuning uses
BertForSequenceClassification over 247 perturbation classes; PRR is
recovered downstream by passing the predicted class probabilities through
the canonical perturbation-mean predictor for the corresponding class.
This is a VCBench Dim A foundation-model checkpoint β see
AppliedScientific/VCBench
for the evaluation code.
Headline results (Dim A, Norman 2019 K562, GEARS test split)
| Regime | PRR | DES |
|---|---|---|
| FT+D (fine-tuned + decoder) | 0.627 | 0.878 |
Geneformer V2-316M scores VC Level 1 under VCBench v1.0 (exceeds no-change baseline 0.000 on Dim A; fails to exceed mean-prediction baseline 0.579).
Files
model.safetensors # Fine-tuned classifier weights (1.27 GB)
config.json # HF model config (BertForSequenceClassification)
training_args.bin
norman_id_class_dict.pkl # Perturbation ID β class index mapping
norman_labeled_train.dataset/ # Tokenized training split (123 MB, Arrow)
norman_labeled_test.dataset/ # Tokenized held-out test split (14 MB, Arrow)
Loading
from transformers import BertForSequenceClassification
model = BertForSequenceClassification.from_pretrained(
"appliedscientific/vcbench-geneformer-perturbation",
revision="8be3f3681718b706d351b12b19b0a0b4d76420ca", # pin to a specific revision
num_labels=247,
)
For the full Dim A evaluation pipeline (predicted probabilities β predicted
expression β PRR), see
AppliedScientific/VCBench/src/models/run_geneformer_perturbation.py.
Citation
@misc{vcbench_geneformer_norman,
author = {{VCBench contributors}},
title = {Geneformer V2-316M fine-tuned on Norman 2019 (VCBench)},
year = {2026},
publisher = {Hugging Face},
howpublished = {\url{https://huggingface.co/appliedscientific/vcbench-geneformer-perturbation}},
note = {Companion Dim A artefact to VCBench v1.0 (AppliedScientific/VCBench, release tag v1.0.0)}
}
License
MIT (Geneformer V2 base checkpoint also MIT, attribution: Ying Lab).
Access
Publicly available on HuggingFace.
- Downloads last month
- 22
Collection including appliedscientific/vcbench-geneformer-perturbation
Evaluation results
- PRR (Pearson r on perturbation deltas) on Norman 2019 K562 (107 GEARS test perturbations, seed=1 simulation split)self-reported0.627
- DES (top-20 DEG sign agreement) on Norman 2019 K562 (107 GEARS test perturbations, seed=1 simulation split)self-reported0.878