Acknowledge license to accept the repository
This repository is publicly accessible, but you have to accept the conditions to access its files and content.
DeepSpotM is released for non-commercial academic research only, under CC-BY-NC-SA-4.0. Requests with vague or insufficient descriptions of intended use will be declined.
Log in or Sign Up to review the conditions and access this model content.
DeepSpot-M
DeepSpot-M: a multimodal foundation model for transcriptome-wide virtual spatial transcriptomics from histology.
DeepSpot-M is a multimodal foundation model that maps a histology image tile to spatial gene expression. It tokenises a 224x224 H&E tile with a LoRA-adapted pathology foundation backbone (Midnight) and lets each gene query attend to the patch tokens through a cross-attention gene decoder. A gene router hypernetwork generates gene-specific output projections from frozen biological embeddings drawn from DNA, RNA, protein, single-cell and text foundation models (Evo 2, Orthrus, ProtT5, scGPT, Apertus). Because genes are represented as queryable embeddings rather than fixed outputs, one model predicts transcriptome-wide expression and genes it never saw during training.
Code is available on GitHub.
Fig. DeepSpot-M predicts transcriptome-wide spatial gene expression from histology. A 224x224 H&E tile is tokenised into spatial patch embeddings by a LoRA-adapted pathology foundation model. A cross-attention gene decoder lets each gene query independently attend to patch tokens via multi-head attention, and a gene router hypernetwork generates gene-specific output projections from frozen biological embeddings drawn from DNA, RNA, protein, single-cell and text foundation models. This design enables zero-shot prediction of genes at inference time.
⚠️ Research use only. Not for clinical or diagnostic use.
Model description
DeepSpot-M adapts the Midnight pathology backbone with LoRA and feeds its patch
tokens to a cross-attention gene decoder conditioned on biological gene embeddings.
It takes 224x224 H&E tiles as input and outputs expression over the ~19k-gene panel
in tokens.csv. Five embedding sources are available, namely evo2, orthrus,
prott5, scgpt and apertus, selected at inference with source=.
Usage
from deepspotm import DeepSpotM # pip install git+https://github.com/ratschlab/DeepSpotM.git
model, image_processor = DeepSpotM.from_pretrained(
"ratschlab/DeepSpotM",
source="scgpt", # one of evo2, orthrus, prott5, scgpt, apertus
)
import torch
tile = image_processor(my_pil_tile).unsqueeze(0) # 224x224 H&E tile
with torch.no_grad():
expression, _, _ = model(tile) # (1, 19338)
# Output column i corresponds to model.gene_names[i].
preds = dict(zip(model.gene_names, expression.squeeze(0).tolist()))
print(preds["EPCAM"])
The predicted vector is ordered by model.gene_names, the genes in tokens.csv, so
model.gene_names[i] is the symbol for output column i.
Predict only specific genes (faster)
You don't have to predict all ~19k genes. Pass a gene or a list and only those are computed, because the cross-attention runs over just the requested gene queries.
vals = model.predict_genes(tile, ["EPCAM", "CD3D", "PTPRC"]) # (1, 3)
vals = model.predict_genes(tile, "EPCAM") # (1, 1)
Output columns follow the requested order. Unknown symbols raise KeyError.
The vision backbone is built offline from a bundled config and its weights are baked
into model.safetensors, so loading needs no network access to the upstream backbone
repo.
Tutorial
examples/predict_tcga_skcm.ipynb
runs DeepSpot-M end to end on a whole-slide TCGA-SKCM H&E image. It tiles the slide,
predicts BRAF, CD37 and COL1A1, and overlays the predictions on the tissue.
Resources
- Code, github.com/ratschlab/DeepSpotM
- TCGA virtual spatial transcriptomics atlas of 28,664 slides across 32 cancers, ratschlab/TCGA_virtual_spatial_transcriptomics_atlas
- HEST-1K virtual single-cell Xenium profiles for 59 samples, ratschlab/HEST_Xenium_virtual_spatial_transcriptomics
Limitations and biases
- Trained on a finite set of cancer indications. Performance on unseen tissue types, stains, scanners or resolutions may degrade.
- Predicts relative expression rather than absolute counts. Under-sequenced genes are predicted less reliably.
- Trained on oncology cohorts, so it is not representative of healthy tissue or non-oncology contexts. Not for clinical or diagnostic use.
License
- Weights, CC-BY-NC-SA-4.0. Non-commercial, ShareAlike, with attribution.
- Code, github.com/ratschlab/DeepSpotM, under PolyForm Noncommercial 1.0.0.
See WEIGHTS_LICENSE.md and THIRD_PARTY_LICENSES.md.
Citation
Paper: DeepSpot-M: a multimodal foundation model for transcriptome-wide virtual spatial transcriptomics from histology (medRxiv, 2026).
@article{nonchev2026deepspotm,
title = {DeepSpot-M: a multimodal foundation model for transcriptome-wide virtual spatial transcriptomics from histology},
author = {Nonchev, Kalin and Dawo, Sebastian and Silina, Karina and Koelzer, Viktor H. and Raetsch, Gunnar},
journal = {medRxiv},
year = {2026},
doi = {10.64898/2026.06.19.26356060},
url = {https://www.medrxiv.org/content/10.64898/2026.06.19.26356060v1}
}
See also CITATION.cff.
- Downloads last month
- 7
