TERRA-96M
JEPA-based spatial-transcriptomics foundation model (TERRA). Code & docs: https://github.com/Lotfollahi-lab/terra
Training data
Trained on a 96M-cell subset of HST-Corpus-112M; the remaining cells are held out for benchmarking and downstream analyses. See the manuscript for details.
Files
model_checkpoint.ptโ target-encoder weights (inference)model_config.yamlโ model / tokenization configtoken_dictionary.pklโ gene-token vocabularyensembl_dictionary.pklโ gene-name to Ensembl-ID mapping (harmonization)gene_count_dictionary.pklโ gene occurrence counts (rare-gene filtering)
Usage
from app.huggingface import download_pretrained
from app.inference import harmonize_tokenize_embed_pipeline
d = download_pretrained("Lotfollahi-lab/TERRA-96M")
adata = harmonize_tokenize_embed_pipeline(
adata=adata,
model_folder_path=d, # gene-reference files auto-resolved from here
# ... sample_key / batch_key / etc.
)
Citation
<add paper / bioRxiv reference>