TERRA-96M

JEPA-based spatial-transcriptomics foundation model (TERRA). Code & docs: https://github.com/Lotfollahi-lab/terra

Training data

Trained on a 96M-cell subset of HST-Corpus-112M; the remaining cells are held out for benchmarking and downstream analyses. See the manuscript for details.

Files

  • model_checkpoint.pt โ€” target-encoder weights (inference)
  • model_config.yaml โ€” model / tokenization config
  • token_dictionary.pkl โ€” gene-token vocabulary
  • ensembl_dictionary.pkl โ€” gene-name to Ensembl-ID mapping (harmonization)
  • gene_count_dictionary.pkl โ€” gene occurrence counts (rare-gene filtering)

Usage

from app.huggingface import download_pretrained
from app.inference import harmonize_tokenize_embed_pipeline

d = download_pretrained("Lotfollahi-lab/TERRA-96M")
adata = harmonize_tokenize_embed_pipeline(
    adata=adata,
    model_folder_path=d,            # gene-reference files auto-resolved from here
    # ... sample_key / batch_key / etc.
)

Citation

<add paper / bioRxiv reference>

Downloads last month

-

Downloads are not tracked for this model. How to track
Inference Providers NEW
This model isn't deployed by any Inference Provider. ๐Ÿ™‹ Ask for provider support