Relational Transformer β PluRel Checkpoints
Relational Transformer (RT) model checkpoints pretrained on synthetic relational databases generated by PluRel.
Relational Transformer is a foundation model architecture for relational data that enables zero-shot transfer across heterogeneous schemas and tasks. It was introduced in:
Relational Transformer: Toward Zero-Shot Foundation Models for Relational Data
Rishabh Ranjan, Valter Hudovernik, Mark Znidar, Charilaos Kanatsoulis, Roshan Upendra, Mahmoud Mohammadi, Joe Meyer, Tom Palczewski, Carlos Guestrin, Jure Leskovec β arXiv:2510.06377 (ICLR 2026)
The checkpoints provided in this repository were trained using the methodology described in:
PluRel: Synthetic Data unlocks Scaling Laws for Relational Foundation Models
Kothapalli, Ranjan, Hudovernik, Dwivedi, Hoffart, Guestrin, Leskovec β arXiv:2602.04029 (2026)
Model Architecture
The Relational Transformer operates on multi-tabular relational databases, treating rows across linked tables as a sequence via BFS-ordered context sampling. It utilizes a Relational Attention mechanism over columns, rows, and primary-foreign key links.
| Hyperparameter | Value |
|---|---|
| Transformer blocks | 12 |
Model dimension (d_model) |
256 |
| Attention heads | 8 |
FFN dimension (d_ff) |
1,024 |
| Context length | 1,024 tokens |
| Text encoder | all-MiniLM-L12-v2 (d_text = 384) |
| Max BFS width | 128 |
The architecture and training loop build on the Relational Transformer codebase.
Download
Single checkpoint (Python) β fetch config.json alongside the weights; it carries
the model architecture and is what the Hub uses to count downloads:
from huggingface_hub import hf_hub_download
config = hf_hub_download("stanford-star/rt-plurel", "config.json")
ckpt = hf_hub_download("stanford-star/rt-plurel", "synthetic-pretrain_rdb_1024_size_4b.pt")
Full repository (CLI):
hf download stanford-star/rt-plurel \
--repo-type model \
--local-dir ~/scratch/rt_hf_ckpts
RelBench leaderboard checkpoints (added 2026-06)
Protocols follow the repo's continued-pretraining script (50k steps, batch 128,
lr 5e-4 cosine, from synthetic-pretrain_rdb_1024_size_4b.pt) and the RT
example_finetune protocol (lr 1e-4, batch 32, 2^15+1 steps), with regression
best-checkpoint selection by val NMAE (MAE / train-split std, ddof=1) β the
leaderboard metric β instead of RΒ². Evaluation = full official test split.
cntd-pretrain_<db>_<task>.ptβ synthetic+real continued pretraining, one leave-one-DB-out run per database (incl. rel-event), per-task best checkpoint. These produce the "PluRel | synthetic + real" zero-shot regression and rel-event cells.finetune_<db>_<task>.ptβ fine-tuned from the matchingcntd-pretraincheckpoint (chosen over synthetic-only by val zero-shot, which the synthetic+real checkpoint won on every task). These produce the "PluRel | pretrained + fine-tuned" leaderboard row.
- Downloads last month
- 1