Relational Transformer — PluRel Checkpoints

Relational Transformer (RT) model checkpoints pretrained on synthetic relational databases generated by PluRel.

Relational Transformer is a foundation model architecture for relational data that enables zero-shot transfer across heterogeneous schemas and tasks. It was introduced in:

Relational Transformer: Toward Zero-Shot Foundation Models for Relational Data
Rishabh Ranjan, Valter Hudovernik, Mark Znidar, Charilaos Kanatsoulis, Roshan Upendra, Mahmoud Mohammadi, Joe Meyer, Tom Palczewski, Carlos Guestrin, Jure Leskovec — arXiv:2510.06377 (ICLR 2026)

The checkpoints provided in this repository were trained using the methodology described in:

PluRel: Synthetic Data unlocks Scaling Laws for Relational Foundation Models
Kothapalli, Ranjan, Hudovernik, Dwivedi, Hoffart, Guestrin, Leskovec — arXiv:2602.04029 (2026)

Model Architecture

The Relational Transformer operates on multi-tabular relational databases, treating rows across linked tables as a sequence via BFS-ordered context sampling. It utilizes a Relational Attention mechanism over columns, rows, and primary-foreign key links.

Hyperparameter	Value
Transformer blocks	12
Model dimension (`d_model`)	256
Attention heads	8
FFN dimension (`d_ff`)	1,024
Context length	1,024 tokens
Text encoder	`all-MiniLM-L12-v2` (d_text = 384)
Max BFS width	128

The architecture and training loop build on the Relational Transformer codebase.

Download

Single checkpoint (Python) — fetch config.json alongside the weights; it carries the model architecture and is what the Hub uses to count downloads:

from huggingface_hub import hf_hub_download

config = hf_hub_download("stanford-star/rt-plurel", "config.json")
ckpt = hf_hub_download("stanford-star/rt-plurel", "synthetic-pretrain_rdb_1024_size_4b.pt")

Full repository (CLI):

hf download stanford-star/rt-plurel \
    --repo-type model \
    --local-dir ~/scratch/rt_hf_ckpts

RelBench leaderboard checkpoints (added 2026-06)

Protocols follow the repo's continued-pretraining script (50k steps, batch 128, lr 5e-4 cosine, from synthetic-pretrain_rdb_1024_size_4b.pt) and the RT example_finetune protocol (lr 1e-4, batch 32, 2^15+1 steps), with regression best-checkpoint selection by val NMAE (MAE / train-split std, ddof=1) — the leaderboard metric — instead of R². Evaluation = full official test split.

cntd-pretrain_<db>_<task>.pt — synthetic+real continued pretraining, one leave-one-DB-out run per database (incl. rel-event), per-task best checkpoint. These produce the "PluRel | synthetic + real" zero-shot regression and rel-event cells.
finetune_<db>_<task>.pt — fine-tuned from the matching cntd-pretrain checkpoint (chosen over synthetic-only by val zero-shot, which the synthetic+real checkpoint won on every task). These produce the "PluRel | pretrained + fine-tuned" leaderboard row.

Downloads last month: 1

Dataset used to train stanford-star/rt-plurel

Papers for stanford-star/rt-plurel

PluRel: Synthetic Data unlocks Scaling Laws for Relational Foundation Models

Paper • 2602.04029 • Published Feb 3

Relational Transformer: Toward Zero-Shot Foundation Models for Relational Data

Paper • 2510.06377 • Published Oct 7, 2025