Tabular-to-Text Clinical Transformer (Schema-Shift Reliability Study)
Custom GPT-style Transformer trained from scratch on serialized clinical records, released as part of the study "Specificity Collapse and Calibration Drift under External Schema Shift."
- Paper / preprint (Zenodo DOI): https://doi.org/10.5281/zenodo.20611423
- Source code & full experiments: see the project repository
- License: MIT
⚠️ Not for clinical use. These weights are released for research reproducibility only. They were trained on a small public dataset (n=303), show measurable calibration drift and specificity collapse under distribution shift, and must not be used for diagnosis or any real medical decision.
What this is
This repo contains 5 cross-validation fold checkpoints of a tabular-to-text Transformer that consumes a Chinese serialized template of clinical records and predicts heart-disease presence.
| File | Description |
|---|---|
saved_models/model_fold{0..4}.pt |
PyTorch state_dict for each of the 5 CV folds |
modeling_gpt.py |
Model architecture + GPTConfig (copied from train.py) |
tokenizer.pkl |
Pickled tiktoken.Encoding (custom BPE, Chinese clinical text) |
tokenizer_en.pkl |
Pickled tiktoken.Encoding (custom BPE, English clinical text) |
The checkpoints are plain state_dicts (no embedded config), so you must build
the model with the architecture in modeling_gpt.py to load them.
Model architecture
GPT-style decoder with grouped-query attention and per-layer value embeddings.
Default GPTConfig:
| Param | Value |
|---|---|
vocab_size |
32768 |
n_layer |
12 |
n_head |
6 |
n_kv_head |
6 |
n_embd |
768 |
How to load
import torch
from modeling_gpt import GPT, GPTConfig # from this repo
config = GPTConfig() # match training-time config
model = GPT(config)
sd = torch.load("saved_models/model_fold0.pt", map_location="cpu", weights_only=True)
model.load_state_dict(sd)
model.eval()
Load the tokenizer (requires tiktoken):
import pickle
enc = pickle.load(open("tokenizer.pkl", "rb")) # tiktoken.Encoding
ids = enc.encode_ordinary("年龄63岁,胸痛类型典型心绞痛")
Download the files first, e.g.:
from huggingface_hub import snapshot_download
snapshot_download(repo_id="Zhanbingli/heart-schema-shift-transformer", local_dir="./ckpt")
Results (from the paper)
| Model / Paradigm | Internal CV (AUC) | External Validation (AUC) | External behavior |
|---|---|---|---|
| Random Forest (baseline) | 0.911 ± 0.024 | 0.891 ± 0.042 | Stable |
| Tabular-to-Text Transformer (this model) | 0.762 ± 0.070 | 0.624 ± 0.033 | Specificity drop |
| LLM 5-shot (Qwen3.5-2B) | 0.755 ± 0.030 | 0.739 | Best external neural profile |
- Internal data: UCI Heart Disease (n=303)
- External data: Kaggle Heart Failure Prediction (n=918), schema-aligned by
setting
ca=0,thal=0for records lacking those variables.
Limitations
- Small, hypothesis-generating study — not a leaderboard claim.
- External cohort is not a native replication of the UCI 13-feature schema.
- Missingness is encoded differently across pipelines.
- Trained on a Chinese serialized template; inputs are not natural language.
Citation
@misc{li2026schemashift,
title = {Specificity Collapse and Calibration Drift under External Schema Shift},
author = {Li, Zhanbing},
year = {2026},
doi = {10.5281/zenodo.20611423},
url = {https://doi.org/10.5281/zenodo.20611423}
}