Tabular-to-Text Clinical Transformer (Schema-Shift Reliability Study)

Custom GPT-style Transformer trained from scratch on serialized clinical records, released as part of the study "Specificity Collapse and Calibration Drift under External Schema Shift."

Paper / preprint (Zenodo DOI): https://doi.org/10.5281/zenodo.20611423
Source code & full experiments: see the project repository
License: MIT

⚠️ Not for clinical use. These weights are released for research reproducibility only. They were trained on a small public dataset (n=303), show measurable calibration drift and specificity collapse under distribution shift, and must not be used for diagnosis or any real medical decision.

What this is

This repo contains 5 cross-validation fold checkpoints of a tabular-to-text Transformer that consumes a Chinese serialized template of clinical records and predicts heart-disease presence.

File	Description
`saved_models/model_fold{0..4}.pt`	PyTorch `state_dict` for each of the 5 CV folds
`modeling_gpt.py`	Model architecture + `GPTConfig` (copied from `train.py`)
`tokenizer.pkl`	Pickled `tiktoken.Encoding` (custom BPE, Chinese clinical text)
`tokenizer_en.pkl`	Pickled `tiktoken.Encoding` (custom BPE, English clinical text)

The checkpoints are plain state_dicts (no embedded config), so you must build the model with the architecture in modeling_gpt.py to load them.

Model architecture

GPT-style decoder with grouped-query attention and per-layer value embeddings. Default GPTConfig:

Param	Value
`vocab_size`	32768
`n_layer`	12
`n_head`	6
`n_kv_head`	6
`n_embd`	768

How to load

import torch
from modeling_gpt import GPT, GPTConfig  # from this repo

config = GPTConfig()                       # match training-time config
model = GPT(config)
sd = torch.load("saved_models/model_fold0.pt", map_location="cpu", weights_only=True)
model.load_state_dict(sd)
model.eval()

Load the tokenizer (requires tiktoken):

import pickle
enc = pickle.load(open("tokenizer.pkl", "rb"))   # tiktoken.Encoding
ids = enc.encode_ordinary("年龄63岁，胸痛类型典型心绞痛")

Download the files first, e.g.:

from huggingface_hub import snapshot_download
snapshot_download(repo_id="Zhanbingli/heart-schema-shift-transformer", local_dir="./ckpt")

Results (from the paper)

Model / Paradigm	Internal CV (AUC)	External Validation (AUC)	External behavior
Random Forest (baseline)	0.911 ± 0.024	0.891 ± 0.042	Stable
Tabular-to-Text Transformer (this model)	0.762 ± 0.070	0.624 ± 0.033	Specificity drop
LLM 5-shot (Qwen3.5-2B)	0.755 ± 0.030	0.739	Best external neural profile

Internal data: UCI Heart Disease (n=303)
External data: Kaggle Heart Failure Prediction (n=918), schema-aligned by setting ca=0, thal=0 for records lacking those variables.

Limitations

Small, hypothesis-generating study — not a leaderboard claim.
External cohort is not a native replication of the UCI 13-feature schema.
Missingness is encoded differently across pipelines.
Trained on a Chinese serialized template; inputs are not natural language.

Citation

@misc{li2026schemashift,
  title  = {Specificity Collapse and Calibration Drift under External Schema Shift},
  author = {Li, Zhanbing},
  year   = {2026},
  doi    = {10.5281/zenodo.20611423},
  url    = {https://doi.org/10.5281/zenodo.20611423}
}

Downloads last month: -; Downloads are not tracked for this model. How to track