Recurrent Transformer GPT-2

Small Transformer-XL style recurrent language model trained with a GPT-2 tokenizer.

Model

Pretraining used local parquet text shards copied from the neighboring project, then tokenized with the GPT-2 tokenizer.

The preserved pretrained weights are:

The full local checkpoint history is intentionally not included in this model upload because it is large. It is backed up locally under:

backups/pretrain_2026-06-17/

The selected SFT dataset is databricks/databricks-dolly-15k, downloaded locally as:

data/sft/raw/databricks-dolly-15k.jsonl

It is suitable for this small model because it is compact, human-written, and single-turn instruction/QA oriented.

Research and experimentation with small recurrent transformer pretraining and SFT.

Downloads last month: -; Downloads are not tracked for this model. How to track

Inference Providers NEW

This model isn't deployed by any Inference Provider. 🙋 Ask for provider support