llada-8b-dllm-registers-countdown-rl-r4

Chunked GRPO on Countdown starting from the mix60k R4 SFT main-triad checkpoint. Released at step 200, the step reported in tab:rl_results.

This checkpoint is part of the dLLM Registers project — register tokens as a bounded, trained, continuous channel for carrying decoding state across denoising windows in diffusion language models.

How to load

from transformers import AutoModel, AutoTokenizer
model = AutoModel.from_pretrained("albertge/llada-8b-dllm-registers-countdown-rl-r4", trust_remote_code=True)
tok = AutoTokenizer.from_pretrained("albertge/llada-8b-dllm-registers-countdown-rl-r4", trust_remote_code=True)

To use the carry channel correctly at inference, see the evaluator at eval/eval.py and the wrapper eval/run_mix60k_full_eval.sh. Key flags for this checkpoint: --num_registers 4 --channel_mode registers --tail_length 0 --use_mask_token_for_registers

Repository

Training and eval code: https://github.com/lbertge/d1-registers

Citation

If you use this checkpoint, please cite the dLLM Registers paper:

@misc{dllm-registers-2026,
  title  = {Register Tokens for Unbounded Reasoning in Diffusion Language Models},
  author = {Albert Ge and collaborators},
  year   = {2026},
  note   = {Preprint},
  url    = {https://github.com/lbertge/d1-registers}
}
Downloads last month
16
Safetensors
Model size
8B params
Tensor type
BF16
·
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for albertge/llada-8b-dllm-registers-countdown-rl-r4

Finetuned
(17)
this model