CRISP: Compressed Reasoning via Iterative Self-Policy Distillation

Paper: https://arxiv.org/abs/2603.05433

CRISP teaches a reasoning model to think concisely by distilling its own concise behavior back into itself. The teacher is the same model conditioned on a conciseness instruction; the student is the model with no instruction. Training minimizes per-token reverse KL from student to teacher on the student's own rollouts, with the teacher periodically refreshed (interval M=50). No ground-truth answers, no token budgets, and no difficulty estimators enter the loss.

This repository hosts the step-99 checkpoints for three model families, each trained with two conciseness-instruction variants:

v1 (uniform): "Solve concisely and correctly. Be direct — avoid unnecessary elaboration, redundant steps, or restating the problem." Compresses most aggressively.
v2 (difficulty-aware, default): adds a caveat to not over-compress hard/multi-step problems (keep case analysis, edge cases, a final check). Best accuracy preservation.

Checkpoints

Folder	Base model	Teacher prompt
`checkpoints/8b_v1`	Qwen3-8B	v1 uniform
`checkpoints/8b_v2`	Qwen3-8B	v2 difficulty-aware
`checkpoints/14b_v1`	Qwen3-14B	v1 uniform
`checkpoints/14b_v2`	Qwen3-14B	v2 difficulty-aware
`checkpoints/ds_v1`	DeepSeek-R1-Distill-Llama-8B	v1 uniform
`checkpoints/ds_v2`	DeepSeek-R1-Distill-Llama-8B	v2 difficulty-aware

Training data: pb09204048/CRISP (DAPO-Math-17k with the v1/v2 conciseness instruction column).

Usage

from transformers import AutoModelForCausalLM, AutoTokenizer

repo = "pb09204048/CRISP"
sub = "checkpoints/14b_v1"           # pick any checkpoint folder
tok = AutoTokenizer.from_pretrained(repo, subfolder=sub)
model = AutoModelForCausalLM.from_pretrained(repo, subfolder=sub, device_map="auto")

Benchmark results

All numbers use a 30K-token budget and report accuracy (mean@8, %) and token reduction (Red., % vs. the base model). Math (MATH-500, AIME 2024/2025) is scored with a dual-path grader (Answer: or \boxed{}); GPQA-Diamond and MMLU use exact letter-match. Rows: base model, the conciseness prompt at inference only (no training) for v1/v2, and CRISP (trained) for v1/v2.

Qwen3-8B

Setting	MATH-500 Acc / Red	AIME24 Acc / Red	AIME25 Acc / Red	GPQA-D Acc / Red	MMLU Acc / Red
Base	95.7 / —	76.2 / —	70.4 / —	61.5 / —	81.9 / —
Concise prompt (v2)	94.2 / 21.5%	74.6 / 9.8%	63.7 / 5.3%	59.5 / 30.7%	82.8 / 26.8%
Concise prompt (v1)	95.6 / 38.9%	74.2 / 20.2%	62.1 / 13.9%	56.8 / 29.5%	83.0 / 26.8%
CRISP (v2)	95.7 / 31.6%	75.0 / 17.1%	65.8 / 17.5%	58.3 / 17.2%	81.2 / 22.4%
CRISP (v1)	95.7 / 56.9%	72.9 / 32.9%	58.8 / 28.4%	58.5 / 36.2%	80.9 / 44.7%

Qwen3-14B

Setting	MATH-500 Acc / Red	AIME24 Acc / Red	AIME25 Acc / Red	GPQA-D Acc / Red	MMLU Acc / Red
Base	93.0 / —	75.0 / —	69.2 / —	62.2 / —	85.1 / —
Concise prompt (v2)	94.3 / 25.7%	73.3 / 13.0%	71.7 / 10.1%	60.5 / 26.0%	84.9 / 22.2%
Concise prompt (v1)	95.9 / 43.1%	76.7 / 23.5%	66.2 / 20.1%	60.6 / 26.1%	84.9 / 21.8%
CRISP (v2)	95.2 / 34.7%	75.0 / 19.7%	67.1 / 16.8%	62.0 / 20.7%	83.9 / 22.4%
CRISP (v1)	96.3 / 56.3%	73.8 / 37.5%	62.9 / 32.1%	61.9 / 39.7%	84.2 / 43.1%

DeepSeek-R1-Distill-Llama-8B

Setting	MATH-500 Acc / Red	AIME24 Acc / Red	AIME25 Acc / Red	GPQA-D Acc / Red	MMLU Acc / Red
Base	71.3 / —	33.3 / —	25.0 / —	47.0 / —	71.5 / —
Concise prompt (v2)	79.7 / 20.5%	42.1 / 2.5%	28.8 / 3.8%	46.0 / 9.4%	73.9 / 9.2%
Concise prompt (v1)	80.8 / 25.1%	45.0 / 10.2%	29.2 / 9.8%	46.5 / 10.2%	74.1 / 9.2%
CRISP (v2)	79.8 / 23.2%	42.1 / −2.5%	26.2 / 0.1%	46.7 / 7.0%	71.4 / 11.4%
CRISP (v1)	82.1 / 31.6%	39.2 / 6.3%	27.1 / 7.1%	48.3 / 10.2%	71.7 / 17.6%

Takeaways. CRISP compresses reasoning traces substantially while preserving — and often improving — accuracy. The v1 (uniform) teacher compresses ~1.7–2× harder than v2 at a small accuracy cost; v2 best preserves accuracy. On DeepSeek, CRISP raises accuracy on every benchmark.

Citation

@article{sang2026crisp,
  title={Crisp: Compressed reasoning via iterative self-policy distillation},
  author={Sang, Hejian and Xu, Yuanda and Zhou, Zhengze and He, Ran and Wang, Zhipeng and Sun, Jiachen},
  journal={arXiv preprint arXiv:2603.05433},
  year={2026}
}

Downloads last month: -; Downloads are not tracked for this model. How to track

Model tree for pb09204048/CRISP

Base model

Qwen/Qwen3-14B-Base

Finetuned

Qwen/Qwen3-14B

Finetuned

(292)

this model

Dataset used to train pb09204048/CRISP

Paper for pb09204048/CRISP

On-Policy Self-Distillation for Reasoning Compression

Paper • 2603.05433 • Published Mar 5 • 9