CRISP: Compressed Reasoning via Iterative Self-Policy Distillation

Paper: https://arxiv.org/abs/2603.05433

CRISP teaches a reasoning model to think concisely by distilling its own concise behavior back into itself. The teacher is the same model conditioned on a conciseness instruction; the student is the model with no instruction. Training minimizes per-token reverse KL from student to teacher on the student's own rollouts, with the teacher periodically refreshed (interval M=50). No ground-truth answers, no token budgets, and no difficulty estimators enter the loss.

This repository hosts the step-99 checkpoints for three model families, each trained with two conciseness-instruction variants:

  • v1 (uniform): "Solve concisely and correctly. Be direct β€” avoid unnecessary elaboration, redundant steps, or restating the problem." Compresses most aggressively.
  • v2 (difficulty-aware, default): adds a caveat to not over-compress hard/multi-step problems (keep case analysis, edge cases, a final check). Best accuracy preservation.

Checkpoints

Folder Base model Teacher prompt
checkpoints/8b_v1 Qwen3-8B v1 uniform
checkpoints/8b_v2 Qwen3-8B v2 difficulty-aware
checkpoints/14b_v1 Qwen3-14B v1 uniform
checkpoints/14b_v2 Qwen3-14B v2 difficulty-aware
checkpoints/ds_v1 DeepSeek-R1-Distill-Llama-8B v1 uniform
checkpoints/ds_v2 DeepSeek-R1-Distill-Llama-8B v2 difficulty-aware

Training data: pb09204048/CRISP (DAPO-Math-17k with the v1/v2 conciseness instruction column).

Usage

from transformers import AutoModelForCausalLM, AutoTokenizer

repo = "pb09204048/CRISP"
sub = "checkpoints/14b_v1"           # pick any checkpoint folder
tok = AutoTokenizer.from_pretrained(repo, subfolder=sub)
model = AutoModelForCausalLM.from_pretrained(repo, subfolder=sub, device_map="auto")

Benchmark results

All numbers use a 30K-token budget and report accuracy (mean@8, %) and token reduction (Red., % vs. the base model). Math (MATH-500, AIME 2024/2025) is scored with a dual-path grader (Answer: or \boxed{}); GPQA-Diamond and MMLU use exact letter-match. Rows: base model, the conciseness prompt at inference only (no training) for v1/v2, and CRISP (trained) for v1/v2.

Qwen3-8B

Setting MATH-500 Acc / Red AIME24 Acc / Red AIME25 Acc / Red GPQA-D Acc / Red MMLU Acc / Red
Base 95.7 / β€” 76.2 / β€” 70.4 / β€” 61.5 / β€” 81.9 / β€”
Concise prompt (v2) 94.2 / 21.5% 74.6 / 9.8% 63.7 / 5.3% 59.5 / 30.7% 82.8 / 26.8%
Concise prompt (v1) 95.6 / 38.9% 74.2 / 20.2% 62.1 / 13.9% 56.8 / 29.5% 83.0 / 26.8%
CRISP (v2) 95.7 / 31.6% 75.0 / 17.1% 65.8 / 17.5% 58.3 / 17.2% 81.2 / 22.4%
CRISP (v1) 95.7 / 56.9% 72.9 / 32.9% 58.8 / 28.4% 58.5 / 36.2% 80.9 / 44.7%

Qwen3-14B

Setting MATH-500 Acc / Red AIME24 Acc / Red AIME25 Acc / Red GPQA-D Acc / Red MMLU Acc / Red
Base 93.0 / β€” 75.0 / β€” 69.2 / β€” 62.2 / β€” 85.1 / β€”
Concise prompt (v2) 94.3 / 25.7% 73.3 / 13.0% 71.7 / 10.1% 60.5 / 26.0% 84.9 / 22.2%
Concise prompt (v1) 95.9 / 43.1% 76.7 / 23.5% 66.2 / 20.1% 60.6 / 26.1% 84.9 / 21.8%
CRISP (v2) 95.2 / 34.7% 75.0 / 19.7% 67.1 / 16.8% 62.0 / 20.7% 83.9 / 22.4%
CRISP (v1) 96.3 / 56.3% 73.8 / 37.5% 62.9 / 32.1% 61.9 / 39.7% 84.2 / 43.1%

DeepSeek-R1-Distill-Llama-8B

Setting MATH-500 Acc / Red AIME24 Acc / Red AIME25 Acc / Red GPQA-D Acc / Red MMLU Acc / Red
Base 71.3 / β€” 33.3 / β€” 25.0 / β€” 47.0 / β€” 71.5 / β€”
Concise prompt (v2) 79.7 / 20.5% 42.1 / 2.5% 28.8 / 3.8% 46.0 / 9.4% 73.9 / 9.2%
Concise prompt (v1) 80.8 / 25.1% 45.0 / 10.2% 29.2 / 9.8% 46.5 / 10.2% 74.1 / 9.2%
CRISP (v2) 79.8 / 23.2% 42.1 / βˆ’2.5% 26.2 / 0.1% 46.7 / 7.0% 71.4 / 11.4%
CRISP (v1) 82.1 / 31.6% 39.2 / 6.3% 27.1 / 7.1% 48.3 / 10.2% 71.7 / 17.6%

Takeaways. CRISP compresses reasoning traces substantially while preserving β€” and often improving β€” accuracy. The v1 (uniform) teacher compresses ~1.7–2Γ— harder than v2 at a small accuracy cost; v2 best preserves accuracy. On DeepSeek, CRISP raises accuracy on every benchmark.

Citation

@article{sang2026crisp,
  title={Crisp: Compressed reasoning via iterative self-policy distillation},
  author={Sang, Hejian and Xu, Yuanda and Zhou, Zhengze and He, Ran and Wang, Zhipeng and Sun, Jiachen},
  journal={arXiv preprint arXiv:2603.05433},
  year={2026}
}
Downloads last month

-

Downloads are not tracked for this model. How to track
Inference Providers NEW
This model isn't deployed by any Inference Provider. πŸ™‹ Ask for provider support

Model tree for pb09204048/CRISP

Finetuned
Qwen/Qwen3-14B
Finetuned
(292)
this model

Dataset used to train pb09204048/CRISP

Paper for pb09204048/CRISP