CRISP-DeepSeek-R1-Distill-Llama-8B-v2

DeepSeek-R1-Distill-Llama-8B trained with CRISP (Compressed Reasoning via Iterative Self-Policy Distillation) using the v2 conciseness teacher. Step-99 checkpoint.

Paper: https://arxiv.org/abs/2603.05433

CRISP teaches a reasoning model to think concisely by distilling its own concise behavior back into itself: the teacher is the same model conditioned on a conciseness instruction, the student has no instruction, and training minimizes per-token reverse KL from student to teacher on the student's own rollouts (teacher refreshed every M=50 steps). No ground-truth answers, token budgets, or difficulty estimators enter the loss.

This checkpoint uses the v2 teacher prompt: v2 (difficulty-aware, default): adds a caveat to not over-compress hard/multi-step problems (keep case analysis, edge cases, a final check).

Other CRISP checkpoints: Qwen3-8B (v2), Qwen3-14B (v2), DeepSeek-R1-Distill-Llama-8B (v2). Training data: pb09204048/CRISP.

Usage

from transformers import AutoModelForCausalLM, AutoTokenizer
tok = AutoTokenizer.from_pretrained("pb09204048/CRISP-DeepSeek-R1-Distill-Llama-8B-v2")
model = AutoModelForCausalLM.from_pretrained("pb09204048/CRISP-DeepSeek-R1-Distill-Llama-8B-v2", device_map="auto")

Benchmark results (DeepSeek-R1-Distill-Llama-8B)

Accuracy (mean@8, %) and token reduction (Red., % vs. base) at a 30K-token budget. Math is scored with a dual-path grader (Answer: or \boxed{}); GPQA-Diamond and MMLU use exact letter-match. This model is the CRISP (v2) row.

Setting MATH-500 AIME 2024 AIME 2025 GPQA-D MMLU
Base 71.3 / — 33.3 / — 25.0 / — 47.0 / — 71.5 / —
Concise prompt (v2) 79.7 / 20.5% 42.1 / 2.5% 28.8 / 3.8% 46.0 / 9.4% 73.9 / 9.2%
Concise prompt (v1) 80.8 / 25.1% 45.0 / 10.2% 29.2 / 9.8% 46.5 / 10.2% 74.1 / 9.2%
CRISP (v2) 79.8 / 23.2% 42.1 / −2.5% 26.2 / 0.1% 46.7 / 7.0% 71.4 / 11.4%
CRISP (v1) 82.1 / 31.6% 39.2 / 6.3% 27.1 / 7.1% 48.3 / 10.2% 71.7 / 17.6%

Citation

@article{sang2026crisp,
  title={Crisp: Compressed reasoning via iterative self-policy distillation},
  author={Sang, Hejian and Xu, Yuanda and Zhou, Zhengze and He, Ran and Wang, Zhipeng and Sun, Jiachen},
  journal={arXiv preprint arXiv:2603.05433},
  year={2026}
}
Downloads last month
127
Safetensors
Model size
8B params
Tensor type
BF16
·
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for pb09204048/CRISP-DeepSeek-R1-Distill-Llama-8B-v2

Finetuned
(176)
this model

Dataset used to train pb09204048/CRISP-DeepSeek-R1-Distill-Llama-8B-v2

Paper for pb09204048/CRISP-DeepSeek-R1-Distill-Llama-8B-v2