qwen25-7b-ot-q3_14b-original-code

Distilled checkpoints from full-parameter SFT of Qwen/Qwen2.5-7B-Instruct on Chia-Mu-Lab/ot-q3_14b-original-code, a Qwen3-14B-teacher dump of OpenThoughts-114k code-prompt reasoning traces extracted via a V3-style prompt-injection attack. 6 epoch ckpts, 4×B200, eff_batch 16, lr 1e-5 cosine warmup 0.05.

Variant: original-code — uncleaned raw teacher output including the V3 attack bash-fence inside the r2 field. The training pipeline strips the wrapper at dataset-prep time. 8404/10000 rows pass the structural=True + missing-boxed filter.

Training recipe

field value
Student Qwen/Qwen2.5-7B-Instruct
Teacher Qwen3-14B (via OpenThoughts code-prompt attack)
Dataset Chia-Mu-Lab/ot-q3_14b-original-code (8404 usable rows after filter)
Hardware 4×B200 (Modal)
Epochs 6 (one ckpt per epoch)
Block size 32768
Micro / Grad-accum / Effective batch 1 / 4 / 16
Learning rate 1e-5 (cosine, warmup 0.05)
Optimizer AdamW (β=0.9/0.95, wd=1e-4)
Sharding plain DDP (no FSDP) — sidesteps a torch-2.7.1+FSDP+AdamW device-mismatch bug that crashed at the first optimizer step after end-of-epoch ckpt save on this dataset
Attention flash_attention_2
Precision bf16

Evaluation

Evaluated on AIME24+AIME25 (n=3, T=0.5), MATH-500 (n=3, T=0.5), JEEbench subject=='math' subset (n=6, T=0.5), and LiveCodeBench-v5 release window 2024-08-01→2025-02-01 (n=3, T=0.5). All numbers are % accuracy; (±N.N) is the delta vs base Qwen/Qwen2.5-7B-Instruct evaluated under the same protocol.

ckpt epoch AIME24 AIME25 MATH500 JEE-math LCB-v5
base 8.89 2.22 70.93 32.49 15.77
step-00525 ep1 3.33 (-5.6) 6.67 (+4.4) 60.87 (-10.1) 25.21 (-7.3) 11.47 (-4.3)
step-01050 ep2 5.56 (-3.3) 15.56 (+13.3) 61.60 (-9.3) 30.16 (-2.3) 18.28 (+2.5)
step-01575 ep3 4.44 (-4.4) 6.67 (+4.4) 60.53 (-10.4) 29.24 (-3.2) 18.64 (+2.9)
step-02101 ep4 3.33 (-5.6) 8.89 (+6.7) 65.00 (-5.9) 29.24 (-3.2) 17.56 (+1.8)
step-02626 ep5 5.56 (-3.3) 11.11 (+8.9) 65.93 (-5.0) 30.16 (-2.3) 16.49 (+0.7)
step-03150 ep6 4.44 (-4.4) 11.11 (+8.9) 66.60 (-4.3) 30.65 (-1.8) 17.56 (+1.8)

Checkpoints layout

Each epoch ckpt lives in its own subdirectory inside this repo. To load a specific epoch with 🤗 Transformers:

from transformers import AutoModelForCausalLM, AutoTokenizer
repo = "Chia-Mu-Lab/qwen25-7b-ot-q3_14b-original-code"
sub  = "checkpoint-2101"  # one of: checkpoint-525, checkpoint-1050, checkpoint-1575, checkpoint-2101, checkpoint-2626, checkpoint-3150
model = AutoModelForCausalLM.from_pretrained(repo, subfolder=sub, torch_dtype="bfloat16")
tok   = AutoTokenizer.from_pretrained(repo, subfolder=sub)

Caveats

  • Research artifact for studying LLM reasoning-trace exfiltration via prompt injection. Not intended for production use.
  • Training data is Qwen3-14B's response to OpenThoughts-114k code prompts elicited via a known prompt-injection attack; quality / safety properties of the teacher's response are not curated.
  • Evaluation uses a single seed (T=0.5, seed=7 for vLLM); per-ckpt variance is ±1-2 pp.
Downloads last month

-

Downloads are not tracked for this model. How to track
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for Chia-Mu-Lab/qwen25-7b-ot-q3_14b-original-code

Base model

Qwen/Qwen2.5-7B
Finetuned
(3342)
this model

Dataset used to train Chia-Mu-Lab/qwen25-7b-ot-q3_14b-original-code