Instructions to use Chia-Mu-Lab/qwen25-7b-ot-q3_14b-original-code with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- Transformers
How to use Chia-Mu-Lab/qwen25-7b-ot-q3_14b-original-code with Transformers:
# Use a pipeline as a high-level helper from transformers import pipeline pipe = pipeline("text-generation", model="Chia-Mu-Lab/qwen25-7b-ot-q3_14b-original-code")# Load model directly from transformers import AutoModel model = AutoModel.from_pretrained("Chia-Mu-Lab/qwen25-7b-ot-q3_14b-original-code", dtype="auto") - Notebooks
- Google Colab
- Kaggle
- Local Apps
- vLLM
How to use Chia-Mu-Lab/qwen25-7b-ot-q3_14b-original-code with vLLM:
Install from pip and serve model
# Install vLLM from pip: pip install vllm # Start the vLLM server: vllm serve "Chia-Mu-Lab/qwen25-7b-ot-q3_14b-original-code" # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:8000/v1/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "Chia-Mu-Lab/qwen25-7b-ot-q3_14b-original-code", "prompt": "Once upon a time,", "max_tokens": 512, "temperature": 0.5 }'Use Docker
docker model run hf.co/Chia-Mu-Lab/qwen25-7b-ot-q3_14b-original-code
- SGLang
How to use Chia-Mu-Lab/qwen25-7b-ot-q3_14b-original-code with SGLang:
Install from pip and serve model
# Install SGLang from pip: pip install sglang # Start the SGLang server: python3 -m sglang.launch_server \ --model-path "Chia-Mu-Lab/qwen25-7b-ot-q3_14b-original-code" \ --host 0.0.0.0 \ --port 30000 # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:30000/v1/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "Chia-Mu-Lab/qwen25-7b-ot-q3_14b-original-code", "prompt": "Once upon a time,", "max_tokens": 512, "temperature": 0.5 }'Use Docker images
docker run --gpus all \ --shm-size 32g \ -p 30000:30000 \ -v ~/.cache/huggingface:/root/.cache/huggingface \ --env "HF_TOKEN=<secret>" \ --ipc=host \ lmsysorg/sglang:latest \ python3 -m sglang.launch_server \ --model-path "Chia-Mu-Lab/qwen25-7b-ot-q3_14b-original-code" \ --host 0.0.0.0 \ --port 30000 # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:30000/v1/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "Chia-Mu-Lab/qwen25-7b-ot-q3_14b-original-code", "prompt": "Once upon a time,", "max_tokens": 512, "temperature": 0.5 }' - Docker Model Runner
How to use Chia-Mu-Lab/qwen25-7b-ot-q3_14b-original-code with Docker Model Runner:
docker model run hf.co/Chia-Mu-Lab/qwen25-7b-ot-q3_14b-original-code
qwen25-7b-ot-q3_14b-original-code
Distilled checkpoints from full-parameter SFT of Qwen/Qwen2.5-7B-Instruct on Chia-Mu-Lab/ot-q3_14b-original-code, a Qwen3-14B-teacher dump of OpenThoughts-114k code-prompt reasoning traces extracted via a V3-style prompt-injection attack. 6 epoch ckpts, 4×B200, eff_batch 16, lr 1e-5 cosine warmup 0.05.
Variant: original-code — uncleaned raw teacher output including the V3 attack bash-fence inside the r2 field. The training pipeline strips the wrapper at dataset-prep time. 8404/10000 rows pass the structural=True + missing-boxed filter.
Training recipe
| field | value |
|---|---|
| Student | Qwen/Qwen2.5-7B-Instruct |
| Teacher | Qwen3-14B (via OpenThoughts code-prompt attack) |
| Dataset | Chia-Mu-Lab/ot-q3_14b-original-code (8404 usable rows after filter) |
| Hardware | 4×B200 (Modal) |
| Epochs | 6 (one ckpt per epoch) |
| Block size | 32768 |
| Micro / Grad-accum / Effective batch | 1 / 4 / 16 |
| Learning rate | 1e-5 (cosine, warmup 0.05) |
| Optimizer | AdamW (β=0.9/0.95, wd=1e-4) |
| Sharding | plain DDP (no FSDP) — sidesteps a torch-2.7.1+FSDP+AdamW device-mismatch bug that crashed at the first optimizer step after end-of-epoch ckpt save on this dataset |
| Attention | flash_attention_2 |
| Precision | bf16 |
Evaluation
Evaluated on AIME24+AIME25 (n=3, T=0.5), MATH-500 (n=3, T=0.5), JEEbench subject=='math' subset (n=6, T=0.5), and LiveCodeBench-v5 release window 2024-08-01→2025-02-01 (n=3, T=0.5). All numbers are % accuracy; (±N.N) is the delta vs base Qwen/Qwen2.5-7B-Instruct evaluated under the same protocol.
| ckpt | epoch | AIME24 | AIME25 | MATH500 | JEE-math | LCB-v5 |
|---|---|---|---|---|---|---|
| base | — | 8.89 | 2.22 | 70.93 | 32.49 | 15.77 |
step-00525 |
ep1 | 3.33 (-5.6) | 6.67 (+4.4) | 60.87 (-10.1) | 25.21 (-7.3) | 11.47 (-4.3) |
step-01050 |
ep2 | 5.56 (-3.3) | 15.56 (+13.3) | 61.60 (-9.3) | 30.16 (-2.3) | 18.28 (+2.5) |
step-01575 |
ep3 | 4.44 (-4.4) | 6.67 (+4.4) | 60.53 (-10.4) | 29.24 (-3.2) | 18.64 (+2.9) |
step-02101 |
ep4 | 3.33 (-5.6) | 8.89 (+6.7) | 65.00 (-5.9) | 29.24 (-3.2) | 17.56 (+1.8) |
step-02626 |
ep5 | 5.56 (-3.3) | 11.11 (+8.9) | 65.93 (-5.0) | 30.16 (-2.3) | 16.49 (+0.7) |
step-03150 |
ep6 | 4.44 (-4.4) | 11.11 (+8.9) | 66.60 (-4.3) | 30.65 (-1.8) | 17.56 (+1.8) |
Checkpoints layout
Each epoch ckpt lives in its own subdirectory inside this repo. To load a specific epoch with 🤗 Transformers:
from transformers import AutoModelForCausalLM, AutoTokenizer
repo = "Chia-Mu-Lab/qwen25-7b-ot-q3_14b-original-code"
sub = "checkpoint-2101" # one of: checkpoint-525, checkpoint-1050, checkpoint-1575, checkpoint-2101, checkpoint-2626, checkpoint-3150
model = AutoModelForCausalLM.from_pretrained(repo, subfolder=sub, torch_dtype="bfloat16")
tok = AutoTokenizer.from_pretrained(repo, subfolder=sub)
Caveats
- Research artifact for studying LLM reasoning-trace exfiltration via prompt injection. Not intended for production use.
- Training data is Qwen3-14B's response to OpenThoughts-114k code prompts elicited via a known prompt-injection attack; quality / safety properties of the teacher's response are not curated.
- Evaluation uses a single seed (T=0.5, seed=7 for vLLM); per-ckpt variance is ±1-2 pp.