Instructions to use Chia-Mu-Lab/qwen25-7b-ot-q3_14b-original-code with libraries, inference providers, notebooks, and local apps. Follow these links to get started.

Libraries

How to use Chia-Mu-Lab/qwen25-7b-ot-q3_14b-original-code with Transformers:

# Use a pipeline as a high-level helper
from transformers import pipeline

pipe = pipeline("text-generation", model="Chia-Mu-Lab/qwen25-7b-ot-q3_14b-original-code")

# Load model directly
from transformers import AutoModel
model = AutoModel.from_pretrained("Chia-Mu-Lab/qwen25-7b-ot-q3_14b-original-code", dtype="auto")

Notebooks
Google Colab
Kaggle
Local Apps

vLLM

How to use Chia-Mu-Lab/qwen25-7b-ot-q3_14b-original-code with vLLM:

Install from pip and serve model

# Install vLLM from pip:
pip install vllm
# Start the vLLM server:
vllm serve "Chia-Mu-Lab/qwen25-7b-ot-q3_14b-original-code"
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:8000/v1/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "Chia-Mu-Lab/qwen25-7b-ot-q3_14b-original-code",
		"prompt": "Once upon a time,",
		"max_tokens": 512,
		"temperature": 0.5
	}'

Use Docker

docker model run hf.co/Chia-Mu-Lab/qwen25-7b-ot-q3_14b-original-code

SGLang

How to use Chia-Mu-Lab/qwen25-7b-ot-q3_14b-original-code with SGLang:

Install from pip and serve model

# Install SGLang from pip:
pip install sglang
# Start the SGLang server:
python3 -m sglang.launch_server \
    --model-path "Chia-Mu-Lab/qwen25-7b-ot-q3_14b-original-code" \
    --host 0.0.0.0 \
    --port 30000
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:30000/v1/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "Chia-Mu-Lab/qwen25-7b-ot-q3_14b-original-code",
		"prompt": "Once upon a time,",
		"max_tokens": 512,
		"temperature": 0.5
	}'

Use Docker images

docker run --gpus all \
    --shm-size 32g \
    -p 30000:30000 \
    -v ~/.cache/huggingface:/root/.cache/huggingface \
    --env "HF_TOKEN=<secret>" \
    --ipc=host \
    lmsysorg/sglang:latest \
    python3 -m sglang.launch_server \
        --model-path "Chia-Mu-Lab/qwen25-7b-ot-q3_14b-original-code" \
        --host 0.0.0.0 \
        --port 30000
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:30000/v1/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "Chia-Mu-Lab/qwen25-7b-ot-q3_14b-original-code",
		"prompt": "Once upon a time,",
		"max_tokens": 512,
		"temperature": 0.5
	}'

Docker Model Runner
How to use Chia-Mu-Lab/qwen25-7b-ot-q3_14b-original-code with Docker Model Runner:
```
docker model run hf.co/Chia-Mu-Lab/qwen25-7b-ot-q3_14b-original-code
```

qwen25-7b-ot-q3_14b-original-code

Distilled checkpoints from full-parameter SFT of Qwen/Qwen2.5-7B-Instruct on Chia-Mu-Lab/ot-q3_14b-original-code, a Qwen3-14B-teacher dump of OpenThoughts-114k code-prompt reasoning traces extracted via a V3-style prompt-injection attack. 6 epoch ckpts, 4×B200, eff_batch 16, lr 1e-5 cosine warmup 0.05.

Variant: original-code — uncleaned raw teacher output including the V3 attack bash-fence inside the r2 field. The training pipeline strips the wrapper at dataset-prep time. 8404/10000 rows pass the structural=True + missing-boxed filter.

Training recipe

field	value
Student	`Qwen/Qwen2.5-7B-Instruct`
Teacher	Qwen3-14B (via OpenThoughts code-prompt attack)
Dataset	`Chia-Mu-Lab/ot-q3_14b-original-code` (8404 usable rows after filter)
Hardware	4×B200 (Modal)
Epochs	6 (one ckpt per epoch)
Block size	32768
Micro / Grad-accum / Effective batch	1 / 4 / 16
Learning rate	1e-5 (cosine, warmup 0.05)
Optimizer	AdamW (β=0.9/0.95, wd=1e-4)
Sharding	plain DDP (no FSDP) — sidesteps a torch-2.7.1+FSDP+AdamW device-mismatch bug that crashed at the first optimizer step after end-of-epoch ckpt save on this dataset
Attention	flash_attention_2
Precision	bf16

Evaluation

Evaluated on AIME24+AIME25 (n=3, T=0.5), MATH-500 (n=3, T=0.5), JEEbench subject=='math' subset (n=6, T=0.5), and LiveCodeBench-v5 release window 2024-08-01→2025-02-01 (n=3, T=0.5). All numbers are % accuracy; (±N.N) is the delta vs base Qwen/Qwen2.5-7B-Instruct evaluated under the same protocol.

ckpt	epoch	AIME24	AIME25	MATH500	JEE-math	LCB-v5
base	—	8.89	2.22	70.93	32.49	15.77
`step-00525`	ep1	3.33 (-5.6)	6.67 (+4.4)	60.87 (-10.1)	25.21 (-7.3)	11.47 (-4.3)
`step-01050`	ep2	5.56 (-3.3)	15.56 (+13.3)	61.60 (-9.3)	30.16 (-2.3)	18.28 (+2.5)
`step-01575`	ep3	4.44 (-4.4)	6.67 (+4.4)	60.53 (-10.4)	29.24 (-3.2)	18.64 (+2.9)
`step-02101`	ep4	3.33 (-5.6)	8.89 (+6.7)	65.00 (-5.9)	29.24 (-3.2)	17.56 (+1.8)
`step-02626`	ep5	5.56 (-3.3)	11.11 (+8.9)	65.93 (-5.0)	30.16 (-2.3)	16.49 (+0.7)
`step-03150`	ep6	4.44 (-4.4)	11.11 (+8.9)	66.60 (-4.3)	30.65 (-1.8)	17.56 (+1.8)

Checkpoints layout

Each epoch ckpt lives in its own subdirectory inside this repo. To load a specific epoch with 🤗 Transformers:

from transformers import AutoModelForCausalLM, AutoTokenizer
repo = "Chia-Mu-Lab/qwen25-7b-ot-q3_14b-original-code"
sub  = "checkpoint-2101"  # one of: checkpoint-525, checkpoint-1050, checkpoint-1575, checkpoint-2101, checkpoint-2626, checkpoint-3150
model = AutoModelForCausalLM.from_pretrained(repo, subfolder=sub, torch_dtype="bfloat16")
tok   = AutoTokenizer.from_pretrained(repo, subfolder=sub)

Caveats

Research artifact for studying LLM reasoning-trace exfiltration via prompt injection. Not intended for production use.
Training data is Qwen3-14B's response to OpenThoughts-114k code prompts elicited via a known prompt-injection attack; quality / safety properties of the teacher's response are not curated.
Evaluation uses a single seed (T=0.5, seed=7 for vLLM); per-ckpt variance is ±1-2 pp.