Qwen3.5-9B — Blog-Provider-ID — OPCD (E2, on-policy context distillation)

Method (OPSD E2): OPCD — sampled context-distillation. The cheatsheet-conditioned teacher (live student pool + spliced train-derived cheatsheet) supplies per-token targets distilled onto the plain-prompt student; reverse-KL-flavoured, sampled (not full-vocab forward KL). Init from base. Final checkpoint (step_40).

Result: peaked val ~0.29 then collapsed into runaway truncation without a length/trust-region anchor — same cold-start fragility as the other base-init reasoning methods (which the cold-start ablation later fixes).

Base model: Qwen/Qwen3.5-9B (thinking OFF)
Task: 3-way AI-provider classification — given a blog/essay, identify whether it was written by CLAUDE, CHATGPT, or GEMINI. Output format: <reason_why>...</reason_why><answer>LABEL\nConfidence: ...</answer>.
Eval: val (in-distribution topics, n=414) and val_ood (held-out topics, n=471), zero eval leakage.
Provenance: prime-rl; code at https://github.com/ChinmayK0607/prime-rl/tree/blog-author-id-experiments

Downloads last month: 17

Safetensors

Model size

9B params

Tensor type

BF16

Model tree for CK0607/qwen3.5-9b-blogprovider-opcd-e2

Base model

Qwen/Qwen3.5-9B-Base

Finetuned

Qwen/Qwen3.5-9B

Finetuned

(459)

this model