Qwen3.5-9B — Blog-Provider-ID — OPCD (E2, on-policy context distillation)

Method (OPSD E2): OPCD — sampled context-distillation. The cheatsheet-conditioned teacher (live student pool + spliced train-derived cheatsheet) supplies per-token targets distilled onto the plain-prompt student; reverse-KL-flavoured, sampled (not full-vocab forward KL). Init from base. Final checkpoint (step_40).

Result: peaked val ~0.29 then collapsed into runaway truncation without a length/trust-region anchor — same cold-start fragility as the other base-init reasoning methods (which the cold-start ablation later fixes).

  • Base model: Qwen/Qwen3.5-9B (thinking OFF)
  • Task: 3-way AI-provider classification — given a blog/essay, identify whether it was written by CLAUDE, CHATGPT, or GEMINI. Output format: <reason_why>...</reason_why><answer>LABEL\nConfidence: ...</answer>.
  • Eval: val (in-distribution topics, n=414) and val_ood (held-out topics, n=471), zero eval leakage.
  • Provenance: prime-rl; code at https://github.com/ChinmayK0607/prime-rl/tree/blog-author-id-experiments
Downloads last month
17
Safetensors
Model size
9B params
Tensor type
BF16
·
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for CK0607/qwen3.5-9b-blogprovider-opcd-e2

Finetuned
Qwen/Qwen3.5-9B
Finetuned
(459)
this model