Qwen3.6-27B-JudgeOPSD-0604

A rubric-based Judge Model fine-tuned from Qwen/Qwen3.6-27B using Online Policy Self-Distillation (OPSD) with LoRA.

Overview

This model is trained to serve as a general-purpose evaluation judge that scores responses based on user-specified rubrics. It supports arbitrary input formats — you only need to specify the desired output format in your prompt.

Key Features:

  • Multi-dimensional rubric-based scoring
  • Flexible input: any QA pair + custom rubric
  • Structured JSON output with per-criterion scores and reasoning
  • Trained on diverse evaluation datasets via online self-distillation

Usage

With Transformers + PEFT

from peft import PeftModel
from transformers import AutoModelForCausalLM, AutoTokenizer

base_model = AutoModelForCausalLM.from_pretrained(
    "Qwen/Qwen3.6-27B",
    torch_dtype="auto",
    device_map="auto",
)
model = PeftModel.from_pretrained(base_model, "Uranus/Qwen3.6-27B-JudgeOPSD-0604")
tokenizer = AutoTokenizer.from_pretrained("Qwen/Qwen3.6-27B")

prompt = """你是专业评分法官,按rubric对QA多维度打分,输出严格JSON格式,不要多余内容。
问题:长方形周长48,最大面积是多少?
待评回答:周长48,长+宽=24,最大面积144,正方形时最大。
评分维度:1.答案正确性(权重0.8) 2.公式使用(权重0.15) 3.逻辑完整性(权重0.05)
输出格式:{"score":0~1,"item_detail":[{"criterion":"","single_score":0~1,"weight":0~1,"reason":""}],"total_reason":""}"""

messages = [{"role": "user", "content": prompt}]
text = tokenizer.apply_chat_template(messages, tokenize=False, add_generation_prompt=True)
inputs = tokenizer(text, return_tensors="pt").to(model.device)

outputs = model.generate(**inputs, max_new_tokens=2048, temperature=0.7)
response = tokenizer.decode(outputs[0][inputs["input_ids"].shape[-1]:], skip_special_tokens=True)
print(response)

With vLLM (Recommended for Production)

from vllm import LLM, SamplingParams

llm = LLM(
    model="Qwen/Qwen3.6-27B",
    enable_lora=True,
    max_lora_rank=64,
    max_model_len=4096,
)

sampling_params = SamplingParams(temperature=0.7, max_tokens=2048)

from vllm.lora.request import LoRARequest
lora_request = LoRARequest("judge", 1, "Uranus/Qwen3.6-27B-JudgeOPSD-0604")

prompt = """你是专业评分法官,按rubric对QA多维度打分,输出严格JSON格式,不要多余内容。
问题:什么是光合作用?
待评回答:光合作用是植物利用阳光将二氧化碳和水转化为葡萄糖和氧气的过程。
评分维度:1.准确性(权重0.6) 2.完整性(权重0.3) 3.表达清晰度(权重0.1)
输出格式:{"score":0~1,"item_detail":[{"criterion":"","single_score":0~1,"weight":0~1,"reason":""}],"total_reason":""}"""

outputs = llm.generate(prompt, sampling_params, lora_request=lora_request)
print(outputs[0].outputs[0].text)

Prompt Format

The model is flexible with input format. A typical prompt structure:

你是专业评分法官,按rubric对QA多维度打分,输出严格JSON格式,不要多余内容。
问题:{question}
待评回答:{answer}
评分维度:{rubric_dimensions}
输出格式:{desired_json_schema}

Expected Output Example:

{
  "score": 0.85,
  "item_detail": [
    {"criterion": "答案正确性", "single_score": 0.9, "weight": 0.8, "reason": "答案正确,正方形时面积最大为144"},
    {"criterion": "公式使用", "single_score": 0.8, "weight": 0.15, "reason": "使用了周长公式但未明确写出"},
    {"criterion": "逻辑完整性", "single_score": 0.7, "weight": 0.05, "reason": "推理步骤较简略"}
  ],
  "total_reason": "回答正确且核心推理完整,但公式展示和推理步骤可更详细"
}

Training Details

Hyperparameter Value
Method LoRA + OPSD (Online Policy Self-Distillation)
LoRA Rank 64
LoRA Alpha 128
Learning Rate 1e-5
Epochs 1
Batch Size 1 × 8 (grad accum) × 8 GPUs = effective 64
Max Sequence Length 4096
Max Completion Length 2048
Temperature 1.0
OPSD Beta 0.5
Hardware 8 × NVIDIA H20-98G

Training Data

A mixture of 4 evaluation/feedback datasets:

Limitations

  • Optimized for rubric-based scoring tasks; may not generalize well to open-ended generation
  • Best performance with structured output prompts specifying JSON format
  • Score calibration may vary across different rubric scales

Citation

If you find this model useful, please cite:

@misc{qwen36-judgeopsd-0604,
  title={Qwen3.6-27B-JudgeOPSD-0604},
  author={Uranus},
  year={2026},
  url={https://huggingface.co/Uranus/Qwen3.6-27B-JudgeOPSD-0604}
}

License

This model inherits the Apache 2.0 License from the base model.

Downloads last month
10
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for Uranus/Qwen3.6-27B-JudgeOPSD-0604

Base model

Qwen/Qwen3.6-27B
Adapter
(127)
this model

Datasets used to train Uranus/Qwen3.6-27B-JudgeOPSD-0604