openbmb/UltraFeedback
Viewer • Updated • 64k • 5.39k • 421
How to use Uranus/Qwen3.6-27B-JudgeOPSD-0604 with PEFT:
from peft import PeftModel
from transformers import AutoModelForCausalLM
base_model = AutoModelForCausalLM.from_pretrained("Qwen/Qwen3.6-27B")
model = PeftModel.from_pretrained(base_model, "Uranus/Qwen3.6-27B-JudgeOPSD-0604")A rubric-based Judge Model fine-tuned from Qwen/Qwen3.6-27B using Online Policy Self-Distillation (OPSD) with LoRA.
This model is trained to serve as a general-purpose evaluation judge that scores responses based on user-specified rubrics. It supports arbitrary input formats — you only need to specify the desired output format in your prompt.
Key Features:
from peft import PeftModel
from transformers import AutoModelForCausalLM, AutoTokenizer
base_model = AutoModelForCausalLM.from_pretrained(
"Qwen/Qwen3.6-27B",
torch_dtype="auto",
device_map="auto",
)
model = PeftModel.from_pretrained(base_model, "Uranus/Qwen3.6-27B-JudgeOPSD-0604")
tokenizer = AutoTokenizer.from_pretrained("Qwen/Qwen3.6-27B")
prompt = """你是专业评分法官,按rubric对QA多维度打分,输出严格JSON格式,不要多余内容。
问题:长方形周长48,最大面积是多少?
待评回答:周长48,长+宽=24,最大面积144,正方形时最大。
评分维度:1.答案正确性(权重0.8) 2.公式使用(权重0.15) 3.逻辑完整性(权重0.05)
输出格式:{"score":0~1,"item_detail":[{"criterion":"","single_score":0~1,"weight":0~1,"reason":""}],"total_reason":""}"""
messages = [{"role": "user", "content": prompt}]
text = tokenizer.apply_chat_template(messages, tokenize=False, add_generation_prompt=True)
inputs = tokenizer(text, return_tensors="pt").to(model.device)
outputs = model.generate(**inputs, max_new_tokens=2048, temperature=0.7)
response = tokenizer.decode(outputs[0][inputs["input_ids"].shape[-1]:], skip_special_tokens=True)
print(response)
from vllm import LLM, SamplingParams
llm = LLM(
model="Qwen/Qwen3.6-27B",
enable_lora=True,
max_lora_rank=64,
max_model_len=4096,
)
sampling_params = SamplingParams(temperature=0.7, max_tokens=2048)
from vllm.lora.request import LoRARequest
lora_request = LoRARequest("judge", 1, "Uranus/Qwen3.6-27B-JudgeOPSD-0604")
prompt = """你是专业评分法官,按rubric对QA多维度打分,输出严格JSON格式,不要多余内容。
问题:什么是光合作用?
待评回答:光合作用是植物利用阳光将二氧化碳和水转化为葡萄糖和氧气的过程。
评分维度:1.准确性(权重0.6) 2.完整性(权重0.3) 3.表达清晰度(权重0.1)
输出格式:{"score":0~1,"item_detail":[{"criterion":"","single_score":0~1,"weight":0~1,"reason":""}],"total_reason":""}"""
outputs = llm.generate(prompt, sampling_params, lora_request=lora_request)
print(outputs[0].outputs[0].text)
The model is flexible with input format. A typical prompt structure:
你是专业评分法官,按rubric对QA多维度打分,输出严格JSON格式,不要多余内容。
问题:{question}
待评回答:{answer}
评分维度:{rubric_dimensions}
输出格式:{desired_json_schema}
Expected Output Example:
{
"score": 0.85,
"item_detail": [
{"criterion": "答案正确性", "single_score": 0.9, "weight": 0.8, "reason": "答案正确,正方形时面积最大为144"},
{"criterion": "公式使用", "single_score": 0.8, "weight": 0.15, "reason": "使用了周长公式但未明确写出"},
{"criterion": "逻辑完整性", "single_score": 0.7, "weight": 0.05, "reason": "推理步骤较简略"}
],
"total_reason": "回答正确且核心推理完整,但公式展示和推理步骤可更详细"
}
| Hyperparameter | Value |
|---|---|
| Method | LoRA + OPSD (Online Policy Self-Distillation) |
| LoRA Rank | 64 |
| LoRA Alpha | 128 |
| Learning Rate | 1e-5 |
| Epochs | 1 |
| Batch Size | 1 × 8 (grad accum) × 8 GPUs = effective 64 |
| Max Sequence Length | 4096 |
| Max Completion Length | 2048 |
| Temperature | 1.0 |
| OPSD Beta | 0.5 |
| Hardware | 8 × NVIDIA H20-98G |
A mixture of 4 evaluation/feedback datasets:
If you find this model useful, please cite:
@misc{qwen36-judgeopsd-0604,
title={Qwen3.6-27B-JudgeOPSD-0604},
author={Uranus},
year={2026},
url={https://huggingface.co/Uranus/Qwen3.6-27B-JudgeOPSD-0604}
}
This model inherits the Apache 2.0 License from the base model.
Base model
Qwen/Qwen3.6-27B