BusinessGPT Reward Model (rubert-base)

Trained on 648 preference pairs from BusinessGPT v16 multi-candidate labeling.

Eval

  • Held-out pairwise accuracy (50 pairs): 0.880
  • Margin (chosen - rejected): mean=2.938, median=3.134

Use

from transformers import AutoTokenizer, AutoModelForSequenceClassification
tok = AutoTokenizer.from_pretrained("vXofi/businessgpt-reward-rubert")
tok.truncation_side = "left"
mdl = AutoModelForSequenceClassification.from_pretrained("vXofi/businessgpt-reward-rubert")

# Score a single (prompt, response) pair:
parts = [f"<|im_start|>{m['role']}\n{m['content']}<|im_end|>" for m in prompt_messages]
parts.append(f"<|im_start|>assistant\n{response}<|im_end|>")
text = "\n".join(parts)
enc = tok(text, truncation=True, max_length=512, return_tensors="pt")
score = mdl(**enc).logits.squeeze().item()

Use for best-of-N re-ranking at inference time.

Downloads last month
20
Safetensors
Model size
0.2B params
Tensor type
F32
·
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for vXofi/businessgpt-reward-rubert

Finetuned
(69)
this model