Multi-Domain Reward Model Mistral-7B-Instruct

This is a multi-domain reward model built from weqweasdas/RM-Mistral-7B. It combines 23 fine-grained regression objectives across coherence, commonsense, empathy, and multicultural response quality with a prompt-conditioned gating network that produces a single preference score.

The checkpoint was packaged with the custom RewardModelWithGating architecture used in the Multi-Domain Reward Model project.

Intended Use

Use this model to score and compare assistant responses when the evaluation should account for multiple quality dimensions rather than a single generic helpfulness score. The primary use case is reward modeling or offline response ranking for chat-style data.

Training Data

The model uses multi-objective scoring and preference data from:

multidomain_data_scoring
RLHFlow/UltraFeedback-preference-standard
allenai/reward-bench for evaluation

Evaluation

Preference accuracy by domain:

Domain	Accuracy (%)
Coherence	85.2052
Commonsense	97.8402
Empathy	95.1549
Multicultural	84.6998

Usage Example

This checkpoint uses the project's custom RewardModelWithGating class. Run the example from an environment where multidomain_model/modeling_custom.py is importable.

import torch
from transformers import AutoTokenizer
from modeling_custom import RewardModelWithGating

model_id = "mario-rc/multi-domain-rm-mistral-7b-it"
dtype = torch.bfloat16 if torch.cuda.is_available() else torch.float32
device_map = {"": 0} if torch.cuda.is_available() else None

tokenizer = AutoTokenizer.from_pretrained(model_id, use_fast=True)
model = RewardModelWithGating.from_pretrained(
    model_id,
    device_map=device_map,
    dtype=dtype,
).eval()
device = next(model.parameters()).device

messages = [
    {"role": "user", "content": "I failed an important exam and feel awful."},
    {"role": "assistant", "content": "I'm sorry. That is a hard setback, but it does not define your ability. Take a little time to recover, then we can make a concrete study plan for the next attempt."},
]

encoded = tokenizer.apply_chat_template(
    messages,
    return_tensors="pt",
    padding=True,
    truncation=True,
    max_length=4096,
)
inputs = {"input_ids": encoded.to(device)} if isinstance(encoded, torch.Tensor) else {
    key: value.to(device) for key, value in encoded.items()
}

with torch.no_grad():
    score = model(**inputs).score.float().item()

print(score)

Limitations

This is a reward model, not a standalone chat assistant. Scores are intended for relative comparison and should be calibrated for each downstream use case. The model inherits limitations from its base model and from the annotation coverage of the multi-domain datasets, especially for cultural contexts not represented in the evaluation data.

Credits

This model is based on the ArmoRM/RLHFlow reward-modeling approach and adapts it to custom multi-domain attributes for coherence, commonsense, empathy, and multicultural response quality.