Edit model card

Model Card for Llama-3-OffsetBias-RM-8B

Llama-3-OffsetBias-RM-8B is a reward model trained on OffsetBias dataset. It is trained to be more robust on various evaluation biases commonly found in evaluation models. The model is introduced in paper OffsetBias: Leveraging Debiased Data for Tuning Evaluators.

Model Details

Model Description

Llama-3-OffsetBias-RM-8B uses sfairXC/FsfairX-LLaMA3-RM-v0.1 as base model, which is built with Meta Llama 3. An intermediate reward model is trained from from Llama-3-8B-Instruct using a subset of dataset used in training of FsfairX-LLaMA3-RM model, combined with NCSOFT/offsetbias dataset. The intermediate model is then merged with FsfairX-LLaMA3-RM model to create Llama-3-OffsetBias-RM-8B.

  • Developed by: NC Research
  • Language(s) (NLP): English
  • License: META LLAMA 3 COMMUNITY LICENSE AGREEMENT
  • Finetuned from model: sfairXC/FsfairX-LLaMA3-RM-v0.1

Model Sources

Uses

Direct Use

  from transformers import AutoTokenizer, pipeline

  model_name = "NCSOFT/Llama-3-OffsetBias-RM-8B"
  rm_tokenizer = AutoTokenizer.from_pretrained(model_name)
  rm_pipe = pipeline(
      "sentiment-analysis",
      model=model_name,
      device="auto",
      tokenizer=rm_tokenizer,
      model_kwargs={"torch_dtype": torch.bfloat16}
  )

  pipe_kwargs = {
      "return_all_scores": True,
      "function_to_apply": "none",
      "batch_size": 1
  }

  chat = [
   {"role": "user", "content": "Hello, how are you?"},
   {"role": "assistant", "content": "I'm doing great. How can I help you today?"},
   {"role": "user", "content": "I'd like to show off how chat templating works!"},
  ]

  test_texts = [tokenizer.apply_chat_template(chat, tokenize=False, add_generation_prompt=False).replace(tokenizer.bos_token, "")]
  pipe_outputs = rm_pipe(test_texts, **pipe_kwargs)
  rewards = [output[0]["score"] for output in pipe_outputs]

Evaluation

RewardBench Result

Metric Score
Chat 97.21
Chat Hard 80.70
Safety 89.01
Reasoning 90.60

EvalBiasBench Result

Metric Score
Length 82.4
Concreteness 92.9
Empty Reference 46.2
Content Continuation 100.0
Nested Instruction 83.3
Familiar Knowledge 58.3

Citation

@misc{park2024offsetbias,
      title={OffsetBias: Leveraging Debiased Data for Tuning Evaluators},
      author={Junsoo Park and Seungyeon Jwa and Meiying Ren and Daeyoung Kim and Sanghyuk Choi},
      year={2024},
      eprint={2407.06551},
      archivePrefix={arXiv},
      primaryClass={cs.CL}
}
Downloads last month
0
Safetensors
Model size
7.5B params
Tensor type
BF16
·
This model does not have enough activity to be deployed to Inference API (serverless) yet. Increase its social visibility and check back later, or deploy to Inference Endpoints (dedicated) instead.

Finetuned from

Datasets used to train NCSOFT/Llama-3-OffsetBias-RM-8B