Transformers
Safetensors
English
deberta-v2
reward_model
reward-model
RLHF
evaluation
llm
instruction
reranking
Inference Endpoints
Better-PairRM / README.md
maywell's picture
Update README.md
a775416 verified
metadata
license: apache-2.0

Better Implementation for PairRM

Introduction

This version of PairRM have some fixes on training process, which improve model's performance significantly.

Minor Fixes

  • Longer Context Length (2048 -> 3370)

Thanks to deberta's tokenzer, original PairRM model had enough Context Length.

But, the longer the better :>


Major Fixes

  • Change Prompt Format

Why use something like

<Response i + 1> {response}

So, I changed to a format based on Vicuna 1.1.


  • Change Truncate side

The original process was using right side truncate even on Input. This can cause serious problem when Input exceeds model's context length.


  • Dataset Filter

There was decent amount of empty assistant response on original dataset. So, I dropped them.


Statistics

Context length

PairRanker type Source max length Candidate max length Total max length
pair-ranker 128 128 384
PairRM 1224 412 2048
Better-PairRM (This model) 2030 670 3370

Performance

Reward-Bench by AllenAI

Metric llm-blender/PairRM-hf maywell/Better-PairRM
model llm-blender/PairRM-hf maywell/Better-PairRM
model_type Custom Classifier Custom Classifier
alpacaeval-length 0.758 0.863
alpacaeval-hard 0.979 1.000
alpacaeval-easy 0.970 0.990
donotanswer 0.360 0.522
hep-cpp 0.628 0.646
hep-go 0.689 0.713
hep-java 0.628 0.713
hep-js 0.604 0.707
hep-python 0.646 0.713
hep-rust 0.652 0.726
llmbar-adver-GPTInst 0.304 0.141
llmbar-adver-GPTOut 0.596 0.447
llmbar-adver-manual 0.500 0.261
llmbar-adver-neighbor 0.433 0.276
llmbar-natural 0.800 0.720
math-prm 0.333 0.295
mt-bench-hard 0.649 0.703
mt-bench-med 0.900 1.000
mt-bench-easy 0.964 0.929
refusals-dangerous 0.080 0.730
refusals-offensive 0.010 0.940
xstest-should-refuse 0.370 0.968
xstest-should-respond 0.952 0.876
average 0.600 0.690

Note - llmbar test score is bit weird across all models on Reward-Bench

Thanks to

Contact