mightbe
/

Better-PairRM

Inference Endpoints

Model card Files Files and versions Community

Better-PairRM / README.md

maywell's picture

Update README.md

a775416 verified 3 months ago

|

raw history blame

No virus

4.16 kB

metadata

license: apache-2.0

Better Implementation for PairRM

Introduction

This version of PairRM have some fixes on training process, which improve model's performance significantly.

Minor Fixes

Longer Context Length (2048 -> 3370)

Thanks to deberta's tokenzer, original PairRM model had enough Context Length.

But, the longer the better :>

Major Fixes

Change Prompt Format

Why use something like

<Response i + 1> {response}

So, I changed to a format based on Vicuna 1.1.

Change Truncate side

The original process was using right side truncate even on Input. This can cause serious problem when Input exceeds model's context length.

Dataset Filter

There was decent amount of empty assistant response on original dataset. So, I dropped them.

Statistics

Context length

PairRanker type	Source max length	Candidate max length	Total max length
pair-ranker	128	128	384
PairRM	1224	412	2048
Better-PairRM (This model)	2030	670	3370

Performance

Reward-Bench by AllenAI

Metric	llm-blender/PairRM-hf	maywell/Better-PairRM
model	llm-blender/PairRM-hf	maywell/Better-PairRM
model_type	Custom Classifier	Custom Classifier
alpacaeval-length	0.758	0.863
alpacaeval-hard	0.979	1.000
alpacaeval-easy	0.970	0.990
donotanswer	0.360	0.522
hep-cpp	0.628	0.646
hep-go	0.689	0.713
hep-java	0.628	0.713
hep-js	0.604	0.707
hep-python	0.646	0.713
hep-rust	0.652	0.726
llmbar-adver-GPTInst	0.304	0.141
llmbar-adver-GPTOut	0.596	0.447
llmbar-adver-manual	0.500	0.261
llmbar-adver-neighbor	0.433	0.276
llmbar-natural	0.800	0.720
math-prm	0.333	0.295
mt-bench-hard	0.649	0.703
mt-bench-med	0.900	1.000
mt-bench-easy	0.964	0.929
refusals-dangerous	0.080	0.730
refusals-offensive	0.010	0.940
xstest-should-refuse	0.370	0.968
xstest-should-respond	0.952	0.876
average	0.600	0.690

Note - llmbar test score is bit weird across all models on Reward-Bench

Thanks to

Sionic AI for providing the A100 cluster.

Contact

Discord Server Link