Better-PairRM / README.md

Update README.md

a775416 verified 4 months ago

4.16 kB

	---
	license: apache-2.0
	---
	# Better Implementation for [PairRM](https://huggingface.co/llm-blender/PairRM)

	## Introduction

	This version of PairRM have some fixes on training process, which improve model's performance significantly.

	### Minor Fixes

	- Longer Context Length (2048 -> 3370)

	Thanks to deberta's tokenzer, original PairRM model had enough Context Length.

	But, the longer the better :>

	---

	### Major Fixes

	- Change Prompt Format

	Why use something like
	```
	<Response i + 1> {response}
	```

	So, I changed to a format based on Vicuna 1.1.

	---

	- Change Truncate side

	The original process was using right side truncate even on Input. This can cause serious problem when Input exceeds model's context length.

	---

	- Dataset Filter

	There was decent amount of empty assistant response on original dataset. So, I dropped them.

	---

	## Statistics

	### Context length
	\| PairRanker type \| Source max length \| Candidate max length \| Total max length \|
	\|:-----------------:\|:-----------------:\|----------------------\|------------------\|
	\| [pair-ranker](https://huggingface.co/llm-blender/pair-ranker) \| 128 \| 128 \| 384 \|
	\| [PairRM](https://huggingface.co/llm-blender/pair-reward-model/) \| 1224 \| 412 \| 2048 \|
	\| [Better-PairRM](https://huggingface.co/maywell/Better-PairRM/) (This model) \| 2030 \| 670 \| 3370 \|

	### Performance

	#### Reward-Bench by AllenAI

	\| Metric \| llm-blender/PairRM-hf \| maywell/Better-PairRM \|
	\|----------------------------\|------------------------\|------------------------\|
	\| model \| llm-blender/PairRM-hf \| maywell/Better-PairRM \|
	\| model_type \| Custom Classifier \| Custom Classifier \|
	\| alpacaeval-length \| 0.758 \| 0.863 \|
	\| alpacaeval-hard \| 0.979 \| 1.000 \|
	\| alpacaeval-easy \| 0.970 \| 0.990 \|
	\| donotanswer \| 0.360 \| 0.522 \|
	\| hep-cpp \| 0.628 \| 0.646 \|
	\| hep-go \| 0.689 \| 0.713 \|
	\| hep-java \| 0.628 \| 0.713 \|
	\| hep-js \| 0.604 \| 0.707 \|
	\| hep-python \| 0.646 \| 0.713 \|
	\| hep-rust \| 0.652 \| 0.726 \|
	\| llmbar-adver-GPTInst \| 0.304 \| 0.141 \|
	\| llmbar-adver-GPTOut \| 0.596 \| 0.447 \|
	\| llmbar-adver-manual \| 0.500 \| 0.261 \|
	\| llmbar-adver-neighbor \| 0.433 \| 0.276 \|
	\| llmbar-natural \| 0.800 \| 0.720 \|
	\| math-prm \| 0.333 \| 0.295 \|
	\| mt-bench-hard \| 0.649 \| 0.703 \|
	\| mt-bench-med \| 0.900 \| 1.000 \|
	\| mt-bench-easy \| 0.964 \| 0.929 \|
	\| refusals-dangerous \| 0.080 \| 0.730 \|
	\| refusals-offensive \| 0.010 \| 0.940 \|
	\| xstest-should-refuse \| 0.370 \| 0.968 \|
	\| xstest-should-respond \| 0.952 \| 0.876 \|
	\| average \| 0.600 \| 0.690 \|

	> Note - llmbar test score is bit weird across all models on [Reward-Bench](https://huggingface.co/spaces/allenai/reward-bench)

	## Thanks to

	- [Sionic AI](https://sionic.ai/) for providing the A100 cluster.

	## Contact

	- [Discord Server Link](https://discord.gg/MrBt3PXdXc)

	---
	license: apache-2.0
	---
	# Better Implementation for [PairRM](https://huggingface.co/llm-blender/PairRM)

	## Introduction

	This version of PairRM have some fixes on training process, which improve model's performance significantly.

	### Minor Fixes

	- Longer Context Length (2048 -> 3370)

	Thanks to deberta's tokenzer, original PairRM model had enough Context Length.

	But, the longer the better :>

	---

	### Major Fixes

	- Change Prompt Format

	Why use something like
	```
	<Response i + 1> {response}
	```

	So, I changed to a format based on Vicuna 1.1.

	---

	- Change Truncate side

	The original process was using right side truncate even on Input. This can cause serious problem when Input exceeds model's context length.

	---

	- Dataset Filter

	There was decent amount of empty assistant response on original dataset. So, I dropped them.

	---

	## Statistics

	### Context length
	\| PairRanker type \| Source max length \| Candidate max length \| Total max length \|
	\|:-----------------:\|:-----------------:\|----------------------\|------------------\|
	\| [pair-ranker](https://huggingface.co/llm-blender/pair-ranker) \| 128 \| 128 \| 384 \|
	\| [PairRM](https://huggingface.co/llm-blender/pair-reward-model/) \| 1224 \| 412 \| 2048 \|
	\| [Better-PairRM](https://huggingface.co/maywell/Better-PairRM/) (This model) \| 2030 \| 670 \| 3370 \|

	### Performance

	#### Reward-Bench by AllenAI

	\| Metric \| llm-blender/PairRM-hf \| maywell/Better-PairRM \|
	\|----------------------------\|------------------------\|------------------------\|
	\| model \| llm-blender/PairRM-hf \| maywell/Better-PairRM \|
	\| model_type \| Custom Classifier \| Custom Classifier \|
	\| alpacaeval-length \| 0.758 \| 0.863 \|
	\| alpacaeval-hard \| 0.979 \| 1.000 \|
	\| alpacaeval-easy \| 0.970 \| 0.990 \|
	\| donotanswer \| 0.360 \| 0.522 \|
	\| hep-cpp \| 0.628 \| 0.646 \|
	\| hep-go \| 0.689 \| 0.713 \|
	\| hep-java \| 0.628 \| 0.713 \|
	\| hep-js \| 0.604 \| 0.707 \|
	\| hep-python \| 0.646 \| 0.713 \|
	\| hep-rust \| 0.652 \| 0.726 \|
	\| llmbar-adver-GPTInst \| 0.304 \| 0.141 \|
	\| llmbar-adver-GPTOut \| 0.596 \| 0.447 \|
	\| llmbar-adver-manual \| 0.500 \| 0.261 \|
	\| llmbar-adver-neighbor \| 0.433 \| 0.276 \|
	\| llmbar-natural \| 0.800 \| 0.720 \|
	\| math-prm \| 0.333 \| 0.295 \|
	\| mt-bench-hard \| 0.649 \| 0.703 \|
	\| mt-bench-med \| 0.900 \| 1.000 \|
	\| mt-bench-easy \| 0.964 \| 0.929 \|
	\| refusals-dangerous \| 0.080 \| 0.730 \|
	\| refusals-offensive \| 0.010 \| 0.940 \|
	\| xstest-should-refuse \| 0.370 \| 0.968 \|
	\| xstest-should-respond \| 0.952 \| 0.876 \|
	\| average \| 0.600 \| 0.690 \|

	> Note - llmbar test score is bit weird across all models on [Reward-Bench](https://huggingface.co/spaces/allenai/reward-bench)

	## Thanks to

	- [Sionic AI](https://sionic.ai/) for providing the A100 cluster.

	## Contact

	- [Discord Server Link](https://discord.gg/MrBt3PXdXc)