Edit model card

Mistral7B-PairRM-SPPO-ExPO

The extrapolated (ExPO) model based on UCLA-AGI/Mistral7B-PairRM-SPPO and mistralai/Mistral-7B-Instruct-v0.2, as in the "Weak-to-Strong Extrapolation Expedites Alignment" paper.

Specifically, we obtain this model by extrapolating (alpha = 0.3) from the weights of the SFT and DPO/RLHF checkpoints, achieving superior alignment with human preference.

This extrapolated model achieves the 35.4% win rate and 31.8% LC win rate on AlpacaEval 2.0, outperforming the original Mistral7B-PairRM-SPPO's 32.2% and 30.5%, respectively.

Evaluation Results

Evaluation results on the AlpacaEval 2.0 benchmark (you can find the evaluation outputs on the official GitHub repo):

Win Rate (Ori) LC Win Rate (Ori) Win Rate (+ ExPO) LC Win Rate (+ ExPO)
HuggingFaceH4/zephyr-7b-alpha 6.7% 10.0% 10.6% 13.6%
HuggingFaceH4/zephyr-7b-beta 10.2% 13.2% 11.1% 14.0%
berkeley-nest/Starling-LM-7B-alpha 15.0% 18.3% 18.2% 19.5%
Nexusflow/Starling-LM-7B-beta 26.6% 25.8% 29.6% 26.4%
snorkelai/Snorkel-Mistral-PairRM 24.7% 24.0% 28.8% 26.4%
RLHFlow/LLaMA3-iterative-DPO-final 29.2% 36.0% 32.7% 37.8%
internlm/internlm2-chat-1.8b 3.8% 4.0% 5.2% 4.3%
internlm/internlm2-chat-7b 20.5% 18.3% 28.1% 22.7%
internlm/internlm2-chat-20b 36.1% 24.9% 46.2% 27.2%
allenai/tulu-2-dpo-7b 8.5% 10.2% 11.5% 11.7%
allenai/tulu-2-dpo-13b 11.2% 15.5% 15.6% 17.6%
allenai/tulu-2-dpo-70b 15.4% 21.2% 23.0% 25.7%

Evaluation results on the MT-Bench benchmark (you can find the evaluation outputs on the official GitHub repo):

Original + ExPO
HuggingFaceH4/zephyr-7b-alpha 6.85 6.87
HuggingFaceH4/zephyr-7b-beta 7.02 7.06
berkeley-nest/Starling-LM-7B-alpha 7.82 7.91
Nexusflow/Starling-LM-7B-beta 8.10 8.18
snorkelai/Snorkel-Mistral-PairRM 7.63 7.69
RLHFlow/LLaMA3-iterative-DPO-final 8.08 8.45
internlm/internlm2-chat-1.8b 5.17 5.26
internlm/internlm2-chat-7b 7.72 7.80
internlm/internlm2-chat-20b 8.13 8.26
allenai/tulu-2-dpo-7b 6.35 6.38
allenai/tulu-2-dpo-13b 7.00 7.26
allenai/tulu-2-dpo-70b 7.79 8.03
Downloads last month
288
Safetensors
Model size
7.24B params
Tensor type
BF16
·
This model does not have enough activity to be deployed to Inference API (serverless) yet. Increase its social visibility and check back later, or deploy to Inference Endpoints (dedicated) instead.

Collection including chujiezheng/Mistral7B-PairRM-SPPO-ExPO