REBEL-Llama-3 / README.md
GitBag's picture
Update README.md
182b517 verified
|
raw
history blame
No virus
2.19 kB
metadata
license: apache-2.0
datasets:
  - openbmb/UltraFeedback
language:
  - en

This is a model released for our paper: REBEL: Reinforcement Learning via Regressing Relative Rewards.

REBEL-Llama-3

This model is developed with REBEL based on Meta-Llama-3-8B-Instruct with FsfairX-LLaMA3-RM-v0.1 as the reward model and UltraFeedback dataset. The training code is available at https://github.com/ZhaolinGao/REBEL.

Links to Other Model

REBEL-OpenChat-3.5

REBEL-Llama-3-epoch_2

AlpacaEval 2.0 Evaluations

Model AlpacaEval 2.0
LC Win Rate
AlpacaEval 2.0
Win Rate
REBEL-OpenChat-3.5 17.3 12.8
REBEL-Llama-3 30.1 32.6

MT-Bench Evaluations

Model MT-Bench
1st Turn
MT-Bench
2nd Turn
MT-Bench
Average
REBEL-OpenChat-3.5 8.54 7.58 8.06
REBEL-Llama-3 8.63 7.69 8.16

Open LLM Leaderboard Evaluations

Model MMLU
(5-shot)
GSM8K
(5-shot)
Arc
(25-shot)
Winogrande
(5-shot)
TruthfulQA
(0-shot)
HellaSway
(10-shot)
Average
REBEL-OpenChat-3.5 63.7 68.8 64.3 80.4 48.2 85.0 68.4
REBEL-Llama-3 65.8 75.6 61.7 75.8 51.7 78.8 68.2

Citation

Please cite our paper if you use this model in your own work:

@misc{gao2024rebel,
      title={REBEL: Reinforcement Learning via Regressing Relative Rewards}, 
      author={Zhaolin Gao and Jonathan D. Chang and Wenhao Zhan and Owen Oertell and Gokul Swamy and Kianté Brantley and Thorsten Joachims and J. Andrew Bagnell and Jason D. Lee and Wen Sun},
      year={2024},
      eprint={2404.16767},
      archivePrefix={arXiv},
      primaryClass={cs.LG}
}