This is a model released for our paper: REBEL: Reinforcement Learning via Regressing Relative Rewards.

REBEL-Llama-3

This model is developed with REBEL based on OpenChat-3.5 with Starling-RM-7B-alpha as the reward model and Nectar dataset. The training code is available at https://github.com/ZhaolinGao/REBEL.

Links to Other Model

REBEL-Llama-3

REBEL-Llama-3-epoch_2

AlpacaEval 2.0 Evaluations

Model	AlpacaEval 2.0 LC Win Rate	AlpacaEval 2.0 Win Rate
REBEL-OpenChat-3.5	17.3	12.8
REBEL-Llama-3	30.1	32.6
REBEL-Llama-3-epoch_2	31.33	34.22

MT-Bench Evaluations

Model	MT-Bench 1st Turn	MT-Bench 2nd Turn	MT-Bench Average
REBEL-OpenChat-3.5	8.54	7.58	8.06
REBEL-Llama-3	8.63	7.69	8.16

Open LLM Leaderboard Evaluations

Model	MMLU (5-shot)	GSM8K (5-shot)	Arc (25-shot)	Winogrande (5-shot)	TruthfulQA (0-shot)	HellaSway (10-shot)	Average
REBEL-OpenChat-3.5	63.7	68.8	64.3	80.4	48.2	85.0	68.4
REBEL-Llama-3	65.8	75.6	61.7	75.8	51.7	78.8	68.2

Citation

Please cite our paper if you use this model in your own work:

@misc{gao2024rebel,
      title={REBEL: Reinforcement Learning via Regressing Relative Rewards}, 
      author={Zhaolin Gao and Jonathan D. Chang and Wenhao Zhan and Owen Oertell and Gokul Swamy and Kianté Brantley and Thorsten Joachims and J. Andrew Bagnell and Jason D. Lee and Wen Sun},
      year={2024},
      eprint={2404.16767},
      archivePrefix={arXiv},
      primaryClass={cs.LG}
}

Cornell-AGI
/

REBEL-OpenChat-3.5

REBEL-Llama-3

Links to Other Model

AlpacaEval 2.0 Evaluations

MT-Bench Evaluations

Open LLM Leaderboard Evaluations

Citation

Dataset used to train Cornell-AGI/REBEL-OpenChat-3.5

Collection including Cornell-AGI/REBEL-OpenChat-3.5

REBEL