GitBag commited on
Commit
d23848f
1 Parent(s): e4a5b85

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +53 -1
README.md CHANGED
@@ -5,4 +5,56 @@ datasets:
5
  language:
6
  - en
7
  ---
8
- This is a model released for our paper: [REBEL: Reinforcement Learning via Regressing Relative Rewards](https://arxiv.org/abs/2404.16767). Please refer to our [repository](https://github.com/ZhaolinGao/REBEL) for more details.
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
5
  language:
6
  - en
7
  ---
8
+ This is a model released for our paper: [REBEL: Reinforcement Learning via Regressing Relative Rewards](https://arxiv.org/abs/2404.16767).
9
+
10
+ Please refer to our [repository](https://github.com/ZhaolinGao/REBEL) for more details.
11
+
12
+ # REBEL-Llama-3
13
+
14
+ This model is developed with [REBEL](https://arxiv.org/abs/2404.16767) based on [Meta-Llama-3-8B-Instruct](https://huggingface.co/meta-llama/Meta-Llama-3-8B-Instruct) with [FsfairX-LLaMA3-RM-v0.1](https://huggingface.co/sfairXC/FsfairX-LLaMA3-RM-v0.1) as the reward model and [UltraFeedback](https://huggingface.co/datasets/openbmb/UltraFeedback) dataset.
15
+ The training code is available at https://github.com/ZhaolinGao/REBEL.
16
+
17
+ ### Links to Other Model
18
+
19
+ [REBEL-OpenChat-3.5](https://huggingface.co/Cornell-AGI/REBEL-OpenChat-3.5)
20
+
21
+ ### AlpacaEval 2.0 Evaluations
22
+
23
+ | Model | AlpacaEval 2.0<br>LC Win Rate | AlpacaEval 2.0<br>Win Rate |
24
+ | :--------: | :--------: | :--------: |
25
+ | REBEL-OpenChat-3.5| 17.3 | 12.8 |
26
+ | REBEL-Llama-3 | 30.1 | 32.6 |
27
+
28
+ ### MT-Bench Evaluations
29
+
30
+ | Model | MT-Bench<br>1st Turn | MT-Bench<br>2nd Turn | MT-Bench<br>Average |
31
+ | :--------: | :--------: | :--------: | :--------: |
32
+ | REBEL-OpenChat-3.5 | 8.54 | 7.58 | 8.06 |
33
+ | REBEL-Llama-3 | 8.63 | 7.69 | 8.16 |
34
+
35
+ ### Open LLM Leaderboard Evaluations
36
+
37
+ | Model | MMLU<br>(5-shot) | GSM8K<br>(5-shot) | Arc<br>(25-shot) | Winogrande<br>(5-shot) | TruthfulQA<br>(0-shot) | HellaSway<br>(10-shot) | Average
38
+ | :--------: | :--------: | :--------: | :--------: | :--------: | :--------: | :--------: | :--------: |
39
+ | REBEL-OpenChat-3.5 | 63.7 | 68.8 | 64.3 | 80.4 | 48.2 | 85.0 | 68.4 |
40
+ | REBEL-Llama-3 | 65.8 | 75.6 | 61.7 | 75.8 | 51.7 | 78.8 | 68.2 |
41
+
42
+ ## Citation
43
+ Please cite our paper if you use this model in your own work:
44
+ ```
45
+ @misc{gao2024rebel,
46
+ title={REBEL: Reinforcement Learning via Regressing Relative Rewards},
47
+ author={Zhaolin Gao and Jonathan D. Chang and Wenhao Zhan and Owen Oertell and Gokul Swamy and Kianté Brantley and Thorsten Joachims and J. Andrew Bagnell and Jason D. Lee and Wen Sun},
48
+ year={2024},
49
+ eprint={2404.16767},
50
+ archivePrefix={arXiv},
51
+ primaryClass={cs.LG}
52
+ }
53
+ ```
54
+
55
+
56
+
57
+
58
+
59
+
60
+