Update README.md
Browse files
README.md
CHANGED
@@ -1,16 +1,19 @@
|
|
1 |
-
|
2 |
-
|
3 |
license: apache-2.0
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
4 |
|
5 |
-
|
6 |
-
|
7 |
-
|
8 |
-
|
9 |
-
> **Developed by**: [Jie Liu](https://jieliu.site/)$^{*1,2}$, [Zhanhui Zhou](https://scholar.google.com/citations?user=SbACfYQAAAAJ&hl=zh-CN)$^{*2}$, [Chao Yang](https://scholar.google.com/citations?user=5KRbHPMAAAAJ&hl=zh-CN)$^{2}$, [Han-Sen Zhong](https://scholar.google.com.hk/citations?user=X_ZfX8sAAAAJ&hl=zh-CN)$^{2}$, and [Wanli Ouyang](https://wlouyang.github.io/)$^{1,2}$.
|
10 |
-
>
|
11 |
-
> $^{1}$MMLab, The Chinese University of Hong Kong $^{2}$Shanghai AI Laboratory
|
12 |
|
13 |
-
|
14 |
|
15 |
We released Storm-7B, the first open-source language model comparable to the GPT-4 series on the [AlpacaEval 2.0](https://tatsu-lab.github.io/alpaca_eval/) leaderboard, ranking 3rd in length-controlled win rate.
|
16 |
|
@@ -45,7 +48,7 @@ We also conducted preliminary evaluations on other benchmarks and observed no si
|
|
45 |
| Mistral-7B-v0.1 | 59.98 | 83.31 | 64.16 | 42.15 | 78.37 | 65.59 |
|
46 |
| Qwen-7b | 51.37 | 78.47 | 59.84 | 47.79 | 72.69 | 62.03 |
|
47 |
|
48 |
-
|
49 |
|
50 |
Our model uses the same chat template as [Openchat-3.5-0106](https://huggingface.co/openchat/openchat-3.5-0106). A sample code snippet for inference using our model is provided below.
|
51 |
|
@@ -78,11 +81,11 @@ response_text = generate_response(input_prompt)
|
|
78 |
print("Response:", response_text)
|
79 |
```
|
80 |
|
81 |
-
|
82 |
|
83 |
Storm-7B is a quick demonstration that a language model, fine-tuned with AI feedback, can easily surpass or match state-of-the-art models, as assessed by the same AI feedback. However, this improvement on the automatic leaderboard may not necessarily indicate better alignment with human intentions. Our model therefore represents a critical, preliminary reevaluation of the RLAIF paradigm, questioning how much learning from and being evaluated by AI feedback aligns with actual human preferences.
|
84 |
|
85 |
-
|
86 |
|
87 |
```
|
88 |
@misc{liu2024storm,
|
@@ -92,5 +95,4 @@ Storm-7B is a quick demonstration that a language model, fine-tuned with AI feed
|
|
92 |
month = {April},
|
93 |
year = {2024}
|
94 |
}
|
95 |
-
```
|
96 |
-
|
|
|
1 |
+
---
|
|
|
2 |
license: apache-2.0
|
3 |
+
library_name: transformers
|
4 |
+
tags:
|
5 |
+
- storm
|
6 |
+
- mistral
|
7 |
+
- openchat
|
8 |
+
- RLAIF
|
9 |
+
- reward model
|
10 |
+
---
|
11 |
|
12 |
+
# Storm-7B
|
13 |
+
- **Developed by**: [Jie Liu](https://jieliu.site/) \\(^{*1,2}\\), [Zhanhui Zhou](https://scholar.google.com/citations?user=SbACfYQAAAAJ&hl=zh-CN) \\(^{*2}\\), [Chao Yang](https://scholar.google.com/citations?user=5KRbHPMAAAAJ&hl=zh-CN) \\(^{2}\\), [Han-Sen Zhong](https://scholar.google.com.hk/citations?user=X_ZfX8sAAAAJ&hl=zh-CN) \\(^{2}\\), and [Wanli Ouyang](https://wlouyang.github.io/) \\(^{1,2}\\).
|
14 |
+
- \\(^{1}\\)MMLab, The Chinese University of Hong Kong   \\(^{2}\\)Shanghai AI Laboratory
|
|
|
|
|
|
|
|
|
15 |
|
16 |
+
## Introduction
|
17 |
|
18 |
We released Storm-7B, the first open-source language model comparable to the GPT-4 series on the [AlpacaEval 2.0](https://tatsu-lab.github.io/alpaca_eval/) leaderboard, ranking 3rd in length-controlled win rate.
|
19 |
|
|
|
48 |
| Mistral-7B-v0.1 | 59.98 | 83.31 | 64.16 | 42.15 | 78.37 | 65.59 |
|
49 |
| Qwen-7b | 51.37 | 78.47 | 59.84 | 47.79 | 72.69 | 62.03 |
|
50 |
|
51 |
+
## Uses
|
52 |
|
53 |
Our model uses the same chat template as [Openchat-3.5-0106](https://huggingface.co/openchat/openchat-3.5-0106). A sample code snippet for inference using our model is provided below.
|
54 |
|
|
|
81 |
print("Response:", response_text)
|
82 |
```
|
83 |
|
84 |
+
## Limitations
|
85 |
|
86 |
Storm-7B is a quick demonstration that a language model, fine-tuned with AI feedback, can easily surpass or match state-of-the-art models, as assessed by the same AI feedback. However, this improvement on the automatic leaderboard may not necessarily indicate better alignment with human intentions. Our model therefore represents a critical, preliminary reevaluation of the RLAIF paradigm, questioning how much learning from and being evaluated by AI feedback aligns with actual human preferences.
|
87 |
|
88 |
+
## Citation
|
89 |
|
90 |
```
|
91 |
@misc{liu2024storm,
|
|
|
95 |
month = {April},
|
96 |
year = {2024}
|
97 |
}
|
98 |
+
```
|
|