jieliu
/

Storm-7B

@@ -1,16 +1,19 @@
-\---
 license: apache-2.0
-\---
-### Storm-7B
-> **Developed by**: [Jie Liu](https://jieliu.site/)$^{*1,2}$, [Zhanhui Zhou](https://scholar.google.com/citations?user=SbACfYQAAAAJ&hl=zh-CN)$^{*2}$, [Chao Yang](https://scholar.google.com/citations?user=5KRbHPMAAAAJ&hl=zh-CN)$^{2}$, [Han-Sen Zhong](https://scholar.google.com.hk/citations?user=X_ZfX8sAAAAJ&hl=zh-CN)$^{2}$, and [Wanli Ouyang](https://wlouyang.github.io/)$^{1,2}$.
->
-> $^{1}$MMLab, The Chinese University of Hong Kong  $^{2}$Shanghai AI Laboratory
-#### Introduction
 We released Storm-7B, the first open-source language model comparable to the GPT-4 series on the [AlpacaEval 2.0](https://tatsu-lab.github.io/alpaca_eval/) leaderboard, ranking 3rd in length-controlled win rate.
@@ -45,7 +48,7 @@ We also conducted preliminary evaluations on other benchmarks and observed no si
 | Mistral-7B-v0.1   | 59.98 | 83.31     | 64.16 | 42.15      | 78.37      | 65.59 |
 | Qwen-7b           | 51.37 | 78.47     | 59.84 | 47.79      | 72.69      | 62.03 |
-#### Uses
 Our model uses the same chat template as [Openchat-3.5-0106](https://huggingface.co/openchat/openchat-3.5-0106). A sample code snippet for inference using our model is provided below.
@@ -78,11 +81,11 @@ response_text = generate_response(input_prompt)
 print("Response:", response_text)
 ```
-#### Limitations
 Storm-7B is a quick demonstration that a language model, fine-tuned with AI feedback, can easily surpass or match state-of-the-art models, as assessed by the same AI feedback. However, this improvement on the automatic leaderboard may not necessarily indicate better alignment with human intentions. Our model therefore represents a critical, preliminary reevaluation of the RLAIF paradigm, questioning how much learning from and being evaluated by AI feedback aligns with actual human preferences.
-#### Citation
 ```
 @misc{liu2024storm,
@@ -92,5 +95,4 @@ Storm-7B is a quick demonstration that a language model, fine-tuned with AI feed
     month = {April},
     year = {2024}
 }
-```

+---
 license: apache-2.0
+library_name: transformers
+tags:
+- storm
+- mistral
+- openchat
+- RLAIF
+- reward model
+---
+# Storm-7B
+- **Developed by**: [Jie Liu](https://jieliu.site/) \\(^{*1,2}\\), [Zhanhui Zhou](https://scholar.google.com/citations?user=SbACfYQAAAAJ&hl=zh-CN) \\(^{*2}\\), [Chao Yang](https://scholar.google.com/citations?user=5KRbHPMAAAAJ&hl=zh-CN) \\(^{2}\\), [Han-Sen Zhong](https://scholar.google.com.hk/citations?user=X_ZfX8sAAAAJ&hl=zh-CN) \\(^{2}\\), and [Wanli Ouyang](https://wlouyang.github.io/) \\(^{1,2}\\).
+- \\(^{1}\\)MMLab, The Chinese University of Hong Kong &ensp;  \\(^{2}\\)Shanghai AI Laboratory
+## Introduction
 We released Storm-7B, the first open-source language model comparable to the GPT-4 series on the [AlpacaEval 2.0](https://tatsu-lab.github.io/alpaca_eval/) leaderboard, ranking 3rd in length-controlled win rate.
 | Mistral-7B-v0.1   | 59.98 | 83.31     | 64.16 | 42.15      | 78.37      | 65.59 |
 | Qwen-7b           | 51.37 | 78.47     | 59.84 | 47.79      | 72.69      | 62.03 |
+## Uses
 Our model uses the same chat template as [Openchat-3.5-0106](https://huggingface.co/openchat/openchat-3.5-0106). A sample code snippet for inference using our model is provided below.
 print("Response:", response_text)
 ```
+## Limitations
 Storm-7B is a quick demonstration that a language model, fine-tuned with AI feedback, can easily surpass or match state-of-the-art models, as assessed by the same AI feedback. However, this improvement on the automatic leaderboard may not necessarily indicate better alignment with human intentions. Our model therefore represents a critical, preliminary reevaluation of the RLAIF paradigm, questioning how much learning from and being evaluated by AI feedback aligns with actual human preferences.
+## Citation
 ```
 @misc{liu2024storm,
     month = {April},
     year = {2024}
 }
+```