nbl97 commited on
Commit
314cbab
1 Parent(s): aca3a4d

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +24 -13
README.md CHANGED
@@ -10,19 +10,23 @@ Xwin-LM: Powerful, Stable, and Reproducible LLM Alignment
10
  <a href="https://huggingface.co/Xwin-LM">
11
  <img src="https://img.shields.io/badge/%F0%9F%A4%97%20Hugging%20Face-Models-blue">
12
  </a>
 
 
 
13
  </p>
14
 
15
 
16
 
17
  **Step up your LLM alignment with Xwin-LM!**
18
 
19
- Xwin-LM aims to develop and open-source alignment technologies for large language models, including supervised fine-tuning (SFT), reward models, reject sampling, reinforcement learning, etc. Our first release, built-upon on the Llama2 base models, ranked **TOP-1** on [AlpacaEval](https://tatsu-lab.github.io/alpaca_eval/). Notably, it's **the first to surpass GPT-4** on this benchmark. The project will be continuously updated.
20
 
21
  ## News
22
 
23
- - :boom: [Sep, 2023] We released [Xwin-LM-70B-V0.1](https://huggingface.co/Xwin-LM/Xwin-LM-70B-V0.1), which has achieved a win-rate against Davinci-003 of **95.57%** on [AlpacaEval](https://tatsu-lab.github.io/alpaca_eval/) benchmark, ranking as **TOP-1** on AlpacaEval. **It was the FIRST model surpassing GPT-4** on [AlpacaEval](https://tatsu-lab.github.io/alpaca_eval/). Also note its winrate v.s. GPT-4 is **60.61**.
24
- - :boom: [Sep, 2023] We released [Xwin-LM-13B-V0.1](https://huggingface.co/Xwin-LM/Xwin-LM-13B-V0.1), which has achieved **91.76%** win-rate on [AlpacaEval](https://tatsu-lab.github.io/alpaca_eval/), ranking as **top-1** among all 13B models.
25
- - :boom: [Sep, 2023] We released [Xwin-LM-7B-V0.1](https://huggingface.co/Xwin-LM/Xwin-LM-7B-V0.1), which has achieved **87.82%** win-rate on [AlpacaEval](https://tatsu-lab.github.io/alpaca_eval/), ranking as **top-1** among all 7B models.
 
26
 
27
 
28
  ## Model Card
@@ -60,19 +64,22 @@ The table below displays the performance of Xwin-LM on [AlpacaEval](https://tats
60
 
61
  ### Xwin-LM performance on NLP foundation tasks.
62
 
63
- The following table provides a comparison of Xwin-LMs with other LLMs on NLP foundation tasks.
64
 
65
  | Model | MMLU 5-shot | ARC 25-shot | TruthfulQA 0-shot | HellaSwag 10-shot | Average |
66
  |------------------|-------------|-------------|-------------------|-------------------|------------|
67
- | Text-davinci-003 | <u>56.9<u/> | **85.2** | **59.3** | <u>82.2<u/> | **70.9** |
68
  |Vicuna-13b 1.1 | 51.3 | 53.0 | 51.8 | 80.1 | 59.1 |
69
- |Guanaco 30B | 57.6 | 63.7 | 50.7 | **85.1** | 64.3 |
70
  | WizardLM-7B 1.0 | 42.7 | 51.6 | 44.7 | 77.7 | 54.2 |
71
  | WizardLM-13B 1.0 | 52.3 | 57.2 | 50.5 | 81.0 | 60.2 |
72
- | WizardLM-30B 1.0 | **58.8** | <u>62.5<u/> | <u>52.4<u/> | 83.3 | <u>64.2<u/>|
73
- | **Xwin-LM-7B-V0.1** | 49.7 | 56.2 | 48.1 | 79.5 | 58.4 |
74
- | **Xwin-LM-13B-V0.1** | - | - | - | - | - |
75
- | **Xwin-LM-70B-V0.1** | - | - | - | - | - |
 
 
 
76
 
77
 
78
  ## Inference
@@ -85,7 +92,7 @@ A chat between a curious user and an artificial intelligence assistant. The assi
85
 
86
  ### HuggingFace Example
87
 
88
- ```
89
  from transformers import AutoTokenizer, AutoModelForCausalLM
90
 
91
  model = AutoModelForCausalLM.from_pretrained("Xwin-LM/Xwin-LM-7B-V0.1")
@@ -106,7 +113,7 @@ print(output)
106
 
107
  ### vllm Example
108
  Because Xwin-LM is based on Llama2, it also offers support for rapid inference using [vllm](https://github.com/vllm-project/vllm). Please refer to [vllm](https://github.com/vllm-project/vllm) for detailed installation instructions.
109
- ```
110
  from vllm import LLM, SamplingParams
111
  (
112
  prompt := "A chat between a curious user and an artificial intelligence assistant. "
@@ -124,6 +131,10 @@ for output in outputs:
124
  print(generated_text)
125
  ```
126
 
 
 
 
 
127
 
128
  ## Citation
129
  Please consider citing our work if you use the data or code in this repo.
 
10
  <a href="https://huggingface.co/Xwin-LM">
11
  <img src="https://img.shields.io/badge/%F0%9F%A4%97%20Hugging%20Face-Models-blue">
12
  </a>
13
+ <a href="https://github.com/Xwin-LM/Xwin-LM">
14
+ <img src="https://img.shields.io/badge/GitHub-yellow.svg?style=social&logo=github">
15
+ </a>
16
  </p>
17
 
18
 
19
 
20
  **Step up your LLM alignment with Xwin-LM!**
21
 
22
+ Xwin-LM aims to develop and open-source alignment technologies for large language models, including supervised fine-tuning (SFT), reward models (RM), reject sampling, reinforcement learning from human feedback (RLHF), etc. Our first release, built-upon on the Llama2 base models, ranked **TOP-1** on [AlpacaEval](https://tatsu-lab.github.io/alpaca_eval/). Notably, it's **the first to surpass GPT-4** on this benchmark. The project will be continuously updated.
23
 
24
  ## News
25
 
26
+ - 💥 [Sep, 2023] We released [Xwin-LM-70B-V0.1](https://huggingface.co/Xwin-LM/Xwin-LM-70B-V0.1), which has achieved a win-rate against Davinci-003 of **95.57%** on [AlpacaEval](https://tatsu-lab.github.io/alpaca_eval/) benchmark, ranking as **TOP-1** on AlpacaEval. **It was the FIRST model surpassing GPT-4** on [AlpacaEval](https://tatsu-lab.github.io/alpaca_eval/). Also note its winrate v.s. GPT-4 is **60.61**.
27
+ - 🔍 [Sep, 2023] RLHF plays crucial role in the strong performance of Xwin-LM-V0.1 release!
28
+ - 💥 [Sep, 2023] We released [Xwin-LM-13B-V0.1](https://huggingface.co/Xwin-LM/Xwin-LM-13B-V0.1), which has achieved **91.76%** win-rate on [AlpacaEval](https://tatsu-lab.github.io/alpaca_eval/), ranking as **top-1** among all 13B models.
29
+ - 💥 [Sep, 2023] We released [Xwin-LM-7B-V0.1](https://huggingface.co/Xwin-LM/Xwin-LM-7B-V0.1), which has achieved **87.82%** win-rate on [AlpacaEval](https://tatsu-lab.github.io/alpaca_eval/), ranking as **top-1** among all 7B models.
30
 
31
 
32
  ## Model Card
 
64
 
65
  ### Xwin-LM performance on NLP foundation tasks.
66
 
67
+ The following table provides a comparison of Xwin-LMs with other LLMs on NLP foundation tasks in [Open LLM Leaderboard](https://huggingface.co/spaces/HuggingFaceH4/open_llm_leaderboard).
68
 
69
  | Model | MMLU 5-shot | ARC 25-shot | TruthfulQA 0-shot | HellaSwag 10-shot | Average |
70
  |------------------|-------------|-------------|-------------------|-------------------|------------|
71
+ | Text-davinci-003 | 56.9 | **85.2** | 59.3 | 82.2 | 70.9 |
72
  |Vicuna-13b 1.1 | 51.3 | 53.0 | 51.8 | 80.1 | 59.1 |
73
+ |Guanaco 30B | 57.6 | 63.7 | 50.7 | 85.1 | 64.3 |
74
  | WizardLM-7B 1.0 | 42.7 | 51.6 | 44.7 | 77.7 | 54.2 |
75
  | WizardLM-13B 1.0 | 52.3 | 57.2 | 50.5 | 81.0 | 60.2 |
76
+ | WizardLM-30B 1.0 | 58.8 | 62.5 | 52.4 | 83.3 | 64.2|
77
+ | Llama-2-7B-Chat | 48.3 | 52.9 | 45.6 | 78.6 | 56.4 |
78
+ | Llama-2-13B-Chat | 54.6 | 59.0 | 44.1 | 81.9 | 59.9 |
79
+ | Llama-2-70B-Chat | 63.9 | 64.6 | 52.8 | 85.9 | 66.8 |
80
+ | **Xwin-LM-7B-V0.1** | 49.7 | 56.2 | 48.1 | 79.5 | 58.4 |
81
+ | **Xwin-LM-13B-V0.1** | 56.6 | 62.4 | 45.5 | 83.0 | 61.9 |
82
+ | **Xwin-LM-70B-V0.1** | **69.6** | 70.5 | **60.1** | **87.1** | **71.8** |
83
 
84
 
85
  ## Inference
 
92
 
93
  ### HuggingFace Example
94
 
95
+ ```python
96
  from transformers import AutoTokenizer, AutoModelForCausalLM
97
 
98
  model = AutoModelForCausalLM.from_pretrained("Xwin-LM/Xwin-LM-7B-V0.1")
 
113
 
114
  ### vllm Example
115
  Because Xwin-LM is based on Llama2, it also offers support for rapid inference using [vllm](https://github.com/vllm-project/vllm). Please refer to [vllm](https://github.com/vllm-project/vllm) for detailed installation instructions.
116
+ ```python
117
  from vllm import LLM, SamplingParams
118
  (
119
  prompt := "A chat between a curious user and an artificial intelligence assistant. "
 
131
  print(generated_text)
132
  ```
133
 
134
+ ## TODO
135
+
136
+ - [ ] Release the source code
137
+ - [ ] Release more capabilities, such as math, reasoning, and etc.
138
 
139
  ## Citation
140
  Please consider citing our work if you use the data or code in this repo.