natolambert
commited on
Commit
β’
22c150c
1
Parent(s):
1c19b68
Update README.md
Browse files
README.md
CHANGED
@@ -40,24 +40,24 @@ At the time of release, the Tulu-v2-dpo-70b model is approximately equal to GPT4
|
|
40 |
All smaller DPO'd models have strong performance per model size in the category and with lower verbosity (average completion length).
|
41 |
| Model | Size | Alignment | MT-Bench (score) | AlpacaEval (win rate %) |
|
42 |
|-------------|-----|----|---------------|--------------|
|
43 |
-
| **Tulu-v2-7b** πͺ | **7B** | **dDPO** | **
|
44 |
-
| **Tulu-v2-dpo-7b** πͺ | **7B** | **dDPO** | **
|
45 |
| StableLM-Tuned-Ξ± | 7B| dSFT |2.75| -|
|
46 |
| MPT-Chat | 7B |dSFT |5.42| -|
|
47 |
| Xwin-LMv0.1 | 7B| dPPO| 6.19| 87.83|
|
48 |
| Mistral-Instructv0.1 | 7B| - | 6.84 |-|
|
49 |
| Zephyr-7b-Ξ± |7B| dDPO| 6.88| -|
|
50 |
| Zephyr-7b-Ξ² πͺ | 7B | dDPO | 7.34 | 90.60 |
|
51 |
-
| **Tulu-v2-13b** πͺ | **13B** | **dDPO** | **
|
52 |
-
| **Tulu-v2-dpo-13b** πͺ | **13B** | **dDPO** | **
|
53 |
| Falcon-Instruct | 40B |dSFT |5.17 |45.71|
|
54 |
| Guanaco | 65B | SFT |6.41| 71.80|
|
55 |
| Llama2-Chat | 70B |RLHF |6.86| 92.66|
|
56 |
| Vicuna v1.3 | 33B |dSFT |7.12 |88.99|
|
57 |
| WizardLM v1.0 | 70B |dSFT |7.71 |-|
|
58 |
| Xwin-LM v0.1 | 70B |dPPO |- |95.57|
|
59 |
-
| **Tulu-v2-70b** πͺ | **70B** | **dDPO** | **
|
60 |
-
| **Tulu-v2-dpo-70b** πͺ | **70B** | **dDPO** | **
|
61 |
| GPT-3.5-turbo | - |RLHF |7.94 |89.37|
|
62 |
| Claude 2 | - |RLHF |8.06| 91.36|
|
63 |
| GPT-4 | -| RLHF |8.99| 95.28|
|
|
|
40 |
All smaller DPO'd models have strong performance per model size in the category and with lower verbosity (average completion length).
|
41 |
| Model | Size | Alignment | MT-Bench (score) | AlpacaEval (win rate %) |
|
42 |
|-------------|-----|----|---------------|--------------|
|
43 |
+
| **Tulu-v2-7b** πͺ | **7B** | **dDPO** | **6.30** | **73.9** |
|
44 |
+
| **Tulu-v2-dpo-7b** πͺ | **7B** | **dDPO** | **6.27** | **85.1** |
|
45 |
| StableLM-Tuned-Ξ± | 7B| dSFT |2.75| -|
|
46 |
| MPT-Chat | 7B |dSFT |5.42| -|
|
47 |
| Xwin-LMv0.1 | 7B| dPPO| 6.19| 87.83|
|
48 |
| Mistral-Instructv0.1 | 7B| - | 6.84 |-|
|
49 |
| Zephyr-7b-Ξ± |7B| dDPO| 6.88| -|
|
50 |
| Zephyr-7b-Ξ² πͺ | 7B | dDPO | 7.34 | 90.60 |
|
51 |
+
| **Tulu-v2-13b** πͺ | **13B** | **dDPO** | **6.70** | **78.9** |
|
52 |
+
| **Tulu-v2-dpo-13b** πͺ | **13B** | **dDPO** | **7.00** | **89.5** |
|
53 |
| Falcon-Instruct | 40B |dSFT |5.17 |45.71|
|
54 |
| Guanaco | 65B | SFT |6.41| 71.80|
|
55 |
| Llama2-Chat | 70B |RLHF |6.86| 92.66|
|
56 |
| Vicuna v1.3 | 33B |dSFT |7.12 |88.99|
|
57 |
| WizardLM v1.0 | 70B |dSFT |7.71 |-|
|
58 |
| Xwin-LM v0.1 | 70B |dPPO |- |95.57|
|
59 |
+
| **Tulu-v2-70b** πͺ | **70B** | **dDPO** | **7.49** | **86.6** |
|
60 |
+
| **Tulu-v2-dpo-70b** πͺ | **70B** | **dDPO** | **7.89** | **95.1** |
|
61 |
| GPT-3.5-turbo | - |RLHF |7.94 |89.37|
|
62 |
| Claude 2 | - |RLHF |8.06| 91.36|
|
63 |
| GPT-4 | -| RLHF |8.99| 95.28|
|