allenai
/

tulu-2-dpo-70b

Text Generation

text-generation-inference

Inference Endpoints

Model card Files Files and versions Community

natolambert commited on Nov 15, 2023

Commit

22c150c

·

1 Parent(s): 1c19b68

Update README.md

Files changed (1) hide show

README.md +6 -6

README.md CHANGED Viewed

@@ -40,24 +40,24 @@ At the time of release, the Tulu-v2-dpo-70b model is approximately equal to GPT4
 All smaller DPO'd models have strong performance per model size in the category and with lower verbosity (average completion length).
 | Model | Size | Alignment | MT-Bench (score) | AlpacaEval (win rate %) |
 |-------------|-----|----|---------------|--------------|
-| **Tulu-v2-7b** 🐪 | **7B** | **dDPO** | **TODO** | **TODO** |
-| **Tulu-v2-dpo-7b** 🐪 | **7B** | **dDPO** | **TODO** | **TODO** |
 | StableLM-Tuned-α | 7B| dSFT |2.75| -|
 | MPT-Chat |  7B |dSFT |5.42| -|
 | Xwin-LMv0.1 | 7B| dPPO| 6.19| 87.83|
 | Mistral-Instructv0.1 | 7B|  - | 6.84 |-|
 | Zephyr-7b-α |7B|  dDPO| 6.88| -|
 | Zephyr-7b-β 🪁 | 7B | dDPO | 7.34 | 90.60 |
-| **Tulu-v2-13b** 🐪 | **13B** | **dDPO** | **TODO** | **TODO** |
-| **Tulu-v2-dpo-13b** 🐪 | **13B** | **dDPO** | **TODO** | **TODO** |
 | Falcon-Instruct |  40B |dSFT |5.17 |45.71|
 | Guanaco | 65B |  SFT |6.41| 71.80|
 | Llama2-Chat |  70B |RLHF |6.86| 92.66|
 | Vicuna v1.3 |  33B |dSFT |7.12 |88.99|
 | WizardLM v1.0 |  70B |dSFT |7.71 |-|
 | Xwin-LM v0.1 |   70B |dPPO |- |95.57|
-| **Tulu-v2-70b** 🐪 | **70B** | **dDPO** | **TODO** | **TODO** |
-| **Tulu-v2-dpo-70b** 🐪 | **70B** | **dDPO** | **TODO** | **TODO** |
 | GPT-3.5-turbo | - |RLHF |7.94 |89.37|
 | Claude 2 |  - |RLHF |8.06| 91.36|
 | GPT-4 |  -| RLHF |8.99| 95.28|

 All smaller DPO'd models have strong performance per model size in the category and with lower verbosity (average completion length).
 | Model | Size | Alignment | MT-Bench (score) | AlpacaEval (win rate %) |
 |-------------|-----|----|---------------|--------------|
+| **Tulu-v2-7b** 🐪 | **7B** | **dDPO** | **6.30** | **73.9** |
+| **Tulu-v2-dpo-7b** 🐪 | **7B** | **dDPO** | **6.27** | **85.1** |
 | StableLM-Tuned-α | 7B| dSFT |2.75| -|
 | MPT-Chat |  7B |dSFT |5.42| -|
 | Xwin-LMv0.1 | 7B| dPPO| 6.19| 87.83|
 | Mistral-Instructv0.1 | 7B|  - | 6.84 |-|
 | Zephyr-7b-α |7B|  dDPO| 6.88| -|
 | Zephyr-7b-β 🪁 | 7B | dDPO | 7.34 | 90.60 |
+| **Tulu-v2-13b** 🐪 | **13B** | **dDPO** | **6.70** | **78.9** |
+| **Tulu-v2-dpo-13b** 🐪 | **13B** | **dDPO** | **7.00** | **89.5** |
 | Falcon-Instruct |  40B |dSFT |5.17 |45.71|
 | Guanaco | 65B |  SFT |6.41| 71.80|
 | Llama2-Chat |  70B |RLHF |6.86| 92.66|
 | Vicuna v1.3 |  33B |dSFT |7.12 |88.99|
 | WizardLM v1.0 |  70B |dSFT |7.71 |-|
 | Xwin-LM v0.1 |   70B |dPPO |- |95.57|
+| **Tulu-v2-70b** 🐪 | **70B** | **dDPO** | **7.49** | **86.6** |
+| **Tulu-v2-dpo-70b** 🐪 | **70B** | **dDPO** | **7.89** | **95.1** |
 | GPT-3.5-turbo | - |RLHF |7.94 |89.37|
 | Claude 2 |  - |RLHF |8.06| 91.36|
 | GPT-4 |  -| RLHF |8.99| 95.28|