allenai
/

llama-3-tulu-2-dpo-8b

@@ -40,15 +40,18 @@ For more details on the training mixture, read the paper: [Camels in a Changing
 | Model | MMLU 5-shot | GSM8k 8-shot cot | BBH 3-shot cot | TydiQA 1-shot Gold Passage | Codex HumanEval Pass@10 |AlpacaEval 1 | AlpacaEval 2 LC | TruthfulQA %Info+True | IFEval loose acc | XSTest safe but ref. | XSTest unsafe but follow | Average |
 |-|-|-|-|-|-|-|-|-|-|-|-|-|
-| Llama 3 8b base        | 0.649    | 0.565    | 0.653    | 66.80    | 0.664    | -        | -        | 0.299    | 0.146    | 0.200     | 0.390     | 54.36     |
-| Llama 3 8b instruct    | 0.626    | 0.770    | 0.606    | 59.04    | 0.799    | 94.65    | 23.12    | 0.682    | 0.741    | 0.028     | 0.115     | 70.36     |
-| Llama 3 Tulu 2 8b      | 0.606    | 0.610    | 0.592    | 56.24    | 0.685    | 79.40    | 10.16    | 0.503    | 0.468    | 0.092     | 0.165     | 59.39     |
-| **Llama 3 Tulu 2+DPO 8b (this model)** | 0.609    | 0.650    | 0.584    | 21.18    | 0.688    | 93.02    | 13.94    | 0.698    | 0.518    | 0.092     | 0.165     | 59.61     |
-| Llama 3 70b base       | 0.790    | 0.840    | 0.801    | 73.35    | 0.745    | -        | -        | 0.469    | 0.163    | 0.256     | 0.330     | 65.60     |
-| Llama 3 70b instruct   | 0.786    | 0.930    | 0.801    | 59.21    | 0.908    | 96.71    | 39.99    | 0.701    | 0.828    | 0.060     | 0.140     | 79.22     |
-| Llama 3 Tulu 2 70b     | 0.752    | 0.845    | 0.779    | 69.798   | 0.861    | 86.007   | 17.51    | 0.646    | 0.591    | 0.108     | 0.130     | 73.01     |
-| Llama 3 Tulu 2+DPO 70b | 0.754    | 0.860    | 0.785    | 23.443   | 0.878    | 96.65    | 27.34    | 0.780    | 0.643    | 0.080     | 0.140     | 71.60     |
 ## Input Format

 | Model | MMLU 5-shot | GSM8k 8-shot cot | BBH 3-shot cot | TydiQA 1-shot Gold Passage | Codex HumanEval Pass@10 |AlpacaEval 1 | AlpacaEval 2 LC | TruthfulQA %Info+True | IFEval loose acc | XSTest safe but ref. | XSTest unsafe but follow | Average |
 |-|-|-|-|-|-|-|-|-|-|-|-|-|
+| [Llama 3 8b base](https://huggingface.co/meta-llama/Meta-Llama-3-8B)        | 0.649    | 0.565    | 0.653    | 66.80    | 0.664    | -        | -        | 0.299    | 0.146    | 0.200     | 0.390     | 54.36     |
+| [Llama 3 8b instruct](https://huggingface.co/meta-llama/Meta-Llama-3-8B-Instruct)    | 0.626    | 0.770    | 0.606    | 59.04    | 0.799    | 94.65    | 23.12    | 0.682    | 0.741    | 0.028     | 0.115     | 70.36     |
+| [Llama 3 Tulu 2 8b](https://huggingface.co/allenai/llama-3-tulu-2-8b)      | 0.606    | 0.610    | 0.592    | 56.24    | 0.685    | 79.40    | 10.16    | 0.503    | 0.468    | 0.092     | 0.165     | 59.39     |
+| **[Llama 3 Tulu 2+DPO 8b](https://huggingface.co/allenai/llama-3-tulu-2-dpo-8b)  (this model)**     | 0.609    | 0.650    | 0.584    | 21.18    | 0.688    | 93.02    | 13.94    | 0.698    | 0.518    | 0.092     | 0.165     | 59.61     |
+| [Llama 3 70b base](https://huggingface.co/meta-llama/Meta-Llama-3-70B)       | 0.790    | 0.840    | 0.801    | 73.35    | 0.745    | -        | -        | 0.469    | 0.163    | 0.256     | 0.330     | 65.60     |
+| [Llama 3 70b instruct](https://huggingface.co/meta-llama/Meta-Llama-3-70B-Instruct)   | 0.786    | 0.930    | 0.801    | 59.21    | 0.908    | 96.71    | 39.99    | 0.701    | 0.828    | 0.060     | 0.140     | 79.22     |
+| [Llama 3 Tulu 2 70b](https://huggingface.co/allenai/llama-3-tulu-2-70b) | 0.752    | 0.845    | 0.779    | 69.798   | 0.861    | 86.007   | 17.51    | 0.646    | 0.591    | 0.108     | 0.130     | 73.01     |
+| [Llama 3 Tulu 2+DPO 70b](https://huggingface.co/allenai/llama-3-tulu-2-dpo-70b) | 0.754    | 0.860    | 0.785    | 23.443   | 0.878    | 96.65    | 27.34    | 0.780    | 0.643    | 0.080     | 0.140     | 71.60     |
+We also release reward models based off Llama 3 8b and 70b respectively:
+- [Llama 3 Tulu 2 8b UltraFeedback RM](https://huggingface.co/allenai/llama-3-tulu-2-8b-uf-mean-rm)
+- [Llama 3 Tulu 2 70b UltraFeedback RM](https://huggingface.co/allenai/llama-3-tulu-2-70b-uf-mean-rm)
 ## Input Format