allenai
/

llama-3.1-tulu-2-dpo-8b

Safetensors

English

llama

Model card Files Files and versions Community

hamishivi commited on Aug 11

Commit

adce9d4

•

1 Parent(s): 8a1371f

Update README.md

Browse files

Files changed (1) hide show

README.md +13 -14

README.md CHANGED Viewed

@@ -41,20 +41,19 @@ Note that Llama 3.1 is released under the Meta Llama 3 community license, includ
 ## Performance
-| Model | MMLU 5-shot | GSM8k 8-shot cot | BBH 3-shot cot | TydiQA 1-shot Gold Passage | Codex HumanEval Pass@10 |AlpacaEval 1 | AlpacaEval 2 LC | TruthfulQA %Info+True | IFEval loose acc | XSTest safe but ref. | XSTest unsafe but follow | Average |
-|-|-|-|-|-|-|-|-|-|-|-|-|-|
-| [Llama 3 8b base](https://huggingface.co/meta-llama/Meta-Llama-3-8B)        | 0.649    | 0.565    | 0.653    | 66.80    | 0.664    | -        | -        | 0.299    | 0.146    | 0.200     | 0.390     | 54.36     |
-| [Llama 3 8b instruct](https://huggingface.co/meta-llama/Meta-Llama-3-8B-Instruct)    | 0.626    | 0.770    | 0.606    | 59.04    | 0.799    | 94.65    | 23.12    | 0.682    | 0.741    | 0.028     | 0.115     | 70.36     |
-| [Llama 3 Tulu 2 8b](https://huggingface.co/allenai/llama-3-tulu-2-8b)      | 0.606    | 0.610    | 0.592    | 56.24    | 0.685    | 79.40    | 10.16    | 0.503    | 0.468    | 0.092     | 0.165     | 59.39     |
-| **[Llama 3 Tulu 2+DPO 8b](https://huggingface.co/allenai/llama-3-tulu-2-dpo-8b)  (this model)**     | 0.609    | 0.650    | 0.584    | 21.18    | 0.688    | 93.02    | 13.94    | 0.698    | 0.518    | 0.092     | 0.165     | 59.61     |
-| [Llama 3 70b base](https://huggingface.co/meta-llama/Meta-Llama-3-70B)       | 0.790    | 0.840    | 0.801    | 73.35    | 0.745    | -        | -        | 0.469    | 0.163    | 0.256     | 0.330     | 65.60     |
-| [Llama 3 70b instruct](https://huggingface.co/meta-llama/Meta-Llama-3-70B-Instruct)   | 0.786    | 0.930    | 0.801    | 59.21    | 0.908    | 96.71    | 39.99    | 0.701    | 0.828    | 0.060     | 0.140     | 79.22     |
-| [Llama 3 Tulu 2 70b](https://huggingface.co/allenai/llama-3-tulu-2-70b) | 0.752    | 0.845    | 0.779    | 69.798   | 0.861    | 86.007   | 17.51    | 0.646    | 0.591    | 0.108     | 0.130     | 73.01     |
-| [Llama 3 Tulu 2+DPO 70b](https://huggingface.co/allenai/llama-3-tulu-2-dpo-70b) | 0.754    | 0.860    | 0.785    | 23.443   | 0.878    | 96.65    | 27.34    | 0.780    | 0.643    | 0.080     | 0.140     | 71.60     |
-We also release reward models based off Llama 3 8b and 70b respectively:
-- [Llama 3 Tulu 2 8b UltraFeedback RM](https://huggingface.co/allenai/llama-3-tulu-2-8b-uf-mean-rm)
-- [Llama 3 Tulu 2 70b UltraFeedback RM](https://huggingface.co/allenai/llama-3-tulu-2-70b-uf-mean-rm)
 ## Input Format

 ## Performance
+| Model | MMLU 5-shot | GSM8k 8-shot cot | BBH 3-shot cot | Codex HumanEval Pass@10 | AlpacaEval 1 | AlpacaEval 2 LC | TruthfulQA %Info+True | IFEval loose acc | XSTest safe but ref. | XSTest unsafe but follow | Average |
+|-|-|-|-|-|-|-|-|-|-|-|-|
+| [Llama 3.1 8b base](https://huggingface.co/meta-llama/Meta-Llama-3.1-8B)        | 65.5    | 57.0    | 65.6    | 61.6    | -        | -        | 32.7    | 11.1    | 17.2     | 44.0     | -     |
+| [Llama 3.1 8b instruct](https://huggingface.co/meta-llama/Meta-Llama-3.1-8B-Instruct)    | 65.6    | 84.5    | 68.5    |  84.5    | 94.8    | 26.0    | 31.1    | 75.6    | 8.8     | 5.5     | 71.8     |
+| [Tulu 2 Llama 3.1 8b](https://huggingface.co/allenai/llama-3.1-tulu-2-8b)       | 61.4    | 68.0    | 59.2    | 67.9    | 80.6    | 9.0    | 56.2    | 46.4    | 11.2     | 13.0     | 63.9     |
+| **[Tulu 2 Llama 3.1 8b DPO](https://huggingface.co/allenai/llama-3.1-tulu-2-dpo-8b) (this model)** | 62.0    | 66.5    | 60.6    |  69.1    | 93.5    | 14.7    | 70.3    | 52.3    | 8.4     | 15.5     | 67.0     |
+| [Llama 3.1 70b base](https://huggingface.co/meta-llama/Meta-Llama-3.1-70B)       | 78.8    | 85.5    | 82.9    | 94.5  | -        | -        | -    | 10.9    | 12.4     | 41.0     | -     |
+| [Llama 3.1 70b instruct](https://huggingface.co/meta-llama/Meta-Llama-3.1-70B-Instruct)   | 81.4    | 96.0    | 83.1  |94.5    | 96.0    | 35.8    | 69.0    | 87.1    | 5.6     | 11.5     | 86.1     |
+| [Tulu 2 Llama 3.1 70b](https://huggingface.co/allenai/llama-3.1-tulu-2-70b)     | 76.0    | 83.5    | 78.5    | 84.1    | 85.9   | 13.2    | 59.7    | 59.1    | 13.6     | 15.5     | 75.2     |
+| [Tulu 2 Llama 3.1 70b DPO](https://huggingface.co/allenai/llama-3.1-tulu-2-dpo-70b) | 76.0    | 88.5    | 79.9    |  89.0    | 96.8    | 24.8    | 78.3    | 63.6    | 9.2     | 14.0     | 80.5     |
+You can find all models Ai2 trained as part of this family [here](https://huggingface.co/collections/hamishivi/tulu-2-llama-3-update-6674a1cbd1bb4d33b5dec246), alongside our prior Llama 3.0 versions.
 ## Input Format