Update README.md
Browse files
README.md
CHANGED
@@ -41,20 +41,19 @@ Note that Llama 3.1 is released under the Meta Llama 3 community license, includ
|
|
41 |
|
42 |
## Performance
|
43 |
|
44 |
-
| Model | MMLU 5-shot | GSM8k 8-shot cot | BBH 3-shot cot |
|
45 |
-
|
46 |
-
| [Llama 3 8b base](https://huggingface.co/meta-llama/Meta-Llama-3-8B) |
|
47 |
-
| [Llama 3 8b instruct](https://huggingface.co/meta-llama/Meta-Llama-3-8B-Instruct) |
|
48 |
-
| [Llama 3
|
49 |
-
| **[Llama 3
|
50 |
-
| [Llama 3 70b base](https://huggingface.co/meta-llama/Meta-Llama-3-70B) |
|
51 |
-
| [Llama 3 70b instruct](https://huggingface.co/meta-llama/Meta-Llama-3-70B-Instruct) |
|
52 |
-
| [Llama 3
|
53 |
-
| [Llama 3
|
54 |
-
|
55 |
-
|
56 |
-
|
57 |
-
- [Llama 3 Tulu 2 70b UltraFeedback RM](https://huggingface.co/allenai/llama-3-tulu-2-70b-uf-mean-rm)
|
58 |
|
59 |
## Input Format
|
60 |
|
|
|
41 |
|
42 |
## Performance
|
43 |
|
44 |
+
| Model | MMLU 5-shot | GSM8k 8-shot cot | BBH 3-shot cot | Codex HumanEval Pass@10 | AlpacaEval 1 | AlpacaEval 2 LC | TruthfulQA %Info+True | IFEval loose acc | XSTest safe but ref. | XSTest unsafe but follow | Average |
|
45 |
+
|-|-|-|-|-|-|-|-|-|-|-|-|
|
46 |
+
| [Llama 3.1 8b base](https://huggingface.co/meta-llama/Meta-Llama-3.1-8B) | 65.5 | 57.0 | 65.6 | 61.6 | - | - | 32.7 | 11.1 | 17.2 | 44.0 | - |
|
47 |
+
| [Llama 3.1 8b instruct](https://huggingface.co/meta-llama/Meta-Llama-3.1-8B-Instruct) | 65.6 | 84.5 | 68.5 | 84.5 | 94.8 | 26.0 | 31.1 | 75.6 | 8.8 | 5.5 | 71.8 |
|
48 |
+
| [Tulu 2 Llama 3.1 8b](https://huggingface.co/allenai/llama-3.1-tulu-2-8b) | 61.4 | 68.0 | 59.2 | 67.9 | 80.6 | 9.0 | 56.2 | 46.4 | 11.2 | 13.0 | 63.9 |
|
49 |
+
| **[Tulu 2 Llama 3.1 8b DPO](https://huggingface.co/allenai/llama-3.1-tulu-2-dpo-8b) (this model)** | 62.0 | 66.5 | 60.6 | 69.1 | 93.5 | 14.7 | 70.3 | 52.3 | 8.4 | 15.5 | 67.0 |
|
50 |
+
| [Llama 3.1 70b base](https://huggingface.co/meta-llama/Meta-Llama-3.1-70B) | 78.8 | 85.5 | 82.9 | 94.5 | - | - | - | 10.9 | 12.4 | 41.0 | - |
|
51 |
+
| [Llama 3.1 70b instruct](https://huggingface.co/meta-llama/Meta-Llama-3.1-70B-Instruct) | 81.4 | 96.0 | 83.1 |94.5 | 96.0 | 35.8 | 69.0 | 87.1 | 5.6 | 11.5 | 86.1 |
|
52 |
+
| [Tulu 2 Llama 3.1 70b](https://huggingface.co/allenai/llama-3.1-tulu-2-70b) | 76.0 | 83.5 | 78.5 | 84.1 | 85.9 | 13.2 | 59.7 | 59.1 | 13.6 | 15.5 | 75.2 |
|
53 |
+
| [Tulu 2 Llama 3.1 70b DPO](https://huggingface.co/allenai/llama-3.1-tulu-2-dpo-70b) | 76.0 | 88.5 | 79.9 | 89.0 | 96.8 | 24.8 | 78.3 | 63.6 | 9.2 | 14.0 | 80.5 |
|
54 |
+
|
55 |
+
You can find all models Ai2 trained as part of this family [here](https://huggingface.co/collections/hamishivi/tulu-2-llama-3-update-6674a1cbd1bb4d33b5dec246), alongside our prior Llama 3.0 versions.
|
56 |
+
|
|
|
57 |
|
58 |
## Input Format
|
59 |
|