Commit
b06385e
1 Parent(s): 51888a7

Adding Evaluation Results (#7)

Browse files

- Adding Evaluation Results (611794fd1a76d62292bb5953dd347fc78c413da0)


Co-authored-by: Open LLM Leaderboard PR Bot <leaderboard-pr-bot@users.noreply.huggingface.co>

Files changed (1) hide show
  1. README.md +125 -8
README.md CHANGED
@@ -1,5 +1,7 @@
1
  ---
2
- base_model: meta-llama/Meta-Llama-3-8B-Instruct
 
 
3
  library_name: transformers
4
  tags:
5
  - axolotl
@@ -9,19 +11,120 @@ tags:
9
  - pytorch
10
  - llama
11
  - llama-3
12
- language:
13
- - en
 
 
 
14
  pipeline_tag: text-generation
15
- license: other
16
  license_name: llama3
17
  license_link: LICENSE
18
  inference: false
19
  model_creator: MaziyarPanahi
20
- model_name: Llama-3-8B-Instruct-DPO-v0.4
21
  quantized_by: MaziyarPanahi
22
- datasets:
23
- - argilla/ultrafeedback-binarized-preferences
24
- - Intel/orca_dpo_pairs
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
25
  ---
26
 
27
  <img src="./llama-3-merges.webp" alt="Goku 8x22B v0.4 Logo" width="500" style="margin-left:'auto' margin-right:'auto' display:'block'"/>
@@ -114,3 +217,17 @@ outputs = pipeline(
114
  )
115
  print(outputs[0]["generated_text"][len(prompt):])
116
  ```
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
  ---
2
+ language:
3
+ - en
4
+ license: other
5
  library_name: transformers
6
  tags:
7
  - axolotl
 
11
  - pytorch
12
  - llama
13
  - llama-3
14
+ base_model: meta-llama/Meta-Llama-3-8B-Instruct
15
+ datasets:
16
+ - argilla/ultrafeedback-binarized-preferences
17
+ - Intel/orca_dpo_pairs
18
+ model_name: Llama-3-8B-Instruct-DPO-v0.4
19
  pipeline_tag: text-generation
 
20
  license_name: llama3
21
  license_link: LICENSE
22
  inference: false
23
  model_creator: MaziyarPanahi
 
24
  quantized_by: MaziyarPanahi
25
+ model-index:
26
+ - name: Llama-3-8B-Instruct-DPO-v0.4
27
+ results:
28
+ - task:
29
+ type: text-generation
30
+ name: Text Generation
31
+ dataset:
32
+ name: AI2 Reasoning Challenge (25-Shot)
33
+ type: ai2_arc
34
+ config: ARC-Challenge
35
+ split: test
36
+ args:
37
+ num_few_shot: 25
38
+ metrics:
39
+ - type: acc_norm
40
+ value: 62.54
41
+ name: normalized accuracy
42
+ source:
43
+ url: https://huggingface.co/spaces/HuggingFaceH4/open_llm_leaderboard?query=MaziyarPanahi/Llama-3-8B-Instruct-DPO-v0.4
44
+ name: Open LLM Leaderboard
45
+ - task:
46
+ type: text-generation
47
+ name: Text Generation
48
+ dataset:
49
+ name: HellaSwag (10-Shot)
50
+ type: hellaswag
51
+ split: validation
52
+ args:
53
+ num_few_shot: 10
54
+ metrics:
55
+ - type: acc_norm
56
+ value: 79.73
57
+ name: normalized accuracy
58
+ source:
59
+ url: https://huggingface.co/spaces/HuggingFaceH4/open_llm_leaderboard?query=MaziyarPanahi/Llama-3-8B-Instruct-DPO-v0.4
60
+ name: Open LLM Leaderboard
61
+ - task:
62
+ type: text-generation
63
+ name: Text Generation
64
+ dataset:
65
+ name: MMLU (5-Shot)
66
+ type: cais/mmlu
67
+ config: all
68
+ split: test
69
+ args:
70
+ num_few_shot: 5
71
+ metrics:
72
+ - type: acc
73
+ value: 68.08
74
+ name: accuracy
75
+ source:
76
+ url: https://huggingface.co/spaces/HuggingFaceH4/open_llm_leaderboard?query=MaziyarPanahi/Llama-3-8B-Instruct-DPO-v0.4
77
+ name: Open LLM Leaderboard
78
+ - task:
79
+ type: text-generation
80
+ name: Text Generation
81
+ dataset:
82
+ name: TruthfulQA (0-shot)
83
+ type: truthful_qa
84
+ config: multiple_choice
85
+ split: validation
86
+ args:
87
+ num_few_shot: 0
88
+ metrics:
89
+ - type: mc2
90
+ value: 53.94
91
+ source:
92
+ url: https://huggingface.co/spaces/HuggingFaceH4/open_llm_leaderboard?query=MaziyarPanahi/Llama-3-8B-Instruct-DPO-v0.4
93
+ name: Open LLM Leaderboard
94
+ - task:
95
+ type: text-generation
96
+ name: Text Generation
97
+ dataset:
98
+ name: Winogrande (5-shot)
99
+ type: winogrande
100
+ config: winogrande_xl
101
+ split: validation
102
+ args:
103
+ num_few_shot: 5
104
+ metrics:
105
+ - type: acc
106
+ value: 75.61
107
+ name: accuracy
108
+ source:
109
+ url: https://huggingface.co/spaces/HuggingFaceH4/open_llm_leaderboard?query=MaziyarPanahi/Llama-3-8B-Instruct-DPO-v0.4
110
+ name: Open LLM Leaderboard
111
+ - task:
112
+ type: text-generation
113
+ name: Text Generation
114
+ dataset:
115
+ name: GSM8k (5-shot)
116
+ type: gsm8k
117
+ config: main
118
+ split: test
119
+ args:
120
+ num_few_shot: 5
121
+ metrics:
122
+ - type: acc
123
+ value: 71.04
124
+ name: accuracy
125
+ source:
126
+ url: https://huggingface.co/spaces/HuggingFaceH4/open_llm_leaderboard?query=MaziyarPanahi/Llama-3-8B-Instruct-DPO-v0.4
127
+ name: Open LLM Leaderboard
128
  ---
129
 
130
  <img src="./llama-3-merges.webp" alt="Goku 8x22B v0.4 Logo" width="500" style="margin-left:'auto' margin-right:'auto' display:'block'"/>
 
217
  )
218
  print(outputs[0]["generated_text"][len(prompt):])
219
  ```
220
+
221
+ # [Open LLM Leaderboard Evaluation Results](https://huggingface.co/spaces/HuggingFaceH4/open_llm_leaderboard)
222
+ Detailed results can be found [here](https://huggingface.co/datasets/open-llm-leaderboard/details_MaziyarPanahi__Llama-3-8B-Instruct-DPO-v0.4)
223
+
224
+ | Metric |Value|
225
+ |---------------------------------|----:|
226
+ |Avg. |68.49|
227
+ |AI2 Reasoning Challenge (25-Shot)|62.54|
228
+ |HellaSwag (10-Shot) |79.73|
229
+ |MMLU (5-Shot) |68.08|
230
+ |TruthfulQA (0-shot) |53.94|
231
+ |Winogrande (5-shot) |75.61|
232
+ |GSM8k (5-shot) |71.04|
233
+