agentlans commited on
Commit
99a8de9
·
verified ·
1 Parent(s): 0e8ce55

Adding Evaluation Results (#1)

Browse files

- Adding Evaluation Results (ac91642f13a732c0d1e2130312089bc8ab382fe1)

Files changed (1) hide show
  1. README.md +113 -1
README.md CHANGED
@@ -4,7 +4,105 @@ library_name: transformers
4
  tags:
5
  - mergekit
6
  - merge
7
-
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
8
  ---
9
  # Llama3.1-8B-drill
10
 
@@ -44,3 +142,17 @@ parameters:
44
  dtype: bfloat16
45
 
46
  ```
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
4
  tags:
5
  - mergekit
6
  - merge
7
+ model-index:
8
+ - name: Llama3.1-8B-drill
9
+ results:
10
+ - task:
11
+ type: text-generation
12
+ name: Text Generation
13
+ dataset:
14
+ name: IFEval (0-Shot)
15
+ type: wis-k/instruction-following-eval
16
+ split: train
17
+ args:
18
+ num_few_shot: 0
19
+ metrics:
20
+ - type: inst_level_strict_acc and prompt_level_strict_acc
21
+ value: 76.52
22
+ name: averaged accuracy
23
+ source:
24
+ url: https://huggingface.co/spaces/open-llm-leaderboard/open_llm_leaderboard#/?search=agentlans%2FLlama3.1-8B-drill
25
+ name: Open LLM Leaderboard
26
+ - task:
27
+ type: text-generation
28
+ name: Text Generation
29
+ dataset:
30
+ name: BBH (3-Shot)
31
+ type: SaylorTwift/bbh
32
+ split: test
33
+ args:
34
+ num_few_shot: 3
35
+ metrics:
36
+ - type: acc_norm
37
+ value: 28.79
38
+ name: normalized accuracy
39
+ source:
40
+ url: https://huggingface.co/spaces/open-llm-leaderboard/open_llm_leaderboard#/?search=agentlans%2FLlama3.1-8B-drill
41
+ name: Open LLM Leaderboard
42
+ - task:
43
+ type: text-generation
44
+ name: Text Generation
45
+ dataset:
46
+ name: MATH Lvl 5 (4-Shot)
47
+ type: lighteval/MATH-Hard
48
+ split: test
49
+ args:
50
+ num_few_shot: 4
51
+ metrics:
52
+ - type: exact_match
53
+ value: 16.54
54
+ name: exact match
55
+ source:
56
+ url: https://huggingface.co/spaces/open-llm-leaderboard/open_llm_leaderboard#/?search=agentlans%2FLlama3.1-8B-drill
57
+ name: Open LLM Leaderboard
58
+ - task:
59
+ type: text-generation
60
+ name: Text Generation
61
+ dataset:
62
+ name: GPQA (0-shot)
63
+ type: Idavidrein/gpqa
64
+ split: train
65
+ args:
66
+ num_few_shot: 0
67
+ metrics:
68
+ - type: acc_norm
69
+ value: 2.35
70
+ name: acc_norm
71
+ source:
72
+ url: https://huggingface.co/spaces/open-llm-leaderboard/open_llm_leaderboard#/?search=agentlans%2FLlama3.1-8B-drill
73
+ name: Open LLM Leaderboard
74
+ - task:
75
+ type: text-generation
76
+ name: Text Generation
77
+ dataset:
78
+ name: MuSR (0-shot)
79
+ type: TAUR-Lab/MuSR
80
+ args:
81
+ num_few_shot: 0
82
+ metrics:
83
+ - type: acc_norm
84
+ value: 4.7
85
+ name: acc_norm
86
+ source:
87
+ url: https://huggingface.co/spaces/open-llm-leaderboard/open_llm_leaderboard#/?search=agentlans%2FLlama3.1-8B-drill
88
+ name: Open LLM Leaderboard
89
+ - task:
90
+ type: text-generation
91
+ name: Text Generation
92
+ dataset:
93
+ name: MMLU-PRO (5-shot)
94
+ type: TIGER-Lab/MMLU-Pro
95
+ config: main
96
+ split: test
97
+ args:
98
+ num_few_shot: 5
99
+ metrics:
100
+ - type: acc
101
+ value: 30.84
102
+ name: accuracy
103
+ source:
104
+ url: https://huggingface.co/spaces/open-llm-leaderboard/open_llm_leaderboard#/?search=agentlans%2FLlama3.1-8B-drill
105
+ name: Open LLM Leaderboard
106
  ---
107
  # Llama3.1-8B-drill
108
 
 
142
  dtype: bfloat16
143
 
144
  ```
145
+ # [Open LLM Leaderboard Evaluation Results](https://huggingface.co/spaces/open-llm-leaderboard/open_llm_leaderboard)
146
+ Detailed results can be found [here](https://huggingface.co/datasets/open-llm-leaderboard/agentlans__Llama3.1-8B-drill-details)!
147
+ Summarized results can be found [here](https://huggingface.co/datasets/open-llm-leaderboard/contents/viewer/default/train?q=agentlans/Llama3.1-8B-drill)!
148
+
149
+ | Metric |Value (%)|
150
+ |-------------------|--------:|
151
+ |**Average** | 26.62|
152
+ |IFEval (0-Shot) | 76.52|
153
+ |BBH (3-Shot) | 28.79|
154
+ |MATH Lvl 5 (4-Shot)| 16.54|
155
+ |GPQA (0-shot) | 2.35|
156
+ |MuSR (0-shot) | 4.70|
157
+ |MMLU-PRO (5-shot) | 30.84|
158
+