Adding Evaluation Results

#2
Files changed (1) hide show
  1. README.md +112 -3
README.md CHANGED
@@ -1,4 +1,7 @@
1
  ---
 
 
 
2
  tags:
3
  - not-for-all-audiences
4
  - merge
@@ -51,9 +54,101 @@ base_model:
51
  - princeton-nlp/Llama-3-Instruct-8B-SimPO
52
  - chujiezheng/Llama-3-Instruct-8B-SimPO-ExPO
53
  - chargoddard/prometheus-2-llama-3-8b
54
- license: llama3
55
- language:
56
- - en
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
57
  ---
58
  # Merging Compute Sponsored by KoboldAI
59
  # GGUF Quants by mradermacher
@@ -255,3 +350,17 @@ models:
255
  # - Removed ResplendentAI QLoRA models as they were trained on the base, but don't seem to train lm_head or embed_tokens.
256
  # - Removed BeaverAI/Llama-3SOME-8B-v2-rc2 as newer versions are out, and idk which is best yet. Also don't want doubledipping if I decide to Beavertrain this.
257
  ```
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
  ---
2
+ language:
3
+ - en
4
+ license: llama3
5
  tags:
6
  - not-for-all-audiences
7
  - merge
 
54
  - princeton-nlp/Llama-3-Instruct-8B-SimPO
55
  - chujiezheng/Llama-3-Instruct-8B-SimPO-ExPO
56
  - chargoddard/prometheus-2-llama-3-8b
57
+ model-index:
58
+ - name: LLaMa-3-CursedStock-v2.0-8B
59
+ results:
60
+ - task:
61
+ type: text-generation
62
+ name: Text Generation
63
+ dataset:
64
+ name: IFEval (0-Shot)
65
+ type: HuggingFaceH4/ifeval
66
+ args:
67
+ num_few_shot: 0
68
+ metrics:
69
+ - type: inst_level_strict_acc and prompt_level_strict_acc
70
+ value: 63.31
71
+ name: strict accuracy
72
+ source:
73
+ url: https://huggingface.co/spaces/open-llm-leaderboard/open_llm_leaderboard?query=PJMixers/LLaMa-3-CursedStock-v2.0-8B
74
+ name: Open LLM Leaderboard
75
+ - task:
76
+ type: text-generation
77
+ name: Text Generation
78
+ dataset:
79
+ name: BBH (3-Shot)
80
+ type: BBH
81
+ args:
82
+ num_few_shot: 3
83
+ metrics:
84
+ - type: acc_norm
85
+ value: 32.56
86
+ name: normalized accuracy
87
+ source:
88
+ url: https://huggingface.co/spaces/open-llm-leaderboard/open_llm_leaderboard?query=PJMixers/LLaMa-3-CursedStock-v2.0-8B
89
+ name: Open LLM Leaderboard
90
+ - task:
91
+ type: text-generation
92
+ name: Text Generation
93
+ dataset:
94
+ name: MATH Lvl 5 (4-Shot)
95
+ type: hendrycks/competition_math
96
+ args:
97
+ num_few_shot: 4
98
+ metrics:
99
+ - type: exact_match
100
+ value: 8.61
101
+ name: exact match
102
+ source:
103
+ url: https://huggingface.co/spaces/open-llm-leaderboard/open_llm_leaderboard?query=PJMixers/LLaMa-3-CursedStock-v2.0-8B
104
+ name: Open LLM Leaderboard
105
+ - task:
106
+ type: text-generation
107
+ name: Text Generation
108
+ dataset:
109
+ name: GPQA (0-shot)
110
+ type: Idavidrein/gpqa
111
+ args:
112
+ num_few_shot: 0
113
+ metrics:
114
+ - type: acc_norm
115
+ value: 3.24
116
+ name: acc_norm
117
+ source:
118
+ url: https://huggingface.co/spaces/open-llm-leaderboard/open_llm_leaderboard?query=PJMixers/LLaMa-3-CursedStock-v2.0-8B
119
+ name: Open LLM Leaderboard
120
+ - task:
121
+ type: text-generation
122
+ name: Text Generation
123
+ dataset:
124
+ name: MuSR (0-shot)
125
+ type: TAUR-Lab/MuSR
126
+ args:
127
+ num_few_shot: 0
128
+ metrics:
129
+ - type: acc_norm
130
+ value: 8.04
131
+ name: acc_norm
132
+ source:
133
+ url: https://huggingface.co/spaces/open-llm-leaderboard/open_llm_leaderboard?query=PJMixers/LLaMa-3-CursedStock-v2.0-8B
134
+ name: Open LLM Leaderboard
135
+ - task:
136
+ type: text-generation
137
+ name: Text Generation
138
+ dataset:
139
+ name: MMLU-PRO (5-shot)
140
+ type: TIGER-Lab/MMLU-Pro
141
+ config: main
142
+ split: test
143
+ args:
144
+ num_few_shot: 5
145
+ metrics:
146
+ - type: acc
147
+ value: 28.4
148
+ name: accuracy
149
+ source:
150
+ url: https://huggingface.co/spaces/open-llm-leaderboard/open_llm_leaderboard?query=PJMixers/LLaMa-3-CursedStock-v2.0-8B
151
+ name: Open LLM Leaderboard
152
  ---
153
  # Merging Compute Sponsored by KoboldAI
154
  # GGUF Quants by mradermacher
 
350
  # - Removed ResplendentAI QLoRA models as they were trained on the base, but don't seem to train lm_head or embed_tokens.
351
  # - Removed BeaverAI/Llama-3SOME-8B-v2-rc2 as newer versions are out, and idk which is best yet. Also don't want doubledipping if I decide to Beavertrain this.
352
  ```
353
+
354
+ # [Open LLM Leaderboard Evaluation Results](https://huggingface.co/spaces/open-llm-leaderboard/open_llm_leaderboard)
355
+ Detailed results can be found [here](https://huggingface.co/datasets/open-llm-leaderboard/details_PJMixers__LLaMa-3-CursedStock-v2.0-8B)
356
+
357
+ | Metric |Value|
358
+ |-------------------|----:|
359
+ |Avg. |24.03|
360
+ |IFEval (0-Shot) |63.31|
361
+ |BBH (3-Shot) |32.56|
362
+ |MATH Lvl 5 (4-Shot)| 8.61|
363
+ |GPQA (0-shot) | 3.24|
364
+ |MuSR (0-shot) | 8.04|
365
+ |MMLU-PRO (5-shot) |28.40|
366
+