LeroyDyer leaderboard-pr-bot commited on
Commit
871255b
1 Parent(s): 8affa38

Adding Evaluation Results (#1)

Browse files

- Adding Evaluation Results (e5b274b026550fde3b12ac6bd3b3753e8a5dae47)


Co-authored-by: Open LLM Leaderboard PR Bot <leaderboard-pr-bot@users.noreply.huggingface.co>

Files changed (1) hide show
  1. README.md +139 -30
README.md CHANGED
@@ -1,22 +1,4 @@
1
  ---
2
- base_model:
3
- - LeroyDyer/LCARS_TOP_SCORE
4
- - LeroyDyer/Mixtral_AI_Cyber_Matrix_2_0
5
- - LeroyDyer/SpydazWeb_AI_CyberTron_Ultra_7b
6
- - LeroyDyer/LCARS_AI_StarTrek_Computer
7
- - LeroyDyer/_Spydaz_Web_AI_ActionQA_Project
8
- - LeroyDyer/_Spydaz_Web_AI_ChatML_512K_Project
9
- - LeroyDyer/_Spydaz_Web_AI_ChatQA_ReAct_Project_UltraFineTuned
10
- - LeroyDyer/SpyazWeb_AI_DeepMind_Project
11
- - LeroyDyer/SpydazWeb_AI_Swahili_Project
12
- - LeroyDyer/_Spydaz_Web_AI_ChatQA_ReAct_Project
13
- - LeroyDyer/_Spydaz_Web_AI_MistralStar_001_Project
14
- - LeroyDyer/QuietStar_Project
15
- - LeroyDyer/Mixtral_BioMedical_7b
16
- - LeroyDyer/Mixtral_AI_CyberTron_Coder
17
- - LeroyDyer/_Spydaz_Web_AI_BIBLE_002
18
- - LeroyDyer/_Spydaz_Web_AI_ChatQA_Reasoning101_Project
19
- - LeroyDyer/SpydazWeb_AI_Text_AudioVision_Project
20
  language:
21
  - en
22
  - sw
@@ -33,18 +15,6 @@ language:
33
  - bm
34
  - su
35
  license: apache-2.0
36
- datasets:
37
- - neoneye/base64-decode-v2
38
- - neoneye/base64-encode-v1
39
- - VuongQuoc/Chemistry_text_to_image
40
- - Kamizuru00/diagram_image_to_text
41
- - LeroyDyer/Chemistry_text_to_image_BASE64
42
- - LeroyDyer/AudioCaps-Spectrograms_to_Base64
43
- - LeroyDyer/winogroud_text_to_imaget_BASE64
44
- - LeroyDyer/chart_text_to_Base64
45
- - LeroyDyer/diagram_image_to_text_BASE64
46
- - mekaneeky/salt_m2e_15_3_instruction
47
- - mekaneeky/SALT-languages-bible
48
  tags:
49
  - mergekit
50
  - merge
@@ -88,6 +58,131 @@ tags:
88
  - Afro-Centric
89
  - African-Model
90
  - Ancient-One
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
91
  ---
92
 
93
 
@@ -823,3 +918,17 @@ LM_MODEL
823
 
824
 
825
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
  ---
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
2
  language:
3
  - en
4
  - sw
 
15
  - bm
16
  - su
17
  license: apache-2.0
 
 
 
 
 
 
 
 
 
 
 
 
18
  tags:
19
  - mergekit
20
  - merge
 
58
  - Afro-Centric
59
  - African-Model
60
  - Ancient-One
61
+ base_model:
62
+ - LeroyDyer/LCARS_TOP_SCORE
63
+ - LeroyDyer/Mixtral_AI_Cyber_Matrix_2_0
64
+ - LeroyDyer/SpydazWeb_AI_CyberTron_Ultra_7b
65
+ - LeroyDyer/LCARS_AI_StarTrek_Computer
66
+ - LeroyDyer/_Spydaz_Web_AI_ActionQA_Project
67
+ - LeroyDyer/_Spydaz_Web_AI_ChatML_512K_Project
68
+ - LeroyDyer/_Spydaz_Web_AI_ChatQA_ReAct_Project_UltraFineTuned
69
+ - LeroyDyer/SpyazWeb_AI_DeepMind_Project
70
+ - LeroyDyer/SpydazWeb_AI_Swahili_Project
71
+ - LeroyDyer/_Spydaz_Web_AI_ChatQA_ReAct_Project
72
+ - LeroyDyer/_Spydaz_Web_AI_MistralStar_001_Project
73
+ - LeroyDyer/QuietStar_Project
74
+ - LeroyDyer/Mixtral_BioMedical_7b
75
+ - LeroyDyer/Mixtral_AI_CyberTron_Coder
76
+ - LeroyDyer/_Spydaz_Web_AI_BIBLE_002
77
+ - LeroyDyer/_Spydaz_Web_AI_ChatQA_Reasoning101_Project
78
+ - LeroyDyer/SpydazWeb_AI_Text_AudioVision_Project
79
+ datasets:
80
+ - neoneye/base64-decode-v2
81
+ - neoneye/base64-encode-v1
82
+ - VuongQuoc/Chemistry_text_to_image
83
+ - Kamizuru00/diagram_image_to_text
84
+ - LeroyDyer/Chemistry_text_to_image_BASE64
85
+ - LeroyDyer/AudioCaps-Spectrograms_to_Base64
86
+ - LeroyDyer/winogroud_text_to_imaget_BASE64
87
+ - LeroyDyer/chart_text_to_Base64
88
+ - LeroyDyer/diagram_image_to_text_BASE64
89
+ - mekaneeky/salt_m2e_15_3_instruction
90
+ - mekaneeky/SALT-languages-bible
91
+ model-index:
92
+ - name: SpydazWebAI_Human_AGI
93
+ results:
94
+ - task:
95
+ type: text-generation
96
+ name: Text Generation
97
+ dataset:
98
+ name: IFEval (0-Shot)
99
+ type: HuggingFaceH4/ifeval
100
+ args:
101
+ num_few_shot: 0
102
+ metrics:
103
+ - type: inst_level_strict_acc and prompt_level_strict_acc
104
+ value: 33.88
105
+ name: strict accuracy
106
+ source:
107
+ url: https://huggingface.co/spaces/open-llm-leaderboard/open_llm_leaderboard?query=LeroyDyer/SpydazWebAI_Human_AGI
108
+ name: Open LLM Leaderboard
109
+ - task:
110
+ type: text-generation
111
+ name: Text Generation
112
+ dataset:
113
+ name: BBH (3-Shot)
114
+ type: BBH
115
+ args:
116
+ num_few_shot: 3
117
+ metrics:
118
+ - type: acc_norm
119
+ value: 7.45
120
+ name: normalized accuracy
121
+ source:
122
+ url: https://huggingface.co/spaces/open-llm-leaderboard/open_llm_leaderboard?query=LeroyDyer/SpydazWebAI_Human_AGI
123
+ name: Open LLM Leaderboard
124
+ - task:
125
+ type: text-generation
126
+ name: Text Generation
127
+ dataset:
128
+ name: MATH Lvl 5 (4-Shot)
129
+ type: hendrycks/competition_math
130
+ args:
131
+ num_few_shot: 4
132
+ metrics:
133
+ - type: exact_match
134
+ value: 0.91
135
+ name: exact match
136
+ source:
137
+ url: https://huggingface.co/spaces/open-llm-leaderboard/open_llm_leaderboard?query=LeroyDyer/SpydazWebAI_Human_AGI
138
+ name: Open LLM Leaderboard
139
+ - task:
140
+ type: text-generation
141
+ name: Text Generation
142
+ dataset:
143
+ name: GPQA (0-shot)
144
+ type: Idavidrein/gpqa
145
+ args:
146
+ num_few_shot: 0
147
+ metrics:
148
+ - type: acc_norm
149
+ value: 4.36
150
+ name: acc_norm
151
+ source:
152
+ url: https://huggingface.co/spaces/open-llm-leaderboard/open_llm_leaderboard?query=LeroyDyer/SpydazWebAI_Human_AGI
153
+ name: Open LLM Leaderboard
154
+ - task:
155
+ type: text-generation
156
+ name: Text Generation
157
+ dataset:
158
+ name: MuSR (0-shot)
159
+ type: TAUR-Lab/MuSR
160
+ args:
161
+ num_few_shot: 0
162
+ metrics:
163
+ - type: acc_norm
164
+ value: 7.38
165
+ name: acc_norm
166
+ source:
167
+ url: https://huggingface.co/spaces/open-llm-leaderboard/open_llm_leaderboard?query=LeroyDyer/SpydazWebAI_Human_AGI
168
+ name: Open LLM Leaderboard
169
+ - task:
170
+ type: text-generation
171
+ name: Text Generation
172
+ dataset:
173
+ name: MMLU-PRO (5-shot)
174
+ type: TIGER-Lab/MMLU-Pro
175
+ config: main
176
+ split: test
177
+ args:
178
+ num_few_shot: 5
179
+ metrics:
180
+ - type: acc
181
+ value: 5.32
182
+ name: accuracy
183
+ source:
184
+ url: https://huggingface.co/spaces/open-llm-leaderboard/open_llm_leaderboard?query=LeroyDyer/SpydazWebAI_Human_AGI
185
+ name: Open LLM Leaderboard
186
  ---
187
 
188
 
 
918
 
919
 
920
 
921
+
922
+ # [Open LLM Leaderboard Evaluation Results](https://huggingface.co/spaces/open-llm-leaderboard/open_llm_leaderboard)
923
+ Detailed results can be found [here](https://huggingface.co/datasets/open-llm-leaderboard/details_LeroyDyer__SpydazWebAI_Human_AGI)
924
+
925
+ | Metric |Value|
926
+ |-------------------|----:|
927
+ |Avg. | 9.88|
928
+ |IFEval (0-Shot) |33.88|
929
+ |BBH (3-Shot) | 7.45|
930
+ |MATH Lvl 5 (4-Shot)| 0.91|
931
+ |GPQA (0-shot) | 4.36|
932
+ |MuSR (0-shot) | 7.38|
933
+ |MMLU-PRO (5-shot) | 5.32|
934
+