leaderboard-pr-bot commited on
Commit
659ac42
1 Parent(s): db30dbe

Adding Evaluation Results

Browse files

This is an automated PR created with https://huggingface.co/spaces/Weyaxi/open-llm-leaderboard-results-pr

The purpose of this PR is to add evaluation results from the Open LLM Leaderboard to your model card.

If you encounter any issues, please report them to https://huggingface.co/spaces/Weyaxi/open-llm-leaderboard-results-pr/discussions

Files changed (1) hide show
  1. README.md +112 -4
README.md CHANGED
@@ -1,9 +1,11 @@
1
  ---
 
 
 
2
  library_name: transformers
3
  tags:
4
  - mergekit
5
  - merge
6
- license: apache-2.0
7
  base_model:
8
  - arcee-ai/Virtuoso-Small
9
  - CultriX/SeQwence-14B-EvolMerge
@@ -12,11 +14,104 @@ base_model:
12
  - underwoods/medius-erebus-magnum-14b
13
  - sometimesanotion/lamarck-14b-prose-model_stock
14
  - sometimesanotion/lamarck-14b-reason-model_stock
15
- language:
16
- - en
17
  metrics:
18
  - accuracy
19
  pipeline_tag: text-generation
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
20
  ---
21
  ![Lamarck.webp](https://huggingface.co/sometimesanotion/Lamarck-14B-v0.3/resolve/main/Lamarck.webp)
22
  ---
@@ -227,4 +322,17 @@ dtype: bfloat16
227
  out_dtype: bfloat16
228
  ---
229
 
230
- ```
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
  ---
2
+ language:
3
+ - en
4
+ license: apache-2.0
5
  library_name: transformers
6
  tags:
7
  - mergekit
8
  - merge
 
9
  base_model:
10
  - arcee-ai/Virtuoso-Small
11
  - CultriX/SeQwence-14B-EvolMerge
 
14
  - underwoods/medius-erebus-magnum-14b
15
  - sometimesanotion/lamarck-14b-prose-model_stock
16
  - sometimesanotion/lamarck-14b-reason-model_stock
 
 
17
  metrics:
18
  - accuracy
19
  pipeline_tag: text-generation
20
+ model-index:
21
+ - name: Lamarck-14B-v0.3
22
+ results:
23
+ - task:
24
+ type: text-generation
25
+ name: Text Generation
26
+ dataset:
27
+ name: IFEval (0-Shot)
28
+ type: HuggingFaceH4/ifeval
29
+ args:
30
+ num_few_shot: 0
31
+ metrics:
32
+ - type: inst_level_strict_acc and prompt_level_strict_acc
33
+ value: 50.32
34
+ name: strict accuracy
35
+ source:
36
+ url: https://huggingface.co/spaces/open-llm-leaderboard/open_llm_leaderboard?query=sometimesanotion/Lamarck-14B-v0.3
37
+ name: Open LLM Leaderboard
38
+ - task:
39
+ type: text-generation
40
+ name: Text Generation
41
+ dataset:
42
+ name: BBH (3-Shot)
43
+ type: BBH
44
+ args:
45
+ num_few_shot: 3
46
+ metrics:
47
+ - type: acc_norm
48
+ value: 51.27
49
+ name: normalized accuracy
50
+ source:
51
+ url: https://huggingface.co/spaces/open-llm-leaderboard/open_llm_leaderboard?query=sometimesanotion/Lamarck-14B-v0.3
52
+ name: Open LLM Leaderboard
53
+ - task:
54
+ type: text-generation
55
+ name: Text Generation
56
+ dataset:
57
+ name: MATH Lvl 5 (4-Shot)
58
+ type: hendrycks/competition_math
59
+ args:
60
+ num_few_shot: 4
61
+ metrics:
62
+ - type: exact_match
63
+ value: 32.4
64
+ name: exact match
65
+ source:
66
+ url: https://huggingface.co/spaces/open-llm-leaderboard/open_llm_leaderboard?query=sometimesanotion/Lamarck-14B-v0.3
67
+ name: Open LLM Leaderboard
68
+ - task:
69
+ type: text-generation
70
+ name: Text Generation
71
+ dataset:
72
+ name: GPQA (0-shot)
73
+ type: Idavidrein/gpqa
74
+ args:
75
+ num_few_shot: 0
76
+ metrics:
77
+ - type: acc_norm
78
+ value: 18.46
79
+ name: acc_norm
80
+ source:
81
+ url: https://huggingface.co/spaces/open-llm-leaderboard/open_llm_leaderboard?query=sometimesanotion/Lamarck-14B-v0.3
82
+ name: Open LLM Leaderboard
83
+ - task:
84
+ type: text-generation
85
+ name: Text Generation
86
+ dataset:
87
+ name: MuSR (0-shot)
88
+ type: TAUR-Lab/MuSR
89
+ args:
90
+ num_few_shot: 0
91
+ metrics:
92
+ - type: acc_norm
93
+ value: 18.0
94
+ name: acc_norm
95
+ source:
96
+ url: https://huggingface.co/spaces/open-llm-leaderboard/open_llm_leaderboard?query=sometimesanotion/Lamarck-14B-v0.3
97
+ name: Open LLM Leaderboard
98
+ - task:
99
+ type: text-generation
100
+ name: Text Generation
101
+ dataset:
102
+ name: MMLU-PRO (5-shot)
103
+ type: TIGER-Lab/MMLU-Pro
104
+ config: main
105
+ split: test
106
+ args:
107
+ num_few_shot: 5
108
+ metrics:
109
+ - type: acc
110
+ value: 49.01
111
+ name: accuracy
112
+ source:
113
+ url: https://huggingface.co/spaces/open-llm-leaderboard/open_llm_leaderboard?query=sometimesanotion/Lamarck-14B-v0.3
114
+ name: Open LLM Leaderboard
115
  ---
116
  ![Lamarck.webp](https://huggingface.co/sometimesanotion/Lamarck-14B-v0.3/resolve/main/Lamarck.webp)
117
  ---
 
322
  out_dtype: bfloat16
323
  ---
324
 
325
+ ```
326
+ # [Open LLM Leaderboard Evaluation Results](https://huggingface.co/spaces/open-llm-leaderboard/open_llm_leaderboard)
327
+ Detailed results can be found [here](https://huggingface.co/datasets/open-llm-leaderboard/details_sometimesanotion__Lamarck-14B-v0.3)
328
+
329
+ | Metric |Value|
330
+ |-------------------|----:|
331
+ |Avg. |36.58|
332
+ |IFEval (0-Shot) |50.32|
333
+ |BBH (3-Shot) |51.27|
334
+ |MATH Lvl 5 (4-Shot)|32.40|
335
+ |GPQA (0-shot) |18.46|
336
+ |MuSR (0-shot) |18.00|
337
+ |MMLU-PRO (5-shot) |49.01|
338
+