Safetensors
qwen2
Eval Results

Adding Evaluation Results

#1
by agentlans - opened
Files changed (1) hide show
  1. README.md +114 -0
README.md CHANGED
@@ -5,6 +5,105 @@ datasets:
5
  - vicgalle/configurable-system-prompt-multitask
6
  base_model:
7
  - Qwen/Qwen2.5-0.5B-Instruct
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
8
  ---
9
  # Qwen2.5-0.5B-Instruct-CrashCourse-dropout
10
 
@@ -39,3 +138,18 @@ Users should be aware that this model, like all AI models, may reflect biases pr
39
  ## Additional Information
40
 
41
  For more details on the base model, please refer to the Qwen/Qwen2.5-0.5B-Instruct model card. For information about the datasets used in fine-tuning, check the respective dataset cards on the Hugging Face Hub.
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
5
  - vicgalle/configurable-system-prompt-multitask
6
  base_model:
7
  - Qwen/Qwen2.5-0.5B-Instruct
8
+ model-index:
9
+ - name: Qwen2.5-0.5B-Instruct-CrashCourse-dropout
10
+ results:
11
+ - task:
12
+ type: text-generation
13
+ name: Text Generation
14
+ dataset:
15
+ name: IFEval (0-Shot)
16
+ type: wis-k/instruction-following-eval
17
+ split: train
18
+ args:
19
+ num_few_shot: 0
20
+ metrics:
21
+ - type: inst_level_strict_acc and prompt_level_strict_acc
22
+ value: 29.49
23
+ name: averaged accuracy
24
+ source:
25
+ url: https://huggingface.co/spaces/open-llm-leaderboard/open_llm_leaderboard#/?search=agentlans%2FQwen2.5-0.5B-Instruct-CrashCourse-dropout
26
+ name: Open LLM Leaderboard
27
+ - task:
28
+ type: text-generation
29
+ name: Text Generation
30
+ dataset:
31
+ name: BBH (3-Shot)
32
+ type: SaylorTwift/bbh
33
+ split: test
34
+ args:
35
+ num_few_shot: 3
36
+ metrics:
37
+ - type: acc_norm
38
+ value: 7.23
39
+ name: normalized accuracy
40
+ source:
41
+ url: https://huggingface.co/spaces/open-llm-leaderboard/open_llm_leaderboard#/?search=agentlans%2FQwen2.5-0.5B-Instruct-CrashCourse-dropout
42
+ name: Open LLM Leaderboard
43
+ - task:
44
+ type: text-generation
45
+ name: Text Generation
46
+ dataset:
47
+ name: MATH Lvl 5 (4-Shot)
48
+ type: lighteval/MATH-Hard
49
+ split: test
50
+ args:
51
+ num_few_shot: 4
52
+ metrics:
53
+ - type: exact_match
54
+ value: 0.08
55
+ name: exact match
56
+ source:
57
+ url: https://huggingface.co/spaces/open-llm-leaderboard/open_llm_leaderboard#/?search=agentlans%2FQwen2.5-0.5B-Instruct-CrashCourse-dropout
58
+ name: Open LLM Leaderboard
59
+ - task:
60
+ type: text-generation
61
+ name: Text Generation
62
+ dataset:
63
+ name: GPQA (0-shot)
64
+ type: Idavidrein/gpqa
65
+ split: train
66
+ args:
67
+ num_few_shot: 0
68
+ metrics:
69
+ - type: acc_norm
70
+ value: 1.79
71
+ name: acc_norm
72
+ source:
73
+ url: https://huggingface.co/spaces/open-llm-leaderboard/open_llm_leaderboard#/?search=agentlans%2FQwen2.5-0.5B-Instruct-CrashCourse-dropout
74
+ name: Open LLM Leaderboard
75
+ - task:
76
+ type: text-generation
77
+ name: Text Generation
78
+ dataset:
79
+ name: MuSR (0-shot)
80
+ type: TAUR-Lab/MuSR
81
+ args:
82
+ num_few_shot: 0
83
+ metrics:
84
+ - type: acc_norm
85
+ value: 1.11
86
+ name: acc_norm
87
+ source:
88
+ url: https://huggingface.co/spaces/open-llm-leaderboard/open_llm_leaderboard#/?search=agentlans%2FQwen2.5-0.5B-Instruct-CrashCourse-dropout
89
+ name: Open LLM Leaderboard
90
+ - task:
91
+ type: text-generation
92
+ name: Text Generation
93
+ dataset:
94
+ name: MMLU-PRO (5-shot)
95
+ type: TIGER-Lab/MMLU-Pro
96
+ config: main
97
+ split: test
98
+ args:
99
+ num_few_shot: 5
100
+ metrics:
101
+ - type: acc
102
+ value: 6.76
103
+ name: accuracy
104
+ source:
105
+ url: https://huggingface.co/spaces/open-llm-leaderboard/open_llm_leaderboard#/?search=agentlans%2FQwen2.5-0.5B-Instruct-CrashCourse-dropout
106
+ name: Open LLM Leaderboard
107
  ---
108
  # Qwen2.5-0.5B-Instruct-CrashCourse-dropout
109
 
 
138
  ## Additional Information
139
 
140
  For more details on the base model, please refer to the Qwen/Qwen2.5-0.5B-Instruct model card. For information about the datasets used in fine-tuning, check the respective dataset cards on the Hugging Face Hub.
141
+
142
+ # [Open LLM Leaderboard Evaluation Results](https://huggingface.co/spaces/open-llm-leaderboard/open_llm_leaderboard)
143
+ Detailed results can be found [here](https://huggingface.co/datasets/open-llm-leaderboard/agentlans__Qwen2.5-0.5B-Instruct-CrashCourse-dropout-details)!
144
+ Summarized results can be found [here](https://huggingface.co/datasets/open-llm-leaderboard/contents/viewer/default/train?q=agentlans%2FQwen2.5-0.5B-Instruct-CrashCourse-dropout&sort[column]=Average%20%E2%AC%86%EF%B8%8F&sort[direction]=desc)!
145
+
146
+ | Metric |Value (%)|
147
+ |-------------------|--------:|
148
+ |**Average** | 7.74|
149
+ |IFEval (0-Shot) | 29.49|
150
+ |BBH (3-Shot) | 7.23|
151
+ |MATH Lvl 5 (4-Shot)| 0.08|
152
+ |GPQA (0-shot) | 1.79|
153
+ |MuSR (0-shot) | 1.11|
154
+ |MMLU-PRO (5-shot) | 6.76|
155
+