agentlans commited on
Commit
551bebf
·
verified ·
1 Parent(s): 9b32ead

Adding Evaluation Results (#1)

Browse files

- Adding Evaluation Results (8f7e9b52c77022ae9b8694e832740dc3bd9d6db1)

Files changed (1) hide show
  1. README.md +114 -1
README.md CHANGED
@@ -1,5 +1,104 @@
1
  ---
2
  license: llama3.2
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
3
  ---
4
  # Model Card: agentlans/Llama-3.2-1B-Instruct-CrashCourse12K
5
 
@@ -27,4 +126,18 @@ license: llama3.2
27
  ## Recommended Use
28
  - General instruction-based tasks
29
  - Educational content generation
30
- - Simple reasoning and task completion
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
  ---
2
  license: llama3.2
3
+ model-index:
4
+ - name: Llama-3.2-1B-Instruct-CrashCourse12K
5
+ results:
6
+ - task:
7
+ type: text-generation
8
+ name: Text Generation
9
+ dataset:
10
+ name: IFEval (0-Shot)
11
+ type: wis-k/instruction-following-eval
12
+ split: train
13
+ args:
14
+ num_few_shot: 0
15
+ metrics:
16
+ - type: inst_level_strict_acc and prompt_level_strict_acc
17
+ value: 53.95
18
+ name: averaged accuracy
19
+ source:
20
+ url: https://huggingface.co/spaces/open-llm-leaderboard/open_llm_leaderboard#/?search=agentlans%2FLlama-3.2-1B-Instruct-CrashCourse12K
21
+ name: Open LLM Leaderboard
22
+ - task:
23
+ type: text-generation
24
+ name: Text Generation
25
+ dataset:
26
+ name: BBH (3-Shot)
27
+ type: SaylorTwift/bbh
28
+ split: test
29
+ args:
30
+ num_few_shot: 3
31
+ metrics:
32
+ - type: acc_norm
33
+ value: 9.39
34
+ name: normalized accuracy
35
+ source:
36
+ url: https://huggingface.co/spaces/open-llm-leaderboard/open_llm_leaderboard#/?search=agentlans%2FLlama-3.2-1B-Instruct-CrashCourse12K
37
+ name: Open LLM Leaderboard
38
+ - task:
39
+ type: text-generation
40
+ name: Text Generation
41
+ dataset:
42
+ name: MATH Lvl 5 (4-Shot)
43
+ type: lighteval/MATH-Hard
44
+ split: test
45
+ args:
46
+ num_few_shot: 4
47
+ metrics:
48
+ - type: exact_match
49
+ value: 6.57
50
+ name: exact match
51
+ source:
52
+ url: https://huggingface.co/spaces/open-llm-leaderboard/open_llm_leaderboard#/?search=agentlans%2FLlama-3.2-1B-Instruct-CrashCourse12K
53
+ name: Open LLM Leaderboard
54
+ - task:
55
+ type: text-generation
56
+ name: Text Generation
57
+ dataset:
58
+ name: GPQA (0-shot)
59
+ type: Idavidrein/gpqa
60
+ split: train
61
+ args:
62
+ num_few_shot: 0
63
+ metrics:
64
+ - type: acc_norm
65
+ value: 0.0
66
+ name: acc_norm
67
+ source:
68
+ url: https://huggingface.co/spaces/open-llm-leaderboard/open_llm_leaderboard#/?search=agentlans%2FLlama-3.2-1B-Instruct-CrashCourse12K
69
+ name: Open LLM Leaderboard
70
+ - task:
71
+ type: text-generation
72
+ name: Text Generation
73
+ dataset:
74
+ name: MuSR (0-shot)
75
+ type: TAUR-Lab/MuSR
76
+ args:
77
+ num_few_shot: 0
78
+ metrics:
79
+ - type: acc_norm
80
+ value: 1.2
81
+ name: acc_norm
82
+ source:
83
+ url: https://huggingface.co/spaces/open-llm-leaderboard/open_llm_leaderboard#/?search=agentlans%2FLlama-3.2-1B-Instruct-CrashCourse12K
84
+ name: Open LLM Leaderboard
85
+ - task:
86
+ type: text-generation
87
+ name: Text Generation
88
+ dataset:
89
+ name: MMLU-PRO (5-shot)
90
+ type: TIGER-Lab/MMLU-Pro
91
+ config: main
92
+ split: test
93
+ args:
94
+ num_few_shot: 5
95
+ metrics:
96
+ - type: acc
97
+ value: 8.99
98
+ name: accuracy
99
+ source:
100
+ url: https://huggingface.co/spaces/open-llm-leaderboard/open_llm_leaderboard#/?search=agentlans%2FLlama-3.2-1B-Instruct-CrashCourse12K
101
+ name: Open LLM Leaderboard
102
  ---
103
  # Model Card: agentlans/Llama-3.2-1B-Instruct-CrashCourse12K
104
 
 
126
  ## Recommended Use
127
  - General instruction-based tasks
128
  - Educational content generation
129
+ - Simple reasoning and task completion
130
+ # [Open LLM Leaderboard Evaluation Results](https://huggingface.co/spaces/open-llm-leaderboard/open_llm_leaderboard)
131
+ Detailed results can be found [here](https://huggingface.co/datasets/open-llm-leaderboard/agentlans__Llama-3.2-1B-Instruct-CrashCourse12K-details)!
132
+ Summarized results can be found [here](https://huggingface.co/datasets/open-llm-leaderboard/contents/viewer/default/train?q=agentlans%2FLlama-3.2-1B-Instruct-CrashCourse12K&sort[column]=Average%20%E2%AC%86%EF%B8%8F&sort[direction]=desc)!
133
+
134
+ | Metric |Value (%)|
135
+ |-------------------|--------:|
136
+ |**Average** | 13.35|
137
+ |IFEval (0-Shot) | 53.95|
138
+ |BBH (3-Shot) | 9.39|
139
+ |MATH Lvl 5 (4-Shot)| 6.57|
140
+ |GPQA (0-shot) | 0.00|
141
+ |MuSR (0-shot) | 1.20|
142
+ |MMLU-PRO (5-shot) | 8.99|
143
+