Commit
57d013f
1 Parent(s): c6c7fdf

Adding Evaluation Results (#4)

Browse files

- Adding Evaluation Results (f8253381459297be52e9608b9df6c9cde06bf291)


Co-authored-by: Open LLM Leaderboard PR Bot <leaderboard-pr-bot@users.noreply.huggingface.co>

Files changed (1) hide show
  1. README.md +111 -3
README.md CHANGED
@@ -3,8 +3,6 @@ language:
3
  - en
4
  license: other
5
  library_name: transformers
6
- license_name: tongyi-qianwen
7
- license_link: https://huggingface.co/Qwen/Qwen2.5-72B-Instruct/blob/main/LICENSE
8
  tags:
9
  - chat
10
  - qwen
@@ -16,9 +14,106 @@ base_model: Qwen/Qwen2.5-72B
16
  datasets:
17
  - argilla/ultrafeedback-binarized-preferences
18
  model_name: calme-2.2-qwen2.5-72b
 
 
19
  pipeline_tag: text-generation
20
  inference: false
21
  model_creator: MaziyarPanahi
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
22
  ---
23
 
24
  <img src="./calme-2.webp" alt="Calme-2 Models" width="800" style="margin-left:'auto' margin-right:'auto' display:'block'"/>
@@ -90,4 +185,17 @@ model = AutoModelForCausalLM.from_pretrained("MaziyarPanahi/calme-2.2-qwen2.5-72
90
 
91
  # Ethical Considerations
92
 
93
- As with any large language model, users should be aware of potential biases and limitations. We recommend implementing appropriate safeguards and human oversight when deploying this model in production environments.
 
 
 
 
 
 
 
 
 
 
 
 
 
 
3
  - en
4
  license: other
5
  library_name: transformers
 
 
6
  tags:
7
  - chat
8
  - qwen
 
14
  datasets:
15
  - argilla/ultrafeedback-binarized-preferences
16
  model_name: calme-2.2-qwen2.5-72b
17
+ license_name: tongyi-qianwen
18
+ license_link: https://huggingface.co/Qwen/Qwen2.5-72B-Instruct/blob/main/LICENSE
19
  pipeline_tag: text-generation
20
  inference: false
21
  model_creator: MaziyarPanahi
22
+ model-index:
23
+ - name: calme-2.2-qwen2.5-72b
24
+ results:
25
+ - task:
26
+ type: text-generation
27
+ name: Text Generation
28
+ dataset:
29
+ name: IFEval (0-Shot)
30
+ type: HuggingFaceH4/ifeval
31
+ args:
32
+ num_few_shot: 0
33
+ metrics:
34
+ - type: inst_level_strict_acc and prompt_level_strict_acc
35
+ value: 84.77
36
+ name: strict accuracy
37
+ source:
38
+ url: https://huggingface.co/spaces/open-llm-leaderboard/open_llm_leaderboard?query=MaziyarPanahi/calme-2.2-qwen2.5-72b
39
+ name: Open LLM Leaderboard
40
+ - task:
41
+ type: text-generation
42
+ name: Text Generation
43
+ dataset:
44
+ name: BBH (3-Shot)
45
+ type: BBH
46
+ args:
47
+ num_few_shot: 3
48
+ metrics:
49
+ - type: acc_norm
50
+ value: 61.8
51
+ name: normalized accuracy
52
+ source:
53
+ url: https://huggingface.co/spaces/open-llm-leaderboard/open_llm_leaderboard?query=MaziyarPanahi/calme-2.2-qwen2.5-72b
54
+ name: Open LLM Leaderboard
55
+ - task:
56
+ type: text-generation
57
+ name: Text Generation
58
+ dataset:
59
+ name: MATH Lvl 5 (4-Shot)
60
+ type: hendrycks/competition_math
61
+ args:
62
+ num_few_shot: 4
63
+ metrics:
64
+ - type: exact_match
65
+ value: 3.63
66
+ name: exact match
67
+ source:
68
+ url: https://huggingface.co/spaces/open-llm-leaderboard/open_llm_leaderboard?query=MaziyarPanahi/calme-2.2-qwen2.5-72b
69
+ name: Open LLM Leaderboard
70
+ - task:
71
+ type: text-generation
72
+ name: Text Generation
73
+ dataset:
74
+ name: GPQA (0-shot)
75
+ type: Idavidrein/gpqa
76
+ args:
77
+ num_few_shot: 0
78
+ metrics:
79
+ - type: acc_norm
80
+ value: 14.54
81
+ name: acc_norm
82
+ source:
83
+ url: https://huggingface.co/spaces/open-llm-leaderboard/open_llm_leaderboard?query=MaziyarPanahi/calme-2.2-qwen2.5-72b
84
+ name: Open LLM Leaderboard
85
+ - task:
86
+ type: text-generation
87
+ name: Text Generation
88
+ dataset:
89
+ name: MuSR (0-shot)
90
+ type: TAUR-Lab/MuSR
91
+ args:
92
+ num_few_shot: 0
93
+ metrics:
94
+ - type: acc_norm
95
+ value: 12.02
96
+ name: acc_norm
97
+ source:
98
+ url: https://huggingface.co/spaces/open-llm-leaderboard/open_llm_leaderboard?query=MaziyarPanahi/calme-2.2-qwen2.5-72b
99
+ name: Open LLM Leaderboard
100
+ - task:
101
+ type: text-generation
102
+ name: Text Generation
103
+ dataset:
104
+ name: MMLU-PRO (5-shot)
105
+ type: TIGER-Lab/MMLU-Pro
106
+ config: main
107
+ split: test
108
+ args:
109
+ num_few_shot: 5
110
+ metrics:
111
+ - type: acc
112
+ value: 51.31
113
+ name: accuracy
114
+ source:
115
+ url: https://huggingface.co/spaces/open-llm-leaderboard/open_llm_leaderboard?query=MaziyarPanahi/calme-2.2-qwen2.5-72b
116
+ name: Open LLM Leaderboard
117
  ---
118
 
119
  <img src="./calme-2.webp" alt="Calme-2 Models" width="800" style="margin-left:'auto' margin-right:'auto' display:'block'"/>
 
185
 
186
  # Ethical Considerations
187
 
188
+ As with any large language model, users should be aware of potential biases and limitations. We recommend implementing appropriate safeguards and human oversight when deploying this model in production environments.
189
+ # [Open LLM Leaderboard Evaluation Results](https://huggingface.co/spaces/open-llm-leaderboard/open_llm_leaderboard)
190
+ Detailed results can be found [here](https://huggingface.co/datasets/open-llm-leaderboard/details_MaziyarPanahi__calme-2.2-qwen2.5-72b)
191
+
192
+ | Metric |Value|
193
+ |-------------------|----:|
194
+ |Avg. |38.01|
195
+ |IFEval (0-Shot) |84.77|
196
+ |BBH (3-Shot) |61.80|
197
+ |MATH Lvl 5 (4-Shot)| 3.63|
198
+ |GPQA (0-shot) |14.54|
199
+ |MuSR (0-shot) |12.02|
200
+ |MMLU-PRO (5-shot) |51.31|
201
+