Adding the Open Portuguese LLM Leaderboard Evaluation Results

#3
Files changed (1) hide show
  1. README.md +170 -4
README.md CHANGED
@@ -1,14 +1,161 @@
1
  ---
 
 
 
 
2
  base_model:
3
  - anthracite-forge/magnum-v3-27b-kto-r3
4
  - anthracite-forge/magnum-v3-27b-KTO-e1-r2
5
  - anthracite-forge/magnum-v3-27b-KTO-e0.25-r1
6
  - IntervitensInc/gemma-2-27b-chatml
7
- library_name: transformers
8
- license: gemma
9
  pipeline_tag: text-generation
10
- tags:
11
- - gemma-2
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
12
  ---
13
 
14
  ![image/png](https://cdn-uploads.huggingface.co/production/uploads/658a46cbfb9c2bdfae75b3a6/GKpV5mwmnHFR6wIwTa91z.png)
@@ -154,3 +301,22 @@ The training was done for 2 epochs. We used 8x[H100s](https://www.nvidia.com/en
154
 
155
  ## Safety
156
  ...
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
  ---
2
+ license: gemma
3
+ library_name: transformers
4
+ tags:
5
+ - gemma-2
6
  base_model:
7
  - anthracite-forge/magnum-v3-27b-kto-r3
8
  - anthracite-forge/magnum-v3-27b-KTO-e1-r2
9
  - anthracite-forge/magnum-v3-27b-KTO-e0.25-r1
10
  - IntervitensInc/gemma-2-27b-chatml
 
 
11
  pipeline_tag: text-generation
12
+ model-index:
13
+ - name: magnum-v3-27b-kto
14
+ results:
15
+ - task:
16
+ type: text-generation
17
+ name: Text Generation
18
+ dataset:
19
+ name: ENEM Challenge (No Images)
20
+ type: eduagarcia/enem_challenge
21
+ split: train
22
+ args:
23
+ num_few_shot: 3
24
+ metrics:
25
+ - type: acc
26
+ value: 74.88
27
+ name: accuracy
28
+ source:
29
+ url: https://huggingface.co/spaces/eduagarcia/open_pt_llm_leaderboard?query=anthracite-org/magnum-v3-27b-kto
30
+ name: Open Portuguese LLM Leaderboard
31
+ - task:
32
+ type: text-generation
33
+ name: Text Generation
34
+ dataset:
35
+ name: BLUEX (No Images)
36
+ type: eduagarcia-temp/BLUEX_without_images
37
+ split: train
38
+ args:
39
+ num_few_shot: 3
40
+ metrics:
41
+ - type: acc
42
+ value: 64.67
43
+ name: accuracy
44
+ source:
45
+ url: https://huggingface.co/spaces/eduagarcia/open_pt_llm_leaderboard?query=anthracite-org/magnum-v3-27b-kto
46
+ name: Open Portuguese LLM Leaderboard
47
+ - task:
48
+ type: text-generation
49
+ name: Text Generation
50
+ dataset:
51
+ name: OAB Exams
52
+ type: eduagarcia/oab_exams
53
+ split: train
54
+ args:
55
+ num_few_shot: 3
56
+ metrics:
57
+ - type: acc
58
+ value: 56.36
59
+ name: accuracy
60
+ source:
61
+ url: https://huggingface.co/spaces/eduagarcia/open_pt_llm_leaderboard?query=anthracite-org/magnum-v3-27b-kto
62
+ name: Open Portuguese LLM Leaderboard
63
+ - task:
64
+ type: text-generation
65
+ name: Text Generation
66
+ dataset:
67
+ name: Assin2 RTE
68
+ type: assin2
69
+ split: test
70
+ args:
71
+ num_few_shot: 15
72
+ metrics:
73
+ - type: f1_macro
74
+ value: 92.35
75
+ name: f1-macro
76
+ source:
77
+ url: https://huggingface.co/spaces/eduagarcia/open_pt_llm_leaderboard?query=anthracite-org/magnum-v3-27b-kto
78
+ name: Open Portuguese LLM Leaderboard
79
+ - task:
80
+ type: text-generation
81
+ name: Text Generation
82
+ dataset:
83
+ name: Assin2 STS
84
+ type: eduagarcia/portuguese_benchmark
85
+ split: test
86
+ args:
87
+ num_few_shot: 15
88
+ metrics:
89
+ - type: pearson
90
+ value: 80.24
91
+ name: pearson
92
+ source:
93
+ url: https://huggingface.co/spaces/eduagarcia/open_pt_llm_leaderboard?query=anthracite-org/magnum-v3-27b-kto
94
+ name: Open Portuguese LLM Leaderboard
95
+ - task:
96
+ type: text-generation
97
+ name: Text Generation
98
+ dataset:
99
+ name: FaQuAD NLI
100
+ type: ruanchaves/faquad-nli
101
+ split: test
102
+ args:
103
+ num_few_shot: 15
104
+ metrics:
105
+ - type: f1_macro
106
+ value: 73.61
107
+ name: f1-macro
108
+ source:
109
+ url: https://huggingface.co/spaces/eduagarcia/open_pt_llm_leaderboard?query=anthracite-org/magnum-v3-27b-kto
110
+ name: Open Portuguese LLM Leaderboard
111
+ - task:
112
+ type: text-generation
113
+ name: Text Generation
114
+ dataset:
115
+ name: HateBR Binary
116
+ type: ruanchaves/hatebr
117
+ split: test
118
+ args:
119
+ num_few_shot: 25
120
+ metrics:
121
+ - type: f1_macro
122
+ value: 75.97
123
+ name: f1-macro
124
+ source:
125
+ url: https://huggingface.co/spaces/eduagarcia/open_pt_llm_leaderboard?query=anthracite-org/magnum-v3-27b-kto
126
+ name: Open Portuguese LLM Leaderboard
127
+ - task:
128
+ type: text-generation
129
+ name: Text Generation
130
+ dataset:
131
+ name: PT Hate Speech Binary
132
+ type: hate_speech_portuguese
133
+ split: test
134
+ args:
135
+ num_few_shot: 25
136
+ metrics:
137
+ - type: f1_macro
138
+ value: 71.35
139
+ name: f1-macro
140
+ source:
141
+ url: https://huggingface.co/spaces/eduagarcia/open_pt_llm_leaderboard?query=anthracite-org/magnum-v3-27b-kto
142
+ name: Open Portuguese LLM Leaderboard
143
+ - task:
144
+ type: text-generation
145
+ name: Text Generation
146
+ dataset:
147
+ name: tweetSentBR
148
+ type: eduagarcia/tweetsentbr_fewshot
149
+ split: test
150
+ args:
151
+ num_few_shot: 25
152
+ metrics:
153
+ - type: f1_macro
154
+ value: 68.8
155
+ name: f1-macro
156
+ source:
157
+ url: https://huggingface.co/spaces/eduagarcia/open_pt_llm_leaderboard?query=anthracite-org/magnum-v3-27b-kto
158
+ name: Open Portuguese LLM Leaderboard
159
  ---
160
 
161
  ![image/png](https://cdn-uploads.huggingface.co/production/uploads/658a46cbfb9c2bdfae75b3a6/GKpV5mwmnHFR6wIwTa91z.png)
 
301
 
302
  ## Safety
303
  ...
304
+
305
+
306
+ # Open Portuguese LLM Leaderboard Evaluation Results
307
+
308
+ Detailed results can be found [here](https://huggingface.co/datasets/eduagarcia-temp/llm_pt_leaderboard_raw_results/tree/main/anthracite-org/magnum-v3-27b-kto) and on the [πŸš€ Open Portuguese LLM Leaderboard](https://huggingface.co/spaces/eduagarcia/open_pt_llm_leaderboard)
309
+
310
+ | Metric | Value |
311
+ |--------------------------|---------|
312
+ |Average |**73.14**|
313
+ |ENEM Challenge (No Images)| 74.88|
314
+ |BLUEX (No Images) | 64.67|
315
+ |OAB Exams | 56.36|
316
+ |Assin2 RTE | 92.35|
317
+ |Assin2 STS | 80.24|
318
+ |FaQuAD NLI | 73.61|
319
+ |HateBR Binary | 75.97|
320
+ |PT Hate Speech Binary | 71.35|
321
+ |tweetSentBR | 68.80|
322
+