Commit
a71be31
1 Parent(s): 94732ff

Adding the Open Portuguese LLM Leaderboard Evaluation Results (#1)

Browse files

- Adding the Open Portuguese LLM Leaderboard Evaluation Results (4c9f80877c2f3a4ff0e7bc4a11497cf173ab5bfa)


Co-authored-by: Open PT LLM Leaderboard PR Bot <leaderboard-pt-pr-bot@users.noreply.huggingface.co>

Files changed (1) hide show
  1. README.md +166 -1
README.md CHANGED
@@ -14,7 +14,153 @@ tags:
14
  - portuguese
15
  base_model: Qwen/Qwen1.5-72B-Chat
16
  pipeline_tag: text-generation
17
-
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
18
  ---
19
  # Cabra 72b
20
  <img src="https://uploads-ssl.webflow.com/65f77c0240ae1c68f8192771/6611c4d5c4e2b5eaea0b979c_cabra72b.png" width="400" height="400">
@@ -130,3 +276,22 @@ O modelo é destinado, por agora, a fins de pesquisa. As áreas e tarefas de pes
130
  | hatebr_offensive_binary | 1.0 | all | 25 | f1_macro | 0.7212| ± | 0.0087 |
131
  | | | all | 25 | acc | 0.7393| ± | 0.0083 |
132
  | oab_exams | 1.5 | all | 3 | acc | 0.5718| ± | 0.0061 |
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
14
  - portuguese
15
  base_model: Qwen/Qwen1.5-72B-Chat
16
  pipeline_tag: text-generation
17
+ model-index:
18
+ - name: Cabra-72b
19
+ results:
20
+ - task:
21
+ type: text-generation
22
+ name: Text Generation
23
+ dataset:
24
+ name: ENEM Challenge (No Images)
25
+ type: eduagarcia/enem_challenge
26
+ split: train
27
+ args:
28
+ num_few_shot: 3
29
+ metrics:
30
+ - type: acc
31
+ value: 80.62
32
+ name: accuracy
33
+ source:
34
+ url: https://huggingface.co/spaces/eduagarcia/open_pt_llm_leaderboard?query=botbot-ai/Cabra-72b
35
+ name: Open Portuguese LLM Leaderboard
36
+ - task:
37
+ type: text-generation
38
+ name: Text Generation
39
+ dataset:
40
+ name: BLUEX (No Images)
41
+ type: eduagarcia-temp/BLUEX_without_images
42
+ split: train
43
+ args:
44
+ num_few_shot: 3
45
+ metrics:
46
+ - type: acc
47
+ value: 67.45
48
+ name: accuracy
49
+ source:
50
+ url: https://huggingface.co/spaces/eduagarcia/open_pt_llm_leaderboard?query=botbot-ai/Cabra-72b
51
+ name: Open Portuguese LLM Leaderboard
52
+ - task:
53
+ type: text-generation
54
+ name: Text Generation
55
+ dataset:
56
+ name: OAB Exams
57
+ type: eduagarcia/oab_exams
58
+ split: train
59
+ args:
60
+ num_few_shot: 3
61
+ metrics:
62
+ - type: acc
63
+ value: 57.18
64
+ name: accuracy
65
+ source:
66
+ url: https://huggingface.co/spaces/eduagarcia/open_pt_llm_leaderboard?query=botbot-ai/Cabra-72b
67
+ name: Open Portuguese LLM Leaderboard
68
+ - task:
69
+ type: text-generation
70
+ name: Text Generation
71
+ dataset:
72
+ name: Assin2 RTE
73
+ type: assin2
74
+ split: test
75
+ args:
76
+ num_few_shot: 15
77
+ metrics:
78
+ - type: f1_macro
79
+ value: 93.58
80
+ name: f1-macro
81
+ source:
82
+ url: https://huggingface.co/spaces/eduagarcia/open_pt_llm_leaderboard?query=botbot-ai/Cabra-72b
83
+ name: Open Portuguese LLM Leaderboard
84
+ - task:
85
+ type: text-generation
86
+ name: Text Generation
87
+ dataset:
88
+ name: Assin2 STS
89
+ type: eduagarcia/portuguese_benchmark
90
+ split: test
91
+ args:
92
+ num_few_shot: 15
93
+ metrics:
94
+ - type: pearson
95
+ value: 78.03
96
+ name: pearson
97
+ source:
98
+ url: https://huggingface.co/spaces/eduagarcia/open_pt_llm_leaderboard?query=botbot-ai/Cabra-72b
99
+ name: Open Portuguese LLM Leaderboard
100
+ - task:
101
+ type: text-generation
102
+ name: Text Generation
103
+ dataset:
104
+ name: FaQuAD NLI
105
+ type: ruanchaves/faquad-nli
106
+ split: test
107
+ args:
108
+ num_few_shot: 15
109
+ metrics:
110
+ - type: f1_macro
111
+ value: 45.45
112
+ name: f1-macro
113
+ source:
114
+ url: https://huggingface.co/spaces/eduagarcia/open_pt_llm_leaderboard?query=botbot-ai/Cabra-72b
115
+ name: Open Portuguese LLM Leaderboard
116
+ - task:
117
+ type: text-generation
118
+ name: Text Generation
119
+ dataset:
120
+ name: HateBR Binary
121
+ type: ruanchaves/hatebr
122
+ split: test
123
+ args:
124
+ num_few_shot: 25
125
+ metrics:
126
+ - type: f1_macro
127
+ value: 72.12
128
+ name: f1-macro
129
+ source:
130
+ url: https://huggingface.co/spaces/eduagarcia/open_pt_llm_leaderboard?query=botbot-ai/Cabra-72b
131
+ name: Open Portuguese LLM Leaderboard
132
+ - task:
133
+ type: text-generation
134
+ name: Text Generation
135
+ dataset:
136
+ name: PT Hate Speech Binary
137
+ type: hate_speech_portuguese
138
+ split: test
139
+ args:
140
+ num_few_shot: 25
141
+ metrics:
142
+ - type: f1_macro
143
+ value: 68.65
144
+ name: f1-macro
145
+ source:
146
+ url: https://huggingface.co/spaces/eduagarcia/open_pt_llm_leaderboard?query=botbot-ai/Cabra-72b
147
+ name: Open Portuguese LLM Leaderboard
148
+ - task:
149
+ type: text-generation
150
+ name: Text Generation
151
+ dataset:
152
+ name: tweetSentBR
153
+ type: eduagarcia/tweetsentbr_fewshot
154
+ split: test
155
+ args:
156
+ num_few_shot: 25
157
+ metrics:
158
+ - type: f1_macro
159
+ value: 71.64
160
+ name: f1-macro
161
+ source:
162
+ url: https://huggingface.co/spaces/eduagarcia/open_pt_llm_leaderboard?query=botbot-ai/Cabra-72b
163
+ name: Open Portuguese LLM Leaderboard
164
  ---
165
  # Cabra 72b
166
  <img src="https://uploads-ssl.webflow.com/65f77c0240ae1c68f8192771/6611c4d5c4e2b5eaea0b979c_cabra72b.png" width="400" height="400">
 
276
  | hatebr_offensive_binary | 1.0 | all | 25 | f1_macro | 0.7212| ± | 0.0087 |
277
  | | | all | 25 | acc | 0.7393| ± | 0.0083 |
278
  | oab_exams | 1.5 | all | 3 | acc | 0.5718| ± | 0.0061 |
279
+
280
+
281
+ # Open Portuguese LLM Leaderboard Evaluation Results
282
+
283
+ Detailed results can be found [here](https://huggingface.co/datasets/eduagarcia-temp/llm_pt_leaderboard_raw_results/tree/main/botbot-ai/Cabra-72b) and on the [🚀 Open Portuguese LLM Leaderboard](https://huggingface.co/spaces/eduagarcia/open_pt_llm_leaderboard)
284
+
285
+ | Metric | Value |
286
+ |--------------------------|---------|
287
+ |Average |**70.52**|
288
+ |ENEM Challenge (No Images)| 80.62|
289
+ |BLUEX (No Images) | 67.45|
290
+ |OAB Exams | 57.18|
291
+ |Assin2 RTE | 93.58|
292
+ |Assin2 STS | 78.03|
293
+ |FaQuAD NLI | 45.45|
294
+ |HateBR Binary | 72.12|
295
+ |PT Hate Speech Binary | 68.65|
296
+ |tweetSentBR | 71.64|
297
+