Commit
777ceb2
1 Parent(s): a113c02

Adding the Open Portuguese LLM Leaderboard Evaluation Results (#1)

Browse files

- Adding the Open Portuguese LLM Leaderboard Evaluation Results (f155de50d32d0d5d305b6c31cb99401bd10b5f9c)


Co-authored-by: Open PT LLM Leaderboard PR Bot <leaderboard-pt-pr-bot@users.noreply.huggingface.co>

Files changed (1) hide show
  1. README.md +171 -5
README.md CHANGED
@@ -1,13 +1,160 @@
1
  ---
2
- library_name: peft
3
- base_model: TheBloke/zephyr-7B-beta-GPTQ
4
- revision: gptq-8bit-32g-actorder_True
5
- license: mit
6
  language:
7
  - pt
 
 
8
  tags:
9
  - gptq
10
  - ptbr
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
11
  ---
12
  ## Training procedure
13
 
@@ -95,4 +242,23 @@ get_inference('Poderia indicar filmes de ação de até 2 horas?', model)
95
  ```
96
 
97
 
98
- - PEFT 0.5.0
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
  ---
 
 
 
 
2
  language:
3
  - pt
4
+ license: mit
5
+ library_name: peft
6
  tags:
7
  - gptq
8
  - ptbr
9
+ base_model: TheBloke/zephyr-7B-beta-GPTQ
10
+ revision: gptq-8bit-32g-actorder_True
11
+ model-index:
12
+ - name: cesar-ptbr
13
+ results:
14
+ - task:
15
+ type: text-generation
16
+ name: Text Generation
17
+ dataset:
18
+ name: ENEM Challenge (No Images)
19
+ type: eduagarcia/enem_challenge
20
+ split: train
21
+ args:
22
+ num_few_shot: 3
23
+ metrics:
24
+ - type: acc
25
+ value: 53.74
26
+ name: accuracy
27
+ source:
28
+ url: https://huggingface.co/spaces/eduagarcia/open_pt_llm_leaderboard?query=matheusrdgsf/cesar-ptbr
29
+ name: Open Portuguese LLM Leaderboard
30
+ - task:
31
+ type: text-generation
32
+ name: Text Generation
33
+ dataset:
34
+ name: BLUEX (No Images)
35
+ type: eduagarcia-temp/BLUEX_without_images
36
+ split: train
37
+ args:
38
+ num_few_shot: 3
39
+ metrics:
40
+ - type: acc
41
+ value: 46.87
42
+ name: accuracy
43
+ source:
44
+ url: https://huggingface.co/spaces/eduagarcia/open_pt_llm_leaderboard?query=matheusrdgsf/cesar-ptbr
45
+ name: Open Portuguese LLM Leaderboard
46
+ - task:
47
+ type: text-generation
48
+ name: Text Generation
49
+ dataset:
50
+ name: OAB Exams
51
+ type: eduagarcia/oab_exams
52
+ split: train
53
+ args:
54
+ num_few_shot: 3
55
+ metrics:
56
+ - type: acc
57
+ value: 38.27
58
+ name: accuracy
59
+ source:
60
+ url: https://huggingface.co/spaces/eduagarcia/open_pt_llm_leaderboard?query=matheusrdgsf/cesar-ptbr
61
+ name: Open Portuguese LLM Leaderboard
62
+ - task:
63
+ type: text-generation
64
+ name: Text Generation
65
+ dataset:
66
+ name: Assin2 RTE
67
+ type: assin2
68
+ split: test
69
+ args:
70
+ num_few_shot: 15
71
+ metrics:
72
+ - type: f1_macro
73
+ value: 58.32
74
+ name: f1-macro
75
+ source:
76
+ url: https://huggingface.co/spaces/eduagarcia/open_pt_llm_leaderboard?query=matheusrdgsf/cesar-ptbr
77
+ name: Open Portuguese LLM Leaderboard
78
+ - task:
79
+ type: text-generation
80
+ name: Text Generation
81
+ dataset:
82
+ name: Assin2 STS
83
+ type: eduagarcia/portuguese_benchmark
84
+ split: test
85
+ args:
86
+ num_few_shot: 15
87
+ metrics:
88
+ - type: pearson
89
+ value: 68.49
90
+ name: pearson
91
+ source:
92
+ url: https://huggingface.co/spaces/eduagarcia/open_pt_llm_leaderboard?query=matheusrdgsf/cesar-ptbr
93
+ name: Open Portuguese LLM Leaderboard
94
+ - task:
95
+ type: text-generation
96
+ name: Text Generation
97
+ dataset:
98
+ name: FaQuAD NLI
99
+ type: ruanchaves/faquad-nli
100
+ split: test
101
+ args:
102
+ num_few_shot: 15
103
+ metrics:
104
+ - type: f1_macro
105
+ value: 73.81
106
+ name: f1-macro
107
+ source:
108
+ url: https://huggingface.co/spaces/eduagarcia/open_pt_llm_leaderboard?query=matheusrdgsf/cesar-ptbr
109
+ name: Open Portuguese LLM Leaderboard
110
+ - task:
111
+ type: text-generation
112
+ name: Text Generation
113
+ dataset:
114
+ name: HateBR Binary
115
+ type: ruanchaves/hatebr
116
+ split: test
117
+ args:
118
+ num_few_shot: 25
119
+ metrics:
120
+ - type: f1_macro
121
+ value: 83.3
122
+ name: f1-macro
123
+ source:
124
+ url: https://huggingface.co/spaces/eduagarcia/open_pt_llm_leaderboard?query=matheusrdgsf/cesar-ptbr
125
+ name: Open Portuguese LLM Leaderboard
126
+ - task:
127
+ type: text-generation
128
+ name: Text Generation
129
+ dataset:
130
+ name: PT Hate Speech Binary
131
+ type: hate_speech_portuguese
132
+ split: test
133
+ args:
134
+ num_few_shot: 25
135
+ metrics:
136
+ - type: f1_macro
137
+ value: 67.49
138
+ name: f1-macro
139
+ source:
140
+ url: https://huggingface.co/spaces/eduagarcia/open_pt_llm_leaderboard?query=matheusrdgsf/cesar-ptbr
141
+ name: Open Portuguese LLM Leaderboard
142
+ - task:
143
+ type: text-generation
144
+ name: Text Generation
145
+ dataset:
146
+ name: tweetSentBR
147
+ type: eduagarcia/tweetsentbr_fewshot
148
+ split: test
149
+ args:
150
+ num_few_shot: 25
151
+ metrics:
152
+ - type: f1_macro
153
+ value: 42.71
154
+ name: f1-macro
155
+ source:
156
+ url: https://huggingface.co/spaces/eduagarcia/open_pt_llm_leaderboard?query=matheusrdgsf/cesar-ptbr
157
+ name: Open Portuguese LLM Leaderboard
158
  ---
159
  ## Training procedure
160
 
 
242
  ```
243
 
244
 
245
+ - PEFT 0.5.0
246
+
247
+
248
+ # Open Portuguese LLM Leaderboard Evaluation Results
249
+
250
+ Detailed results can be found [here](https://huggingface.co/datasets/eduagarcia-temp/llm_pt_leaderboard_raw_results/tree/main/matheusrdgsf/cesar-ptbr) and on the [🚀 Open Portuguese LLM Leaderboard](https://huggingface.co/spaces/eduagarcia/open_pt_llm_leaderboard)
251
+
252
+ | Metric | Value |
253
+ |--------------------------|---------|
254
+ |Average |**59.22**|
255
+ |ENEM Challenge (No Images)| 53.74|
256
+ |BLUEX (No Images) | 46.87|
257
+ |OAB Exams | 38.27|
258
+ |Assin2 RTE | 58.32|
259
+ |Assin2 STS | 68.49|
260
+ |FaQuAD NLI | 73.81|
261
+ |HateBR Binary | 83.30|
262
+ |PT Hate Speech Binary | 67.49|
263
+ |tweetSentBR | 42.71|
264
+