leaderboard-pt-pr-bot commited on
Commit
0fa5158
1 Parent(s): d32ed61

Adding the Open Portuguese LLM Leaderboard Evaluation Results

Browse files

This is an automated PR created with https://huggingface.co/spaces/eduagarcia-temp/portuguese-leaderboard-results-to-modelcard

The purpose of this PR is to add evaluation results from the Open Portuguese LLM Leaderboard to your model card.

If you encounter any issues, please report them to https://huggingface.co/spaces/eduagarcia-temp/portuguese-leaderboard-results-to-modelcard/discussions

Files changed (1) hide show
  1. README.md +166 -0
README.md CHANGED
@@ -1,5 +1,152 @@
1
  ---
2
  library_name: peft
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
3
  ---
4
  ## Training procedure
5
 
@@ -22,3 +169,22 @@ The following `bitsandbytes` quantization config was used during training:
22
 
23
 
24
  - PEFT 0.5.0
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
  ---
2
  library_name: peft
3
+ model-index:
4
+ - name: mistralbode_7b_qlora_ultraalpaca
5
+ results:
6
+ - task:
7
+ type: text-generation
8
+ name: Text Generation
9
+ dataset:
10
+ name: ENEM Challenge (No Images)
11
+ type: eduagarcia/enem_challenge
12
+ split: train
13
+ args:
14
+ num_few_shot: 3
15
+ metrics:
16
+ - type: acc
17
+ value: 56.82
18
+ name: accuracy
19
+ source:
20
+ url: https://huggingface.co/spaces/eduagarcia/open_pt_llm_leaderboard?query=recogna-nlp/mistralbode_7b_qlora_ultraalpaca
21
+ name: Open Portuguese LLM Leaderboard
22
+ - task:
23
+ type: text-generation
24
+ name: Text Generation
25
+ dataset:
26
+ name: BLUEX (No Images)
27
+ type: eduagarcia-temp/BLUEX_without_images
28
+ split: train
29
+ args:
30
+ num_few_shot: 3
31
+ metrics:
32
+ - type: acc
33
+ value: 47.15
34
+ name: accuracy
35
+ source:
36
+ url: https://huggingface.co/spaces/eduagarcia/open_pt_llm_leaderboard?query=recogna-nlp/mistralbode_7b_qlora_ultraalpaca
37
+ name: Open Portuguese LLM Leaderboard
38
+ - task:
39
+ type: text-generation
40
+ name: Text Generation
41
+ dataset:
42
+ name: OAB Exams
43
+ type: eduagarcia/oab_exams
44
+ split: train
45
+ args:
46
+ num_few_shot: 3
47
+ metrics:
48
+ - type: acc
49
+ value: 36.31
50
+ name: accuracy
51
+ source:
52
+ url: https://huggingface.co/spaces/eduagarcia/open_pt_llm_leaderboard?query=recogna-nlp/mistralbode_7b_qlora_ultraalpaca
53
+ name: Open Portuguese LLM Leaderboard
54
+ - task:
55
+ type: text-generation
56
+ name: Text Generation
57
+ dataset:
58
+ name: Assin2 RTE
59
+ type: assin2
60
+ split: test
61
+ args:
62
+ num_few_shot: 15
63
+ metrics:
64
+ - type: f1_macro
65
+ value: 88.92
66
+ name: f1-macro
67
+ source:
68
+ url: https://huggingface.co/spaces/eduagarcia/open_pt_llm_leaderboard?query=recogna-nlp/mistralbode_7b_qlora_ultraalpaca
69
+ name: Open Portuguese LLM Leaderboard
70
+ - task:
71
+ type: text-generation
72
+ name: Text Generation
73
+ dataset:
74
+ name: Assin2 STS
75
+ type: eduagarcia/portuguese_benchmark
76
+ split: test
77
+ args:
78
+ num_few_shot: 15
79
+ metrics:
80
+ - type: pearson
81
+ value: 76.37
82
+ name: pearson
83
+ source:
84
+ url: https://huggingface.co/spaces/eduagarcia/open_pt_llm_leaderboard?query=recogna-nlp/mistralbode_7b_qlora_ultraalpaca
85
+ name: Open Portuguese LLM Leaderboard
86
+ - task:
87
+ type: text-generation
88
+ name: Text Generation
89
+ dataset:
90
+ name: FaQuAD NLI
91
+ type: ruanchaves/faquad-nli
92
+ split: test
93
+ args:
94
+ num_few_shot: 15
95
+ metrics:
96
+ - type: f1_macro
97
+ value: 67.17
98
+ name: f1-macro
99
+ source:
100
+ url: https://huggingface.co/spaces/eduagarcia/open_pt_llm_leaderboard?query=recogna-nlp/mistralbode_7b_qlora_ultraalpaca
101
+ name: Open Portuguese LLM Leaderboard
102
+ - task:
103
+ type: text-generation
104
+ name: Text Generation
105
+ dataset:
106
+ name: HateBR Binary
107
+ type: ruanchaves/hatebr
108
+ split: test
109
+ args:
110
+ num_few_shot: 25
111
+ metrics:
112
+ - type: f1_macro
113
+ value: 82.02
114
+ name: f1-macro
115
+ source:
116
+ url: https://huggingface.co/spaces/eduagarcia/open_pt_llm_leaderboard?query=recogna-nlp/mistralbode_7b_qlora_ultraalpaca
117
+ name: Open Portuguese LLM Leaderboard
118
+ - task:
119
+ type: text-generation
120
+ name: Text Generation
121
+ dataset:
122
+ name: PT Hate Speech Binary
123
+ type: hate_speech_portuguese
124
+ split: test
125
+ args:
126
+ num_few_shot: 25
127
+ metrics:
128
+ - type: f1_macro
129
+ value: 69.24
130
+ name: f1-macro
131
+ source:
132
+ url: https://huggingface.co/spaces/eduagarcia/open_pt_llm_leaderboard?query=recogna-nlp/mistralbode_7b_qlora_ultraalpaca
133
+ name: Open Portuguese LLM Leaderboard
134
+ - task:
135
+ type: text-generation
136
+ name: Text Generation
137
+ dataset:
138
+ name: tweetSentBR
139
+ type: eduagarcia/tweetsentbr_fewshot
140
+ split: test
141
+ args:
142
+ num_few_shot: 25
143
+ metrics:
144
+ - type: f1_macro
145
+ value: 48.14
146
+ name: f1-macro
147
+ source:
148
+ url: https://huggingface.co/spaces/eduagarcia/open_pt_llm_leaderboard?query=recogna-nlp/mistralbode_7b_qlora_ultraalpaca
149
+ name: Open Portuguese LLM Leaderboard
150
  ---
151
  ## Training procedure
152
 
 
169
 
170
 
171
  - PEFT 0.5.0
172
+
173
+
174
+ # Open Portuguese LLM Leaderboard Evaluation Results
175
+
176
+ Detailed results can be found [here](https://huggingface.co/datasets/eduagarcia-temp/llm_pt_leaderboard_raw_results/tree/main/recogna-nlp/mistralbode_7b_qlora_ultraalpaca) and on the [🚀 Open Portuguese LLM Leaderboard](https://huggingface.co/spaces/eduagarcia/open_pt_llm_leaderboard)
177
+
178
+ | Metric | Value |
179
+ |--------------------------|---------|
180
+ |Average |**63.57**|
181
+ |ENEM Challenge (No Images)| 56.82|
182
+ |BLUEX (No Images) | 47.15|
183
+ |OAB Exams | 36.31|
184
+ |Assin2 RTE | 88.92|
185
+ |Assin2 STS | 76.37|
186
+ |FaQuAD NLI | 67.17|
187
+ |HateBR Binary | 82.02|
188
+ |PT Hate Speech Binary | 69.24|
189
+ |tweetSentBR | 48.14|
190
+