leaderboard-pt-pr-bot commited on
Commit
492ee6a
1 Parent(s): d6d5b63

Adding the Open Portuguese LLM Leaderboard Evaluation Results

Browse files

This is an automated PR created with https://huggingface.co/spaces/eduagarcia-temp/portuguese-leaderboard-results-to-modelcard

The purpose of this PR is to add evaluation results from the Open Portuguese LLM Leaderboard to your model card.

If you encounter any issues, please report them to https://huggingface.co/spaces/eduagarcia-temp/portuguese-leaderboard-results-to-modelcard/discussions

Files changed (1) hide show
  1. README.md +167 -1
README.md CHANGED
@@ -11,6 +11,153 @@ tags:
11
  base_model: unsloth/mistral-7b-bnb-4bit
12
  datasets:
13
  - lucianosb/cetacean-ptbr
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
14
  ---
15
  # Boto 7B v1.1
16
 
@@ -99,4 +246,23 @@ O uso do modelo é de inteira responsabilidade do usuário. O desenvolvedor do m
99
 
100
  This mistral model was trained 2x faster with [Unsloth](https://github.com/unslothai/unsloth) and Huggingface's TRL library.
101
 
102
- [<img src="https://raw.githubusercontent.com/unslothai/unsloth/main/images/unsloth%20made%20with%20love.png" width="200"/>](https://github.com/unslothai/unsloth)
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
11
  base_model: unsloth/mistral-7b-bnb-4bit
12
  datasets:
13
  - lucianosb/cetacean-ptbr
14
+ model-index:
15
+ - name: boto-7B-v1.1
16
+ results:
17
+ - task:
18
+ type: text-generation
19
+ name: Text Generation
20
+ dataset:
21
+ name: ENEM Challenge (No Images)
22
+ type: eduagarcia/enem_challenge
23
+ split: train
24
+ args:
25
+ num_few_shot: 3
26
+ metrics:
27
+ - type: acc
28
+ value: 60.81
29
+ name: accuracy
30
+ source:
31
+ url: https://huggingface.co/spaces/eduagarcia/open_pt_llm_leaderboard?query=lucianosb/boto-7B-v1.1
32
+ name: Open Portuguese LLM Leaderboard
33
+ - task:
34
+ type: text-generation
35
+ name: Text Generation
36
+ dataset:
37
+ name: BLUEX (No Images)
38
+ type: eduagarcia-temp/BLUEX_without_images
39
+ split: train
40
+ args:
41
+ num_few_shot: 3
42
+ metrics:
43
+ - type: acc
44
+ value: 49.37
45
+ name: accuracy
46
+ source:
47
+ url: https://huggingface.co/spaces/eduagarcia/open_pt_llm_leaderboard?query=lucianosb/boto-7B-v1.1
48
+ name: Open Portuguese LLM Leaderboard
49
+ - task:
50
+ type: text-generation
51
+ name: Text Generation
52
+ dataset:
53
+ name: OAB Exams
54
+ type: eduagarcia/oab_exams
55
+ split: train
56
+ args:
57
+ num_few_shot: 3
58
+ metrics:
59
+ - type: acc
60
+ value: 42.28
61
+ name: accuracy
62
+ source:
63
+ url: https://huggingface.co/spaces/eduagarcia/open_pt_llm_leaderboard?query=lucianosb/boto-7B-v1.1
64
+ name: Open Portuguese LLM Leaderboard
65
+ - task:
66
+ type: text-generation
67
+ name: Text Generation
68
+ dataset:
69
+ name: Assin2 RTE
70
+ type: assin2
71
+ split: test
72
+ args:
73
+ num_few_shot: 15
74
+ metrics:
75
+ - type: f1_macro
76
+ value: 88.12
77
+ name: f1-macro
78
+ source:
79
+ url: https://huggingface.co/spaces/eduagarcia/open_pt_llm_leaderboard?query=lucianosb/boto-7B-v1.1
80
+ name: Open Portuguese LLM Leaderboard
81
+ - task:
82
+ type: text-generation
83
+ name: Text Generation
84
+ dataset:
85
+ name: Assin2 STS
86
+ type: eduagarcia/portuguese_benchmark
87
+ split: test
88
+ args:
89
+ num_few_shot: 15
90
+ metrics:
91
+ - type: pearson
92
+ value: 63.71
93
+ name: pearson
94
+ source:
95
+ url: https://huggingface.co/spaces/eduagarcia/open_pt_llm_leaderboard?query=lucianosb/boto-7B-v1.1
96
+ name: Open Portuguese LLM Leaderboard
97
+ - task:
98
+ type: text-generation
99
+ name: Text Generation
100
+ dataset:
101
+ name: FaQuAD NLI
102
+ type: ruanchaves/faquad-nli
103
+ split: test
104
+ args:
105
+ num_few_shot: 15
106
+ metrics:
107
+ - type: f1_macro
108
+ value: 48.83
109
+ name: f1-macro
110
+ source:
111
+ url: https://huggingface.co/spaces/eduagarcia/open_pt_llm_leaderboard?query=lucianosb/boto-7B-v1.1
112
+ name: Open Portuguese LLM Leaderboard
113
+ - task:
114
+ type: text-generation
115
+ name: Text Generation
116
+ dataset:
117
+ name: HateBR Binary
118
+ type: ruanchaves/hatebr
119
+ split: test
120
+ args:
121
+ num_few_shot: 25
122
+ metrics:
123
+ - type: f1_macro
124
+ value: 75.97
125
+ name: f1-macro
126
+ source:
127
+ url: https://huggingface.co/spaces/eduagarcia/open_pt_llm_leaderboard?query=lucianosb/boto-7B-v1.1
128
+ name: Open Portuguese LLM Leaderboard
129
+ - task:
130
+ type: text-generation
131
+ name: Text Generation
132
+ dataset:
133
+ name: PT Hate Speech Binary
134
+ type: hate_speech_portuguese
135
+ split: test
136
+ args:
137
+ num_few_shot: 25
138
+ metrics:
139
+ - type: f1_macro
140
+ value: 64.66
141
+ name: f1-macro
142
+ source:
143
+ url: https://huggingface.co/spaces/eduagarcia/open_pt_llm_leaderboard?query=lucianosb/boto-7B-v1.1
144
+ name: Open Portuguese LLM Leaderboard
145
+ - task:
146
+ type: text-generation
147
+ name: Text Generation
148
+ dataset:
149
+ name: tweetSentBR
150
+ type: eduagarcia/tweetsentbr_fewshot
151
+ split: test
152
+ args:
153
+ num_few_shot: 25
154
+ metrics:
155
+ - type: f1_macro
156
+ value: 55.45
157
+ name: f1-macro
158
+ source:
159
+ url: https://huggingface.co/spaces/eduagarcia/open_pt_llm_leaderboard?query=lucianosb/boto-7B-v1.1
160
+ name: Open Portuguese LLM Leaderboard
161
  ---
162
  # Boto 7B v1.1
163
 
 
246
 
247
  This mistral model was trained 2x faster with [Unsloth](https://github.com/unslothai/unsloth) and Huggingface's TRL library.
248
 
249
+ [<img src="https://raw.githubusercontent.com/unslothai/unsloth/main/images/unsloth%20made%20with%20love.png" width="200"/>](https://github.com/unslothai/unsloth)
250
+
251
+
252
+ # Open Portuguese LLM Leaderboard Evaluation Results
253
+
254
+ Detailed results can be found [here](https://huggingface.co/datasets/eduagarcia-temp/llm_pt_leaderboard_raw_results/tree/main/lucianosb/boto-7B-v1.1) and on the [🚀 Open Portuguese LLM Leaderboard](https://huggingface.co/spaces/eduagarcia/open_pt_llm_leaderboard)
255
+
256
+ | Metric | Value |
257
+ |--------------------------|---------|
258
+ |Average |**61.02**|
259
+ |ENEM Challenge (No Images)| 60.81|
260
+ |BLUEX (No Images) | 49.37|
261
+ |OAB Exams | 42.28|
262
+ |Assin2 RTE | 88.12|
263
+ |Assin2 STS | 63.71|
264
+ |FaQuAD NLI | 48.83|
265
+ |HateBR Binary | 75.97|
266
+ |PT Hate Speech Binary | 64.66|
267
+ |tweetSentBR | 55.45|
268
+