Commit
f3e03b9
1 Parent(s): 45f1a3a

Adding the Open Portuguese LLM Leaderboard Evaluation Results (#1)

Browse files

- Adding the Open Portuguese LLM Leaderboard Evaluation Results (fa72378e6538f62ed634cec11b0d38a0a3407c5b)


Co-authored-by: Open PT LLM Leaderboard PR Bot <leaderboard-pt-pr-bot@users.noreply.huggingface.co>

Files changed (1) hide show
  1. README.md +164 -0
README.md CHANGED
@@ -1,5 +1,152 @@
1
  ---
2
  license: mit
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
3
  ---
4
  # Phi-Bode
5
 
@@ -19,3 +166,20 @@ que não possuem recursos computacionais disponíveis para o uso de LLMs (Large
19
  - **Treinamento:** O treinamento foi realizado a partir do fine-tuning completo do phi-1.5.
20
 
21
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
  ---
2
  license: mit
3
+ model-index:
4
+ - name: phibode_1_5_ultraalpaca
5
+ results:
6
+ - task:
7
+ type: text-generation
8
+ name: Text Generation
9
+ dataset:
10
+ name: ENEM Challenge (No Images)
11
+ type: eduagarcia/enem_challenge
12
+ split: train
13
+ args:
14
+ num_few_shot: 3
15
+ metrics:
16
+ - type: acc
17
+ value: 23.58
18
+ name: accuracy
19
+ source:
20
+ url: https://huggingface.co/spaces/eduagarcia/open_pt_llm_leaderboard?query=recogna-nlp/phibode_1_5_ultraalpaca
21
+ name: Open Portuguese LLM Leaderboard
22
+ - task:
23
+ type: text-generation
24
+ name: Text Generation
25
+ dataset:
26
+ name: BLUEX (No Images)
27
+ type: eduagarcia-temp/BLUEX_without_images
28
+ split: train
29
+ args:
30
+ num_few_shot: 3
31
+ metrics:
32
+ - type: acc
33
+ value: 20.72
34
+ name: accuracy
35
+ source:
36
+ url: https://huggingface.co/spaces/eduagarcia/open_pt_llm_leaderboard?query=recogna-nlp/phibode_1_5_ultraalpaca
37
+ name: Open Portuguese LLM Leaderboard
38
+ - task:
39
+ type: text-generation
40
+ name: Text Generation
41
+ dataset:
42
+ name: OAB Exams
43
+ type: eduagarcia/oab_exams
44
+ split: train
45
+ args:
46
+ num_few_shot: 3
47
+ metrics:
48
+ - type: acc
49
+ value: 24.87
50
+ name: accuracy
51
+ source:
52
+ url: https://huggingface.co/spaces/eduagarcia/open_pt_llm_leaderboard?query=recogna-nlp/phibode_1_5_ultraalpaca
53
+ name: Open Portuguese LLM Leaderboard
54
+ - task:
55
+ type: text-generation
56
+ name: Text Generation
57
+ dataset:
58
+ name: Assin2 RTE
59
+ type: assin2
60
+ split: test
61
+ args:
62
+ num_few_shot: 15
63
+ metrics:
64
+ - type: f1_macro
65
+ value: 69.07
66
+ name: f1-macro
67
+ source:
68
+ url: https://huggingface.co/spaces/eduagarcia/open_pt_llm_leaderboard?query=recogna-nlp/phibode_1_5_ultraalpaca
69
+ name: Open Portuguese LLM Leaderboard
70
+ - task:
71
+ type: text-generation
72
+ name: Text Generation
73
+ dataset:
74
+ name: Assin2 STS
75
+ type: eduagarcia/portuguese_benchmark
76
+ split: test
77
+ args:
78
+ num_few_shot: 15
79
+ metrics:
80
+ - type: pearson
81
+ value: 4.94
82
+ name: pearson
83
+ source:
84
+ url: https://huggingface.co/spaces/eduagarcia/open_pt_llm_leaderboard?query=recogna-nlp/phibode_1_5_ultraalpaca
85
+ name: Open Portuguese LLM Leaderboard
86
+ - task:
87
+ type: text-generation
88
+ name: Text Generation
89
+ dataset:
90
+ name: FaQuAD NLI
91
+ type: ruanchaves/faquad-nli
92
+ split: test
93
+ args:
94
+ num_few_shot: 15
95
+ metrics:
96
+ - type: f1_macro
97
+ value: 43.97
98
+ name: f1-macro
99
+ source:
100
+ url: https://huggingface.co/spaces/eduagarcia/open_pt_llm_leaderboard?query=recogna-nlp/phibode_1_5_ultraalpaca
101
+ name: Open Portuguese LLM Leaderboard
102
+ - task:
103
+ type: text-generation
104
+ name: Text Generation
105
+ dataset:
106
+ name: HateBR Binary
107
+ type: ruanchaves/hatebr
108
+ split: test
109
+ args:
110
+ num_few_shot: 25
111
+ metrics:
112
+ - type: f1_macro
113
+ value: 34.94
114
+ name: f1-macro
115
+ source:
116
+ url: https://huggingface.co/spaces/eduagarcia/open_pt_llm_leaderboard?query=recogna-nlp/phibode_1_5_ultraalpaca
117
+ name: Open Portuguese LLM Leaderboard
118
+ - task:
119
+ type: text-generation
120
+ name: Text Generation
121
+ dataset:
122
+ name: PT Hate Speech Binary
123
+ type: hate_speech_portuguese
124
+ split: test
125
+ args:
126
+ num_few_shot: 25
127
+ metrics:
128
+ - type: f1_macro
129
+ value: 41.23
130
+ name: f1-macro
131
+ source:
132
+ url: https://huggingface.co/spaces/eduagarcia/open_pt_llm_leaderboard?query=recogna-nlp/phibode_1_5_ultraalpaca
133
+ name: Open Portuguese LLM Leaderboard
134
+ - task:
135
+ type: text-generation
136
+ name: Text Generation
137
+ dataset:
138
+ name: tweetSentBR
139
+ type: eduagarcia/tweetsentbr_fewshot
140
+ split: test
141
+ args:
142
+ num_few_shot: 25
143
+ metrics:
144
+ - type: f1_macro
145
+ value: 24.19
146
+ name: f1-macro
147
+ source:
148
+ url: https://huggingface.co/spaces/eduagarcia/open_pt_llm_leaderboard?query=recogna-nlp/phibode_1_5_ultraalpaca
149
+ name: Open Portuguese LLM Leaderboard
150
  ---
151
  # Phi-Bode
152
 
 
166
  - **Treinamento:** O treinamento foi realizado a partir do fine-tuning completo do phi-1.5.
167
 
168
 
169
+ # Open Portuguese LLM Leaderboard Evaluation Results
170
+
171
+ Detailed results can be found [here](https://huggingface.co/datasets/eduagarcia-temp/llm_pt_leaderboard_raw_results/tree/main/recogna-nlp/phibode_1_5_ultraalpaca) and on the [🚀 Open Portuguese LLM Leaderboard](https://huggingface.co/spaces/eduagarcia/open_pt_llm_leaderboard)
172
+
173
+ | Metric | Value |
174
+ |--------------------------|---------|
175
+ |Average |**31.95**|
176
+ |ENEM Challenge (No Images)| 23.58|
177
+ |BLUEX (No Images) | 20.72|
178
+ |OAB Exams | 24.87|
179
+ |Assin2 RTE | 69.07|
180
+ |Assin2 STS | 4.94|
181
+ |FaQuAD NLI | 43.97|
182
+ |HateBR Binary | 34.94|
183
+ |PT Hate Speech Binary | 41.23|
184
+ |tweetSentBR | 24.19|
185
+