Adding the Open Portuguese LLM Leaderboard Evaluation Results

#1
Files changed (1) hide show
  1. README.md +168 -3
README.md CHANGED
@@ -1,10 +1,156 @@
1
-
2
  ---
 
 
3
  library_name: transformers
4
  datasets:
5
  - adalbertojunior/dolphin_pt_test
6
- language:
7
- - pt
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
8
  ---
9
 
10
  # Model Card for Llama-3-8B-Dolphin-Portuguese-v0.2
@@ -52,3 +198,22 @@ outputs = pipeline(
52
  )
53
  print(outputs[0]["generated_text"][len(prompt):])
54
  ```
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
  ---
2
+ language:
3
+ - pt
4
  library_name: transformers
5
  datasets:
6
  - adalbertojunior/dolphin_pt_test
7
+ model-index:
8
+ - name: Llama-3-8B-Dolphin-Portuguese-v0.2
9
+ results:
10
+ - task:
11
+ type: text-generation
12
+ name: Text Generation
13
+ dataset:
14
+ name: ENEM Challenge (No Images)
15
+ type: eduagarcia/enem_challenge
16
+ split: train
17
+ args:
18
+ num_few_shot: 3
19
+ metrics:
20
+ - type: acc
21
+ value: 67.39
22
+ name: accuracy
23
+ source:
24
+ url: https://huggingface.co/spaces/eduagarcia/open_pt_llm_leaderboard?query=adalbertojunior/Llama-3-8B-Dolphin-Portuguese-v0.2
25
+ name: Open Portuguese LLM Leaderboard
26
+ - task:
27
+ type: text-generation
28
+ name: Text Generation
29
+ dataset:
30
+ name: BLUEX (No Images)
31
+ type: eduagarcia-temp/BLUEX_without_images
32
+ split: train
33
+ args:
34
+ num_few_shot: 3
35
+ metrics:
36
+ - type: acc
37
+ value: 53.82
38
+ name: accuracy
39
+ source:
40
+ url: https://huggingface.co/spaces/eduagarcia/open_pt_llm_leaderboard?query=adalbertojunior/Llama-3-8B-Dolphin-Portuguese-v0.2
41
+ name: Open Portuguese LLM Leaderboard
42
+ - task:
43
+ type: text-generation
44
+ name: Text Generation
45
+ dataset:
46
+ name: OAB Exams
47
+ type: eduagarcia/oab_exams
48
+ split: train
49
+ args:
50
+ num_few_shot: 3
51
+ metrics:
52
+ - type: acc
53
+ value: 45.38
54
+ name: accuracy
55
+ source:
56
+ url: https://huggingface.co/spaces/eduagarcia/open_pt_llm_leaderboard?query=adalbertojunior/Llama-3-8B-Dolphin-Portuguese-v0.2
57
+ name: Open Portuguese LLM Leaderboard
58
+ - task:
59
+ type: text-generation
60
+ name: Text Generation
61
+ dataset:
62
+ name: Assin2 RTE
63
+ type: assin2
64
+ split: test
65
+ args:
66
+ num_few_shot: 15
67
+ metrics:
68
+ - type: f1_macro
69
+ value: 92.97
70
+ name: f1-macro
71
+ source:
72
+ url: https://huggingface.co/spaces/eduagarcia/open_pt_llm_leaderboard?query=adalbertojunior/Llama-3-8B-Dolphin-Portuguese-v0.2
73
+ name: Open Portuguese LLM Leaderboard
74
+ - task:
75
+ type: text-generation
76
+ name: Text Generation
77
+ dataset:
78
+ name: Assin2 STS
79
+ type: eduagarcia/portuguese_benchmark
80
+ split: test
81
+ args:
82
+ num_few_shot: 15
83
+ metrics:
84
+ - type: pearson
85
+ value: 75.99
86
+ name: pearson
87
+ source:
88
+ url: https://huggingface.co/spaces/eduagarcia/open_pt_llm_leaderboard?query=adalbertojunior/Llama-3-8B-Dolphin-Portuguese-v0.2
89
+ name: Open Portuguese LLM Leaderboard
90
+ - task:
91
+ type: text-generation
92
+ name: Text Generation
93
+ dataset:
94
+ name: FaQuAD NLI
95
+ type: ruanchaves/faquad-nli
96
+ split: test
97
+ args:
98
+ num_few_shot: 15
99
+ metrics:
100
+ - type: f1_macro
101
+ value: 80.17
102
+ name: f1-macro
103
+ source:
104
+ url: https://huggingface.co/spaces/eduagarcia/open_pt_llm_leaderboard?query=adalbertojunior/Llama-3-8B-Dolphin-Portuguese-v0.2
105
+ name: Open Portuguese LLM Leaderboard
106
+ - task:
107
+ type: text-generation
108
+ name: Text Generation
109
+ dataset:
110
+ name: HateBR Binary
111
+ type: ruanchaves/hatebr
112
+ split: test
113
+ args:
114
+ num_few_shot: 25
115
+ metrics:
116
+ - type: f1_macro
117
+ value: 88.26
118
+ name: f1-macro
119
+ source:
120
+ url: https://huggingface.co/spaces/eduagarcia/open_pt_llm_leaderboard?query=adalbertojunior/Llama-3-8B-Dolphin-Portuguese-v0.2
121
+ name: Open Portuguese LLM Leaderboard
122
+ - task:
123
+ type: text-generation
124
+ name: Text Generation
125
+ dataset:
126
+ name: PT Hate Speech Binary
127
+ type: hate_speech_portuguese
128
+ split: test
129
+ args:
130
+ num_few_shot: 25
131
+ metrics:
132
+ - type: f1_macro
133
+ value: 58.45
134
+ name: f1-macro
135
+ source:
136
+ url: https://huggingface.co/spaces/eduagarcia/open_pt_llm_leaderboard?query=adalbertojunior/Llama-3-8B-Dolphin-Portuguese-v0.2
137
+ name: Open Portuguese LLM Leaderboard
138
+ - task:
139
+ type: text-generation
140
+ name: Text Generation
141
+ dataset:
142
+ name: tweetSentBR
143
+ type: eduagarcia/tweetsentbr_fewshot
144
+ split: test
145
+ args:
146
+ num_few_shot: 25
147
+ metrics:
148
+ - type: f1_macro
149
+ value: 69.56
150
+ name: f1-macro
151
+ source:
152
+ url: https://huggingface.co/spaces/eduagarcia/open_pt_llm_leaderboard?query=adalbertojunior/Llama-3-8B-Dolphin-Portuguese-v0.2
153
+ name: Open Portuguese LLM Leaderboard
154
  ---
155
 
156
  # Model Card for Llama-3-8B-Dolphin-Portuguese-v0.2
 
198
  )
199
  print(outputs[0]["generated_text"][len(prompt):])
200
  ```
201
+
202
+
203
+ # Open Portuguese LLM Leaderboard Evaluation Results
204
+
205
+ Detailed results can be found [here](https://huggingface.co/datasets/eduagarcia-temp/llm_pt_leaderboard_raw_results/tree/main/adalbertojunior/Llama-3-8B-Dolphin-Portuguese-v0.2) and on the [πŸš€ Open Portuguese LLM Leaderboard](https://huggingface.co/spaces/eduagarcia/open_pt_llm_leaderboard)
206
+
207
+ | Metric | Value |
208
+ |--------------------------|---------|
209
+ |Average |**70.22**|
210
+ |ENEM Challenge (No Images)| 67.39|
211
+ |BLUEX (No Images) | 53.82|
212
+ |OAB Exams | 45.38|
213
+ |Assin2 RTE | 92.97|
214
+ |Assin2 STS | 75.99|
215
+ |FaQuAD NLI | 80.17|
216
+ |HateBR Binary | 88.26|
217
+ |PT Hate Speech Binary | 58.45|
218
+ |tweetSentBR | 69.56|
219
+