Adding the Open Portuguese LLM Leaderboard Evaluation Results

#1
Files changed (1) hide show
  1. README.md +169 -3
README.md CHANGED
@@ -1,9 +1,156 @@
1
  ---
 
 
2
  library_name: transformers
3
  datasets:
4
  - adalbertojunior/dolphin_portuguese_legal
5
- language:
6
- - pt
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
7
  ---
8
 
9
  # Model Card for Model ID
@@ -199,4 +346,23 @@ Carbon emissions can be estimated using the [Machine Learning Impact calculator]
199
 
200
  ## Model Card Contact
201
 
202
- [More Information Needed]
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
  ---
2
+ language:
3
+ - pt
4
  library_name: transformers
5
  datasets:
6
  - adalbertojunior/dolphin_portuguese_legal
7
+ model-index:
8
+ - name: NeuralDaredevil-Dolphin-Portuguese
9
+ results:
10
+ - task:
11
+ type: text-generation
12
+ name: Text Generation
13
+ dataset:
14
+ name: ENEM Challenge (No Images)
15
+ type: eduagarcia/enem_challenge
16
+ split: train
17
+ args:
18
+ num_few_shot: 3
19
+ metrics:
20
+ - type: acc
21
+ value: 7.42
22
+ name: accuracy
23
+ source:
24
+ url: https://huggingface.co/spaces/eduagarcia/open_pt_llm_leaderboard?query=adalbertojunior/NeuralDaredevil-Dolphin-Portuguese
25
+ name: Open Portuguese LLM Leaderboard
26
+ - task:
27
+ type: text-generation
28
+ name: Text Generation
29
+ dataset:
30
+ name: BLUEX (No Images)
31
+ type: eduagarcia-temp/BLUEX_without_images
32
+ split: train
33
+ args:
34
+ num_few_shot: 3
35
+ metrics:
36
+ - type: acc
37
+ value: 6.12
38
+ name: accuracy
39
+ source:
40
+ url: https://huggingface.co/spaces/eduagarcia/open_pt_llm_leaderboard?query=adalbertojunior/NeuralDaredevil-Dolphin-Portuguese
41
+ name: Open Portuguese LLM Leaderboard
42
+ - task:
43
+ type: text-generation
44
+ name: Text Generation
45
+ dataset:
46
+ name: OAB Exams
47
+ type: eduagarcia/oab_exams
48
+ split: train
49
+ args:
50
+ num_few_shot: 3
51
+ metrics:
52
+ - type: acc
53
+ value: 0.96
54
+ name: accuracy
55
+ source:
56
+ url: https://huggingface.co/spaces/eduagarcia/open_pt_llm_leaderboard?query=adalbertojunior/NeuralDaredevil-Dolphin-Portuguese
57
+ name: Open Portuguese LLM Leaderboard
58
+ - task:
59
+ type: text-generation
60
+ name: Text Generation
61
+ dataset:
62
+ name: Assin2 RTE
63
+ type: assin2
64
+ split: test
65
+ args:
66
+ num_few_shot: 15
67
+ metrics:
68
+ - type: f1_macro
69
+ value: 93.05
70
+ name: f1-macro
71
+ source:
72
+ url: https://huggingface.co/spaces/eduagarcia/open_pt_llm_leaderboard?query=adalbertojunior/NeuralDaredevil-Dolphin-Portuguese
73
+ name: Open Portuguese LLM Leaderboard
74
+ - task:
75
+ type: text-generation
76
+ name: Text Generation
77
+ dataset:
78
+ name: Assin2 STS
79
+ type: eduagarcia/portuguese_benchmark
80
+ split: test
81
+ args:
82
+ num_few_shot: 15
83
+ metrics:
84
+ - type: pearson
85
+ value: 74.65
86
+ name: pearson
87
+ source:
88
+ url: https://huggingface.co/spaces/eduagarcia/open_pt_llm_leaderboard?query=adalbertojunior/NeuralDaredevil-Dolphin-Portuguese
89
+ name: Open Portuguese LLM Leaderboard
90
+ - task:
91
+ type: text-generation
92
+ name: Text Generation
93
+ dataset:
94
+ name: FaQuAD NLI
95
+ type: ruanchaves/faquad-nli
96
+ split: test
97
+ args:
98
+ num_few_shot: 15
99
+ metrics:
100
+ - type: f1_macro
101
+ value: 77.75
102
+ name: f1-macro
103
+ source:
104
+ url: https://huggingface.co/spaces/eduagarcia/open_pt_llm_leaderboard?query=adalbertojunior/NeuralDaredevil-Dolphin-Portuguese
105
+ name: Open Portuguese LLM Leaderboard
106
+ - task:
107
+ type: text-generation
108
+ name: Text Generation
109
+ dataset:
110
+ name: HateBR Binary
111
+ type: ruanchaves/hatebr
112
+ split: test
113
+ args:
114
+ num_few_shot: 25
115
+ metrics:
116
+ - type: f1_macro
117
+ value: 86.17
118
+ name: f1-macro
119
+ source:
120
+ url: https://huggingface.co/spaces/eduagarcia/open_pt_llm_leaderboard?query=adalbertojunior/NeuralDaredevil-Dolphin-Portuguese
121
+ name: Open Portuguese LLM Leaderboard
122
+ - task:
123
+ type: text-generation
124
+ name: Text Generation
125
+ dataset:
126
+ name: PT Hate Speech Binary
127
+ type: hate_speech_portuguese
128
+ split: test
129
+ args:
130
+ num_few_shot: 25
131
+ metrics:
132
+ - type: f1_macro
133
+ value: 66.06
134
+ name: f1-macro
135
+ source:
136
+ url: https://huggingface.co/spaces/eduagarcia/open_pt_llm_leaderboard?query=adalbertojunior/NeuralDaredevil-Dolphin-Portuguese
137
+ name: Open Portuguese LLM Leaderboard
138
+ - task:
139
+ type: text-generation
140
+ name: Text Generation
141
+ dataset:
142
+ name: tweetSentBR
143
+ type: eduagarcia/tweetsentbr_fewshot
144
+ split: test
145
+ args:
146
+ num_few_shot: 25
147
+ metrics:
148
+ - type: f1_macro
149
+ value: 71.71
150
+ name: f1-macro
151
+ source:
152
+ url: https://huggingface.co/spaces/eduagarcia/open_pt_llm_leaderboard?query=adalbertojunior/NeuralDaredevil-Dolphin-Portuguese
153
+ name: Open Portuguese LLM Leaderboard
154
  ---
155
 
156
  # Model Card for Model ID
 
346
 
347
  ## Model Card Contact
348
 
349
+ [More Information Needed]
350
+
351
+
352
+ # Open Portuguese LLM Leaderboard Evaluation Results
353
+
354
+ Detailed results can be found [here](https://huggingface.co/datasets/eduagarcia-temp/llm_pt_leaderboard_raw_results/tree/main/adalbertojunior/NeuralDaredevil-Dolphin-Portuguese) and on the [πŸš€ Open Portuguese LLM Leaderboard](https://huggingface.co/spaces/eduagarcia/open_pt_llm_leaderboard)
355
+
356
+ | Metric | Value |
357
+ |--------------------------|---------|
358
+ |Average |**53.76**|
359
+ |ENEM Challenge (No Images)| 7.42|
360
+ |BLUEX (No Images) | 6.12|
361
+ |OAB Exams | 0.96|
362
+ |Assin2 RTE | 93.05|
363
+ |Assin2 STS | 74.65|
364
+ |FaQuAD NLI | 77.75|
365
+ |HateBR Binary | 86.17|
366
+ |PT Hate Speech Binary | 66.06|
367
+ |tweetSentBR | 71.71|
368
+