Adding the Open Portuguese LLM Leaderboard Evaluation Results

#1
Files changed (1) hide show
  1. README.md +142 -5
README.md CHANGED
@@ -1,13 +1,133 @@
1
  ---
2
-
3
- datasets:
4
- - dominguesm/Canarim-Instruct-PTBR-Dataset
5
- library_name: adapter-transformers
6
- pipeline_tag: text-generation
7
  language:
8
  - pt
9
  - en
 
 
 
 
10
  thumbnail: https://blog.cobasi.com.br/wp-content/uploads/2022/08/AdobeStock_461738919.webp
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
11
  ---
12
  <!-- header start -->
13
  <div style="width: 100%;">
@@ -122,3 +242,20 @@ Os computadores quânticos são um tipo de computador cuja arquitetura é basead
122
  - Pytorch 2.0.1+cu118
123
  - Datasets 2.12.0
124
  - Tokenizers 0.13.3
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
  ---
 
 
 
 
 
2
  language:
3
  - pt
4
  - en
5
+ library_name: adapter-transformers
6
+ datasets:
7
+ - dominguesm/Canarim-Instruct-PTBR-Dataset
8
+ pipeline_tag: text-generation
9
  thumbnail: https://blog.cobasi.com.br/wp-content/uploads/2022/08/AdobeStock_461738919.webp
10
+ model-index:
11
+ - name: Caramelinho
12
+ results:
13
+ - task:
14
+ type: text-generation
15
+ name: Text Generation
16
+ dataset:
17
+ name: ENEM Challenge (No Images)
18
+ type: eduagarcia/enem_challenge
19
+ split: train
20
+ args:
21
+ num_few_shot: 3
22
+ metrics:
23
+ - type: acc
24
+ value: 21.48
25
+ name: accuracy
26
+ source:
27
+ url: https://huggingface.co/spaces/eduagarcia/open_pt_llm_leaderboard?query=Bruno/Caramelinho
28
+ name: Open Portuguese LLM Leaderboard
29
+ - task:
30
+ type: text-generation
31
+ name: Text Generation
32
+ dataset:
33
+ name: BLUEX (No Images)
34
+ type: eduagarcia-temp/BLUEX_without_images
35
+ split: train
36
+ args:
37
+ num_few_shot: 3
38
+ metrics:
39
+ - type: acc
40
+ value: 22.11
41
+ name: accuracy
42
+ source:
43
+ url: https://huggingface.co/spaces/eduagarcia/open_pt_llm_leaderboard?query=Bruno/Caramelinho
44
+ name: Open Portuguese LLM Leaderboard
45
+ - task:
46
+ type: text-generation
47
+ name: Text Generation
48
+ dataset:
49
+ name: OAB Exams
50
+ type: eduagarcia/oab_exams
51
+ split: train
52
+ args:
53
+ num_few_shot: 3
54
+ metrics:
55
+ - type: acc
56
+ value: 25.15
57
+ name: accuracy
58
+ source:
59
+ url: https://huggingface.co/spaces/eduagarcia/open_pt_llm_leaderboard?query=Bruno/Caramelinho
60
+ name: Open Portuguese LLM Leaderboard
61
+ - task:
62
+ type: text-generation
63
+ name: Text Generation
64
+ dataset:
65
+ name: Assin2 RTE
66
+ type: assin2
67
+ split: test
68
+ args:
69
+ num_few_shot: 15
70
+ metrics:
71
+ - type: f1_macro
72
+ value: 48.97
73
+ name: f1-macro
74
+ - type: pearson
75
+ value: 19.38
76
+ name: pearson
77
+ source:
78
+ url: https://huggingface.co/spaces/eduagarcia/open_pt_llm_leaderboard?query=Bruno/Caramelinho
79
+ name: Open Portuguese LLM Leaderboard
80
+ - task:
81
+ type: text-generation
82
+ name: Text Generation
83
+ dataset:
84
+ name: FaQuAD NLI
85
+ type: ruanchaves/faquad-nli
86
+ split: test
87
+ args:
88
+ num_few_shot: 15
89
+ metrics:
90
+ - type: f1_macro
91
+ value: 43.92
92
+ name: f1-macro
93
+ source:
94
+ url: https://huggingface.co/spaces/eduagarcia/open_pt_llm_leaderboard?query=Bruno/Caramelinho
95
+ name: Open Portuguese LLM Leaderboard
96
+ - task:
97
+ type: text-generation
98
+ name: Text Generation
99
+ dataset:
100
+ name: HateBR Binary
101
+ type: eduagarcia/portuguese_benchmark
102
+ split: test
103
+ args:
104
+ num_few_shot: 25
105
+ metrics:
106
+ - type: f1_macro
107
+ value: 33.97
108
+ name: f1-macro
109
+ - type: f1_macro
110
+ value: 46.57
111
+ name: f1-macro
112
+ source:
113
+ url: https://huggingface.co/spaces/eduagarcia/open_pt_llm_leaderboard?query=Bruno/Caramelinho
114
+ name: Open Portuguese LLM Leaderboard
115
+ - task:
116
+ type: text-generation
117
+ name: Text Generation
118
+ dataset:
119
+ name: tweetSentBR
120
+ type: eduagarcia-temp/tweetsentbr
121
+ split: test
122
+ args:
123
+ num_few_shot: 25
124
+ metrics:
125
+ - type: f1_macro
126
+ value: 56.31
127
+ name: f1-macro
128
+ source:
129
+ url: https://huggingface.co/spaces/eduagarcia/open_pt_llm_leaderboard?query=Bruno/Caramelinho
130
+ name: Open Portuguese LLM Leaderboard
131
  ---
132
  <!-- header start -->
133
  <div style="width: 100%;">
 
242
  - Pytorch 2.0.1+cu118
243
  - Datasets 2.12.0
244
  - Tokenizers 0.13.3
245
+
246
+ # [Open Portuguese LLM Leaderboard Evaluation Results](https://huggingface.co/spaces/eduagarcia/open_pt_llm_leaderboard)
247
+ Detailed results can be found [here](https://huggingface.co/datasets/eduagarcia-temp/llm_pt_leaderboard_raw_results/tree/main/Bruno/Caramelinho)
248
+
249
+ | Metric | Value |
250
+ |--------------------------|---------|
251
+ |Average |**35.32**|
252
+ |ENEM Challenge (No Images)| 21.48|
253
+ |BLUEX (No Images) | 22.11|
254
+ |OAB Exams | 25.15|
255
+ |Assin2 RTE | 48.97|
256
+ |Assin2 STS | 19.38|
257
+ |FaQuAD NLI | 43.92|
258
+ |HateBR Binary | 33.97|
259
+ |PT Hate Speech Binary | 46.57|
260
+ |tweetSentBR | 56.31|
261
+