eduagarcia commited on
Commit
486fcf5
1 Parent(s): 9d153fa

Adding the Open Portuguese LLM Leaderboard Evaluation Results

Browse files

This is an automated PR created with https://huggingface.co/spaces/eduagarcia-temp/portuguese-leaderboard-results-to-modelcard

The purpose of this PR is to add evaluation results from the Open Portuguese LLM Leaderboard to your model card.

If you encounter any issues, please report them to https://huggingface.co/spaces/eduagarcia-temp/portuguese-leaderboard-results-to-modelcard/discussions

Files changed (1) hide show
  1. README.md +142 -5
README.md CHANGED
@@ -1,13 +1,133 @@
1
  ---
2
-
3
- datasets:
4
- - dominguesm/Canarim-Instruct-PTBR-Dataset
5
- library_name: adapter-transformers
6
- pipeline_tag: text-generation
7
  language:
8
  - pt
9
  - en
 
 
 
 
10
  thumbnail: https://blog.cobasi.com.br/wp-content/uploads/2022/08/AdobeStock_461738919.webp
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
11
  ---
12
  <!-- header start -->
13
  <div style="width: 100%;">
@@ -122,3 +242,20 @@ Os computadores quânticos são um tipo de computador cuja arquitetura é basead
122
  - Pytorch 2.0.1+cu118
123
  - Datasets 2.12.0
124
  - Tokenizers 0.13.3
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
  ---
 
 
 
 
 
2
  language:
3
  - pt
4
  - en
5
+ library_name: adapter-transformers
6
+ datasets:
7
+ - dominguesm/Canarim-Instruct-PTBR-Dataset
8
+ pipeline_tag: text-generation
9
  thumbnail: https://blog.cobasi.com.br/wp-content/uploads/2022/08/AdobeStock_461738919.webp
10
+ model-index:
11
+ - name: Caramelinho
12
+ results:
13
+ - task:
14
+ type: text-generation
15
+ name: Text Generation
16
+ dataset:
17
+ name: ENEM Challenge (No Images)
18
+ type: eduagarcia/enem_challenge
19
+ split: train
20
+ args:
21
+ num_few_shot: 3
22
+ metrics:
23
+ - type: acc
24
+ value: 21.48
25
+ name: accuracy
26
+ source:
27
+ url: https://huggingface.co/spaces/eduagarcia/open_pt_llm_leaderboard?query=Bruno/Caramelinho
28
+ name: Open Portuguese LLM Leaderboard
29
+ - task:
30
+ type: text-generation
31
+ name: Text Generation
32
+ dataset:
33
+ name: BLUEX (No Images)
34
+ type: eduagarcia-temp/BLUEX_without_images
35
+ split: train
36
+ args:
37
+ num_few_shot: 3
38
+ metrics:
39
+ - type: acc
40
+ value: 22.11
41
+ name: accuracy
42
+ source:
43
+ url: https://huggingface.co/spaces/eduagarcia/open_pt_llm_leaderboard?query=Bruno/Caramelinho
44
+ name: Open Portuguese LLM Leaderboard
45
+ - task:
46
+ type: text-generation
47
+ name: Text Generation
48
+ dataset:
49
+ name: OAB Exams
50
+ type: eduagarcia/oab_exams
51
+ split: train
52
+ args:
53
+ num_few_shot: 3
54
+ metrics:
55
+ - type: acc
56
+ value: 25.15
57
+ name: accuracy
58
+ source:
59
+ url: https://huggingface.co/spaces/eduagarcia/open_pt_llm_leaderboard?query=Bruno/Caramelinho
60
+ name: Open Portuguese LLM Leaderboard
61
+ - task:
62
+ type: text-generation
63
+ name: Text Generation
64
+ dataset:
65
+ name: Assin2 RTE
66
+ type: assin2
67
+ split: test
68
+ args:
69
+ num_few_shot: 15
70
+ metrics:
71
+ - type: f1_macro
72
+ value: 48.97
73
+ name: f1-macro
74
+ - type: pearson
75
+ value: 19.38
76
+ name: pearson
77
+ source:
78
+ url: https://huggingface.co/spaces/eduagarcia/open_pt_llm_leaderboard?query=Bruno/Caramelinho
79
+ name: Open Portuguese LLM Leaderboard
80
+ - task:
81
+ type: text-generation
82
+ name: Text Generation
83
+ dataset:
84
+ name: FaQuAD NLI
85
+ type: ruanchaves/faquad-nli
86
+ split: test
87
+ args:
88
+ num_few_shot: 15
89
+ metrics:
90
+ - type: f1_macro
91
+ value: 43.92
92
+ name: f1-macro
93
+ source:
94
+ url: https://huggingface.co/spaces/eduagarcia/open_pt_llm_leaderboard?query=Bruno/Caramelinho
95
+ name: Open Portuguese LLM Leaderboard
96
+ - task:
97
+ type: text-generation
98
+ name: Text Generation
99
+ dataset:
100
+ name: HateBR Binary
101
+ type: eduagarcia/portuguese_benchmark
102
+ split: test
103
+ args:
104
+ num_few_shot: 25
105
+ metrics:
106
+ - type: f1_macro
107
+ value: 33.97
108
+ name: f1-macro
109
+ - type: f1_macro
110
+ value: 46.57
111
+ name: f1-macro
112
+ source:
113
+ url: https://huggingface.co/spaces/eduagarcia/open_pt_llm_leaderboard?query=Bruno/Caramelinho
114
+ name: Open Portuguese LLM Leaderboard
115
+ - task:
116
+ type: text-generation
117
+ name: Text Generation
118
+ dataset:
119
+ name: tweetSentBR
120
+ type: eduagarcia-temp/tweetsentbr
121
+ split: test
122
+ args:
123
+ num_few_shot: 25
124
+ metrics:
125
+ - type: f1_macro
126
+ value: 56.31
127
+ name: f1-macro
128
+ source:
129
+ url: https://huggingface.co/spaces/eduagarcia/open_pt_llm_leaderboard?query=Bruno/Caramelinho
130
+ name: Open Portuguese LLM Leaderboard
131
  ---
132
  <!-- header start -->
133
  <div style="width: 100%;">
 
242
  - Pytorch 2.0.1+cu118
243
  - Datasets 2.12.0
244
  - Tokenizers 0.13.3
245
+
246
+ # [Open Portuguese LLM Leaderboard Evaluation Results](https://huggingface.co/spaces/eduagarcia/open_pt_llm_leaderboard)
247
+ Detailed results can be found [here](https://huggingface.co/datasets/eduagarcia-temp/llm_pt_leaderboard_raw_results/tree/main/Bruno/Caramelinho)
248
+
249
+ | Metric | Value |
250
+ |--------------------------|---------|
251
+ |Average |**35.32**|
252
+ |ENEM Challenge (No Images)| 21.48|
253
+ |BLUEX (No Images) | 22.11|
254
+ |OAB Exams | 25.15|
255
+ |Assin2 RTE | 48.97|
256
+ |Assin2 STS | 19.38|
257
+ |FaQuAD NLI | 43.92|
258
+ |HateBR Binary | 33.97|
259
+ |PT Hate Speech Binary | 46.57|
260
+ |tweetSentBR | 56.31|
261
+