Adding the Open Portuguese LLM Leaderboard Evaluation Results

#1
Files changed (1) hide show
  1. README.md +167 -1
README.md CHANGED
@@ -1,6 +1,153 @@
1
  ---
2
  library_name: transformers
3
  tags: []
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
4
  ---
5
 
6
  # Model Card for Model ID
@@ -196,4 +343,23 @@ Carbon emissions can be estimated using the [Machine Learning Impact calculator]
196
 
197
  ## Model Card Contact
198
 
199
- [More Information Needed]
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
  ---
2
  library_name: transformers
3
  tags: []
4
+ model-index:
5
+ - name: portuguese-Phi3-Tom-Cat-128k-instruct
6
+ results:
7
+ - task:
8
+ type: text-generation
9
+ name: Text Generation
10
+ dataset:
11
+ name: ENEM Challenge (No Images)
12
+ type: eduagarcia/enem_challenge
13
+ split: train
14
+ args:
15
+ num_few_shot: 3
16
+ metrics:
17
+ - type: acc
18
+ value: 51.15
19
+ name: accuracy
20
+ source:
21
+ url: https://huggingface.co/spaces/eduagarcia/open_pt_llm_leaderboard?query=rhaymison/portuguese-Phi3-Tom-Cat-128k-instruct
22
+ name: Open Portuguese LLM Leaderboard
23
+ - task:
24
+ type: text-generation
25
+ name: Text Generation
26
+ dataset:
27
+ name: BLUEX (No Images)
28
+ type: eduagarcia-temp/BLUEX_without_images
29
+ split: train
30
+ args:
31
+ num_few_shot: 3
32
+ metrics:
33
+ - type: acc
34
+ value: 42.56
35
+ name: accuracy
36
+ source:
37
+ url: https://huggingface.co/spaces/eduagarcia/open_pt_llm_leaderboard?query=rhaymison/portuguese-Phi3-Tom-Cat-128k-instruct
38
+ name: Open Portuguese LLM Leaderboard
39
+ - task:
40
+ type: text-generation
41
+ name: Text Generation
42
+ dataset:
43
+ name: OAB Exams
44
+ type: eduagarcia/oab_exams
45
+ split: train
46
+ args:
47
+ num_few_shot: 3
48
+ metrics:
49
+ - type: acc
50
+ value: 39.86
51
+ name: accuracy
52
+ source:
53
+ url: https://huggingface.co/spaces/eduagarcia/open_pt_llm_leaderboard?query=rhaymison/portuguese-Phi3-Tom-Cat-128k-instruct
54
+ name: Open Portuguese LLM Leaderboard
55
+ - task:
56
+ type: text-generation
57
+ name: Text Generation
58
+ dataset:
59
+ name: Assin2 RTE
60
+ type: assin2
61
+ split: test
62
+ args:
63
+ num_few_shot: 15
64
+ metrics:
65
+ - type: f1_macro
66
+ value: 88.86
67
+ name: f1-macro
68
+ source:
69
+ url: https://huggingface.co/spaces/eduagarcia/open_pt_llm_leaderboard?query=rhaymison/portuguese-Phi3-Tom-Cat-128k-instruct
70
+ name: Open Portuguese LLM Leaderboard
71
+ - task:
72
+ type: text-generation
73
+ name: Text Generation
74
+ dataset:
75
+ name: Assin2 STS
76
+ type: eduagarcia/portuguese_benchmark
77
+ split: test
78
+ args:
79
+ num_few_shot: 15
80
+ metrics:
81
+ - type: pearson
82
+ value: 68.0
83
+ name: pearson
84
+ source:
85
+ url: https://huggingface.co/spaces/eduagarcia/open_pt_llm_leaderboard?query=rhaymison/portuguese-Phi3-Tom-Cat-128k-instruct
86
+ name: Open Portuguese LLM Leaderboard
87
+ - task:
88
+ type: text-generation
89
+ name: Text Generation
90
+ dataset:
91
+ name: FaQuAD NLI
92
+ type: ruanchaves/faquad-nli
93
+ split: test
94
+ args:
95
+ num_few_shot: 15
96
+ metrics:
97
+ - type: f1_macro
98
+ value: 45.16
99
+ name: f1-macro
100
+ source:
101
+ url: https://huggingface.co/spaces/eduagarcia/open_pt_llm_leaderboard?query=rhaymison/portuguese-Phi3-Tom-Cat-128k-instruct
102
+ name: Open Portuguese LLM Leaderboard
103
+ - task:
104
+ type: text-generation
105
+ name: Text Generation
106
+ dataset:
107
+ name: HateBR Binary
108
+ type: ruanchaves/hatebr
109
+ split: test
110
+ args:
111
+ num_few_shot: 25
112
+ metrics:
113
+ - type: f1_macro
114
+ value: 85.92
115
+ name: f1-macro
116
+ source:
117
+ url: https://huggingface.co/spaces/eduagarcia/open_pt_llm_leaderboard?query=rhaymison/portuguese-Phi3-Tom-Cat-128k-instruct
118
+ name: Open Portuguese LLM Leaderboard
119
+ - task:
120
+ type: text-generation
121
+ name: Text Generation
122
+ dataset:
123
+ name: PT Hate Speech Binary
124
+ type: hate_speech_portuguese
125
+ split: test
126
+ args:
127
+ num_few_shot: 25
128
+ metrics:
129
+ - type: f1_macro
130
+ value: 65.76
131
+ name: f1-macro
132
+ source:
133
+ url: https://huggingface.co/spaces/eduagarcia/open_pt_llm_leaderboard?query=rhaymison/portuguese-Phi3-Tom-Cat-128k-instruct
134
+ name: Open Portuguese LLM Leaderboard
135
+ - task:
136
+ type: text-generation
137
+ name: Text Generation
138
+ dataset:
139
+ name: tweetSentBR
140
+ type: eduagarcia/tweetsentbr_fewshot
141
+ split: test
142
+ args:
143
+ num_few_shot: 25
144
+ metrics:
145
+ - type: f1_macro
146
+ value: 53.32
147
+ name: f1-macro
148
+ source:
149
+ url: https://huggingface.co/spaces/eduagarcia/open_pt_llm_leaderboard?query=rhaymison/portuguese-Phi3-Tom-Cat-128k-instruct
150
+ name: Open Portuguese LLM Leaderboard
151
  ---
152
 
153
  # Model Card for Model ID
 
343
 
344
  ## Model Card Contact
345
 
346
+ [More Information Needed]
347
+
348
+
349
+ # Open Portuguese LLM Leaderboard Evaluation Results
350
+
351
+ Detailed results can be found [here](https://huggingface.co/datasets/eduagarcia-temp/llm_pt_leaderboard_raw_results/tree/main/rhaymison/portuguese-Phi3-Tom-Cat-128k-instruct) and on the [🚀 Open Portuguese LLM Leaderboard](https://huggingface.co/spaces/eduagarcia/open_pt_llm_leaderboard)
352
+
353
+ | Metric | Value |
354
+ |--------------------------|---------|
355
+ |Average |**60.06**|
356
+ |ENEM Challenge (No Images)| 51.15|
357
+ |BLUEX (No Images) | 42.56|
358
+ |OAB Exams | 39.86|
359
+ |Assin2 RTE | 88.86|
360
+ |Assin2 STS | 68|
361
+ |FaQuAD NLI | 45.16|
362
+ |HateBR Binary | 85.92|
363
+ |PT Hate Speech Binary | 65.76|
364
+ |tweetSentBR | 53.32|
365
+