leaderboard-pt-pr-bot commited on
Commit
a7a9e15
1 Parent(s): afb5654

Adding the Open Portuguese LLM Leaderboard Evaluation Results

Browse files

This is an automated PR created with https://huggingface.co/spaces/eduagarcia-temp/portuguese-leaderboard-results-to-modelcard

The purpose of this PR is to add evaluation results from the Open Portuguese LLM Leaderboard to your model card.

If you encounter any issues, please report them to https://huggingface.co/spaces/eduagarcia-temp/portuguese-leaderboard-results-to-modelcard/discussions

Files changed (1) hide show
  1. README.md +164 -1
README.md CHANGED
@@ -1,5 +1,152 @@
1
  ---
2
  license: mit
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
3
  ---
4
  # Model Card for Model ID
5
 
@@ -194,4 +341,20 @@ Carbon emissions can be estimated using the [Machine Learning Impact calculator]
194
 
195
  ## Model Card Contact
196
 
197
- [More Information Needed]
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
  ---
2
  license: mit
3
+ model-index:
4
+ - name: mistral-bode
5
+ results:
6
+ - task:
7
+ type: text-generation
8
+ name: Text Generation
9
+ dataset:
10
+ name: ENEM Challenge (No Images)
11
+ type: eduagarcia/enem_challenge
12
+ split: train
13
+ args:
14
+ num_few_shot: 3
15
+ metrics:
16
+ - type: acc
17
+ value: 47.03
18
+ name: accuracy
19
+ source:
20
+ url: https://huggingface.co/spaces/eduagarcia/open_pt_llm_leaderboard?query=recogna-nlp/mistral-bode
21
+ name: Open Portuguese LLM Leaderboard
22
+ - task:
23
+ type: text-generation
24
+ name: Text Generation
25
+ dataset:
26
+ name: BLUEX (No Images)
27
+ type: eduagarcia-temp/BLUEX_without_images
28
+ split: train
29
+ args:
30
+ num_few_shot: 3
31
+ metrics:
32
+ - type: acc
33
+ value: 39.78
34
+ name: accuracy
35
+ source:
36
+ url: https://huggingface.co/spaces/eduagarcia/open_pt_llm_leaderboard?query=recogna-nlp/mistral-bode
37
+ name: Open Portuguese LLM Leaderboard
38
+ - task:
39
+ type: text-generation
40
+ name: Text Generation
41
+ dataset:
42
+ name: OAB Exams
43
+ type: eduagarcia/oab_exams
44
+ split: train
45
+ args:
46
+ num_few_shot: 3
47
+ metrics:
48
+ - type: acc
49
+ value: 33.76
50
+ name: accuracy
51
+ source:
52
+ url: https://huggingface.co/spaces/eduagarcia/open_pt_llm_leaderboard?query=recogna-nlp/mistral-bode
53
+ name: Open Portuguese LLM Leaderboard
54
+ - task:
55
+ type: text-generation
56
+ name: Text Generation
57
+ dataset:
58
+ name: Assin2 RTE
59
+ type: assin2
60
+ split: test
61
+ args:
62
+ num_few_shot: 15
63
+ metrics:
64
+ - type: f1_macro
65
+ value: 85.66
66
+ name: f1-macro
67
+ source:
68
+ url: https://huggingface.co/spaces/eduagarcia/open_pt_llm_leaderboard?query=recogna-nlp/mistral-bode
69
+ name: Open Portuguese LLM Leaderboard
70
+ - task:
71
+ type: text-generation
72
+ name: Text Generation
73
+ dataset:
74
+ name: Assin2 STS
75
+ type: eduagarcia/portuguese_benchmark
76
+ split: test
77
+ args:
78
+ num_few_shot: 15
79
+ metrics:
80
+ - type: pearson
81
+ value: 62.15
82
+ name: pearson
83
+ source:
84
+ url: https://huggingface.co/spaces/eduagarcia/open_pt_llm_leaderboard?query=recogna-nlp/mistral-bode
85
+ name: Open Portuguese LLM Leaderboard
86
+ - task:
87
+ type: text-generation
88
+ name: Text Generation
89
+ dataset:
90
+ name: FaQuAD NLI
91
+ type: ruanchaves/faquad-nli
92
+ split: test
93
+ args:
94
+ num_few_shot: 15
95
+ metrics:
96
+ - type: f1_macro
97
+ value: 56.45
98
+ name: f1-macro
99
+ source:
100
+ url: https://huggingface.co/spaces/eduagarcia/open_pt_llm_leaderboard?query=recogna-nlp/mistral-bode
101
+ name: Open Portuguese LLM Leaderboard
102
+ - task:
103
+ type: text-generation
104
+ name: Text Generation
105
+ dataset:
106
+ name: HateBR Binary
107
+ type: ruanchaves/hatebr
108
+ split: test
109
+ args:
110
+ num_few_shot: 25
111
+ metrics:
112
+ - type: f1_macro
113
+ value: 73.25
114
+ name: f1-macro
115
+ source:
116
+ url: https://huggingface.co/spaces/eduagarcia/open_pt_llm_leaderboard?query=recogna-nlp/mistral-bode
117
+ name: Open Portuguese LLM Leaderboard
118
+ - task:
119
+ type: text-generation
120
+ name: Text Generation
121
+ dataset:
122
+ name: PT Hate Speech Binary
123
+ type: hate_speech_portuguese
124
+ split: test
125
+ args:
126
+ num_few_shot: 25
127
+ metrics:
128
+ - type: f1_macro
129
+ value: 63.61
130
+ name: f1-macro
131
+ source:
132
+ url: https://huggingface.co/spaces/eduagarcia/open_pt_llm_leaderboard?query=recogna-nlp/mistral-bode
133
+ name: Open Portuguese LLM Leaderboard
134
+ - task:
135
+ type: text-generation
136
+ name: Text Generation
137
+ dataset:
138
+ name: tweetSentBR
139
+ type: eduagarcia-temp/tweetsentbr
140
+ split: test
141
+ args:
142
+ num_few_shot: 25
143
+ metrics:
144
+ - type: f1_macro
145
+ value: 53.17
146
+ name: f1-macro
147
+ source:
148
+ url: https://huggingface.co/spaces/eduagarcia/open_pt_llm_leaderboard?query=recogna-nlp/mistral-bode
149
+ name: Open Portuguese LLM Leaderboard
150
  ---
151
  # Model Card for Model ID
152
 
 
341
 
342
  ## Model Card Contact
343
 
344
+ [More Information Needed]
345
+ # [Open Portuguese LLM Leaderboard Evaluation Results](https://huggingface.co/spaces/eduagarcia/open_pt_llm_leaderboard)
346
+ Detailed results can be found [here](https://huggingface.co/datasets/eduagarcia-temp/llm_pt_leaderboard_raw_results/tree/main/recogna-nlp/mistral-bode)
347
+
348
+ | Metric | Value |
349
+ |--------------------------|---------|
350
+ |Average |**57.21**|
351
+ |ENEM Challenge (No Images)| 47.03|
352
+ |BLUEX (No Images) | 39.78|
353
+ |OAB Exams | 33.76|
354
+ |Assin2 RTE | 85.66|
355
+ |Assin2 STS | 62.15|
356
+ |FaQuAD NLI | 56.45|
357
+ |HateBR Binary | 73.25|
358
+ |PT Hate Speech Binary | 63.61|
359
+ |tweetSentBR | 53.17|
360
+