leaderboard-pt-pr-bot commited on
Commit
0938d09
1 Parent(s): 54078ab

Adding the Open Portuguese LLM Leaderboard Evaluation Results

Browse files

This is an automated PR created with https://huggingface.co/spaces/eduagarcia-temp/portuguese-leaderboard-results-to-modelcard

The purpose of this PR is to add evaluation results from the Open Portuguese LLM Leaderboard to your model card.

If you encounter any issues, please report them to https://huggingface.co/spaces/eduagarcia-temp/portuguese-leaderboard-results-to-modelcard/discussions

Files changed (1) hide show
  1. README.md +140 -3
README.md CHANGED
@@ -1,9 +1,130 @@
1
  ---
2
- datasets:
3
- - cnmoro/WizardVicuna-PTBR-Instruct-Clean
4
  language:
5
  - en
6
  - pt
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
7
  ---
8
 
9
  This is a finetuned version of [mistralai/Mistral-7B-Instruct-v0.2](https://huggingface.co/mistralai/Mistral-7B-Instruct-v0.2) using [unsloth](https://github.com/unslothai/unsloth) on a instruct portuguese dataset, as an attempt to improve the performance of the model on the language.
@@ -14,4 +135,20 @@ The original prompt format was used:
14
 
15
  ```plaintext
16
  <s>[INST] {Prompt goes here} [/INST]
17
- ```
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
  ---
 
 
2
  language:
3
  - en
4
  - pt
5
+ datasets:
6
+ - cnmoro/WizardVicuna-PTBR-Instruct-Clean
7
+ model-index:
8
+ - name: Mistral-7B-Portuguese
9
+ results:
10
+ - task:
11
+ type: text-generation
12
+ name: Text Generation
13
+ dataset:
14
+ name: ENEM Challenge (No Images)
15
+ type: eduagarcia/enem_challenge
16
+ split: train
17
+ args:
18
+ num_few_shot: 3
19
+ metrics:
20
+ - type: acc
21
+ value: 58.08
22
+ name: accuracy
23
+ source:
24
+ url: https://huggingface.co/spaces/eduagarcia/open_pt_llm_leaderboard?query=cnmoro/Mistral-7B-Portuguese
25
+ name: Open Portuguese LLM Leaderboard
26
+ - task:
27
+ type: text-generation
28
+ name: Text Generation
29
+ dataset:
30
+ name: BLUEX (No Images)
31
+ type: eduagarcia-temp/BLUEX_without_images
32
+ split: train
33
+ args:
34
+ num_few_shot: 3
35
+ metrics:
36
+ - type: acc
37
+ value: 48.68
38
+ name: accuracy
39
+ source:
40
+ url: https://huggingface.co/spaces/eduagarcia/open_pt_llm_leaderboard?query=cnmoro/Mistral-7B-Portuguese
41
+ name: Open Portuguese LLM Leaderboard
42
+ - task:
43
+ type: text-generation
44
+ name: Text Generation
45
+ dataset:
46
+ name: OAB Exams
47
+ type: eduagarcia/oab_exams
48
+ split: train
49
+ args:
50
+ num_few_shot: 3
51
+ metrics:
52
+ - type: acc
53
+ value: 37.08
54
+ name: accuracy
55
+ source:
56
+ url: https://huggingface.co/spaces/eduagarcia/open_pt_llm_leaderboard?query=cnmoro/Mistral-7B-Portuguese
57
+ name: Open Portuguese LLM Leaderboard
58
+ - task:
59
+ type: text-generation
60
+ name: Text Generation
61
+ dataset:
62
+ name: Assin2 RTE
63
+ type: assin2
64
+ split: test
65
+ args:
66
+ num_few_shot: 15
67
+ metrics:
68
+ - type: f1_macro
69
+ value: 90.31
70
+ name: f1-macro
71
+ - type: pearson
72
+ value: 76.55
73
+ name: pearson
74
+ source:
75
+ url: https://huggingface.co/spaces/eduagarcia/open_pt_llm_leaderboard?query=cnmoro/Mistral-7B-Portuguese
76
+ name: Open Portuguese LLM Leaderboard
77
+ - task:
78
+ type: text-generation
79
+ name: Text Generation
80
+ dataset:
81
+ name: FaQuAD NLI
82
+ type: ruanchaves/faquad-nli
83
+ split: test
84
+ args:
85
+ num_few_shot: 15
86
+ metrics:
87
+ - type: f1_macro
88
+ value: 58.84
89
+ name: f1-macro
90
+ source:
91
+ url: https://huggingface.co/spaces/eduagarcia/open_pt_llm_leaderboard?query=cnmoro/Mistral-7B-Portuguese
92
+ name: Open Portuguese LLM Leaderboard
93
+ - task:
94
+ type: text-generation
95
+ name: Text Generation
96
+ dataset:
97
+ name: HateBR Binary
98
+ type: eduagarcia/portuguese_benchmark
99
+ split: test
100
+ args:
101
+ num_few_shot: 25
102
+ metrics:
103
+ - type: f1_macro
104
+ value: 79.21
105
+ name: f1-macro
106
+ - type: f1_macro
107
+ value: 68.87
108
+ name: f1-macro
109
+ source:
110
+ url: https://huggingface.co/spaces/eduagarcia/open_pt_llm_leaderboard?query=cnmoro/Mistral-7B-Portuguese
111
+ name: Open Portuguese LLM Leaderboard
112
+ - task:
113
+ type: text-generation
114
+ name: Text Generation
115
+ dataset:
116
+ name: tweetSentBR
117
+ type: eduagarcia-temp/tweetsentbr
118
+ split: test
119
+ args:
120
+ num_few_shot: 25
121
+ metrics:
122
+ - type: f1_macro
123
+ value: 64.71
124
+ name: f1-macro
125
+ source:
126
+ url: https://huggingface.co/spaces/eduagarcia/open_pt_llm_leaderboard?query=cnmoro/Mistral-7B-Portuguese
127
+ name: Open Portuguese LLM Leaderboard
128
  ---
129
 
130
  This is a finetuned version of [mistralai/Mistral-7B-Instruct-v0.2](https://huggingface.co/mistralai/Mistral-7B-Instruct-v0.2) using [unsloth](https://github.com/unslothai/unsloth) on a instruct portuguese dataset, as an attempt to improve the performance of the model on the language.
 
135
 
136
  ```plaintext
137
  <s>[INST] {Prompt goes here} [/INST]
138
+ ```
139
+ # [Open Portuguese LLM Leaderboard Evaluation Results](https://huggingface.co/spaces/eduagarcia/open_pt_llm_leaderboard)
140
+ Detailed results can be found [here](https://huggingface.co/datasets/eduagarcia-temp/llm_pt_leaderboard_raw_results/tree/main/cnmoro/Mistral-7B-Portuguese)
141
+
142
+ | Metric | Value |
143
+ |--------------------------|--------|
144
+ |Average |**64.7**|
145
+ |ENEM Challenge (No Images)| 58.08|
146
+ |BLUEX (No Images) | 48.68|
147
+ |OAB Exams | 37.08|
148
+ |Assin2 RTE | 90.31|
149
+ |Assin2 STS | 76.55|
150
+ |FaQuAD NLI | 58.84|
151
+ |HateBR Binary | 79.21|
152
+ |PT Hate Speech Binary | 68.87|
153
+ |tweetSentBR | 64.71|
154
+