Adding the Open Portuguese LLM Leaderboard Evaluation Results

#4
Files changed (1) hide show
  1. README.md +138 -0
README.md CHANGED
@@ -1,3 +1,141 @@
1
  ---
2
  license: llama2
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
3
  ---
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
  ---
2
  license: llama2
3
+ model-index:
4
+ - name: cabrita_7b_pt_850000
5
+ results:
6
+ - task:
7
+ type: text-generation
8
+ name: Text Generation
9
+ dataset:
10
+ name: ENEM Challenge (No Images)
11
+ type: eduagarcia/enem_challenge
12
+ split: train
13
+ args:
14
+ num_few_shot: 3
15
+ metrics:
16
+ - type: acc
17
+ value: 22.53
18
+ name: accuracy
19
+ source:
20
+ url: https://huggingface.co/spaces/eduagarcia/open_pt_llm_leaderboard?query=22h/cabrita_7b_pt_850000
21
+ name: Open Portuguese LLM Leaderboard
22
+ - task:
23
+ type: text-generation
24
+ name: Text Generation
25
+ dataset:
26
+ name: BLUEX (No Images)
27
+ type: eduagarcia-temp/BLUEX_without_images
28
+ split: train
29
+ args:
30
+ num_few_shot: 3
31
+ metrics:
32
+ - type: acc
33
+ value: 23.09
34
+ name: accuracy
35
+ source:
36
+ url: https://huggingface.co/spaces/eduagarcia/open_pt_llm_leaderboard?query=22h/cabrita_7b_pt_850000
37
+ name: Open Portuguese LLM Leaderboard
38
+ - task:
39
+ type: text-generation
40
+ name: Text Generation
41
+ dataset:
42
+ name: OAB Exams
43
+ type: eduagarcia/oab_exams
44
+ split: train
45
+ args:
46
+ num_few_shot: 3
47
+ metrics:
48
+ - type: acc
49
+ value: 29.2
50
+ name: accuracy
51
+ source:
52
+ url: https://huggingface.co/spaces/eduagarcia/open_pt_llm_leaderboard?query=22h/cabrita_7b_pt_850000
53
+ name: Open Portuguese LLM Leaderboard
54
+ - task:
55
+ type: text-generation
56
+ name: Text Generation
57
+ dataset:
58
+ name: Assin2 RTE
59
+ type: assin2
60
+ split: test
61
+ args:
62
+ num_few_shot: 15
63
+ metrics:
64
+ - type: f1_macro
65
+ value: 33.33
66
+ name: f1-macro
67
+ - type: pearson
68
+ value: 12.65
69
+ name: pearson
70
+ source:
71
+ url: https://huggingface.co/spaces/eduagarcia/open_pt_llm_leaderboard?query=22h/cabrita_7b_pt_850000
72
+ name: Open Portuguese LLM Leaderboard
73
+ - task:
74
+ type: text-generation
75
+ name: Text Generation
76
+ dataset:
77
+ name: FaQuAD NLI
78
+ type: ruanchaves/faquad-nli
79
+ split: test
80
+ args:
81
+ num_few_shot: 15
82
+ metrics:
83
+ - type: f1_macro
84
+ value: 17.72
85
+ name: f1-macro
86
+ source:
87
+ url: https://huggingface.co/spaces/eduagarcia/open_pt_llm_leaderboard?query=22h/cabrita_7b_pt_850000
88
+ name: Open Portuguese LLM Leaderboard
89
+ - task:
90
+ type: text-generation
91
+ name: Text Generation
92
+ dataset:
93
+ name: HateBR Binary
94
+ type: eduagarcia/portuguese_benchmark
95
+ split: test
96
+ args:
97
+ num_few_shot: 25
98
+ metrics:
99
+ - type: f1_macro
100
+ value: 55.98
101
+ name: f1-macro
102
+ - type: f1_macro
103
+ value: 49.02
104
+ name: f1-macro
105
+ source:
106
+ url: https://huggingface.co/spaces/eduagarcia/open_pt_llm_leaderboard?query=22h/cabrita_7b_pt_850000
107
+ name: Open Portuguese LLM Leaderboard
108
+ - task:
109
+ type: text-generation
110
+ name: Text Generation
111
+ dataset:
112
+ name: tweetSentBR
113
+ type: eduagarcia-temp/tweetsentbr
114
+ split: test
115
+ args:
116
+ num_few_shot: 25
117
+ metrics:
118
+ - type: f1_macro
119
+ value: 45.75
120
+ name: f1-macro
121
+ source:
122
+ url: https://huggingface.co/spaces/eduagarcia/open_pt_llm_leaderboard?query=22h/cabrita_7b_pt_850000
123
+ name: Open Portuguese LLM Leaderboard
124
  ---
125
+
126
+ # [Open Portuguese LLM Leaderboard Evaluation Results](https://huggingface.co/spaces/eduagarcia/open_pt_llm_leaderboard)
127
+ Detailed results can be found [here](https://huggingface.co/datasets/eduagarcia-temp/llm_pt_leaderboard_raw_results/tree/main/22h/cabrita_7b_pt_850000)
128
+
129
+ | Metric | Value |
130
+ |--------------------------|---------|
131
+ |Average |**32.14**|
132
+ |ENEM Challenge (No Images)| 22.53|
133
+ |BLUEX (No Images) | 23.09|
134
+ |OAB Exams | 29.20|
135
+ |Assin2 RTE | 33.33|
136
+ |Assin2 STS | 12.65|
137
+ |FaQuAD NLI | 17.72|
138
+ |HateBR Binary | 55.98|
139
+ |PT Hate Speech Binary | 49.02|
140
+ |tweetSentBR | 45.75|
141
+