Commit
db885c8
1 Parent(s): 5679cd8

Adding the Open Portuguese LLM Leaderboard Evaluation Results (#5)

Browse files

- Adding the Open Portuguese LLM Leaderboard Evaluation Results (8c082fc653c6b8c99c848aa93663a8141b7ebca8)


Co-authored-by: Open PT LLM Leaderboard PR Bot <leaderboard-pt-pr-bot@users.noreply.huggingface.co>

Files changed (1) hide show
  1. README.md +138 -0
README.md CHANGED
@@ -1,6 +1,127 @@
1
  ---
2
  language:
3
  - pt
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
4
  ---
5
 
6
  Sabiá-7B is Portuguese language model developed by [Maritaca AI](https://www.maritaca.ai/).
@@ -117,3 +238,20 @@ Please use the following bibtex to cite our paper:
117
  isbn="978-3-031-45392-2"
118
  }
119
  ```
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
  ---
2
  language:
3
  - pt
4
+ model-index:
5
+ - name: sabia-7b
6
+ results:
7
+ - task:
8
+ type: text-generation
9
+ name: Text Generation
10
+ dataset:
11
+ name: ENEM Challenge (No Images)
12
+ type: eduagarcia/enem_challenge
13
+ split: train
14
+ args:
15
+ num_few_shot: 3
16
+ metrics:
17
+ - type: acc
18
+ value: 55.07
19
+ name: accuracy
20
+ source:
21
+ url: https://huggingface.co/spaces/eduagarcia/open_pt_llm_leaderboard?query=maritaca-ai/sabia-7b
22
+ name: Open Portuguese LLM Leaderboard
23
+ - task:
24
+ type: text-generation
25
+ name: Text Generation
26
+ dataset:
27
+ name: BLUEX (No Images)
28
+ type: eduagarcia-temp/BLUEX_without_images
29
+ split: train
30
+ args:
31
+ num_few_shot: 3
32
+ metrics:
33
+ - type: acc
34
+ value: 47.71
35
+ name: accuracy
36
+ source:
37
+ url: https://huggingface.co/spaces/eduagarcia/open_pt_llm_leaderboard?query=maritaca-ai/sabia-7b
38
+ name: Open Portuguese LLM Leaderboard
39
+ - task:
40
+ type: text-generation
41
+ name: Text Generation
42
+ dataset:
43
+ name: OAB Exams
44
+ type: eduagarcia/oab_exams
45
+ split: train
46
+ args:
47
+ num_few_shot: 3
48
+ metrics:
49
+ - type: acc
50
+ value: 41.41
51
+ name: accuracy
52
+ source:
53
+ url: https://huggingface.co/spaces/eduagarcia/open_pt_llm_leaderboard?query=maritaca-ai/sabia-7b
54
+ name: Open Portuguese LLM Leaderboard
55
+ - task:
56
+ type: text-generation
57
+ name: Text Generation
58
+ dataset:
59
+ name: Assin2 RTE
60
+ type: assin2
61
+ split: test
62
+ args:
63
+ num_few_shot: 15
64
+ metrics:
65
+ - type: f1_macro
66
+ value: 46.68
67
+ name: f1-macro
68
+ - type: pearson
69
+ value: 1.89
70
+ name: pearson
71
+ source:
72
+ url: https://huggingface.co/spaces/eduagarcia/open_pt_llm_leaderboard?query=maritaca-ai/sabia-7b
73
+ name: Open Portuguese LLM Leaderboard
74
+ - task:
75
+ type: text-generation
76
+ name: Text Generation
77
+ dataset:
78
+ name: FaQuAD NLI
79
+ type: ruanchaves/faquad-nli
80
+ split: test
81
+ args:
82
+ num_few_shot: 15
83
+ metrics:
84
+ - type: f1_macro
85
+ value: 58.34
86
+ name: f1-macro
87
+ source:
88
+ url: https://huggingface.co/spaces/eduagarcia/open_pt_llm_leaderboard?query=maritaca-ai/sabia-7b
89
+ name: Open Portuguese LLM Leaderboard
90
+ - task:
91
+ type: text-generation
92
+ name: Text Generation
93
+ dataset:
94
+ name: HateBR Binary
95
+ type: eduagarcia/portuguese_benchmark
96
+ split: test
97
+ args:
98
+ num_few_shot: 25
99
+ metrics:
100
+ - type: f1_macro
101
+ value: 61.93
102
+ name: f1-macro
103
+ - type: f1_macro
104
+ value: 64.13
105
+ name: f1-macro
106
+ source:
107
+ url: https://huggingface.co/spaces/eduagarcia/open_pt_llm_leaderboard?query=maritaca-ai/sabia-7b
108
+ name: Open Portuguese LLM Leaderboard
109
+ - task:
110
+ type: text-generation
111
+ name: Text Generation
112
+ dataset:
113
+ name: tweetSentBR
114
+ type: eduagarcia-temp/tweetsentbr
115
+ split: test
116
+ args:
117
+ num_few_shot: 25
118
+ metrics:
119
+ - type: f1_macro
120
+ value: 46.64
121
+ name: f1-macro
122
+ source:
123
+ url: https://huggingface.co/spaces/eduagarcia/open_pt_llm_leaderboard?query=maritaca-ai/sabia-7b
124
+ name: Open Portuguese LLM Leaderboard
125
  ---
126
 
127
  Sabiá-7B is Portuguese language model developed by [Maritaca AI](https://www.maritaca.ai/).
 
238
  isbn="978-3-031-45392-2"
239
  }
240
  ```
241
+
242
+ # [Open Portuguese LLM Leaderboard Evaluation Results](https://huggingface.co/spaces/eduagarcia/open_pt_llm_leaderboard)
243
+ Detailed results can be found [here](https://huggingface.co/datasets/eduagarcia-temp/llm_pt_leaderboard_raw_results/tree/main/maritaca-ai/sabia-7b)
244
+
245
+ | Metric | Value |
246
+ |--------------------------|---------|
247
+ |Average |**47.09**|
248
+ |ENEM Challenge (No Images)| 55.07|
249
+ |BLUEX (No Images) | 47.71|
250
+ |OAB Exams | 41.41|
251
+ |Assin2 RTE | 46.68|
252
+ |Assin2 STS | 1.89|
253
+ |FaQuAD NLI | 58.34|
254
+ |HateBR Binary | 61.93|
255
+ |PT Hate Speech Binary | 64.13|
256
+ |tweetSentBR | 46.64|
257
+