Text Generation
Transformers
Safetensors
llama
conversational
text-generation-inference
Inference Endpoints

Adding the Open Portuguese LLM Leaderboard Evaluation Results

#12
Files changed (1) hide show
  1. README.md +168 -2
README.md CHANGED
@@ -1,10 +1,157 @@
1
  ---
2
- library_name: transformers
3
  license: llama3
 
4
  datasets:
5
  - aqua_rat
6
  - microsoft/orca-math-word-problems-200k
7
  - m-a-p/CodeFeedback-Filtered-Instruction
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
8
  ---
9
 
10
  # Smaug-Llama-3-70B-Instruct
@@ -148,4 +295,23 @@ The score for both Llama-3 and this model are significantly different when evalu
148
  with the updated harness as the issue with stop words has been addressed.
149
 
150
 
151
- This version of Smaug uses new techniques and new data compared to [Smaug-72B](https://huggingface.co/abacusai/Smaug-72B-v0.1), and more information will be released later on. For now, see the previous Smaug paper: https://arxiv.org/abs/2402.13228.
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
  ---
 
2
  license: llama3
3
+ library_name: transformers
4
  datasets:
5
  - aqua_rat
6
  - microsoft/orca-math-word-problems-200k
7
  - m-a-p/CodeFeedback-Filtered-Instruction
8
+ model-index:
9
+ - name: Smaug-Llama-3-70B-Instruct
10
+ results:
11
+ - task:
12
+ type: text-generation
13
+ name: Text Generation
14
+ dataset:
15
+ name: ENEM Challenge (No Images)
16
+ type: eduagarcia/enem_challenge
17
+ split: train
18
+ args:
19
+ num_few_shot: 3
20
+ metrics:
21
+ - type: acc
22
+ value: 77.89
23
+ name: accuracy
24
+ source:
25
+ url: https://huggingface.co/spaces/eduagarcia/open_pt_llm_leaderboard?query=abacusai/Smaug-Llama-3-70B-Instruct
26
+ name: Open Portuguese LLM Leaderboard
27
+ - task:
28
+ type: text-generation
29
+ name: Text Generation
30
+ dataset:
31
+ name: BLUEX (No Images)
32
+ type: eduagarcia-temp/BLUEX_without_images
33
+ split: train
34
+ args:
35
+ num_few_shot: 3
36
+ metrics:
37
+ - type: acc
38
+ value: 69.54
39
+ name: accuracy
40
+ source:
41
+ url: https://huggingface.co/spaces/eduagarcia/open_pt_llm_leaderboard?query=abacusai/Smaug-Llama-3-70B-Instruct
42
+ name: Open Portuguese LLM Leaderboard
43
+ - task:
44
+ type: text-generation
45
+ name: Text Generation
46
+ dataset:
47
+ name: OAB Exams
48
+ type: eduagarcia/oab_exams
49
+ split: train
50
+ args:
51
+ num_few_shot: 3
52
+ metrics:
53
+ - type: acc
54
+ value: 63.64
55
+ name: accuracy
56
+ source:
57
+ url: https://huggingface.co/spaces/eduagarcia/open_pt_llm_leaderboard?query=abacusai/Smaug-Llama-3-70B-Instruct
58
+ name: Open Portuguese LLM Leaderboard
59
+ - task:
60
+ type: text-generation
61
+ name: Text Generation
62
+ dataset:
63
+ name: Assin2 RTE
64
+ type: assin2
65
+ split: test
66
+ args:
67
+ num_few_shot: 15
68
+ metrics:
69
+ - type: f1_macro
70
+ value: 93.62
71
+ name: f1-macro
72
+ source:
73
+ url: https://huggingface.co/spaces/eduagarcia/open_pt_llm_leaderboard?query=abacusai/Smaug-Llama-3-70B-Instruct
74
+ name: Open Portuguese LLM Leaderboard
75
+ - task:
76
+ type: text-generation
77
+ name: Text Generation
78
+ dataset:
79
+ name: Assin2 STS
80
+ type: eduagarcia/portuguese_benchmark
81
+ split: test
82
+ args:
83
+ num_few_shot: 15
84
+ metrics:
85
+ - type: pearson
86
+ value: 78.52
87
+ name: pearson
88
+ source:
89
+ url: https://huggingface.co/spaces/eduagarcia/open_pt_llm_leaderboard?query=abacusai/Smaug-Llama-3-70B-Instruct
90
+ name: Open Portuguese LLM Leaderboard
91
+ - task:
92
+ type: text-generation
93
+ name: Text Generation
94
+ dataset:
95
+ name: FaQuAD NLI
96
+ type: ruanchaves/faquad-nli
97
+ split: test
98
+ args:
99
+ num_few_shot: 15
100
+ metrics:
101
+ - type: f1_macro
102
+ value: 80.01
103
+ name: f1-macro
104
+ source:
105
+ url: https://huggingface.co/spaces/eduagarcia/open_pt_llm_leaderboard?query=abacusai/Smaug-Llama-3-70B-Instruct
106
+ name: Open Portuguese LLM Leaderboard
107
+ - task:
108
+ type: text-generation
109
+ name: Text Generation
110
+ dataset:
111
+ name: HateBR Binary
112
+ type: ruanchaves/hatebr
113
+ split: test
114
+ args:
115
+ num_few_shot: 25
116
+ metrics:
117
+ - type: f1_macro
118
+ value: 91.78
119
+ name: f1-macro
120
+ source:
121
+ url: https://huggingface.co/spaces/eduagarcia/open_pt_llm_leaderboard?query=abacusai/Smaug-Llama-3-70B-Instruct
122
+ name: Open Portuguese LLM Leaderboard
123
+ - task:
124
+ type: text-generation
125
+ name: Text Generation
126
+ dataset:
127
+ name: PT Hate Speech Binary
128
+ type: hate_speech_portuguese
129
+ split: test
130
+ args:
131
+ num_few_shot: 25
132
+ metrics:
133
+ - type: f1_macro
134
+ value: 68.36
135
+ name: f1-macro
136
+ source:
137
+ url: https://huggingface.co/spaces/eduagarcia/open_pt_llm_leaderboard?query=abacusai/Smaug-Llama-3-70B-Instruct
138
+ name: Open Portuguese LLM Leaderboard
139
+ - task:
140
+ type: text-generation
141
+ name: Text Generation
142
+ dataset:
143
+ name: tweetSentBR
144
+ type: eduagarcia/tweetsentbr_fewshot
145
+ split: test
146
+ args:
147
+ num_few_shot: 25
148
+ metrics:
149
+ - type: f1_macro
150
+ value: 70.29
151
+ name: f1-macro
152
+ source:
153
+ url: https://huggingface.co/spaces/eduagarcia/open_pt_llm_leaderboard?query=abacusai/Smaug-Llama-3-70B-Instruct
154
+ name: Open Portuguese LLM Leaderboard
155
  ---
156
 
157
  # Smaug-Llama-3-70B-Instruct
 
295
  with the updated harness as the issue with stop words has been addressed.
296
 
297
 
298
+ This version of Smaug uses new techniques and new data compared to [Smaug-72B](https://huggingface.co/abacusai/Smaug-72B-v0.1), and more information will be released later on. For now, see the previous Smaug paper: https://arxiv.org/abs/2402.13228.
299
+
300
+
301
+ # Open Portuguese LLM Leaderboard Evaluation Results
302
+
303
+ Detailed results can be found [here](https://huggingface.co/datasets/eduagarcia-temp/llm_pt_leaderboard_raw_results/tree/main/abacusai/Smaug-Llama-3-70B-Instruct) and on the [πŸš€ Open Portuguese LLM Leaderboard](https://huggingface.co/spaces/eduagarcia/open_pt_llm_leaderboard)
304
+
305
+ | Metric | Value |
306
+ |--------------------------|---------|
307
+ |Average |**77.07**|
308
+ |ENEM Challenge (No Images)| 77.89|
309
+ |BLUEX (No Images) | 69.54|
310
+ |OAB Exams | 63.64|
311
+ |Assin2 RTE | 93.62|
312
+ |Assin2 STS | 78.52|
313
+ |FaQuAD NLI | 80.01|
314
+ |HateBR Binary | 91.78|
315
+ |PT Hate Speech Binary | 68.36|
316
+ |tweetSentBR | 70.29|
317
+