Adding the Open Portuguese LLM Leaderboard Evaluation Results

#1
Files changed (1) hide show
  1. README.md +176 -13
README.md CHANGED
@@ -1,31 +1,31 @@
1
  ---
2
- license: apache-2.0
3
- datasets:
4
- - nicholasKluge/instruct-aira-dataset-v2
5
  language:
6
  - pt
7
- metrics:
8
- - accuracy
9
  library_name: transformers
10
- pipeline_tag: text-generation
11
  tags:
12
  - alignment
13
  - instruction tuned
14
  - text generation
15
  - conversation
16
  - assistant
 
 
 
 
 
17
  widget:
18
- - text: "<s><instruction>Cite algumas bandas de rock famosas da década de 1960.</instruction>"
19
  example_title: Exemplo
20
- - text: "<s><instruction>Quantos planetas existem no sistema solar?</instruction>"
21
  example_title: Exemplo
22
- - text: "<s><instruction>Qual é o futuro do ser humano?</instruction>"
23
  example_title: Exemplo
24
- - text: "<s><instruction>Qual o sentido da vida?</instruction>"
25
  example_title: Exemplo
26
- - text: "<s><instruction>Como imprimir hello world em python?</instruction>"
27
  example_title: Exemplo
28
- - text: "<s><instruction>Invente uma história sobre um encanador com poderes mágicos.</instruction>"
29
  example_title: Exemplo
30
  inference:
31
  parameters:
@@ -42,6 +42,153 @@ co2_eq_emissions:
42
  training_type: fine-tuning
43
  geographical_location: United States of America
44
  hardware_used: NVIDIA A100-SXM4-40GB
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
45
  ---
46
  # TeenyTinyLlama-460m-Chat
47
 
@@ -233,4 +380,20 @@ This repository was built as part of the RAIES ([Rede de Inteligência Artificia
233
 
234
  ## License
235
 
236
- TeenyTinyLlama-460m-Chat is licensed under the Apache License, Version 2.0. See the [LICENSE](LICENSE) file for more details.
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
  ---
 
 
 
2
  language:
3
  - pt
4
+ license: apache-2.0
 
5
  library_name: transformers
 
6
  tags:
7
  - alignment
8
  - instruction tuned
9
  - text generation
10
  - conversation
11
  - assistant
12
+ datasets:
13
+ - nicholasKluge/instruct-aira-dataset-v2
14
+ metrics:
15
+ - accuracy
16
+ pipeline_tag: text-generation
17
  widget:
18
+ - text: <s><instruction>Cite algumas bandas de rock famosas da década de 1960.</instruction>
19
  example_title: Exemplo
20
+ - text: <s><instruction>Quantos planetas existem no sistema solar?</instruction>
21
  example_title: Exemplo
22
+ - text: <s><instruction>Qual é o futuro do ser humano?</instruction>
23
  example_title: Exemplo
24
+ - text: <s><instruction>Qual o sentido da vida?</instruction>
25
  example_title: Exemplo
26
+ - text: <s><instruction>Como imprimir hello world em python?</instruction>
27
  example_title: Exemplo
28
+ - text: <s><instruction>Invente uma história sobre um encanador com poderes mágicos.</instruction>
29
  example_title: Exemplo
30
  inference:
31
  parameters:
 
42
  training_type: fine-tuning
43
  geographical_location: United States of America
44
  hardware_used: NVIDIA A100-SXM4-40GB
45
+ model-index:
46
+ - name: TeenyTinyLlama-460m-Chat
47
+ results:
48
+ - task:
49
+ type: text-generation
50
+ name: Text Generation
51
+ dataset:
52
+ name: ENEM Challenge (No Images)
53
+ type: eduagarcia/enem_challenge
54
+ split: train
55
+ args:
56
+ num_few_shot: 3
57
+ metrics:
58
+ - type: acc
59
+ value: 20.29
60
+ name: accuracy
61
+ source:
62
+ url: https://huggingface.co/spaces/eduagarcia/open_pt_llm_leaderboard?query=nicholasKluge/TeenyTinyLlama-460m-Chat
63
+ name: Open Portuguese LLM Leaderboard
64
+ - task:
65
+ type: text-generation
66
+ name: Text Generation
67
+ dataset:
68
+ name: BLUEX (No Images)
69
+ type: eduagarcia-temp/BLUEX_without_images
70
+ split: train
71
+ args:
72
+ num_few_shot: 3
73
+ metrics:
74
+ - type: acc
75
+ value: 25.45
76
+ name: accuracy
77
+ source:
78
+ url: https://huggingface.co/spaces/eduagarcia/open_pt_llm_leaderboard?query=nicholasKluge/TeenyTinyLlama-460m-Chat
79
+ name: Open Portuguese LLM Leaderboard
80
+ - task:
81
+ type: text-generation
82
+ name: Text Generation
83
+ dataset:
84
+ name: OAB Exams
85
+ type: eduagarcia/oab_exams
86
+ split: train
87
+ args:
88
+ num_few_shot: 3
89
+ metrics:
90
+ - type: acc
91
+ value: 26.74
92
+ name: accuracy
93
+ source:
94
+ url: https://huggingface.co/spaces/eduagarcia/open_pt_llm_leaderboard?query=nicholasKluge/TeenyTinyLlama-460m-Chat
95
+ name: Open Portuguese LLM Leaderboard
96
+ - task:
97
+ type: text-generation
98
+ name: Text Generation
99
+ dataset:
100
+ name: Assin2 RTE
101
+ type: assin2
102
+ split: test
103
+ args:
104
+ num_few_shot: 15
105
+ metrics:
106
+ - type: f1_macro
107
+ value: 43.77
108
+ name: f1-macro
109
+ source:
110
+ url: https://huggingface.co/spaces/eduagarcia/open_pt_llm_leaderboard?query=nicholasKluge/TeenyTinyLlama-460m-Chat
111
+ name: Open Portuguese LLM Leaderboard
112
+ - task:
113
+ type: text-generation
114
+ name: Text Generation
115
+ dataset:
116
+ name: Assin2 STS
117
+ type: eduagarcia/portuguese_benchmark
118
+ split: test
119
+ args:
120
+ num_few_shot: 15
121
+ metrics:
122
+ - type: pearson
123
+ value: 4.52
124
+ name: pearson
125
+ source:
126
+ url: https://huggingface.co/spaces/eduagarcia/open_pt_llm_leaderboard?query=nicholasKluge/TeenyTinyLlama-460m-Chat
127
+ name: Open Portuguese LLM Leaderboard
128
+ - task:
129
+ type: text-generation
130
+ name: Text Generation
131
+ dataset:
132
+ name: FaQuAD NLI
133
+ type: ruanchaves/faquad-nli
134
+ split: test
135
+ args:
136
+ num_few_shot: 15
137
+ metrics:
138
+ - type: f1_macro
139
+ value: 34.0
140
+ name: f1-macro
141
+ source:
142
+ url: https://huggingface.co/spaces/eduagarcia/open_pt_llm_leaderboard?query=nicholasKluge/TeenyTinyLlama-460m-Chat
143
+ name: Open Portuguese LLM Leaderboard
144
+ - task:
145
+ type: text-generation
146
+ name: Text Generation
147
+ dataset:
148
+ name: HateBR Binary
149
+ type: ruanchaves/hatebr
150
+ split: test
151
+ args:
152
+ num_few_shot: 25
153
+ metrics:
154
+ - type: f1_macro
155
+ value: 33.49
156
+ name: f1-macro
157
+ source:
158
+ url: https://huggingface.co/spaces/eduagarcia/open_pt_llm_leaderboard?query=nicholasKluge/TeenyTinyLlama-460m-Chat
159
+ name: Open Portuguese LLM Leaderboard
160
+ - task:
161
+ type: text-generation
162
+ name: Text Generation
163
+ dataset:
164
+ name: PT Hate Speech Binary
165
+ type: hate_speech_portuguese
166
+ split: test
167
+ args:
168
+ num_few_shot: 25
169
+ metrics:
170
+ - type: f1_macro
171
+ value: 22.99
172
+ name: f1-macro
173
+ source:
174
+ url: https://huggingface.co/spaces/eduagarcia/open_pt_llm_leaderboard?query=nicholasKluge/TeenyTinyLlama-460m-Chat
175
+ name: Open Portuguese LLM Leaderboard
176
+ - task:
177
+ type: text-generation
178
+ name: Text Generation
179
+ dataset:
180
+ name: tweetSentBR
181
+ type: eduagarcia-temp/tweetsentbr
182
+ split: test
183
+ args:
184
+ num_few_shot: 25
185
+ metrics:
186
+ - type: f1_macro
187
+ value: 18.13
188
+ name: f1-macro
189
+ source:
190
+ url: https://huggingface.co/spaces/eduagarcia/open_pt_llm_leaderboard?query=nicholasKluge/TeenyTinyLlama-460m-Chat
191
+ name: Open Portuguese LLM Leaderboard
192
  ---
193
  # TeenyTinyLlama-460m-Chat
194
 
 
380
 
381
  ## License
382
 
383
+ TeenyTinyLlama-460m-Chat is licensed under the Apache License, Version 2.0. See the [LICENSE](LICENSE) file for more details.
384
+ # [Open Portuguese LLM Leaderboard Evaluation Results](https://huggingface.co/spaces/eduagarcia/open_pt_llm_leaderboard)
385
+ Detailed results can be found [here](https://huggingface.co/datasets/eduagarcia-temp/llm_pt_leaderboard_raw_results/tree/main/nicholasKluge/TeenyTinyLlama-460m-Chat)
386
+
387
+ | Metric | Value |
388
+ |--------------------------|---------|
389
+ |Average |**25.49**|
390
+ |ENEM Challenge (No Images)| 20.29|
391
+ |BLUEX (No Images) | 25.45|
392
+ |OAB Exams | 26.74|
393
+ |Assin2 RTE | 43.77|
394
+ |Assin2 STS | 4.52|
395
+ |FaQuAD NLI | 34|
396
+ |HateBR Binary | 33.49|
397
+ |PT Hate Speech Binary | 22.99|
398
+ |tweetSentBR | 18.13|
399
+