leaderboard-pt-pr-bot commited on
Commit
cd4ad4d
1 Parent(s): fa8e4eb

Adding the Open Portuguese LLM Leaderboard Evaluation Results

Browse files

This is an automated PR created with https://huggingface.co/spaces/eduagarcia-temp/portuguese-leaderboard-results-to-modelcard

The purpose of this PR is to add evaluation results from the Open Portuguese LLM Leaderboard to your model card.

If you encounter any issues, please report them to https://huggingface.co/spaces/eduagarcia-temp/portuguese-leaderboard-results-to-modelcard/discussions

Files changed (1) hide show
  1. README.md +150 -13
README.md CHANGED
@@ -1,31 +1,31 @@
1
  ---
2
- license: apache-2.0
3
- datasets:
4
- - nicholasKluge/instruct-aira-dataset-v2
5
  language:
6
  - pt
7
- metrics:
8
- - accuracy
9
  library_name: transformers
10
- pipeline_tag: text-generation
11
  tags:
12
  - alignment
13
  - instruction tuned
14
  - text generation
15
  - conversation
16
  - assistant
 
 
 
 
 
17
  widget:
18
- - text: "<s><instruction>Cite algumas bandas de rock famosas da década de 1960.</instruction>"
19
  example_title: Exemplo
20
- - text: "<s><instruction>Quantos planetas existem no sistema solar?</instruction>"
21
  example_title: Exemplo
22
- - text: "<s><instruction>Qual é o futuro do ser humano?</instruction>"
23
  example_title: Exemplo
24
- - text: "<s><instruction>Qual o sentido da vida?</instruction>"
25
  example_title: Exemplo
26
- - text: "<s><instruction>Como imprimir hello world em python?</instruction>"
27
  example_title: Exemplo
28
- - text: "<s><instruction>Invente uma história sobre um encanador com poderes mágicos.</instruction>"
29
  example_title: Exemplo
30
  inference:
31
  parameters:
@@ -42,6 +42,127 @@ co2_eq_emissions:
42
  training_type: fine-tuning
43
  geographical_location: United States of America
44
  hardware_used: NVIDIA A100-SXM4-40GB
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
45
  ---
46
  # TeenyTinyLlama-460m-Chat
47
 
@@ -233,4 +354,20 @@ This repository was built as part of the RAIES ([Rede de Inteligência Artificia
233
 
234
  ## License
235
 
236
- TeenyTinyLlama-460m-Chat is licensed under the Apache License, Version 2.0. See the [LICENSE](LICENSE) file for more details.
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
  ---
 
 
 
2
  language:
3
  - pt
4
+ license: apache-2.0
 
5
  library_name: transformers
 
6
  tags:
7
  - alignment
8
  - instruction tuned
9
  - text generation
10
  - conversation
11
  - assistant
12
+ datasets:
13
+ - nicholasKluge/instruct-aira-dataset-v2
14
+ metrics:
15
+ - accuracy
16
+ pipeline_tag: text-generation
17
  widget:
18
+ - text: <s><instruction>Cite algumas bandas de rock famosas da década de 1960.</instruction>
19
  example_title: Exemplo
20
+ - text: <s><instruction>Quantos planetas existem no sistema solar?</instruction>
21
  example_title: Exemplo
22
+ - text: <s><instruction>Qual é o futuro do ser humano?</instruction>
23
  example_title: Exemplo
24
+ - text: <s><instruction>Qual o sentido da vida?</instruction>
25
  example_title: Exemplo
26
+ - text: <s><instruction>Como imprimir hello world em python?</instruction>
27
  example_title: Exemplo
28
+ - text: <s><instruction>Invente uma história sobre um encanador com poderes mágicos.</instruction>
29
  example_title: Exemplo
30
  inference:
31
  parameters:
 
42
  training_type: fine-tuning
43
  geographical_location: United States of America
44
  hardware_used: NVIDIA A100-SXM4-40GB
45
+ model-index:
46
+ - name: TeenyTinyLlama-460m-Chat
47
+ results:
48
+ - task:
49
+ type: text-generation
50
+ name: Text Generation
51
+ dataset:
52
+ name: ENEM Challenge (No Images)
53
+ type: eduagarcia/enem_challenge
54
+ split: train
55
+ args:
56
+ num_few_shot: 3
57
+ metrics:
58
+ - type: acc
59
+ value: 20.29
60
+ name: accuracy
61
+ source:
62
+ url: https://huggingface.co/spaces/eduagarcia/open_pt_llm_leaderboard?query=nicholasKluge/TeenyTinyLlama-460m-Chat
63
+ name: Open Portuguese LLM Leaderboard
64
+ - task:
65
+ type: text-generation
66
+ name: Text Generation
67
+ dataset:
68
+ name: BLUEX (No Images)
69
+ type: eduagarcia-temp/BLUEX_without_images
70
+ split: train
71
+ args:
72
+ num_few_shot: 3
73
+ metrics:
74
+ - type: acc
75
+ value: 25.45
76
+ name: accuracy
77
+ source:
78
+ url: https://huggingface.co/spaces/eduagarcia/open_pt_llm_leaderboard?query=nicholasKluge/TeenyTinyLlama-460m-Chat
79
+ name: Open Portuguese LLM Leaderboard
80
+ - task:
81
+ type: text-generation
82
+ name: Text Generation
83
+ dataset:
84
+ name: OAB Exams
85
+ type: eduagarcia/oab_exams
86
+ split: train
87
+ args:
88
+ num_few_shot: 3
89
+ metrics:
90
+ - type: acc
91
+ value: 26.74
92
+ name: accuracy
93
+ source:
94
+ url: https://huggingface.co/spaces/eduagarcia/open_pt_llm_leaderboard?query=nicholasKluge/TeenyTinyLlama-460m-Chat
95
+ name: Open Portuguese LLM Leaderboard
96
+ - task:
97
+ type: text-generation
98
+ name: Text Generation
99
+ dataset:
100
+ name: Assin2 RTE
101
+ type: assin2
102
+ split: test
103
+ args:
104
+ num_few_shot: 15
105
+ metrics:
106
+ - type: f1_macro
107
+ value: 43.77
108
+ name: f1-macro
109
+ - type: pearson
110
+ value: 4.52
111
+ name: pearson
112
+ source:
113
+ url: https://huggingface.co/spaces/eduagarcia/open_pt_llm_leaderboard?query=nicholasKluge/TeenyTinyLlama-460m-Chat
114
+ name: Open Portuguese LLM Leaderboard
115
+ - task:
116
+ type: text-generation
117
+ name: Text Generation
118
+ dataset:
119
+ name: FaQuAD NLI
120
+ type: ruanchaves/faquad-nli
121
+ split: test
122
+ args:
123
+ num_few_shot: 15
124
+ metrics:
125
+ - type: f1_macro
126
+ value: 34.0
127
+ name: f1-macro
128
+ source:
129
+ url: https://huggingface.co/spaces/eduagarcia/open_pt_llm_leaderboard?query=nicholasKluge/TeenyTinyLlama-460m-Chat
130
+ name: Open Portuguese LLM Leaderboard
131
+ - task:
132
+ type: text-generation
133
+ name: Text Generation
134
+ dataset:
135
+ name: HateBR Binary
136
+ type: eduagarcia/portuguese_benchmark
137
+ split: test
138
+ args:
139
+ num_few_shot: 25
140
+ metrics:
141
+ - type: f1_macro
142
+ value: 33.49
143
+ name: f1-macro
144
+ - type: f1_macro
145
+ value: 22.99
146
+ name: f1-macro
147
+ source:
148
+ url: https://huggingface.co/spaces/eduagarcia/open_pt_llm_leaderboard?query=nicholasKluge/TeenyTinyLlama-460m-Chat
149
+ name: Open Portuguese LLM Leaderboard
150
+ - task:
151
+ type: text-generation
152
+ name: Text Generation
153
+ dataset:
154
+ name: tweetSentBR
155
+ type: eduagarcia-temp/tweetsentbr
156
+ split: test
157
+ args:
158
+ num_few_shot: 25
159
+ metrics:
160
+ - type: f1_macro
161
+ value: 18.13
162
+ name: f1-macro
163
+ source:
164
+ url: https://huggingface.co/spaces/eduagarcia/open_pt_llm_leaderboard?query=nicholasKluge/TeenyTinyLlama-460m-Chat
165
+ name: Open Portuguese LLM Leaderboard
166
  ---
167
  # TeenyTinyLlama-460m-Chat
168
 
 
354
 
355
  ## License
356
 
357
+ TeenyTinyLlama-460m-Chat is licensed under the Apache License, Version 2.0. See the [LICENSE](LICENSE) file for more details.
358
+ # [Open Portuguese LLM Leaderboard Evaluation Results](https://huggingface.co/spaces/eduagarcia/open_pt_llm_leaderboard)
359
+ Detailed results can be found [here](https://huggingface.co/datasets/eduagarcia-temp/llm_pt_leaderboard_raw_results/tree/main/nicholasKluge/TeenyTinyLlama-460m-Chat)
360
+
361
+ | Metric | Value |
362
+ |--------------------------|---------|
363
+ |Average |**25.49**|
364
+ |ENEM Challenge (No Images)| 20.29|
365
+ |BLUEX (No Images) | 25.45|
366
+ |OAB Exams | 26.74|
367
+ |Assin2 RTE | 43.77|
368
+ |Assin2 STS | 4.52|
369
+ |FaQuAD NLI | 34|
370
+ |HateBR Binary | 33.49|
371
+ |PT Hate Speech Binary | 22.99|
372
+ |tweetSentBR | 18.13|
373
+