leaderboard-pt-pr-bot commited on
Commit
6c29478
1 Parent(s): a5390a4

Adding the Open Portuguese LLM Leaderboard Evaluation Results

Browse files

This is an automated PR created with https://huggingface.co/spaces/eduagarcia-temp/portuguese-leaderboard-results-to-modelcard

The purpose of this PR is to add evaluation results from the Open Portuguese LLM Leaderboard to your model card.

If you encounter any issues, please report them to https://huggingface.co/spaces/eduagarcia-temp/portuguese-leaderboard-results-to-modelcard/discussions

Files changed (1) hide show
  1. README.md +142 -5
README.md CHANGED
@@ -1,4 +1,8 @@
1
  ---
 
 
 
 
2
  library_name: peft
3
  tags:
4
  - Phi-2B
@@ -6,16 +10,133 @@ tags:
6
  - Bode
7
  - LLM
8
  - Alpaca
9
- license: mit
10
- language:
11
- - pt
12
- - en
13
  metrics:
14
  - accuracy
15
  - f1
16
  - precision
17
  - recall
18
  pipeline_tag: text-generation
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
19
  ---
20
 
21
  # Phi-Bode
@@ -110,4 +231,20 @@ Se você deseja utilizar o Phi-Bode em sua pesquisa, cite-o da seguinte maneira:
110
  doi = { 10.57967/hf/1880 },
111
  publisher = { Hugging Face }
112
  }
113
- ```
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
  ---
2
+ language:
3
+ - pt
4
+ - en
5
+ license: mit
6
  library_name: peft
7
  tags:
8
  - Phi-2B
 
10
  - Bode
11
  - LLM
12
  - Alpaca
 
 
 
 
13
  metrics:
14
  - accuracy
15
  - f1
16
  - precision
17
  - recall
18
  pipeline_tag: text-generation
19
+ model-index:
20
+ - name: Phi-Bode
21
+ results:
22
+ - task:
23
+ type: text-generation
24
+ name: Text Generation
25
+ dataset:
26
+ name: ENEM Challenge (No Images)
27
+ type: eduagarcia/enem_challenge
28
+ split: train
29
+ args:
30
+ num_few_shot: 3
31
+ metrics:
32
+ - type: acc
33
+ value: 33.94
34
+ name: accuracy
35
+ source:
36
+ url: https://huggingface.co/spaces/eduagarcia/open_pt_llm_leaderboard?query=recogna-nlp/Phi-Bode
37
+ name: Open Portuguese LLM Leaderboard
38
+ - task:
39
+ type: text-generation
40
+ name: Text Generation
41
+ dataset:
42
+ name: BLUEX (No Images)
43
+ type: eduagarcia-temp/BLUEX_without_images
44
+ split: train
45
+ args:
46
+ num_few_shot: 3
47
+ metrics:
48
+ - type: acc
49
+ value: 25.31
50
+ name: accuracy
51
+ source:
52
+ url: https://huggingface.co/spaces/eduagarcia/open_pt_llm_leaderboard?query=recogna-nlp/Phi-Bode
53
+ name: Open Portuguese LLM Leaderboard
54
+ - task:
55
+ type: text-generation
56
+ name: Text Generation
57
+ dataset:
58
+ name: OAB Exams
59
+ type: eduagarcia/oab_exams
60
+ split: train
61
+ args:
62
+ num_few_shot: 3
63
+ metrics:
64
+ - type: acc
65
+ value: 28.56
66
+ name: accuracy
67
+ source:
68
+ url: https://huggingface.co/spaces/eduagarcia/open_pt_llm_leaderboard?query=recogna-nlp/Phi-Bode
69
+ name: Open Portuguese LLM Leaderboard
70
+ - task:
71
+ type: text-generation
72
+ name: Text Generation
73
+ dataset:
74
+ name: Assin2 RTE
75
+ type: assin2
76
+ split: test
77
+ args:
78
+ num_few_shot: 15
79
+ metrics:
80
+ - type: f1_macro
81
+ value: 68.1
82
+ name: f1-macro
83
+ - type: pearson
84
+ value: 30.57
85
+ name: pearson
86
+ source:
87
+ url: https://huggingface.co/spaces/eduagarcia/open_pt_llm_leaderboard?query=recogna-nlp/Phi-Bode
88
+ name: Open Portuguese LLM Leaderboard
89
+ - task:
90
+ type: text-generation
91
+ name: Text Generation
92
+ dataset:
93
+ name: FaQuAD NLI
94
+ type: ruanchaves/faquad-nli
95
+ split: test
96
+ args:
97
+ num_few_shot: 15
98
+ metrics:
99
+ - type: f1_macro
100
+ value: 43.97
101
+ name: f1-macro
102
+ source:
103
+ url: https://huggingface.co/spaces/eduagarcia/open_pt_llm_leaderboard?query=recogna-nlp/Phi-Bode
104
+ name: Open Portuguese LLM Leaderboard
105
+ - task:
106
+ type: text-generation
107
+ name: Text Generation
108
+ dataset:
109
+ name: HateBR Binary
110
+ type: eduagarcia/portuguese_benchmark
111
+ split: test
112
+ args:
113
+ num_few_shot: 25
114
+ metrics:
115
+ - type: f1_macro
116
+ value: 60.51
117
+ name: f1-macro
118
+ - type: f1_macro
119
+ value: 54.6
120
+ name: f1-macro
121
+ source:
122
+ url: https://huggingface.co/spaces/eduagarcia/open_pt_llm_leaderboard?query=recogna-nlp/Phi-Bode
123
+ name: Open Portuguese LLM Leaderboard
124
+ - task:
125
+ type: text-generation
126
+ name: Text Generation
127
+ dataset:
128
+ name: tweetSentBR
129
+ type: eduagarcia-temp/tweetsentbr
130
+ split: test
131
+ args:
132
+ num_few_shot: 25
133
+ metrics:
134
+ - type: f1_macro
135
+ value: 46.78
136
+ name: f1-macro
137
+ source:
138
+ url: https://huggingface.co/spaces/eduagarcia/open_pt_llm_leaderboard?query=recogna-nlp/Phi-Bode
139
+ name: Open Portuguese LLM Leaderboard
140
  ---
141
 
142
  # Phi-Bode
 
231
  doi = { 10.57967/hf/1880 },
232
  publisher = { Hugging Face }
233
  }
234
+ ```
235
+ # [Open Portuguese LLM Leaderboard Evaluation Results](https://huggingface.co/spaces/eduagarcia/open_pt_llm_leaderboard)
236
+ Detailed results can be found [here](https://huggingface.co/datasets/eduagarcia-temp/llm_pt_leaderboard_raw_results/tree/main/recogna-nlp/Phi-Bode)
237
+
238
+ | Metric | Value |
239
+ |--------------------------|---------|
240
+ |Average |**43.59**|
241
+ |ENEM Challenge (No Images)| 33.94|
242
+ |BLUEX (No Images) | 25.31|
243
+ |OAB Exams | 28.56|
244
+ |Assin2 RTE | 68.10|
245
+ |Assin2 STS | 30.57|
246
+ |FaQuAD NLI | 43.97|
247
+ |HateBR Binary | 60.51|
248
+ |PT Hate Speech Binary | 54.60|
249
+ |tweetSentBR | 46.78|
250
+