leaderboard-pt-pr-bot commited on
Commit
24a9f8b
1 Parent(s): 1a6ccc1

Adding Evaluation Results

Browse files

This is an automated PR created with https://huggingface.co/spaces/eduagarcia-temp/portuguese-leaderboard-results-to-modelcard

The purpose of this PR is to add evaluation results from the Open Portuguese LLM Leaderboard to your model card.

If you encounter any issues, please report them to https://huggingface.co/spaces/eduagarcia-temp/portuguese-leaderboard-results-to-modelcard/discussions

Files changed (1) hide show
  1. README.md +142 -6
README.md CHANGED
@@ -1,13 +1,133 @@
1
  ---
2
- license: other
3
- license_name: tongyi-qianwen
4
- license_link: >-
5
- https://huggingface.co/Qwen/Qwen1.5-72B-Chat/blob/main/LICENSE
6
  language:
7
  - en
8
- pipeline_tag: text-generation
9
  tags:
10
  - chat
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
11
  ---
12
 
13
  # Qwen1.5-72B-Chat
@@ -95,4 +215,20 @@ If you find our work helpful, feel free to give us a cite.
95
  journal={arXiv preprint arXiv:2309.16609},
96
  year={2023}
97
  }
98
- ```
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
  ---
 
 
 
 
2
  language:
3
  - en
4
+ license: other
5
  tags:
6
  - chat
7
+ license_name: tongyi-qianwen
8
+ license_link: https://huggingface.co/Qwen/Qwen1.5-72B-Chat/blob/main/LICENSE
9
+ pipeline_tag: text-generation
10
+ model-index:
11
+ - name: Qwen1.5-72B-Chat
12
+ results:
13
+ - task:
14
+ type: text-generation
15
+ name: Text Generation
16
+ dataset:
17
+ name: ENEM Challenge (No Images)
18
+ type: eduagarcia/enem_challenge
19
+ split: train
20
+ args:
21
+ num_few_shot: 3
22
+ metrics:
23
+ - type: acc
24
+ value: 77.05
25
+ name: accuracy
26
+ source:
27
+ url: https://huggingface.co/spaces/eduagarcia/open_pt_llm_leaderboard?query=Qwen/Qwen1.5-72B-Chat
28
+ name: Open Portuguese LLM Leaderboard
29
+ - task:
30
+ type: text-generation
31
+ name: Text Generation
32
+ dataset:
33
+ name: BLUEX (No Images)
34
+ type: eduagarcia-temp/BLUEX_without_images
35
+ split: train
36
+ args:
37
+ num_few_shot: 3
38
+ metrics:
39
+ - type: acc
40
+ value: 67.59
41
+ name: accuracy
42
+ source:
43
+ url: https://huggingface.co/spaces/eduagarcia/open_pt_llm_leaderboard?query=Qwen/Qwen1.5-72B-Chat
44
+ name: Open Portuguese LLM Leaderboard
45
+ - task:
46
+ type: text-generation
47
+ name: Text Generation
48
+ dataset:
49
+ name: OAB Exams
50
+ type: eduagarcia/oab_exams
51
+ split: train
52
+ args:
53
+ num_few_shot: 3
54
+ metrics:
55
+ - type: acc
56
+ value: 55.31
57
+ name: accuracy
58
+ source:
59
+ url: https://huggingface.co/spaces/eduagarcia/open_pt_llm_leaderboard?query=Qwen/Qwen1.5-72B-Chat
60
+ name: Open Portuguese LLM Leaderboard
61
+ - task:
62
+ type: text-generation
63
+ name: Text Generation
64
+ dataset:
65
+ name: Assin2 RTE
66
+ type: assin2
67
+ split: test
68
+ args:
69
+ num_few_shot: 15
70
+ metrics:
71
+ - type: f1_macro
72
+ value: 92.8
73
+ name: f1-macro
74
+ - type: pearson
75
+ value: 78.2
76
+ name: pearson
77
+ source:
78
+ url: https://huggingface.co/spaces/eduagarcia/open_pt_llm_leaderboard?query=Qwen/Qwen1.5-72B-Chat
79
+ name: Open Portuguese LLM Leaderboard
80
+ - task:
81
+ type: text-generation
82
+ name: Text Generation
83
+ dataset:
84
+ name: FaQuAD NLI
85
+ type: ruanchaves/faquad-nli
86
+ split: test
87
+ args:
88
+ num_few_shot: 15
89
+ metrics:
90
+ - type: f1_macro
91
+ value: 79.48
92
+ name: f1-macro
93
+ source:
94
+ url: https://huggingface.co/spaces/eduagarcia/open_pt_llm_leaderboard?query=Qwen/Qwen1.5-72B-Chat
95
+ name: Open Portuguese LLM Leaderboard
96
+ - task:
97
+ type: text-generation
98
+ name: Text Generation
99
+ dataset:
100
+ name: HateBR Binary
101
+ type: eduagarcia/portuguese_benchmark
102
+ split: test
103
+ args:
104
+ num_few_shot: 25
105
+ metrics:
106
+ - type: f1_macro
107
+ value: 86.84
108
+ name: f1-macro
109
+ - type: f1_macro
110
+ value: 62.59
111
+ name: f1-macro
112
+ source:
113
+ url: https://huggingface.co/spaces/eduagarcia/open_pt_llm_leaderboard?query=Qwen/Qwen1.5-72B-Chat
114
+ name: Open Portuguese LLM Leaderboard
115
+ - task:
116
+ type: text-generation
117
+ name: Text Generation
118
+ dataset:
119
+ name: tweetSentBR
120
+ type: eduagarcia-temp/tweetsentbr
121
+ split: test
122
+ args:
123
+ num_few_shot: 25
124
+ metrics:
125
+ - type: f1_macro
126
+ value: 68.98
127
+ name: f1-macro
128
+ source:
129
+ url: https://huggingface.co/spaces/eduagarcia/open_pt_llm_leaderboard?query=Qwen/Qwen1.5-72B-Chat
130
+ name: Open Portuguese LLM Leaderboard
131
  ---
132
 
133
  # Qwen1.5-72B-Chat
 
215
  journal={arXiv preprint arXiv:2309.16609},
216
  year={2023}
217
  }
218
+ ```
219
+ # [Open Portuguese LLM Leaderboard Evaluation Results](https://huggingface.co/spaces/eduagarcia/open_pt_llm_leaderboard)
220
+ Detailed results can be found [here](https://huggingface.co/datasets/eduagarcia-temp/llm_pt_leaderboard_raw_results/tree/main/Qwen/Qwen1.5-72B-Chat)
221
+
222
+ | Metric | Value |
223
+ |--------------------------|---------|
224
+ |Average |**74.32**|
225
+ |ENEM Challenge (No Images)| 77.05|
226
+ |BLUEX (No Images) | 67.59|
227
+ |OAB Exams | 55.31|
228
+ |Assin2 RTE | 92.80|
229
+ |Assin2 STS | 78.20|
230
+ |FaQuAD NLI | 79.48|
231
+ |HateBR Binary | 86.84|
232
+ |PT Hate Speech Binary | 62.59|
233
+ |tweetSentBR | 68.98|
234
+