leaderboard-pt-pr-bot commited on
Commit
d86b9ab
•
1 Parent(s): 8c87091

Adding the Open Portuguese LLM Leaderboard Evaluation Results

Browse files

This is an automated PR created with https://huggingface.co/spaces/eduagarcia-temp/portuguese-leaderboard-results-to-modelcard

The purpose of this PR is to add evaluation results from the [🚀 Open Portuguese LLM Leaderboard](https://huggingface.co/spaces/eduagarcia/open_pt_llm_leaderboard) to your model card.

If you encounter any issues, please report them to https://huggingface.co/spaces/eduagarcia-temp/portuguese-leaderboard-results-to-modelcard/discussions

Files changed (1) hide show
  1. README.md +23 -4
README.md CHANGED
@@ -1,14 +1,14 @@
1
  ---
2
- base_model: google/gemma-2-9b-it
3
  tags:
4
  - alignment-handbook
5
  - generated_from_trainer
 
6
  datasets:
7
  - princeton-nlp/gemma2-ultrafeedback-armorm
8
  model-index:
9
- - name: princeton-nlp/gemma-2-9b-it-SimPO
10
  results: []
11
- license: mit
12
  ---
13
 
14
  # gemma-2-9b-it-SimPO Model Card
@@ -135,4 +135,23 @@ ArmoRM paper:
135
  journal={arXiv preprint arXiv:2406.12845},
136
  year={2024}
137
  }
138
- ```
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
  ---
2
+ license: mit
3
  tags:
4
  - alignment-handbook
5
  - generated_from_trainer
6
+ base_model: google/gemma-2-9b-it
7
  datasets:
8
  - princeton-nlp/gemma2-ultrafeedback-armorm
9
  model-index:
10
+ - name: princeton-nlp/gemma-2-9b-it-SimPO
11
  results: []
 
12
  ---
13
 
14
  # gemma-2-9b-it-SimPO Model Card
 
135
  journal={arXiv preprint arXiv:2406.12845},
136
  year={2024}
137
  }
138
+ ```
139
+
140
+
141
+ # Open Portuguese LLM Leaderboard Evaluation Results
142
+
143
+ Detailed results can be found [here](https://huggingface.co/datasets/eduagarcia-temp/llm_pt_leaderboard_raw_results/tree/main/princeton-nlp/gemma-2-9b-it-SimPO) and on the [🚀 Open Portuguese LLM Leaderboard](https://huggingface.co/spaces/eduagarcia/open_pt_llm_leaderboard)
144
+
145
+ | Metric | Value |
146
+ |--------------------------|---------|
147
+ |Average |**73.28**|
148
+ |ENEM Challenge (No Images)| 75.09|
149
+ |BLUEX (No Images) | 65.37|
150
+ |OAB Exams | 54.21|
151
+ |Assin2 RTE | 93.82|
152
+ |Assin2 STS | 77.82|
153
+ |FaQuAD NLI | 70.45|
154
+ |HateBR Binary | 89.76|
155
+ |PT Hate Speech Binary | 66.68|
156
+ |tweetSentBR | 66.28|
157
+