leaderboard-pr-bot commited on
Commit
a8ba72d
1 Parent(s): ff7e304

Adding Evaluation Results

Browse files

This is an automated PR created with https://huggingface.co/spaces/Weyaxi/open-llm-leaderboard-results-pr

The purpose of this PR is to add evaluation results from the Open LLM Leaderboard to your model card.

If you encounter any issues, please report them to https://huggingface.co/spaces/Weyaxi/open-llm-leaderboard-results-pr/discussions

Files changed (1) hide show
  1. README.md +17 -4
README.md CHANGED
@@ -1,18 +1,18 @@
1
  ---
 
 
2
  license: apache-2.0
3
- base_model: deepseek-ai/deepseek-math-7b-base
4
  tags:
5
  - alignment-handbook
6
  - generated_from_trainer
7
  - aimo
8
  - math
 
9
  datasets:
10
  - AI-MO/NuminaMath-CoT
11
  model-index:
12
  - name: AI-MO/NuminaMath-7B-CoT
13
  results: []
14
- language:
15
- - en
16
  ---
17
 
18
  <!-- This model card has been generated automatically according to the information the Trainer had access to. You
@@ -141,4 +141,17 @@ The following hyperparameters were used during training:
141
  - Transformers 4.42.3
142
  - Pytorch 2.3.0+cu121
143
  - Datasets 2.18.0
144
- - Tokenizers 0.19.1
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
  ---
2
+ language:
3
+ - en
4
  license: apache-2.0
 
5
  tags:
6
  - alignment-handbook
7
  - generated_from_trainer
8
  - aimo
9
  - math
10
+ base_model: deepseek-ai/deepseek-math-7b-base
11
  datasets:
12
  - AI-MO/NuminaMath-CoT
13
  model-index:
14
  - name: AI-MO/NuminaMath-7B-CoT
15
  results: []
 
 
16
  ---
17
 
18
  <!-- This model card has been generated automatically according to the information the Trainer had access to. You
 
141
  - Transformers 4.42.3
142
  - Pytorch 2.3.0+cu121
143
  - Datasets 2.18.0
144
+ - Tokenizers 0.19.1
145
+ # [Open LLM Leaderboard Evaluation Results](https://huggingface.co/spaces/open-llm-leaderboard/open_llm_leaderboard)
146
+ Detailed results can be found [here](https://huggingface.co/datasets/open-llm-leaderboard/details_AI-MO__NuminaMath-7B-CoT)
147
+
148
+ | Metric |Value|
149
+ |-------------------|----:|
150
+ |Avg. |12.95|
151
+ |IFEval (0-Shot) |26.89|
152
+ |BBH (3-Shot) |19.15|
153
+ |MATH Lvl 5 (4-Shot)| 7.93|
154
+ |GPQA (0-shot) | 2.13|
155
+ |MuSR (0-shot) | 0.83|
156
+ |MMLU-PRO (5-shot) |20.76|
157
+