leaderboard-pr-bot commited on
Commit
04f5aaf
1 Parent(s): de2981a

Adding Evaluation Results

Browse files

This is an automated PR created with https://huggingface.co/spaces/Weyaxi/open-llm-leaderboard-results-pr

The purpose of this PR is to add evaluation results from the Open LLM Leaderboard to your model card.

If you encounter any issues, please report them to https://huggingface.co/spaces/Weyaxi/open-llm-leaderboard-results-pr/discussions

Files changed (1) hide show
  1. README.md +18 -5
README.md CHANGED
@@ -1,16 +1,16 @@
1
  ---
 
 
2
  license: llama3
3
  library_name: peft
4
  tags:
5
  - generated_from_trainer
6
  base_model: meta-llama/Meta-Llama-3-8B-Instruct
 
 
7
  model-index:
8
  - name: llama-3-8b-claudstruct-v3
9
  results: []
10
- datasets:
11
- - Norquinal/claude_multi_instruct_30k
12
- language:
13
- - en
14
  ---
15
 
16
  <!-- This model card has been generated automatically according to the information the Trainer had access to. You
@@ -163,4 +163,17 @@ The following hyperparameters were used during training:
163
  - Transformers 4.41.1
164
  - Pytorch 2.3.0
165
  - Datasets 2.19.1
166
- - Tokenizers 0.19.1
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
  ---
2
+ language:
3
+ - en
4
  license: llama3
5
  library_name: peft
6
  tags:
7
  - generated_from_trainer
8
  base_model: meta-llama/Meta-Llama-3-8B-Instruct
9
+ datasets:
10
+ - Norquinal/claude_multi_instruct_30k
11
  model-index:
12
  - name: llama-3-8b-claudstruct-v3
13
  results: []
 
 
 
 
14
  ---
15
 
16
  <!-- This model card has been generated automatically according to the information the Trainer had access to. You
 
163
  - Transformers 4.41.1
164
  - Pytorch 2.3.0
165
  - Datasets 2.19.1
166
+ - Tokenizers 0.19.1
167
+ # [Open LLM Leaderboard Evaluation Results](https://huggingface.co/spaces/HuggingFaceH4/open_llm_leaderboard)
168
+ Detailed results can be found [here](https://huggingface.co/datasets/open-llm-leaderboard/details_jrahn__llama-3-8b-claudstruct-v3)
169
+
170
+ | Metric |Value|
171
+ |---------------------------------|----:|
172
+ |Avg. |65.62|
173
+ |AI2 Reasoning Challenge (25-Shot)|58.96|
174
+ |HellaSwag (10-Shot) |80.05|
175
+ |MMLU (5-Shot) |64.55|
176
+ |TruthfulQA (0-shot) |51.76|
177
+ |Winogrande (5-shot) |74.19|
178
+ |GSM8k (5-shot) |64.22|
179
+