leaderboard-pr-bot commited on
Commit
796f0cf
1 Parent(s): a102f98

Adding Evaluation Results

Browse files

This is an automated PR created with https://huggingface.co/spaces/Weyaxi/open-llm-leaderboard-results-pr

The purpose of this PR is to add evaluation results from the Open LLM Leaderboard to your model card.

If you encounter any issues, please report them to https://huggingface.co/spaces/Weyaxi/open-llm-leaderboard-results-pr/discussions

Files changed (1) hide show
  1. README.md +19 -6
README.md CHANGED
@@ -1,22 +1,22 @@
1
  ---
2
- datasets:
3
- - argilla/ultrafeedback-binarized-preferences-cleaned
4
  language:
5
  - en
6
  - de
7
  - es
8
  - fr
9
  - it
10
- base_model: mistralai/Mixtral-8x7B-Instruct-v0.1
11
  library_name: transformers
12
- pipeline_tag: text-generation
13
  tags:
14
  - dpo
15
  - rlaif
16
  - preference
17
  - ultrafeedback
18
  - moe
19
- license: apache-2.0
 
 
 
20
  model-index:
21
  - name: notux-8x7b-v1
22
  results: []
@@ -94,4 +94,17 @@ The following hyperparameters were used during training:
94
  - Transformers 4.36.0
95
  - Pytorch 2.1.0+cu118
96
  - Datasets 2.14.6
97
- - Tokenizers 0.15.0
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
  ---
 
 
2
  language:
3
  - en
4
  - de
5
  - es
6
  - fr
7
  - it
8
+ license: apache-2.0
9
  library_name: transformers
 
10
  tags:
11
  - dpo
12
  - rlaif
13
  - preference
14
  - ultrafeedback
15
  - moe
16
+ datasets:
17
+ - argilla/ultrafeedback-binarized-preferences-cleaned
18
+ base_model: mistralai/Mixtral-8x7B-Instruct-v0.1
19
+ pipeline_tag: text-generation
20
  model-index:
21
  - name: notux-8x7b-v1
22
  results: []
 
94
  - Transformers 4.36.0
95
  - Pytorch 2.1.0+cu118
96
  - Datasets 2.14.6
97
+ - Tokenizers 0.15.0
98
+ # [Open LLM Leaderboard Evaluation Results](https://huggingface.co/spaces/HuggingFaceH4/open_llm_leaderboard)
99
+ Detailed results can be found [here](https://huggingface.co/datasets/open-llm-leaderboard/details_argilla__notus-8x7b-experiment)
100
+
101
+ | Metric |Value|
102
+ |---------------------------------|----:|
103
+ |Avg. |73.18|
104
+ |AI2 Reasoning Challenge (25-Shot)|70.99|
105
+ |HellaSwag (10-Shot) |87.73|
106
+ |MMLU (5-Shot) |71.33|
107
+ |TruthfulQA (0-shot) |65.79|
108
+ |Winogrande (5-shot) |81.61|
109
+ |GSM8k (5-shot) |61.64|
110
+