orangetin commited on
Commit
cde21ae
1 Parent(s): a55b010

Update benchmark results

Browse files
Files changed (1) hide show
  1. README.md +13 -20
README.md CHANGED
@@ -18,7 +18,7 @@ language:
18
 
19
  # OpenHermes - Mixtral 8x7B
20
 
21
- ![image/jpeg](https://cdn-uploads.huggingface.co/production/uploads/6440872be44f30a723256163/3reRxAyfCRBtGxd16SK1q.jpeg)
22
 
23
  ## Model Card
24
  OpenHermes Mixtral 8x7B - a state of the art Mixtral Fine-tune.
@@ -27,25 +27,18 @@ Huge thank you to [Teknium](https://huggingface.co/datasets/teknium) for open-so
27
 
28
  This model was trained on the [OpenHermes dataset](https://huggingface.co/datasets/teknium/openhermes) for 3 epochs
29
 
30
- ## Benchmark Results
31
-
32
- ARC:
33
- ```
34
- | Task |Version| Metric |Value | |Stderr|
35
- |-------------|------:|--------|-----:|---|-----:|
36
- |arc_challenge| 0|acc |0.6075|± |0.0143|
37
- | | |acc_norm|0.6493|± |0.0139|
38
- ```
39
-
40
- TruthfulQA:
41
- ```
42
- | Task |Version|Metric|Value | |Stderr|
43
- |-------------|------:|------|-----:|---|-----:|
44
- |truthfulqa_mc| 1|mc1 |0.4272|± |0.0173|
45
- | | |mc2 |0.5865|± |0.0160|
46
- ```
47
-
48
- More benchmarks coming soon!
49
 
50
  # Prompt Format
51
 
 
18
 
19
  # OpenHermes - Mixtral 8x7B
20
 
21
+ ![image/jpeg](https://cdn-uploads.huggingface.co/production/uploads/6440872be44f30a723256163/3Gvl__aGtP4AHxzx9NoXX.jpeg)
22
 
23
  ## Model Card
24
  OpenHermes Mixtral 8x7B - a state of the art Mixtral Fine-tune.
 
27
 
28
  This model was trained on the [OpenHermes dataset](https://huggingface.co/datasets/teknium/openhermes) for 3 epochs
29
 
30
+ ## [Open LLM Leaderboard Evaluation Results](https://huggingface.co/spaces/HuggingFaceH4/open_llm_leaderboard)
31
+ Detailed results can be found [here](https://huggingface.co/datasets/open-llm-leaderboard/details_orangetin__OpenHermes-Mixtral-8x7B)
32
+
33
+ | Metric | Value |
34
+ |-----------------------|---------------------------|
35
+ | Avg. | 65.27 |
36
+ | ARC (25-shot) | 63.91 |
37
+ | HellaSwag (10-shot) | 84.14 |
38
+ | MMLU (5-shot) | 64.29 |
39
+ | TruthfulQA (0-shot) | 59.53 |
40
+ | Winogrande (5-shot) | 74.03 |
41
+ | GSM8K (5-shot) | 45.72 |
 
 
 
 
 
 
 
42
 
43
  # Prompt Format
44