Update benchmark results

Files changed (1) hide show

README.md CHANGED Viewed

@@ -18,7 +18,7 @@ language:
 # OpenHermes - Mixtral 8x7B
-![image/jpeg](https://cdn-uploads.huggingface.co/production/uploads/6440872be44f30a723256163/3reRxAyfCRBtGxd16SK1q.jpeg)
 ## Model Card
 OpenHermes Mixtral 8x7B - a state of the art Mixtral Fine-tune.
@@ -27,25 +27,18 @@ Huge thank you to [Teknium](https://huggingface.co/datasets/teknium) for open-so
 This model was trained on the [OpenHermes dataset](https://huggingface.co/datasets/teknium/openhermes) for 3 epochs
-## Benchmark Results
-ARC:
-```
-|    Task     |Version| Metric |Value |   |Stderr|
-|-------------|------:|--------|-----:|---|-----:|
-|arc_challenge|      0|acc     |0.6075|±  |0.0143|
-|             |       |acc_norm|0.6493|±  |0.0139|
-```
-TruthfulQA:
-```
-|    Task     |Version|Metric|Value |   |Stderr|
-|-------------|------:|------|-----:|---|-----:|
-|truthfulqa_mc|      1|mc1   |0.4272|±  |0.0173|
-|             |       |mc2   |0.5865|±  |0.0160|
-```
-More benchmarks coming soon!
 # Prompt Format

 # OpenHermes - Mixtral 8x7B
+![image/jpeg](https://cdn-uploads.huggingface.co/production/uploads/6440872be44f30a723256163/3Gvl__aGtP4AHxzx9NoXX.jpeg)
 ## Model Card
 OpenHermes Mixtral 8x7B - a state of the art Mixtral Fine-tune.
 This model was trained on the [OpenHermes dataset](https://huggingface.co/datasets/teknium/openhermes) for 3 epochs
+## [Open LLM Leaderboard Evaluation Results](https://huggingface.co/spaces/HuggingFaceH4/open_llm_leaderboard)
+Detailed results can be found [here](https://huggingface.co/datasets/open-llm-leaderboard/details_orangetin__OpenHermes-Mixtral-8x7B)
+ | Metric                | Value                     |
+|-----------------------|---------------------------|
+| Avg.                  | 65.27   |
+| ARC (25-shot)         | 63.91          |
+| HellaSwag (10-shot)   | 84.14    |
+| MMLU (5-shot)         | 64.29         |
+| TruthfulQA (0-shot)   | 59.53   |
+| Winogrande (5-shot)   | 74.03   |
+| GSM8K (5-shot)        | 45.72        |
 # Prompt Format