Adding Evaluation Results

This is an automated PR created with https://huggingface.co/spaces/Weyaxi/open-llm-leaderboard-results-pr

The purpose of this PR is to add evaluation results from the Open LLM Leaderboard to your model card.

If you encounter any issues, please report them to https://huggingface.co/spaces/Weyaxi/open-llm-leaderboard-results-pr/discussions

Files changed (1) hide show

README.md +14 -0

README.md CHANGED Viewed

@@ -169,3 +169,17 @@ These benchmarks currently have us at #1 on ARC-c, ARC-e, Hellaswag, and OpenBoo
 The model is available for download on Hugging Face. It is suitable for a wide range of language tasks, from generating creative text to understanding and following complex instructions.
 Compute provided by our project sponsor Redmond AI, thank you!!

 The model is available for download on Hugging Face. It is suitable for a wide range of language tasks, from generating creative text to understanding and following complex instructions.
 Compute provided by our project sponsor Redmond AI, thank you!!
+# [Open LLM Leaderboard Evaluation Results](https://huggingface.co/spaces/HuggingFaceH4/open_llm_leaderboard)
+Detailed results can be found [here](https://huggingface.co/datasets/open-llm-leaderboard/details_TheBloke__Nous-Hermes-13B-SuperHOT-8K-fp16)
+| Metric                | Value                     |
+|-----------------------|---------------------------|
+| Avg.                  | 49.3   |
+| ARC (25-shot)         | 55.29          |
+| HellaSwag (10-shot)   | 81.87    |
+| MMLU (5-shot)         | 48.23         |
+| TruthfulQA (0-shot)   | 51.19   |
+| Winogrande (5-shot)   | 75.3   |
+| GSM8K (5-shot)        | 1.21        |
+| DROP (3-shot)         | 32.03         |