Adding Evaluation Results

This is an automated PR created with https://huggingface.co/spaces/Weyaxi/open-llm-leaderboard-results-pr

The purpose of this PR is to add evaluation results from the Open LLM Leaderboard to your model card.

If you encounter any issues, please report them to https://huggingface.co/spaces/Weyaxi/open-llm-leaderboard-results-pr/discussions

Files changed (1) hide show

README.md +14 -0

README.md CHANGED Viewed

	@@ -54,3 +54,17 @@ These models are intended for research only, in adherence with the [CC BY-NC-4.0
54
55	Although the aforementioned dataset helps to steer the base language models into "safer" distributions of text, not all biases and toxicity can be mitigated through fine-tuning. We ask that users be mindful of such potential issues that can arise in generated responses. Do not treat model outputs as substitutes for human judgment or as sources of truth. Please use it responsibly.
56

 Although the aforementioned dataset helps to steer the base language models into "safer" distributions of text, not all biases and toxicity can be mitigated through fine-tuning. We ask that users be mindful of such potential issues that can arise in generated responses. Do not treat model outputs as substitutes for human judgment or as sources of truth. Please use it responsibly.
+# [Open LLM Leaderboard Evaluation Results](https://huggingface.co/spaces/HuggingFaceH4/open_llm_leaderboard)
+Detailed results can be found [here](https://huggingface.co/datasets/open-llm-leaderboard/details_quantumaikr__QuantumLM)
+| Metric                | Value                     |
+|-----------------------|---------------------------|
+| Avg.                  | 46.73   |
+| ARC (25-shot)         | 55.8          |
+| HellaSwag (10-shot)   | 79.74    |
+| MMLU (5-shot)         | 54.17         |
+| TruthfulQA (0-shot)   | 46.71   |
+| Winogrande (5-shot)   | 74.19   |
+| GSM8K (5-shot)        | 9.86        |
+| DROP (3-shot)         | 6.65         |