Adding Evaluation Results

This is an automated PR created with https://huggingface.co/spaces/Weyaxi/open-llm-leaderboard-results-pr

The purpose of this PR is to add evaluation results from the Open LLM Leaderboard to your model card.

If you encounter any issues, please report them to https://huggingface.co/spaces/Weyaxi/open-llm-leaderboard-results-pr/discussions

Files changed (1) hide show

README.md +14 -1

README.md CHANGED Viewed

@@ -115,4 +115,17 @@ state of the art, but rather further show that chat-like behaviors in LLMs can b
 *DLite is an experimental technology and is not designed for use in any environment without significant testing and safety consideration.
 Furthermore, the model can sometimes exhibit undesired behaviors. Some of these behaviors include, but are not limited to: factual
 inaccuracies, biases, offensive responses, toxicity, and hallucinations. Just as with any other LLM, we advise users of this technology
-to exercise good judgment when applying this technology.*

 *DLite is an experimental technology and is not designed for use in any environment without significant testing and safety consideration.
 Furthermore, the model can sometimes exhibit undesired behaviors. Some of these behaviors include, but are not limited to: factual
 inaccuracies, biases, offensive responses, toxicity, and hallucinations. Just as with any other LLM, we advise users of this technology
+to exercise good judgment when applying this technology.*
+# [Open LLM Leaderboard Evaluation Results](https://huggingface.co/spaces/HuggingFaceH4/open_llm_leaderboard)
+Detailed results can be found [here](https://huggingface.co/datasets/open-llm-leaderboard/details_aisquared__dlite-v2-355m)
+| Metric                | Value                     |
+|-----------------------|---------------------------|
+| Avg.                  | 27.53   |
+| ARC (25-shot)         | 28.33          |
+| HellaSwag (10-shot)   | 40.54    |
+| MMLU (5-shot)         | 26.77         |
+| TruthfulQA (0-shot)   | 38.76   |
+| Winogrande (5-shot)   | 52.8   |
+| GSM8K (5-shot)        | 0.0        |
+| DROP (3-shot)         | 5.53         |