leaderboard-pr-bot commited on
Commit
41c6c18
1 Parent(s): f51d310

Adding Evaluation Results

Browse files

This is an automated PR created with https://huggingface.co/spaces/Weyaxi/open-llm-leaderboard-results-pr

The purpose of this PR is to add evaluation results from the Open LLM Leaderboard to your model card.

If you encounter any issues, please report them to https://huggingface.co/spaces/Weyaxi/open-llm-leaderboard-results-pr/discussions

Files changed (1) hide show
  1. README.md +14 -1
README.md CHANGED
@@ -115,4 +115,17 @@ state of the art, but rather further show that chat-like behaviors in LLMs can b
115
  *DLite is an experimental technology and is not designed for use in any environment without significant testing and safety consideration.
116
  Furthermore, the model can sometimes exhibit undesired behaviors. Some of these behaviors include, but are not limited to: factual
117
  inaccuracies, biases, offensive responses, toxicity, and hallucinations. Just as with any other LLM, we advise users of this technology
118
- to exercise good judgment when applying this technology.*
 
 
 
 
 
 
 
 
 
 
 
 
 
 
115
  *DLite is an experimental technology and is not designed for use in any environment without significant testing and safety consideration.
116
  Furthermore, the model can sometimes exhibit undesired behaviors. Some of these behaviors include, but are not limited to: factual
117
  inaccuracies, biases, offensive responses, toxicity, and hallucinations. Just as with any other LLM, we advise users of this technology
118
+ to exercise good judgment when applying this technology.*
119
+ # [Open LLM Leaderboard Evaluation Results](https://huggingface.co/spaces/HuggingFaceH4/open_llm_leaderboard)
120
+ Detailed results can be found [here](https://huggingface.co/datasets/open-llm-leaderboard/details_aisquared__dlite-v2-355m)
121
+
122
+ | Metric | Value |
123
+ |-----------------------|---------------------------|
124
+ | Avg. | 27.53 |
125
+ | ARC (25-shot) | 28.33 |
126
+ | HellaSwag (10-shot) | 40.54 |
127
+ | MMLU (5-shot) | 26.77 |
128
+ | TruthfulQA (0-shot) | 38.76 |
129
+ | Winogrande (5-shot) | 52.8 |
130
+ | GSM8K (5-shot) | 0.0 |
131
+ | DROP (3-shot) | 5.53 |