leaderboard-pr-bot's picture
Adding Evaluation Results
c27030d

https://wandb.ai/open-assistant/supervised-finetuning/runs/i9gmn0dt

Trained with residual dropout 0.1

Open LLM Leaderboard Evaluation Results

Detailed results can be found here

Metric Value
Avg. 49.28
ARC (25-shot) 56.4
HellaSwag (10-shot) 79.34
MMLU (5-shot) 46.59
TruthfulQA (0-shot) 48.6
Winogrande (5-shot) 75.22
GSM8K (5-shot) 11.83
DROP (3-shot) 27.03