NekoPunchBBB's picture
Adding Evaluation Results (#1)
c01916d
|
raw
history blame
689 Bytes

Open LLM Leaderboard Evaluation Results

Detailed results can be found here

Metric Value
Avg. 47.51
ARC (25-shot) 57.51
HellaSwag (10-shot) 82.49
MMLU (5-shot) 54.83
TruthfulQA (0-shot) 43.81
Winogrande (5-shot) 77.27
GSM8K (5-shot) 10.46
DROP (3-shot) 6.18