SummerSigh's picture
Adding Evaluation Results (#2)
41ffbc3 verified
metadata
license: apache-2.0

Open LLM Leaderboard Evaluation Results

Detailed results can be found here

Metric Value
Avg. 27.03
ARC (25-shot) 25.94
HellaSwag (10-shot) 38.55
MMLU (5-shot) 25.76
TruthfulQA (0-shot) 45.25
Winogrande (5-shot) 50.2
GSM8K (5-shot) 0.3
DROP (3-shot) 3.24