leaderboard-pr-bot's picture
Adding Evaluation Results
bd1d2d2
|
raw
history blame
695 Bytes
metadata
license: llama2

Open LLM Leaderboard Evaluation Results

Detailed results can be found here

Metric Value
Avg. 65.41
ARC (25-shot) 72.1
HellaSwag (10-shot) 87.46
MMLU (5-shot) 71.02
TruthfulQA (0-shot) 61.18
Winogrande (5-shot) 82.87
GSM8K (5-shot) 30.78
DROP (3-shot) 52.45