MelangeB-70b / README.md
chargoddard's picture
Adding Evaluation Results (#2)
5737b53
|
raw
history blame contribute delete
No virus
655 Bytes

Open LLM Leaderboard Evaluation Results

Detailed results can be found here

Metric Value
Avg. 65.8
ARC (25-shot) 71.67
HellaSwag (10-shot) 87.5
MMLU (5-shot) 70.03
TruthfulQA (0-shot) 59.36
Winogrande (5-shot) 83.5
GSM8K (5-shot) 30.63
DROP (3-shot) 57.92