MelangeA-70b / README.md
chargoddard's picture
Adding Evaluation Results (#2)
57ff900

Experimental merge. Details to come if successful.

Open LLM Leaderboard Evaluation Results

Detailed results can be found here

Metric Value
Avg. 55.92
ARC (25-shot) 71.25
HellaSwag (10-shot) 87.3
MMLU (5-shot) 70.56
TruthfulQA (0-shot) 60.61
Winogrande (5-shot) 81.53
GSM8K (5-shot) 5.69
DROP (3-shot) 14.53