chargoddard
/

MelangeA-70b

Text Generation

Inference Endpoints

text-generation-inference

Model card Files Files and versions Community

MelangeA-70b / README.md

chargoddard's picture

Adding Evaluation Results (#2)

57ff900 7 months ago

|

raw history blame contribute delete

No virus

706 Bytes

Experimental merge. Details to come if successful.

Open LLM Leaderboard Evaluation Results

Detailed results can be found here

Metric	Value
Avg.	55.92
ARC (25-shot)	71.25
HellaSwag (10-shot)	87.3
MMLU (5-shot)	70.56
TruthfulQA (0-shot)	60.61
Winogrande (5-shot)	81.53
GSM8K (5-shot)	5.69
DROP (3-shot)	14.53