Some traditional benchmarks?
#11
by
pj-ml
- opened
Could you add some well-known benchmarks?
Yeah, I agree. There are no common benchmarks.
Yes, @pj-ml and @PlanetDOGE , we ran the traditional benchmarks as below, using the same methodology as the Open LLM Leaderboard:
Average | hellaswag | arc_challenge | truthful_qa (mc2) | MMLU (acc) |
---|---|---|---|---|
0.57221 | 0.81617 | 0.58874 | 0.38275 | 0.5012 |
Cheers!
Thanks! I would recommend adding it to the model card for visibility; then, I can close this comment out (as it would no longer be necessary for the visibility of the results you kindly shared).
Hi @pj-ml updated here https://huggingface.co/amazon/MistralLite/blob/main/README.md#mistrallite-lm-eval-results
Thank you!
pj-ml
changed discussion status to
closed