Some traditional benchmarks?

#11

by pj-ml - opened Nov 7, 2023

Discussion

pj-ml

Nov 7, 2023

Could you add some well-known benchmarks?

PlanetDOGE

Nov 7, 2023

•

edited Nov 7, 2023

Yeah, I agree. There are no common benchmarks.

yinsong1986

Amazon org Nov 14, 2023

•

edited Nov 14, 2023 by

qsvga2

Yes, @pj-ml and @PlanetDOGE , we ran the traditional benchmarks as below, using the same methodology as the Open LLM Leaderboard:

Average	hellaswag	arc_challenge	truthful_qa (mc2)	MMLU (acc)
0.57221	0.81617	0.58874	0.38275	0.5012

Cheers!

pj-ml

Nov 14, 2023

Thanks! I would recommend adding it to the model card for visibility; then, I can close this comment out (as it would no longer be necessary for the visibility of the results you kindly shared).

yinsong1986

Amazon org Nov 15, 2023

Hi @pj-ml updated here https://huggingface.co/amazon/MistralLite/blob/main/README.md#mistrallite-lm-eval-results

Thank you!

pj-ml changed discussion status to closed Nov 17, 2023

Upload images, audio, and videos by dragging in the text input, pasting, or clicking here.

Tap or paste here to upload images

Your need to confirm your account before you can post a new comment.

· Sign up or log in to comment