Some traditional benchmarks?

#11
by pj-ml - opened

Could you add some well-known benchmarks?

Yeah, I agree. There are no common benchmarks.

Amazon Web Services org
edited Nov 14, 2023

Yes, @pj-ml and @PlanetDOGE , we ran the traditional benchmarks as below, using the same methodology as the Open LLM Leaderboard:

Average hellaswag arc_challenge truthful_qa (mc2) MMLU (acc)
0.57221 0.81617 0.58874 0.38275 0.5012

Cheers!

Thanks! I would recommend adding it to the model card for visibility; then, I can close this comment out (as it would no longer be necessary for the visibility of the results you kindly shared).

pj-ml changed discussion status to closed

Sign up or log in to comment