I can't reproduce the kobest_hellaswag

#2
by Woncheol - opened

As I tried to reproduce the polyglot evaluation, kobest_hellaswag does not match, although other datasets(kobest_copa, wic, boolq) match well.
Is there a problem or the kobest_hellaswage data has been changed?

์ €๋Š” skt/kogpt-trinity-1.2b-v0.5 ๋ชจ๋ธ์„ lm-evaluation-harness๋กœ ํ‰๊ฐ€ํ•ด๋ณด์•˜์œผ๋‚˜, ์œ ์‚ฌํ•˜๊ฒŒ HellaSwag์™€ WiC ํƒœ์Šคํฌ์—์„œ ์ƒ์ดํ•œ ์Šค์ฝ”์–ด๊ฐ€ ๋‚˜์™”์Šต๋‹ˆ๋‹ค. (5-shot / F1 Score ๊ธฐ์ค€์œผ๋กœ) HellaSwag์™€ WiC๋Š” ์›๋ฌธ์—์„  ๊ฐ๊ฐ 0.5272, 0.4313์ ์„ ๋ฐ›์•˜๋‹ค๊ณ  ํ•˜์ง€๋งŒ, ์ œ๊ฐ€ ํ…Œ์ŠคํŠธํ•œ ๊ฒฐ๊ณผ๋กœ๋Š” 0.3999์™€ 0.3953๋กœ ํฐ ์ฐจ์ด๊ฐ€ ๋‚ฌ์Šต๋‹ˆ๋‹ค. ์ž์„ธํ•œ ๋‚ด์šฉ์€ ๋‹ค์Œ ๋ธ”๋กœ๊ทธ ํฌ์ŠคํŠธ์— ์ ์–ด๋‘์—ˆ์Šต๋‹ˆ๋‹ค.

https://tilnote.io/pages/65649f20f097f21e37a1c1ad

Sign up or log in to comment