Edit model card

You need to agree to share your contact information to access this model

This repository is publicly accessible, but you have to accept the conditions to access its files and content.

Log in or Sign Up to review the conditions and access this model content.

EAGLE: ETRI's Advanced-lightweight Generative Language Engine

(๊ณผ๊ฑฐ์— eGPT๋กœ ๋ถˆ๋ ธ์œผ๋ฉฐ, 2024.11.14 ์— ์ด๋ฆ„์„ ๋ณ€๊ฒฝํ•˜์˜€์Šต๋‹ˆ๋‹ค. ์ถ”ํ›„ ๋ฆด๋ฆฌ์ฆˆ๋˜๋Š” ๋ชจ๋ธ์˜ prefix๋Š” egpt- ๋Œ€์‹  eagle-๋กœ ๋ณ€๊ฒฝ๋ฉ๋‹ˆ๋‹ค.)

๋ณธ ๋ชจ๋ธ์€ ์‚ฌ์ „ํ•™์Šต๋งŒ ์ˆ˜ํ–‰๋œ ๋ชจ๋ธ์ด๋ฉฐ, ๋ณ„๋„์˜ Instruction Tuning ๋“ฑ์ด ์ ์šฉ๋˜์ง€ ์•Š์€ ๊ธฐ์ดˆ ๋ชจ๋ธ์ž…๋‹ˆ๋‹ค. ์ฑ—๋ด‡ ์Šคํƒ€์ผ์˜ ์ž…์ถœ๋ ฅ์ด ํ•„์š”ํ•œ ๊ฒฝ์šฐ, ๋ณ„๋„์˜ ๋ฏธ์„ธ์กฐ์ •์„ ๋ฐ˜๋“œ์‹œ ์ˆ˜ํ–‰ํ•ด์•ผ ํ•ฉ๋‹ˆ๋‹ค.

๋ชจ๋ธ ์ •๋ณด

1.3B Decoder-only, Causal ์–ธ์–ด๋ชจ๋ธ. ์ˆ˜ํ•™, ์ •๋Ÿ‰ ์ถ”๋ก ์„ ๋น„๋กฏํ•œ STEM ๋ถ„์•ผ์— ํŠนํ™”๋œ ์†Œ๊ทœ๋ชจ ์–ธ์–ด๋ชจ๋ธ์„ ์ง€ํ–ฅํ•ฉ๋‹ˆ๋‹ค. ๋ฒ”์šฉ ์–ธ์–ด๋ชจ๋ธ์˜ ์—ญํ• ์„ ๋ชฉํ‘œ๋กœํ•˜์ง€๋Š” ์•Š๊ธฐ์—, ํ†ต์ƒ์˜ ์ดํ•ด ๊ด€๋ จ ๋ฒ”์šฉ ํƒœ์Šคํฌ ํ‰๊ฐ€(e.g. hellaswag, sentineg ๋“ฑ)์—๋Š” ๋‚ฎ์€ ์„ฑ๋Šฅ์ด ๋‚˜ํƒ€๋‚  ์ˆ˜ ์žˆ์Šต๋‹ˆ๋‹ค. ํ•™์Šต ๋ฐ์ดํ„ฐ ๋ณ€๊ฒฝ ๋ฐ ํ•™์Šต ๋ฐฉ๋ฒ• ์ˆ˜์ •, ๊ฐœ์„ ์œผ๋กœ ์ธํ•ด ๋ณธ ๋ชจ๋ธ์€ ๋น„์ •๊ธฐ์ ์œผ๋กœ ์—…๋ฐ์ดํŠธ ๋  ์ˆ˜ ์žˆ์Œ์„ ๋ฏธ๋ฆฌ ์•Œ๋ ค๋“œ๋ฆฝ๋‹ˆ๋‹ค.

Tokenizer๋Š” LLaMa์˜ ๊ตฌ์„ฑ๊ณผ ์œ ์‚ฌํ•˜๊ฒŒ byte-fallbacked BPE + digit ๋ถ„๋ฆฌ ๊ตฌ์„ฑ์„ ๊ฐ€์ง€๋‚˜, BOS/EOS(e.g. <s>,</s>) ํ† ํฐ์ด ๋ชจ๋‘ EOS(</s>)๋กœ ํ†ต์ผ๋˜์–ด ์žˆ์Šต๋‹ˆ๋‹ค. ํ† ํฌ๋‚˜์ด์ € ์„ค์ •์—์„œ PAD ํ† ํฐ์€ ๋ณ„๋„๋กœ ์ง€์ •๋˜์–ด ์žˆ์ง€ ์•Š์œผ๋‚˜, Byte-level BPE์˜ ํŠน์„ฑ์ƒ <unk> ์‹ฌ๋ณผ์ด ์‚ฌ์šฉ๋˜์ง€ ์•Š์œผ๋ฏ€๋กœ, ๋ฏธ์„ธ์กฐ์ • ๋‹จ๊ณ„์—์„œ๋Š” <unk> ํ† ํฐ์„ PAD ํ† ํฐ์œผ๋กœ ์ง€์ •ํ•˜์—ฌ ํ™œ์šฉํ•  ๊ฒƒ์„ ๊ถŒ์žฅํ•ฉ๋‹ˆ๋‹ค. EleutherAI/gptneox ์•„ํ‚คํ…์ณ๋กœ ๊ตฌ์„ฑ๋˜์–ด ์žˆ์œผ๋ฉฐ, A100 80GB PCIE * 8์žฅ์—์„œ ์•ฝ 12์ฃผ๊ฐ„ ํ•™์Šต(4์ฃผ์”ฉ v1, v2, v3 ์ง€์† ์‚ฌ์ „ ํ•™์Šต; ์•ฝ 500B tokens ํ•™์Šต)ํ•˜์—ฌ ํš๋“๋œ ์‚ฌ์ „ํ•™์Šต ๋ชจ๋ธ์ž…๋‹ˆ๋‹ค.

ํ†ต์ง€์‚ฌํ•ญ/Acknowledgement

  • ์ด ๋ชจ๋ธ์€ 2023๋…„๋„ ์ •๋ถ€(๊ณผํ•™๊ธฐ์ˆ ์ •๋ณดํ†ต์‹ ๋ถ€)์˜ ์žฌ์›์œผ๋กœ ์ •๋ณดํ†ต์‹ ๊ธฐํšํ‰๊ฐ€์›์˜ ์ง€์›์„ ๋ฐ›์•„ ์ˆ˜ํ–‰๋œ ์—ฐ๊ตฌ์ž„ (RS-2023-00216011, ์‚ฌ๋žŒ์ฒ˜๋Ÿผ ๊ฐœ๋…์ ์œผ๋กœ ์ดํ•ด/์ถ”๋ก ์ด ๊ฐ€๋Šฅํ•œ ๋ณตํ•ฉ์ธ๊ณต์ง€๋Šฅ ์›์ฒœ๊ธฐ์ˆ  ์—ฐ๊ตฌ)
  • This work was supported by Institute of Information & Communications Technology Planning & Evaluation(IITP) grant funded by the Korea government(MSIT) (RS-2023-00216011, Development of artificial complex intelligence for conceptually understanding and inferring like human)

์ œํ•œ์  ๋ชจ๋ธ ์ ‘๊ทผ ๋ฐ, ๋ชจ๋ธ ์ ‘๊ทผ ํ—ˆ๊ฐ€์™€ ๊ด€๋ จํ•œ ๊ฐœ์ธ์ •๋ณด ์ˆ˜์ง‘ ๋ฐ ์‚ฌ์šฉ ์•ˆ๋‚ด/Information on Collection and Use of Personal Information for Gated Model Access

๋ณธ ๋ชจ๋ธ์€ ์—ฐ๊ตฌ์™€ ๊ต์œก ๋ชฉ์ ์œผ๋กœ๋งŒ ์‚ฌ์šฉ ๋  ์ˆ˜ ์žˆ์œผ๋ฉฐ, ํ˜„์žฌ ์ œํ•œ์  ๊ณต๊ฐœ ์ƒํƒœ๋กœ, ๋ณธ ์–ธ์–ด ๋ชจ๋ธ์˜ ๋‹ค์šด๋กœ๋“œ์—๋Š” ๋‹ด๋‹น์ž ์‚ฌ์ „ ์Šน์ธ์ด ํ•„์š”ํ•ฉ๋‹ˆ๋‹ค. ์‚ฌ์ „ ์Šน์ธ๊ณผ ๊ด€๋ จ๋œ ๋ฌธ์˜์‚ฌํ•ญ์€ ๋ณ„๋„์˜ ๋ฉ”์ผ(jhshin82 at etri.re.kr)๋กœ ์š”์ฒญ ๋ถ€ํƒ๋“œ๋ฆฝ๋‹ˆ๋‹ค.

๋ณธ ๋ชจ๋ธ๊ณผ ๊ด€๋ จํ•ด ์‚ฌํšŒ์ , ๋ฒ•์  ๋ฌธ์ œ๊ฐ€ ๋ฐœ์ƒํ•  ๊ฒฝ์šฐ ๋ชจ๋ธ์˜ ์‚ฌ์šฉ์„ ์ œํ•œํ•˜๊ณ , ๋ฐฐํฌ๋ฅผ ์ฒ ํšŒํ•  ์ˆ˜ ์žˆ์Šต๋‹ˆ๋‹ค. ์ด๋ฅผ ์œ„ํ•ด ๋ชจ๋ธ ์ ‘๊ทผ ํ—ˆ๊ฐ€์— ์‚ฌ์šฉ๋œ ์ด๋ฉ”์ผ ์ฃผ์†Œ๋ฅผ ๋‹ค์Œ๊ณผ ๊ฐ™์ด ์ˆ˜์ง‘, ๋ณด์œ , ์ด์šฉํ•  ์ˆ˜ ์žˆ์Šต๋‹ˆ๋‹ค.

๊ฐœ์ธ์ •๋ณด ์ˆ˜์ง‘๋™์˜/Concent to collection of Personal Information

๋ณธ ๋ชจ๋ธ์˜ ์‚ฌ์šฉ๊ณผ ๊ด€๋ จ, ๋ฐฐํฌ/์‚ฌ์šฉ ์ œํ•œ/์ฒ ํšŒ, ๊ทธ ์™ธ ์‚ฌ์šฉ์ž์˜ ์ด์ต์— ๊ด€๊ณ„๋œ ๋ผ์ด์„ ์Šค ๋ณ€๊ฒฝ ์‹œ ์ด๋ฅผ ํ†ต์ง€ํ•˜๊ธฐ ์œ„ํ•ด, ์•„๋ž˜์™€ ๊ฐ™์ด ๊ฐœ์ธ์ •๋ณด๋ฅผ ์ˆ˜์ง‘, ์ด์šฉํ•ฉ๋‹ˆ๋‹ค.

์ˆ˜์ง‘ ๋ชฉ์  ์ˆ˜์ง‘ ํ•ญ๋ชฉ ๋ณด์œ , ์ด์šฉ๊ธฐ๊ฐ„
๋ชจ๋ธ์˜ ์‚ฌ์šฉ์ œํ•œ/์ฒ ํšŒ ์š”์ฒญ ๋ชฉ์  ์ด๋ฉ”์ผ ์ฃผ์†Œ, huggingface hub ID ๋ณธ ๋ชจ๋ธ์˜ ๊ณต๊ฐœ ๊ธฐ๊ฐ„ ๋ฐ ์ด์šฉ ๋ชฉ์  ๋‹ฌ์„ฑ ์‹œ
๋ชจ๋ธ์˜ ์‚ฌ์šฉ ๋ผ์ด์„ ์Šค ๋“ฑ ๋ณ€๊ฒฝ ์•ˆ๋‚ด ์ด๋ฉ”์ผ ์ฃผ์†Œ, huggingface hub ID ๋ณธ ๋ชจ๋ธ์˜ ๊ณต๊ฐœ ๊ธฐ๊ฐ„ ๋ฐ ์ด์šฉ ๋ชฉ์  ๋‹ฌ์„ฑ ์‹œ

๋ณธ ๋ชจ๋ธ์— ๋Œ€ํ•œ ์ ‘๊ทผ ์š”์ฒญ์„ ์ˆ˜ํ–‰ํ•˜๊ณ , ๋ชจ๋ธ์— ์ ‘๊ทผํ•˜์‹œ๋Š” ํ–‰์œ„๋Š” ์•„๋ž˜์— ์•ˆ๋‚ด๋œ ์•ˆ๋‚ด์‚ฌํ•ญ, ๋ณธ ๋ชจ๋ธ์˜ ํ•œ๊ณ„, ์ฑ…์ž„์žˆ๋Š” AI ์—ฐ๊ตฌ์— ๋Œ€ํ•œ ์ •๋ณด, ๊ฐœ์ธ์ •๋ณด ์ˆ˜์ง‘/์ด์šฉ์— ๋™์˜ํ•˜์‹  ๊ฒƒ์œผ๋กœ ๊ฐ„์ฃผํ•ฉ๋‹ˆ๋‹ค. ์‚ฌ์šฉ์ž๋Š” ๋™์˜๋ฅผ ๊ฑฐ๋ถ€ํ•˜์‹ค ๊ถŒ๋ฆฌ๊ฐ€ ์žˆ์œผ๋ฉฐ, ๋™์˜๋ฅผ ๊ฑฐ๋ถ€ํ•˜์‹ค ๊ฒฝ์šฐ ๋ชจ๋ธ ์‚ฌ์šฉ์ด ์ œํ•œ๋˜๋ฉฐ, ์ด์— ๊ด€๋ จํ•œ ์‚ฌ์šฉ, ๊ฒฐ๊ณผ์— ๋Œ€ํ•œ ์ฑ…์ž„์€ ์‚ฌ์šฉ์ž์—๊ฒŒ ์žˆ์Œ์„ ์•Œ๋ ค๋“œ๋ฆฝ๋‹ˆ๋‹ค. ์‚ฌ์šฉ ํ›„ ๋™์˜ ์ฒ ํšŒ, ๊ฐœ์ธ์ •๋ณด ํ๊ธฐ์— ๋Œ€ํ•œ ์‚ฌํ•ญ์€ ์ƒ๊ธฐ ์•ˆ๋‚ด๋œ ๋ฉ”์ผ ์ฃผ์†Œ ๋˜๋Š” Community tab์„ ํ†ตํ•ด์„œ ์š”์ฒญํ•˜์‹ค ์ˆ˜ ์žˆ์Šต๋‹ˆ๋‹ค.

๋ชจ๋ธ์˜ ํ•œ๊ณ„, ์ฑ…์ž„์žˆ๋Š” AI ์—ฐ๊ตฌ๋ฅผ ์œ„ํ•œ ๊ด€๋ จ ์ •๋ณด ์•ˆ๋‚ด

๋ณธ ๋ชจ๋ธ์˜ ๊ฐœ๋ฐœ๊ณผ ๊ด€๋ จํ•œ ๊ฐœ๋ฐœ์ž ๋ฐ ์กฐ์ง์€ ์ฑ…์ž„์žˆ๋Š” AI ์—ฐ๊ตฌ๋ฅผ ์ค€์ˆ˜ํ•˜๊ณ ์ž ๋…ธ๋ ฅํ•˜๊ณ  ์žˆ์œผ๋ฉฐ, ์ด์™€ ๊ด€๋ จํ•ด AI ์—ฐ๊ตฌ์— ์‚ฌ์šฉ๋˜๋Š” ์ž…์ถœ๋ ฅ ๋ฐ์ดํ„ฐ ๋‚ด ํฌํ•จ๋œ ์š•์„ค, ์Œ๋ž€, ์ •์น˜์  ๋‚ด์šฉ ๋ฐ ๊ธฐํƒ€ ๊ฑฐ์นœ ์–ธ์–ด์— ๋Œ€ํ•œ ์ฒ˜๋ฆฌ๋ฅผ ์ˆ˜ํ–‰ํ•˜๊ณ ์ž ๋…ธ๋ ฅํ•˜๊ณ  ์žˆ์Šต๋‹ˆ๋‹ค. ๊ทธ๋Ÿผ์—๋„ ๋ถˆ๊ตฌํ•˜๊ณ , ์›์‹œ ์›น ํ…์ŠคํŠธ ๋ฐ์ดํ„ฐ์˜ ํŠน์„ฑ ์ƒ ์ด๋ฅผ ๊ธฐ๋ฐ˜์œผ๋กœ ํ•ด ํ•™์Šต๋œ ๋ณธ ์ƒ์„ฑ ์–ธ์–ด ๋ชจ๋ธ์€ ๊ฒฝ๋„๋œ ์‚ฌ์ƒ์„ ํฌํ•จํ•˜๊ฑฐ๋‚˜, ์‚ฌํšŒ์ ์œผ๋กœ ์šฉ์ธ๋  ์ˆ˜ ์—†๋Š” ํ…์ŠคํŠธ๋ฅผ ์ƒ์„ฑํ•  ์ˆ˜ ์žˆ์œผ๋ฉฐ, ๋‹ค๋ฅธ ์–ธ์–ด ๋ชจ๋ธ๊ณผ ๋งˆ์ฐฌ๊ฐ€์ง€๋กœ ํŠน์ • ํ”„๋กฌํ”„ํŠธ์™€ ๊ณต๊ฒฉ์ ์ธ ์ฝ˜ํ…์ธ ๊ฐ€ ๋ฐ˜ํ™˜๋  ์ˆ˜ ์žˆ์Šต๋‹ˆ๋‹ค. ์ด๋ฅผ ํฌํ•จ, ๋ณธ ๋ชจ๋ธ์˜ ์ถœ๋ ฅ/์ƒ์„ฑ ๊ฒฐ๊ณผ์™€ ๊ด€๋ จํ•œ ๋‚ด์šฉ์€ ๊ฐœ๋ฐœ์ž ๋ฐ ๊ฐœ๋ฐœ์ž๊ฐ€ ์†ํ•œ ์กฐ์ง์˜ ์‚ฌ์ƒ, ์˜๋„์™€ ์ „ํ˜€ ๊ด€๋ จ์ด ์—†์Œ์„ ์•Œ๋ ค๋“œ๋ฆฝ๋‹ˆ๋‹ค.

ํ…Œ์ŠคํŠธ์ค‘์— ๋ฐœ์ƒํ•œ ๋น„์ •์ƒ์ ์ธ ํ˜น์€ ์‚ฌํšŒ์ ์œผ๋กœ ์šฉ์ธ๋˜์ง€ ์•Š๋Š” ํ…์ŠคํŠธ๊ฐ€ ์ƒ์„ฑ๋œ ๊ฒฝ์šฐ jhshin82 at etri.re.kr๋กœ (__at__์„ @๋กœ ์น˜ํ™˜) ์ถœ๋ ฅ ์œ ๋„์— ์‚ฌ์šฉ๋œ ์ž…๋ ฅ๋ฌธ(ํ”„๋กฌํ”„ํŠธ), ์‚ฌ์šฉ๋œ ์ƒ˜ํ”Œ๋ง ๊ธฐ๋ฒ• ๋ฐ ํ•˜์ดํผํŒŒ๋ผ๋ฏธํ„ฐ(์˜ˆ: top-p=0.8, temperature, repetition-penalty ๋“ฑ), ์ด๋ฅผ ํ†ตํ•ด ์ƒ์„ฑ๋œ ์ถœ๋ ฅ ๊ฒฐ๊ณผ๋ฅผ ํ•จ๊ป˜ ๋ณด๋‚ด์ฃผ์‹œ๋ฉด, ์ด๋ฅผ ์–ต์ œํ•˜๊ธฐ ์œ„ํ•œ ๋…ธ๋ ฅ์„ ๊ธฐ์šธ์ด๋„๋ก ํ•˜๊ฒ ์Šต๋‹ˆ๋‹ค.

ํ‰๊ฐ€/Evaluations

์‚ฌ์ „ํ•™์Šต ๋ชจ๋ธ์˜ KOBEST ํ‰๊ฐ€

ํ‰๊ฐ€๋Š” EleutherAI/lm-evaluation-harness, polyglot branch ๋ฅผ ์‚ฌ์šฉํ•˜์—ฌ, KoBEST(Kim et al., 2022) ํ‰๊ฐ€์…‹์œผ๋กœ fine-tuning ์—†์ด zero-shot, 5-shot ํ…Œ์ŠคํŠธ๋ฅผ ์ˆ˜ํ–‰ํ–ˆ์Šต๋‹ˆ๋‹ค. (lm-evaluation-harness์˜ KOBEST ํ‰๊ฐ€๋Š” ๋ฒ„์ „์— ๋”ฐ๋ผ ๋‹ค๋ฅด๊ฒŒ ๋‚˜ํƒ€๋Š” ๋ฌธ์ œ๊ฐ€ ์žˆ์–ด, ์ตœ์‹  lm-evaluation-harness(๋ฒ„์ „ 0.4.2 ์ดํ›„)๋ฅผ ํ†ตํ•œ ํ‰๊ฐ€๋ฅผ ์•„๋ž˜ ๋ณ„๋„๋กœ ์ œ์‹œํ•˜์˜€์Šต๋‹ˆ๋‹ค.)

Zero-shot ์„ฑ๋Šฅ KB-BOOLQ (F1) KB-COPA (F1) KB-HELLASWAG (F1) KB-SENTINEG (F1) KB-WIC (F1)
Polyglot-ko-1.3b 0.3552ยฑ0.0087 0.7196ยฑ0.0142 0.4013ยฑ0.0217 0.6790ยฑ0.0239 0.3276ยฑ0.0064
egpt-1.3b (23/07) 0.4903ยฑ0.0134 0.6612ยฑ0.0149 0.3925ยฑ0.0217 0.3383ยฑ0.0112 0.3280ยฑ0.0063
egpt-1.3b (23/11) 0.3969ยฑ0.0112 0.6470ยฑ0.0151 0.3746ยฑ0.0214 0.3350ยฑ0.0111 0.3297ยฑ0.0066
egpt-1.3b (24/03) 0.4034ยฑ0.0118 0.6438ยฑ0.0152 0.4150ยฑ0.0218 0.5272ยฑ0.0255 0.3294ยฑ0.0066
5-shot ์„ฑ๋Šฅ KB-BOOLQ (F1) KB-COPA (F1) KB-HELLASWAG (F1) KB-SENTINEG (F1) KB-WIC (F1)
Polyglot-ko-1.3b 0.4751ยฑ0.0133 0.7193ยฑ0.0142 0.3984ยฑ0.0218 0.6257ยฑ0.0244 0.4559ยฑ0.0138
egpt-1.3b (23/07) 0.4829ยฑ0.0133 0.6558ยฑ0.0150 0.3846ยฑ0.0216 0.5715ยฑ0.0249 0.5108ยฑ0.0141
egpt-1.3b (23/11) 0.4762ยฑ0.0133 0.6499ยฑ0.0151 0.3689ยฑ0.0214 0.5607ยฑ0.0249 0.4776ยฑ0.0141
egpt-1.3b (24/03) 0.4944ยฑ0.0134 0.6643ยฑ0.0149 0.3862ยฑ0.0216 0.5232ยฑ0.0251 0.4947ยฑ0.0141

LM-Evaluation-Harness 0.4.2 ๋ฒ„์ „ ์ด์ƒ(์ดํ•˜ LEH 0.4.2dev, commit id b1777c82) ์œผ๋กœ ํ‰๊ฐ€ ์‹œ, KB-SENTINEG๋Š” ๋” ๋‚ฎ์€ ์ ์ˆ˜๋ฅผ, ๋‚˜๋จธ์ง€ 4๊ฐœ ํ‰๊ฐ€ ํ•ญ๋ชฉ์€ ๋” ๋†’์€ ์ ์ˆ˜๋กœ ๋‚˜ํƒ€๋‚ฉ๋‹ˆ๋‹ค. polyglot branch์˜ ํ‰๊ฐ€ ์˜ค๋ฅ˜๊ฐ€ ์ˆ˜์ •๋œ ๊ฒƒ์œผ๋กœ ๋ณด์—ฌ ์ตœ์‹  ๋ฒ„์ „์„ ํ†ตํ•ด ํ‰๊ฐ€ํ•˜๋Š” ๊ฒƒ์ด ์ข‹์„ ๊ฒƒ์œผ๋กœ ํŒ๋‹จ๋˜๋‚˜, ํ‰๊ฐ€ ์ผ๊ด€์„ฑ์„ ์œ„ํ•ด polyglot branch์˜ ํ‰๊ฐ€ ์ ์ˆ˜๋ฅผ ๋ณ„๋„๋กœ ์œ ์ง€ํ•˜์˜€์Šต๋‹ˆ๋‹ค.

Zero-shot ์„ฑ๋Šฅ KB-BOOLQ (F1) KB-COPA (F1) KB-HELLASWAG (F1) KB-SENTINEG (F1) KB-WIC (F1)
egpt-1.3b (23/11) - LEH v0.4.2dev ํ‰๊ฐ€ 0.4926 0.6530 0.3933 0.3350 0.3280
egpt-1.3b (24/03) - LEH v0.4.2dev ํ‰๊ฐ€ 0.4391 0.6497 0.4222 0.3733 0.3412
egpt-1.3b (24/03) - LEH polyglot branch ํ‰๊ฐ€(์ฐธ๊ณ ) 0.4034 0.6438 0.4150 0.5272 0.3294

์ „์ดํ•™์Šต ๋Šฅ๋ ฅ ํ‰๊ฐ€

MetaMathQA๋ฅผ ํ†ตํ•œ ์˜์–ด GSM8k ํ‰๊ฐ€ ์ ์ˆ˜๋Š” ๋‹ค์Œ๊ณผ ๊ฐ™์Šต๋‹ˆ๋‹ค. ํ•™์Šต ํ™˜๊ฒฝ: LR 8e-5, FP16, TF32 ์‚ฌ์šฉ, ์œ ํšจ ๋ฐฐ์น˜ ํฌ๊ธฐ๋ฅผ ๋™์ผํ•˜๊ฒŒ 128๋กœ ์„ค์ • (์˜ˆ: GPU=4 * batch size=GPU ๋‹น 8 * Gradient Accumulation=4). LR Scheduler๋Š” Cosine Decaying, Warmup ratio 0.03. no weight decay.

๋ชจ๋ธ GSM8k test ๋น„๊ณ 
polyglot-ko-1.3b 0.2160
polyglot-ko-12.8b 0.3646 LR์„ 5e-5๋กœ ์„ธํŒ…, Beomi/polyglot-ko-alpaca-12.8b์˜ hparam์„ ๋ณด๊ณ  LR์„ ๊ฒฐ์ •ํ•จ
egpt-1.3b (23/11) 0.4443
egpt-1.3b (24/03) 0.4147

์—…๋ฐ์ดํŠธ ๊ธฐ๋ก/Update log

  • (23/7/27 ๋ชจ๋ธ) ์ดˆ๊ธฐ ๋ชจ๋ธ. Polyglot 1.3b ๋Œ€๋น„ BOOLQ/WIC์—์„œ ๋” ๋‚˜์€ ์„ฑ๋Šฅ, ๊ทธ๋ฆฌ๊ณ  COPA/HELLASWAG/SENTINEG์—์„œ ์—ด์„ธ.
  • (23/11/22 ๋ชจ๋ธ) ์œ ์‚ฌํ•œ ๋ฐ์ดํ„ฐ ๊ตฌ์„ฑ, ์ผ๋ถ€ ๋ฐ์ดํ„ฐ ์ถ”๊ฐ€ํ•˜์—ฌ 23/7/27 ๋ชจ๋ธ๋กœ ๋ถ€ํ„ฐ ์ถ”๊ฐ€ ์‚ฌ์ „ํ•™์Šตํ•œ ๊ฒƒ. ์ง€์ • ๋ชฉํ‘œ๋ฅผ ์œ„ํ•œ ๋‹ค๋ฅธ ํ‰๊ฐ€ ์ฒด๊ณ„์—์„œ ๋” ๋‚˜์€ ์„ฑ๋Šฅ(39 vs 44)์„ ๋ณด์—ฌ ์—…๋ฐ์ดํŠธ ํ•จ
  • (24/03/21 ๋ชจ๋ธ) AIHUB ๋ฐ์ดํ„ฐ์…‹, ํ•œ๊ตญ์–ด ์œ„ํ‚ค ๋ฐ์ดํ„ฐ์…‹์„ ์‚ฌ์šฉํ•ด ์ถ”๊ฐ€ ์‚ฌ์ „ํ•™์Šต

์‚ฌ์ „ํ•™์Šต์— ์ฐธ์—ฌํ•œ ๋ฐ์ดํ„ฐ์…‹ ์ •๋ณด/Datasets

์•„๋ž˜์˜ ํ•™์Šต ๋ฐ์ดํ„ฐ๋ฅผ ์‚ฌ์šฉํ•˜์—ฌ ํ•™์Šตํ•˜์˜€์Šต๋‹ˆ๋‹ค:

์‚ฌ์šฉ ์š”๋ น/How to use

์•„๋ž˜ ์ฝ”๋“œ๋ฅผ ํ†ตํ•ด, transformers>=4.28 ๋ฒ„์ „์—์„œ ์ถ”๋ก  ๊ฐ€๋Šฅํ•ฉ๋‹ˆ๋‹ค.

import sys

from transformers import (
        AutoTokenizer, AutoModelForCausalLM, GenerationConfig
        )


def load_model(mdl_path):
    tokenizer = AutoTokenizer.from_pretrained(mdl_path, use_fast=True, legacy=False,)
    # device_map ์ธ์ž๋ฅผ ์‚ฌ์šฉํ•˜๊ธฐ ์œ„ํ•ด์„œ๋Š” accelerator ๋ชจ๋“ˆ ์„ค์น˜ ํ•„์š”.
    model = AutoModelForCausalLM.from_pretrained(mdl_path, device_map="auto",
                                                 torch_dtype="auto")

    return tokenizer, model


if __name__ == '__main__':
    # FIXME: ๋ชจ๋ธ ๊ฒฝ๋กœ ์ˆ˜์ •!
    tokenizer, model = load_model("../egpt-1.3b-test-230720/")
    # print(model.hf_device_map)
    # ํ•„์š”์— ๋”ฐ๋ผ ์•„๋ž˜ ์ƒ์„ฑ ์˜ต์…˜์„ ์ œ์–ด
    gen_cfg = GenerationConfig(max_new_tokens=256, min_length=0,
                               max_time=10.0, do_sample=True,
                               top_p=0.9, epsilon_cutoff=3e-4,)

    print("** Now Ready to input from stdin.")
    for aline in sys.stdin:
        aline = aline.rstrip("\n\r\t")
        input_cond = tokenizer(aline, add_special_tokens=False, return_tensors="pt").to("cuda")
        outs = model.generate(**input_cond, generation_config=gen_cfg)
        out_str = tokenizer.batch_decode(outs, skip_special_tokens=True,
                                         clean_up_tokenization_spaces=True)
        print(">> " + ' '.join(out_str))
Downloads last month
132
Inference Examples
This model does not have enough activity to be deployed to Inference API (serverless) yet. Increase its social visibility and check back later, or deploy to Inference Endpoints (dedicated) instead.

Spaces using etri-lirs/egpt-1.3b-preview 2

Collection including etri-lirs/egpt-1.3b-preview