egpt-1.3b-preview / README.md
dalgarak's picture
Update README.md
31d1b60 verified
---
language:
- ko
library_name: transformers
---
# EAGLE: ETRI's Advanced-lightweight Generative Language Engine
(๊ณผ๊ฑฐ์— eGPT๋กœ ๋ถˆ๋ ธ์œผ๋ฉฐ, 2024.11.14 ์— ์ด๋ฆ„์„ ๋ณ€๊ฒฝํ•˜์˜€์Šต๋‹ˆ๋‹ค. ์ถ”ํ›„ ๋ฆด๋ฆฌ์ฆˆ๋˜๋Š” ๋ชจ๋ธ์˜ prefix๋Š” egpt- ๋Œ€์‹  eagle-๋กœ ๋ณ€๊ฒฝ๋ฉ๋‹ˆ๋‹ค.)
__๋ณธ ๋ชจ๋ธ์€ ์‚ฌ์ „ํ•™์Šต๋งŒ ์ˆ˜ํ–‰๋œ ๋ชจ๋ธ์ด๋ฉฐ, ๋ณ„๋„์˜ Instruction Tuning ๋“ฑ์ด ์ ์šฉ๋˜์ง€ ์•Š์€ ๊ธฐ์ดˆ ๋ชจ๋ธ์ž…๋‹ˆ๋‹ค. ์ฑ—๋ด‡ ์Šคํƒ€์ผ์˜ ์ž…์ถœ๋ ฅ์ด ํ•„์š”ํ•œ ๊ฒฝ์šฐ, ๋ณ„๋„์˜ ๋ฏธ์„ธ์กฐ์ •์„ ๋ฐ˜๋“œ์‹œ ์ˆ˜ํ–‰ํ•ด์•ผ ํ•ฉ๋‹ˆ๋‹ค.__
## ๋ชจ๋ธ ์ •๋ณด
1.3B Decoder-only, Causal ์–ธ์–ด๋ชจ๋ธ. ์ˆ˜ํ•™, ์ •๋Ÿ‰ ์ถ”๋ก ์„ ๋น„๋กฏํ•œ STEM ๋ถ„์•ผ์— ํŠนํ™”๋œ ์†Œ๊ทœ๋ชจ ์–ธ์–ด๋ชจ๋ธ์„ ์ง€ํ–ฅํ•ฉ๋‹ˆ๋‹ค.
๋ฒ”์šฉ ์–ธ์–ด๋ชจ๋ธ์˜ ์—ญํ• ์„ ๋ชฉํ‘œ๋กœํ•˜์ง€๋Š” ์•Š๊ธฐ์—, ํ†ต์ƒ์˜ ์ดํ•ด ๊ด€๋ จ ๋ฒ”์šฉ ํƒœ์Šคํฌ ํ‰๊ฐ€(e.g. hellaswag, sentineg ๋“ฑ)์—๋Š” ๋‚ฎ์€ ์„ฑ๋Šฅ์ด ๋‚˜ํƒ€๋‚  ์ˆ˜ ์žˆ์Šต๋‹ˆ๋‹ค.
ํ•™์Šต ๋ฐ์ดํ„ฐ ๋ณ€๊ฒฝ ๋ฐ ํ•™์Šต ๋ฐฉ๋ฒ• ์ˆ˜์ •, ๊ฐœ์„ ์œผ๋กœ ์ธํ•ด ๋ณธ ๋ชจ๋ธ์€ ๋น„์ •๊ธฐ์ ์œผ๋กœ ์—…๋ฐ์ดํŠธ ๋  ์ˆ˜ ์žˆ์Œ์„ ๋ฏธ๋ฆฌ ์•Œ๋ ค๋“œ๋ฆฝ๋‹ˆ๋‹ค.
Tokenizer๋Š” LLaMa์˜ ๊ตฌ์„ฑ๊ณผ ์œ ์‚ฌํ•˜๊ฒŒ byte-fallbacked BPE + digit ๋ถ„๋ฆฌ ๊ตฌ์„ฑ์„ ๊ฐ€์ง€๋‚˜, BOS/EOS(e.g. ```<s>,</s>```) ํ† ํฐ์ด ๋ชจ๋‘ EOS(```</s>```)๋กœ ํ†ต์ผ๋˜์–ด ์žˆ์Šต๋‹ˆ๋‹ค. ํ† ํฌ๋‚˜์ด์ € ์„ค์ •์—์„œ PAD ํ† ํฐ์€ ๋ณ„๋„๋กœ ์ง€์ •๋˜์–ด ์žˆ์ง€ ์•Š์œผ๋‚˜, Byte-level BPE์˜ ํŠน์„ฑ์ƒ ```<unk>``` ์‹ฌ๋ณผ์ด ์‚ฌ์šฉ๋˜์ง€ ์•Š์œผ๋ฏ€๋กœ, ๋ฏธ์„ธ์กฐ์ • ๋‹จ๊ณ„์—์„œ๋Š” ```<unk>``` ํ† ํฐ์„ PAD ํ† ํฐ์œผ๋กœ ์ง€์ •ํ•˜์—ฌ ํ™œ์šฉํ•  ๊ฒƒ์„ ๊ถŒ์žฅํ•ฉ๋‹ˆ๋‹ค.
EleutherAI/gptneox ์•„ํ‚คํ…์ณ๋กœ ๊ตฌ์„ฑ๋˜์–ด ์žˆ์œผ๋ฉฐ, A100 80GB PCIE * 8์žฅ์—์„œ ์•ฝ 12์ฃผ๊ฐ„ ํ•™์Šต(4์ฃผ์”ฉ v1, v2, v3 ์ง€์† ์‚ฌ์ „ ํ•™์Šต; ์•ฝ 500B tokens ํ•™์Šต)ํ•˜์—ฌ ํš๋“๋œ ์‚ฌ์ „ํ•™์Šต ๋ชจ๋ธ์ž…๋‹ˆ๋‹ค.
## ํ†ต์ง€์‚ฌํ•ญ/Acknowledgement
* ์ด ๋ชจ๋ธ์€ 2023๋…„๋„ ์ •๋ถ€(๊ณผํ•™๊ธฐ์ˆ ์ •๋ณดํ†ต์‹ ๋ถ€)์˜ ์žฌ์›์œผ๋กœ ์ •๋ณดํ†ต์‹ ๊ธฐํšํ‰๊ฐ€์›์˜ ์ง€์›์„ ๋ฐ›์•„ ์ˆ˜ํ–‰๋œ ์—ฐ๊ตฌ์ž„ (RS-2023-00216011, ์‚ฌ๋žŒ์ฒ˜๋Ÿผ ๊ฐœ๋…์ ์œผ๋กœ ์ดํ•ด/์ถ”๋ก ์ด ๊ฐ€๋Šฅํ•œ ๋ณตํ•ฉ์ธ๊ณต์ง€๋Šฅ ์›์ฒœ๊ธฐ์ˆ  ์—ฐ๊ตฌ)
* This work was supported by Institute of Information & Communications Technology Planning & Evaluation(IITP) grant funded by the Korea government(MSIT) (RS-2023-00216011, Development of artificial complex intelligence for conceptually understanding and inferring like human)
## ์ œํ•œ์  ๋ชจ๋ธ ์ ‘๊ทผ ๋ฐ, ๋ชจ๋ธ ์ ‘๊ทผ ํ—ˆ๊ฐ€์™€ ๊ด€๋ จํ•œ ๊ฐœ์ธ์ •๋ณด ์ˆ˜์ง‘ ๋ฐ ์‚ฌ์šฉ ์•ˆ๋‚ด/Information on Collection and Use of Personal Information for Gated Model Access
__๋ณธ ๋ชจ๋ธ์€ ์—ฐ๊ตฌ์™€ ๊ต์œก ๋ชฉ์ ์œผ๋กœ๋งŒ ์‚ฌ์šฉ__ ๋  ์ˆ˜ ์žˆ์œผ๋ฉฐ, ํ˜„์žฌ ์ œํ•œ์  ๊ณต๊ฐœ ์ƒํƒœ๋กœ, ๋ณธ ์–ธ์–ด ๋ชจ๋ธ์˜ ๋‹ค์šด๋กœ๋“œ์—๋Š” ๋‹ด๋‹น์ž ์‚ฌ์ „ ์Šน์ธ์ด ํ•„์š”ํ•ฉ๋‹ˆ๋‹ค.
์‚ฌ์ „ ์Šน์ธ๊ณผ ๊ด€๋ จ๋œ ๋ฌธ์˜์‚ฌํ•ญ์€ ๋ณ„๋„์˜ ๋ฉ”์ผ(jhshin82 __at__ etri.re.kr)๋กœ ์š”์ฒญ ๋ถ€ํƒ๋“œ๋ฆฝ๋‹ˆ๋‹ค.
๋ณธ ๋ชจ๋ธ๊ณผ ๊ด€๋ จํ•ด ์‚ฌํšŒ์ , ๋ฒ•์  ๋ฌธ์ œ๊ฐ€ ๋ฐœ์ƒํ•  ๊ฒฝ์šฐ ๋ชจ๋ธ์˜ ์‚ฌ์šฉ์„ ์ œํ•œํ•˜๊ณ , ๋ฐฐํฌ๋ฅผ ์ฒ ํšŒํ•  ์ˆ˜ ์žˆ์Šต๋‹ˆ๋‹ค. ์ด๋ฅผ ์œ„ํ•ด ๋ชจ๋ธ ์ ‘๊ทผ ํ—ˆ๊ฐ€์— ์‚ฌ์šฉ๋œ ์ด๋ฉ”์ผ ์ฃผ์†Œ๋ฅผ ๋‹ค์Œ๊ณผ ๊ฐ™์ด ์ˆ˜์ง‘, ๋ณด์œ , ์ด์šฉํ•  ์ˆ˜ ์žˆ์Šต๋‹ˆ๋‹ค.
### ๊ฐœ์ธ์ •๋ณด ์ˆ˜์ง‘๋™์˜/Concent to collection of Personal Information
๋ณธ ๋ชจ๋ธ์˜ ์‚ฌ์šฉ๊ณผ ๊ด€๋ จ, ๋ฐฐํฌ/์‚ฌ์šฉ ์ œํ•œ/์ฒ ํšŒ, ๊ทธ ์™ธ ์‚ฌ์šฉ์ž์˜ ์ด์ต์— ๊ด€๊ณ„๋œ ๋ผ์ด์„ ์Šค ๋ณ€๊ฒฝ ์‹œ ์ด๋ฅผ ํ†ต์ง€ํ•˜๊ธฐ ์œ„ํ•ด, ์•„๋ž˜์™€ ๊ฐ™์ด ๊ฐœ์ธ์ •๋ณด๋ฅผ ์ˆ˜์ง‘, ์ด์šฉํ•ฉ๋‹ˆ๋‹ค.
| ์ˆ˜์ง‘ ๋ชฉ์  | ์ˆ˜์ง‘ ํ•ญ๋ชฉ | ๋ณด์œ , ์ด์šฉ๊ธฐ๊ฐ„ |
|----------------- | ------------------------------ | ---------------- |
| ๋ชจ๋ธ์˜ ์‚ฌ์šฉ์ œํ•œ/์ฒ ํšŒ ์š”์ฒญ ๋ชฉ์ | ์ด๋ฉ”์ผ ์ฃผ์†Œ, huggingface hub ID | ๋ณธ ๋ชจ๋ธ์˜ ๊ณต๊ฐœ ๊ธฐ๊ฐ„ ๋ฐ ์ด์šฉ ๋ชฉ์  ๋‹ฌ์„ฑ ์‹œ |
| ๋ชจ๋ธ์˜ ์‚ฌ์šฉ ๋ผ์ด์„ ์Šค ๋“ฑ ๋ณ€๊ฒฝ ์•ˆ๋‚ด| ์ด๋ฉ”์ผ ์ฃผ์†Œ, huggingface hub ID | ๋ณธ ๋ชจ๋ธ์˜ ๊ณต๊ฐœ ๊ธฐ๊ฐ„ ๋ฐ ์ด์šฉ ๋ชฉ์  ๋‹ฌ์„ฑ ์‹œ|
๋ณธ ๋ชจ๋ธ์— ๋Œ€ํ•œ ์ ‘๊ทผ ์š”์ฒญ์„ ์ˆ˜ํ–‰ํ•˜๊ณ , ๋ชจ๋ธ์— ์ ‘๊ทผํ•˜์‹œ๋Š” ํ–‰์œ„๋Š” ์•„๋ž˜์— ์•ˆ๋‚ด๋œ ์•ˆ๋‚ด์‚ฌํ•ญ, ๋ณธ ๋ชจ๋ธ์˜ ํ•œ๊ณ„, ์ฑ…์ž„์žˆ๋Š” AI ์—ฐ๊ตฌ์— ๋Œ€ํ•œ ์ •๋ณด, ๊ฐœ์ธ์ •๋ณด ์ˆ˜์ง‘/์ด์šฉ์— ๋™์˜ํ•˜์‹  ๊ฒƒ์œผ๋กœ ๊ฐ„์ฃผํ•ฉ๋‹ˆ๋‹ค. ์‚ฌ์šฉ์ž๋Š” ๋™์˜๋ฅผ ๊ฑฐ๋ถ€ํ•˜์‹ค ๊ถŒ๋ฆฌ๊ฐ€ ์žˆ์œผ๋ฉฐ, ๋™์˜๋ฅผ ๊ฑฐ๋ถ€ํ•˜์‹ค ๊ฒฝ์šฐ ๋ชจ๋ธ ์‚ฌ์šฉ์ด ์ œํ•œ๋˜๋ฉฐ, ์ด์— ๊ด€๋ จํ•œ ์‚ฌ์šฉ, ๊ฒฐ๊ณผ์— ๋Œ€ํ•œ ์ฑ…์ž„์€ ์‚ฌ์šฉ์ž์—๊ฒŒ ์žˆ์Œ์„ ์•Œ๋ ค๋“œ๋ฆฝ๋‹ˆ๋‹ค. ์‚ฌ์šฉ ํ›„ ๋™์˜ ์ฒ ํšŒ, ๊ฐœ์ธ์ •๋ณด ํ๊ธฐ์— ๋Œ€ํ•œ ์‚ฌํ•ญ์€ ์ƒ๊ธฐ ์•ˆ๋‚ด๋œ ๋ฉ”์ผ ์ฃผ์†Œ ๋˜๋Š” Community tab์„ ํ†ตํ•ด์„œ ์š”์ฒญํ•˜์‹ค ์ˆ˜ ์žˆ์Šต๋‹ˆ๋‹ค.
## ๋ชจ๋ธ์˜ ํ•œ๊ณ„, ์ฑ…์ž„์žˆ๋Š” AI ์—ฐ๊ตฌ๋ฅผ ์œ„ํ•œ ๊ด€๋ จ ์ •๋ณด ์•ˆ๋‚ด
๋ณธ ๋ชจ๋ธ์˜ ๊ฐœ๋ฐœ๊ณผ ๊ด€๋ จํ•œ ๊ฐœ๋ฐœ์ž ๋ฐ ์กฐ์ง์€ ์ฑ…์ž„์žˆ๋Š” AI ์—ฐ๊ตฌ๋ฅผ ์ค€์ˆ˜ํ•˜๊ณ ์ž ๋…ธ๋ ฅํ•˜๊ณ  ์žˆ์œผ๋ฉฐ, ์ด์™€ ๊ด€๋ จํ•ด AI ์—ฐ๊ตฌ์— ์‚ฌ์šฉ๋˜๋Š” ์ž…์ถœ๋ ฅ ๋ฐ์ดํ„ฐ ๋‚ด ํฌํ•จ๋œ ์š•์„ค, ์Œ๋ž€, ์ •์น˜์  ๋‚ด์šฉ ๋ฐ ๊ธฐํƒ€ ๊ฑฐ์นœ ์–ธ์–ด์— ๋Œ€ํ•œ ์ฒ˜๋ฆฌ๋ฅผ ์ˆ˜ํ–‰ํ•˜๊ณ ์ž ๋…ธ๋ ฅํ•˜๊ณ  ์žˆ์Šต๋‹ˆ๋‹ค.
๊ทธ๋Ÿผ์—๋„ ๋ถˆ๊ตฌํ•˜๊ณ , ์›์‹œ ์›น ํ…์ŠคํŠธ ๋ฐ์ดํ„ฐ์˜ ํŠน์„ฑ ์ƒ ์ด๋ฅผ ๊ธฐ๋ฐ˜์œผ๋กœ ํ•ด ํ•™์Šต๋œ ๋ณธ ์ƒ์„ฑ ์–ธ์–ด ๋ชจ๋ธ์€ ๊ฒฝ๋„๋œ ์‚ฌ์ƒ์„ ํฌํ•จํ•˜๊ฑฐ๋‚˜, ์‚ฌํšŒ์ ์œผ๋กœ ์šฉ์ธ๋  ์ˆ˜ ์—†๋Š” ํ…์ŠคํŠธ๋ฅผ ์ƒ์„ฑํ•  ์ˆ˜ ์žˆ์œผ๋ฉฐ, ๋‹ค๋ฅธ ์–ธ์–ด ๋ชจ๋ธ๊ณผ ๋งˆ์ฐฌ๊ฐ€์ง€๋กœ ํŠน์ • ํ”„๋กฌํ”„ํŠธ์™€ ๊ณต๊ฒฉ์ ์ธ ์ฝ˜ํ…์ธ ๊ฐ€ ๋ฐ˜ํ™˜๋  ์ˆ˜ ์žˆ์Šต๋‹ˆ๋‹ค.
์ด๋ฅผ ํฌํ•จ, ๋ณธ ๋ชจ๋ธ์˜ ์ถœ๋ ฅ/์ƒ์„ฑ ๊ฒฐ๊ณผ์™€ ๊ด€๋ จํ•œ ๋‚ด์šฉ์€ ๊ฐœ๋ฐœ์ž ๋ฐ ๊ฐœ๋ฐœ์ž๊ฐ€ ์†ํ•œ ์กฐ์ง์˜ ์‚ฌ์ƒ, ์˜๋„์™€ ์ „ํ˜€ ๊ด€๋ จ์ด ์—†์Œ์„ ์•Œ๋ ค๋“œ๋ฆฝ๋‹ˆ๋‹ค.
ํ…Œ์ŠคํŠธ์ค‘์— ๋ฐœ์ƒํ•œ ๋น„์ •์ƒ์ ์ธ ํ˜น์€ ์‚ฌํšŒ์ ์œผ๋กœ ์šฉ์ธ๋˜์ง€ ์•Š๋Š” ํ…์ŠคํŠธ๊ฐ€ ์ƒ์„ฑ๋œ ๊ฒฝ์šฐ jhshin82 __at__ etri.re.kr๋กœ (__at__์„ @๋กœ ์น˜ํ™˜) ์ถœ๋ ฅ ์œ ๋„์— ์‚ฌ์šฉ๋œ ์ž…๋ ฅ๋ฌธ(ํ”„๋กฌํ”„ํŠธ), ์‚ฌ์šฉ๋œ ์ƒ˜ํ”Œ๋ง ๊ธฐ๋ฒ• ๋ฐ ํ•˜์ดํผํŒŒ๋ผ๋ฏธํ„ฐ(์˜ˆ: top-p=0.8, temperature, repetition-penalty ๋“ฑ), ์ด๋ฅผ ํ†ตํ•ด ์ƒ์„ฑ๋œ ์ถœ๋ ฅ ๊ฒฐ๊ณผ๋ฅผ ํ•จ๊ป˜ ๋ณด๋‚ด์ฃผ์‹œ๋ฉด, ์ด๋ฅผ ์–ต์ œํ•˜๊ธฐ ์œ„ํ•œ ๋…ธ๋ ฅ์„ ๊ธฐ์šธ์ด๋„๋ก ํ•˜๊ฒ ์Šต๋‹ˆ๋‹ค.
## ํ‰๊ฐ€/Evaluations
### ์‚ฌ์ „ํ•™์Šต ๋ชจ๋ธ์˜ KOBEST ํ‰๊ฐ€
ํ‰๊ฐ€๋Š” EleutherAI/lm-evaluation-harness, __polyglot branch__ ๋ฅผ ์‚ฌ์šฉํ•˜์—ฌ, KoBEST(Kim et al., 2022) ํ‰๊ฐ€์…‹์œผ๋กœ fine-tuning ์—†์ด zero-shot, 5-shot ํ…Œ์ŠคํŠธ๋ฅผ ์ˆ˜ํ–‰ํ–ˆ์Šต๋‹ˆ๋‹ค.
(lm-evaluation-harness์˜ KOBEST ํ‰๊ฐ€๋Š” ๋ฒ„์ „์— ๋”ฐ๋ผ ๋‹ค๋ฅด๊ฒŒ ๋‚˜ํƒ€๋Š” ๋ฌธ์ œ๊ฐ€ ์žˆ์–ด, ์ตœ์‹  lm-evaluation-harness(๋ฒ„์ „ 0.4.2 ์ดํ›„)๋ฅผ ํ†ตํ•œ ํ‰๊ฐ€๋ฅผ ์•„๋ž˜ ๋ณ„๋„๋กœ ์ œ์‹œํ•˜์˜€์Šต๋‹ˆ๋‹ค.)
|Zero-shot ์„ฑ๋Šฅ | KB-BOOLQ (F1) | KB-COPA (F1) | KB-HELLASWAG (F1) | KB-SENTINEG (F1) | KB-WIC (F1) |
|--------------|---------------|--------------|-------------------|------------------|-------------|
|Polyglot-ko-1.3b | 0.3552ยฑ0.0087 | **0.7196ยฑ0.0142** | 0.4013ยฑ0.0217 | **0.6790ยฑ0.0239** | 0.3276ยฑ0.0064 |
|egpt-1.3b (23/07) | **0.4903ยฑ0.0134** | 0.6612ยฑ0.0149 | 0.3925ยฑ0.0217 | 0.3383ยฑ0.0112 | 0.3280ยฑ0.0063 |
|egpt-1.3b (23/11) | 0.3969ยฑ0.0112 | 0.6470ยฑ0.0151 | 0.3746ยฑ0.0214 | 0.3350ยฑ0.0111 | **0.3297ยฑ0.0066** |
|egpt-1.3b (24/03) | 0.4034ยฑ0.0118 | 0.6438ยฑ0.0152 | **0.4150ยฑ0.0218** | 0.5272ยฑ0.0255 | 0.3294ยฑ0.0066 |
| 5-shot ์„ฑ๋Šฅ | KB-BOOLQ (F1) | KB-COPA (F1) | KB-HELLASWAG (F1) | KB-SENTINEG (F1) | KB-WIC (F1) |
|------------|---------------|--------------|-------------------|------------------|-------------|
|Polyglot-ko-1.3b | 0.4751ยฑ0.0133 | **0.7193ยฑ0.0142** | **0.3984ยฑ0.0218** | **0.6257ยฑ0.0244** | 0.4559ยฑ0.0138 |
|egpt-1.3b (23/07) | 0.4829ยฑ0.0133 | 0.6558ยฑ0.0150 | 0.3846ยฑ0.0216 | 0.5715ยฑ0.0249 | **0.5108ยฑ0.0141** |
|egpt-1.3b (23/11) | 0.4762ยฑ0.0133 | 0.6499ยฑ0.0151 | 0.3689ยฑ0.0214 | 0.5607ยฑ0.0249 | 0.4776ยฑ0.0141 |
|egpt-1.3b (24/03) | **0.4944ยฑ0.0134** | 0.6643ยฑ0.0149 | 0.3862ยฑ0.0216 | 0.5232ยฑ0.0251 | 0.4947ยฑ0.0141 |
LM-Evaluation-Harness 0.4.2 ๋ฒ„์ „ ์ด์ƒ(์ดํ•˜ LEH 0.4.2dev, commit id b1777c82) ์œผ๋กœ ํ‰๊ฐ€ ์‹œ, KB-SENTINEG๋Š” ๋” ๋‚ฎ์€ ์ ์ˆ˜๋ฅผ, ๋‚˜๋จธ์ง€ 4๊ฐœ ํ‰๊ฐ€ ํ•ญ๋ชฉ์€ ๋” ๋†’์€ ์ ์ˆ˜๋กœ ๋‚˜ํƒ€๋‚ฉ๋‹ˆ๋‹ค.
polyglot branch์˜ ํ‰๊ฐ€ ์˜ค๋ฅ˜๊ฐ€ ์ˆ˜์ •๋œ ๊ฒƒ์œผ๋กœ ๋ณด์—ฌ ์ตœ์‹  ๋ฒ„์ „์„ ํ†ตํ•ด ํ‰๊ฐ€ํ•˜๋Š” ๊ฒƒ์ด ์ข‹์„ ๊ฒƒ์œผ๋กœ ํŒ๋‹จ๋˜๋‚˜, ํ‰๊ฐ€ ์ผ๊ด€์„ฑ์„ ์œ„ํ•ด polyglot branch์˜ ํ‰๊ฐ€ ์ ์ˆ˜๋ฅผ ๋ณ„๋„๋กœ ์œ ์ง€ํ•˜์˜€์Šต๋‹ˆ๋‹ค.
|Zero-shot ์„ฑ๋Šฅ | KB-BOOLQ (F1) | KB-COPA (F1) | KB-HELLASWAG (F1) | KB-SENTINEG (F1) | KB-WIC (F1) |
|--------------|---------------|--------------|-------------------|------------------|-------------|
|egpt-1.3b (23/11) - LEH v0.4.2dev ํ‰๊ฐ€ | **0.4926** | **0.6530** | 0.3933 | 0.3350 | 0.3280 |
|egpt-1.3b (24/03) - LEH v0.4.2dev ํ‰๊ฐ€ | 0.4391 | 0.6497 | **0.4222** | **0.3733** | **0.3412** |
|egpt-1.3b (24/03) - LEH polyglot branch ํ‰๊ฐ€(์ฐธ๊ณ ) | 0.4034 | 0.6438 | 0.4150 | 0.5272 | 0.3294 |
### ์ „์ดํ•™์Šต ๋Šฅ๋ ฅ ํ‰๊ฐ€
MetaMathQA๋ฅผ ํ†ตํ•œ ์˜์–ด GSM8k ํ‰๊ฐ€ ์ ์ˆ˜๋Š” ๋‹ค์Œ๊ณผ ๊ฐ™์Šต๋‹ˆ๋‹ค.
ํ•™์Šต ํ™˜๊ฒฝ: LR 8e-5, FP16, TF32 ์‚ฌ์šฉ, ์œ ํšจ ๋ฐฐ์น˜ ํฌ๊ธฐ๋ฅผ ๋™์ผํ•˜๊ฒŒ 128๋กœ ์„ค์ • (์˜ˆ: GPU=4 * batch size=GPU ๋‹น 8 * Gradient Accumulation=4). LR Scheduler๋Š” Cosine Decaying, Warmup ratio 0.03. no weight decay.
| ๋ชจ๋ธ | GSM8k test | ๋น„๊ณ  |
| ---- | ---------- | ---- |
| polyglot-ko-1.3b | 0.2160 | |
| polyglot-ko-12.8b | 0.3646 | LR์„ 5e-5๋กœ ์„ธํŒ…, Beomi/polyglot-ko-alpaca-12.8b์˜ hparam์„ ๋ณด๊ณ  LR์„ ๊ฒฐ์ •ํ•จ |
| egpt-1.3b (23/11) | 0.4443 | |
| egpt-1.3b (24/03) | 0.4147 | |
## ์—…๋ฐ์ดํŠธ ๊ธฐ๋ก/Update log
* (23/7/27 ๋ชจ๋ธ) ์ดˆ๊ธฐ ๋ชจ๋ธ. Polyglot 1.3b ๋Œ€๋น„ BOOLQ/WIC์—์„œ ๋” ๋‚˜์€ ์„ฑ๋Šฅ, ๊ทธ๋ฆฌ๊ณ  COPA/HELLASWAG/SENTINEG์—์„œ ์—ด์„ธ.
* (23/11/22 ๋ชจ๋ธ) ์œ ์‚ฌํ•œ ๋ฐ์ดํ„ฐ ๊ตฌ์„ฑ, ์ผ๋ถ€ ๋ฐ์ดํ„ฐ ์ถ”๊ฐ€ํ•˜์—ฌ 23/7/27 ๋ชจ๋ธ๋กœ ๋ถ€ํ„ฐ ์ถ”๊ฐ€ ์‚ฌ์ „ํ•™์Šตํ•œ ๊ฒƒ. ์ง€์ • ๋ชฉํ‘œ๋ฅผ ์œ„ํ•œ ๋‹ค๋ฅธ ํ‰๊ฐ€ ์ฒด๊ณ„์—์„œ ๋” ๋‚˜์€ ์„ฑ๋Šฅ(39 vs 44)์„ ๋ณด์—ฌ ์—…๋ฐ์ดํŠธ ํ•จ
* (24/03/21 ๋ชจ๋ธ) AIHUB ๋ฐ์ดํ„ฐ์…‹, ํ•œ๊ตญ์–ด ์œ„ํ‚ค ๋ฐ์ดํ„ฐ์…‹์„ ์‚ฌ์šฉํ•ด ์ถ”๊ฐ€ ์‚ฌ์ „ํ•™์Šต
## ์‚ฌ์ „ํ•™์Šต์— ์ฐธ์—ฌํ•œ ๋ฐ์ดํ„ฐ์…‹ ์ •๋ณด/Datasets
์•„๋ž˜์˜ ํ•™์Šต ๋ฐ์ดํ„ฐ๋ฅผ ์‚ฌ์šฉํ•˜์—ฌ ํ•™์Šตํ•˜์˜€์Šต๋‹ˆ๋‹ค:
* [AIHub ๋ฐ์ดํ„ฐ์…‹, MRC, RAW, ๋Œ€ํ™”, ๋ฒˆ์—ญ, ์š”์•ฝ](https://aihub.or.kr)
* [KISTI ๊ตญ๋‚ด๋…ผ๋ฌธ EN, KR ๋ฐ์ดํ„ฐ์…‹](https://aida.kisti.re.kr/)
* [KcBERT v2022.3q ๋„ค์ด๋ฒ„ ๋‰ด์Šค ๋Œ“๊ธ€ ๋ฐ์ดํ„ฐ์…‹](https://huggingface.co/beomi/kcbert-base)
* [๊ตญ๋ฆฝ๊ตญ์–ด์› ๋ชจ๋‘์˜ ๋ง๋ญ‰์น˜(๋ฌธ์–ด, ๊ตฌ์–ด, ์‹ ๋ฌธ, ๋น„์ถœํŒ๋ฌผ, ๊ตญํšŒํšŒ์˜๋ก, ์ผ์ƒ๋Œ€ํ™”, ์˜จ๋ผ์ธ๋Œ€ํ™”, ๋ฉ”์‹ ์ € ๋ง๋ญ‰์น˜)](https://kli.korean.go.kr/)
* [ํ•œ๊ตญ์–ด ์œ„ํ‚คํ”ผ๋””์–ด ๋คํ”„, lovit/ko-wikitext ๋ฐ์ดํ„ฐ์…‹. 20200920.v3 ๋“ฑ korpora ๋ฐ์ดํ„ฐ์…‹์˜ ์‚ฌ์ „ํ•™์Šต์šฉ ๋ง๋ญ‰์น˜ ์ผ๋ถ€](https://ko-nlp.github.io/Korpora/)
* (์˜) stack exchange ๋ฐ์ดํ„ฐ์…‹
* (์˜) OpenWebText2
* ~~(์˜) books3 corpus~~ (๋ผ์ด์„ ์Šค ๋ฌธ์ œ๋กœ 2024/03์—์„œ ์ œ๊ฑฐ๋จ. removed on v3(2024/03) due to licensing issues)
* (์˜) 2020-09-08-arXiv-extracts
* (์˜) PUBMED title abstracts 2019
* THUDM/MathGLM Arithmetic Text Corpus (applied from 23/11/22, https://github.com/THUDM/MathGLM)
## ์‚ฌ์šฉ ์š”๋ น/How to use
์•„๋ž˜ ์ฝ”๋“œ๋ฅผ ํ†ตํ•ด, transformers>=4.28 ๋ฒ„์ „์—์„œ ์ถ”๋ก  ๊ฐ€๋Šฅํ•ฉ๋‹ˆ๋‹ค.
```
import sys
from transformers import (
AutoTokenizer, AutoModelForCausalLM, GenerationConfig
)
def load_model(mdl_path):
tokenizer = AutoTokenizer.from_pretrained(mdl_path, use_fast=True, legacy=False,)
# device_map ์ธ์ž๋ฅผ ์‚ฌ์šฉํ•˜๊ธฐ ์œ„ํ•ด์„œ๋Š” accelerator ๋ชจ๋“ˆ ์„ค์น˜ ํ•„์š”.
model = AutoModelForCausalLM.from_pretrained(mdl_path, device_map="auto",
torch_dtype="auto")
return tokenizer, model
if __name__ == '__main__':
# FIXME: ๋ชจ๋ธ ๊ฒฝ๋กœ ์ˆ˜์ •!
tokenizer, model = load_model("../egpt-1.3b-test-230720/")
# print(model.hf_device_map)
# ํ•„์š”์— ๋”ฐ๋ผ ์•„๋ž˜ ์ƒ์„ฑ ์˜ต์…˜์„ ์ œ์–ด
gen_cfg = GenerationConfig(max_new_tokens=256, min_length=0,
max_time=10.0, do_sample=True,
top_p=0.9, epsilon_cutoff=3e-4,)
print("** Now Ready to input from stdin.")
for aline in sys.stdin:
aline = aline.rstrip("\n\r\t")
input_cond = tokenizer(aline, add_special_tokens=False, return_tensors="pt").to("cuda")
outs = model.generate(**input_cond, generation_config=gen_cfg)
out_str = tokenizer.batch_decode(outs, skip_special_tokens=True,
clean_up_tokenization_spaces=True)
print(">> " + ' '.join(out_str))
```