Overview
HyperCLOVAX-SEED-Text-Instruct-0.5B is a Text-to-Text model with instruction-following capabilities that excels in understanding Korean language and culture. Compared to external competitors of similar scale, it demonstrates improved mathematical performance and a substantial enhancement in Korean language capability. The HyperCLOVAX-SEED-Text-Instruct-0.5B is currently the smallest model released by the HyperCLOVAX, representing a lightweight solution suitable for deployment in resourceโconstrained environments such as edge devices. It supports a maximum context length of 4K and functions as a versatile small model applicable to a wide range of tasks. The total cost of a single training run for HyperCLOVAX-SEED-Text-Instruct-0.5B was 4.358K A100 GPU hours (approximately USD 6.537K), which is 39 times lower than the cost of training the QWEN2.5โ0.5Bโinstruct
model.
Basic Information
- Architecture: Transformerโbased (Dense Model)
- Parameters: 0.57 B (total); 0.45 B (excluding token embeddings, tied embeddings)
- Input/Output Format: Text / Text
- Maximum Context Length: 4 K tokens
- Knowledge Cutoff Date: Trained on data up to January 2025
Training and Data
The training dataset for HyperCLOVAX-SEED-Text-Instruct-0.5B consists of diverse sources, including the highโquality data accumulated during the development of HyperCLOVAX-SEED-Text-Instruct-0.5B. Training was conducted in three main stages:
- Pretraining: Knowledge acquisition using highโquality data and a highโperformance pretrained model.
- Rejection Sampling FineโTuning (RFT): Enhancement of multiโdomain knowledge and complex reasoning capabilities.
- Supervised FineโTuning (SFT): Improvement of instructionโfollowing proficiency.
Training Cost
HyperCLOVAX-SEED-Text-Instruct-0.5B leveraged HyperCLOVA Xโs lightweight training process and highโquality data to achieve significantly lower training costs compared to industryโleading competitors of similar scale. Excluding the SFT stage, a single pretraining run incurred:
Pretraining Cost Category | HyperCLOVAX-SEED-Text-Instruct-0.5B | QWEN2.5โ0.5Bโinstruct |
---|---|---|
A100 GPU Hours | 4.358 K | 169.257 K |
Cost (USD) | 6.537 K | 253.886 K |
This represents approximately a 39ร reduction in pretraining cost relative to QWEN2.5โ0.5B-instruct
.
Benchmarks
Model | KMMLU (5-shot, acc) | HAE-RAE (5-shot, acc) | CLiCK (5-shot, acc) | KoBEST (5-shot, acc) |
---|---|---|---|---|
HyperCLOVAX-SEED-Text-Base-0.5B | 0.4181 | 0.6370 | 0.5373 | 0.6963 |
HyperCLOVAX-SEED-Text-Instruct-0.5B | 0.3815 | 0.5619 | 0.4446 | 0.6299 |
QWEN2.5-0.5B-instruct | 0.2968 | 0.3428 | 0.3805 | 0.5025 |
HuggingFace Usage Example
Python Code
For better inference results with HyperCLOVAX-SEED-Text-Instruct-0.5B
, we recommend setting repetition_penalty
to 1.2
.
from transformers import AutoModelForCausalLM, AutoTokenizer
model = AutoModelForCausalLM.from_pretrained("naver-hyperclovax/HyperCLOVAX-SEED-Text-Instruct-0.5B").to(device="cuda")
tokenizer = AutoTokenizer.from_pretrained("naver-hyperclovax/HyperCLOVAX-SEED-Text-Instruct-0.5B")
chat = [
{"role": "tool_list", "content": ""},
{"role": "system", "content": "- AI ์ธ์ด๋ชจ๋ธ์ ์ด๋ฆ์ \"CLOVA X\" ์ด๋ฉฐ ๋ค์ด๋ฒ์์ ๋ง๋ค์๋ค.\n- ์ค๋์ 2025๋
04์ 24์ผ(๋ชฉ)์ด๋ค."},
{"role": "user", "content": "์๋ขฐ๋ฉ๊ฑฐ ๋ฐฉ์ ์๊ณผ ์์์ญํ์ ๊ด๊ณ๋ฅผ ์ต๋ํ ์์ธํ ์๋ ค์ค."},
]
inputs = tokenizer.apply_chat_template(chat, add_generation_prompt=True, return_dict=True, return_tensors="pt")
inputs = inputs.to(device="cuda")
output_ids = model.generate(**inputs,
max_length=1024,
stop_strings=["<|endofturn|>", "<|stop|>"],
repetition_penalty=1.2,
tokenizer=tokenizer)
print(tokenizer.batch_decode(output_ids))
Result
[
'<|im_start|>tool_list\n<|im_end|>\n' \
'<|im_start|>system\n- AI ์ธ์ด๋ชจ๋ธ์ ์ด๋ฆ์ "CLOVA X" ์ด๋ฉฐ ๋ค์ด๋ฒ์์ ๋ง๋ค์๋ค.\n- ์ค๋์ 2025๋
04์ 24์ผ(๋ชฉ)์ด๋ค.<|im_end|>\n' \
'<|im_start|>user\n์๋ขฐ๋ฉ๊ฑฐ ๋ฐฉ์ ์๊ณผ ์์์ญํ์ ๊ด๊ณ๋ฅผ ์ต๋ํ ์์ธํ ์๋ ค์ค.<|im_end|>\n' \
'<|im_start|>assistant\n์์์ญํ์ ์๋ขฐ๋ฉ๊ฑฐ ๋ฐฉ์ ์์ ํตํด ๋ฌผ์ง๊ณผ ์๋์ง, ๊ณต๊ฐ ๋ฑ์ ํ์์ ์ค๋ช
ํฉ๋๋ค.\n\n**1. ์๋ขฐ๋ฉ๊ฑฐ ๋ฐฉ์ ์**\n\n์๋ขฐ๋ฉ๊ฑฐ๋ ํ๋ํจ์๋ฅผ ์ด์ฉํ์ฌ ์
์์ ์์น์ ์ด๋๋์ ๊ณ์ฐํ ์ ์๋ค๊ณ ์ฃผ์ฅํ์ต๋๋ค. ์ด๋ฅผ ์ํด ๋ค์๊ณผ ๊ฐ์ ์์ผ๋ก ํํ๋ฉ๋๋ค:\n\n$$\\frac{\\partial \\psi}{\\partial t} = iH \\nabla^2 \\psi + V(x)\\psi $$\n\n์ฌ๊ธฐ์ $\\psi$๋ ํ๋ํจ์์ด๊ณ $i$๋ ํ์ ๋จ์์
๋๋ค. ์ฌ๊ธฐ์ $t$๋ ์๊ฐ, $x$๋ ๊ณต๊ฐ ์ขํ์ด๋ฉฐ, $H$๋ ํด๋ฐํด ์์๋ก ์์คํ
์ ์๋์ง๋ฅผ ๋ํ๋
๋๋ค. ๋ํ $V(x)$๋ ์ธ๋ถ ํ์ด๋ ์ฅ๋ฒฝ์ ์ํด ์ํฅ์ ๋ฐ๋ ๋ถ๋ถ์ ๋ํ๋ด๋ ํจ์๋ก, ์ผ๋ฐ์ ์ผ๋ก ์ ์์ฅ์ ์ฌ์ฉํฉ๋๋ค.\n\n**2. ์์์ญํ๊ณผ ์๋ขฐ๋ฉ๊ฑฐ ๋ฐฉ์ ์์ ๊ด๊ณ**\n\n์์์ญํ์์๋ ์๋ขฐ๋ฉ๊ฑฐ ๋ฐฉ์ ์์ด ๋งค์ฐ ์ค์ํ ์ญํ ์ ํฉ๋๋ค. ์ด๋ ๋ชจ๋ ๋ฌผ๋ฆฌ์ ์์คํ
์ด ๋ถํ์ ์ฑ ์๋ฆฌ์ ๋ฐ๋ผ ํ๋์ ํ๋ฉฐ, ์ด๋ฌํ ์์คํ
๋ค์ ํ๋ฅ ์ ์ผ๋ก ์ํ๋ฅผ ๊ฐ์ง ์๋ฐ์ ์๊ธฐ ๋๋ฌธ์
๋๋ค. ๋ฐ๋ผ์ ์๋ขฐ๋ฉ๊ฑฐ ๋ฐฉ์ ์์ ์์์ญํ์ ์ํ์ ์ผ๋ก ๋ชจ๋ธ๋งํ๋ ํต์ฌ์ ์ธ ๋๊ตฌ ์ค ํ๋์
๋๋ค.\n\n์๋ฅผ ๋ค์ด, ์์ํต ๋ด์ ์ ์๋ค์ ์ํ๋ ์๋ขฐ๋ฉ๊ฑฐ ๋ฐฉ์ ์์ ์ํด ๊ฒฐ์ ๋๋ฉฐ, ์ด๋ ๋ฌผ๋ฆฌํ์ ๋ฒ์น์ ๋ฐ๋ฅด๋ ๊ฒ์ผ๋ก ๋ณด์
๋๋ค. ๋ํ, ๊ด์ ํจ๊ณผ์์๋ ์๋ขฐ๋ฉ๊ฑฐ ๋ฐฉ์ ์์ ๋น์ด ๋ฌผ์ง ๋ด์์ ์ด๋ป๊ฒ ํก์๋๊ณ ๋ฐ์ฌ๋๋์ง๋ฅผ ์์ธกํ๋๋ฐ ์ฌ์ฉ๋ฉ๋๋ค.\n\n**3. ์์ฉ ๋ถ์ผ**\n\n์๋ขฐ๋ฉ๊ฑฐ ๋ฐฉ์ ์์ ๋ค์ํ ๋ถ์ผ์์ ํ์ฉ๋๊ณ ์์ต๋๋ค. ์๋ฅผ ๋ค๋ฉด, ๋ฐ๋์ฒด ๊ธฐ์ ์์์ ํธ๋์ง์คํฐ ์ค๊ณ, ํต๋ฌผ๋ฆฌํ์์์ ๋ฐฉ์ฌ์ฑ ๋ถ๊ดด ์ฐ๊ตฌ ๋ฑ์ด ์์ผ๋ฉฐ, ์ด๋ ๋ชจ๋ ์๋ขฐ๋ฉ๊ฑฐ ๋ฐฉ์ ์์ ๊ธฐ๋ฐ์ผ๋ก ํ ์ด๋ก ์ ๊ธฐ๋ฐ ์์์ ์ด๋ฃจ์ด์ง๋๋ค.\n\n๋ํ, ํ๋ ๊ณผํ ๊ธฐ์ ์ ๋ฐ์ ์๋ ํฐ ๊ธฐ์ฌ๋ฅผ ํ๊ณ ์๋๋ฐ, ํนํ ์ธ๊ณต์ง๋ฅ(AI), ์ปดํจํฐ ์๋ฎฌ๋ ์ด์
๋ฑ์์ ๋ณต์กํ ๋ฌธ์ ๋ฅผ ํด๊ฒฐํ๊ณ ์๋ก์ด ์ง์์ ์ฐฝ์ถํ๊ธฐ ์ํ ๊ธฐ์ด๊ฐ ๋๊ณ ์์ต๋๋ค.\n\n๊ฒฐ๋ก ์ ์ผ๋ก, ์๋ขฐ๋ฉ๊ฑฐ ๋ฐฉ์ ์์ ์์์ญํ์ ๊ธฐ๋ณธ ๊ฐ๋
๋ค์ ์ดํดํ๊ณ ํด์ํ๋ฉฐ, ๊ทธ ๊ฒฐ๊ณผ๋ก์ ๋ง์ ํ์ ์ ์ด๊ณ ์ค์ฉ์ ์ธ ๊ธฐ์ ์ ๊ฐ๋ฅํ๊ฒ ํ์ต๋๋ค. ์ด๋ ์์์ญํ์ ์ค์์ฑ์ ๋ณด์ฌ์ฃผ๋ ๋ํ์ ์ธ ์์๋ผ๊ณ ํ ์ ์์ต๋๋ค.<|im_end|>' \
'<|endofturn|>'
]
- Downloads last month
- 9,355