image/png

Overview

HyperCLOVAX-SEED-Text-Instruct-0.5B is a Text-to-Text model with instruction-following capabilities that excels in understanding Korean language and culture. Compared to external competitors of similar scale, it demonstrates improved mathematical performance and a substantial enhancement in Korean language capability. The HyperCLOVAX-SEED-Text-Instruct-0.5B is currently the smallest model released by the HyperCLOVAX, representing a lightweight solution suitable for deployment in resourceโ€‘constrained environments such as edge devices. It supports a maximum context length of 4K and functions as a versatile small model applicable to a wide range of tasks. The total cost of a single training run for HyperCLOVAX-SEED-Text-Instruct-0.5B was 4.358K A100 GPU hours (approximately USD 6.537K), which is 39 times lower than the cost of training the QWEN2.5โ€‘0.5Bโ€‘instruct model.

Basic Information

  • Architecture: Transformerโ€‘based (Dense Model)
  • Parameters: 0.57 B (total); 0.45 B (excluding token embeddings, tied embeddings)
  • Input/Output Format: Text / Text
  • Maximum Context Length: 4 K tokens
  • Knowledge Cutoff Date: Trained on data up to January 2025

Training and Data

The training dataset for HyperCLOVAX-SEED-Text-Instruct-0.5B consists of diverse sources, including the highโ€‘quality data accumulated during the development of HyperCLOVAX-SEED-Text-Instruct-0.5B. Training was conducted in three main stages:

  1. Pretraining: Knowledge acquisition using highโ€‘quality data and a highโ€‘performance pretrained model.
  2. Rejection Sampling Fineโ€‘Tuning (RFT): Enhancement of multiโ€‘domain knowledge and complex reasoning capabilities.
  3. Supervised Fineโ€‘Tuning (SFT): Improvement of instructionโ€‘following proficiency.

Training Cost

HyperCLOVAX-SEED-Text-Instruct-0.5B leveraged HyperCLOVA Xโ€™s lightweight training process and highโ€‘quality data to achieve significantly lower training costs compared to industryโ€‘leading competitors of similar scale. Excluding the SFT stage, a single pretraining run incurred:

Pretraining Cost Category HyperCLOVAX-SEED-Text-Instruct-0.5B QWEN2.5โ€‘0.5Bโ€‘instruct
A100 GPU Hours 4.358 K 169.257 K
Cost (USD) 6.537 K 253.886 K

This represents approximately a 39ร— reduction in pretraining cost relative to QWEN2.5โ€‘0.5B-instruct.

Benchmarks

Model KMMLU (5-shot, acc) HAE-RAE (5-shot, acc) CLiCK (5-shot, acc) KoBEST (5-shot, acc)
HyperCLOVAX-SEED-Text-Base-0.5B 0.4181 0.6370 0.5373 0.6963
HyperCLOVAX-SEED-Text-Instruct-0.5B 0.3815 0.5619 0.4446 0.6299
QWEN2.5-0.5B-instruct 0.2968 0.3428 0.3805 0.5025

HuggingFace Usage Example

Python Code

For better inference results with HyperCLOVAX-SEED-Text-Instruct-0.5B, we recommend setting repetition_penalty to 1.2.

from transformers import AutoModelForCausalLM, AutoTokenizer
model = AutoModelForCausalLM.from_pretrained("naver-hyperclovax/HyperCLOVAX-SEED-Text-Instruct-0.5B").to(device="cuda")
tokenizer = AutoTokenizer.from_pretrained("naver-hyperclovax/HyperCLOVAX-SEED-Text-Instruct-0.5B")

chat = [
  {"role": "tool_list", "content": ""},
  {"role": "system", "content": "- AI ์–ธ์–ด๋ชจ๋ธ์˜ ์ด๋ฆ„์€ \"CLOVA X\" ์ด๋ฉฐ ๋„ค์ด๋ฒ„์—์„œ ๋งŒ๋“ค์—ˆ๋‹ค.\n- ์˜ค๋Š˜์€ 2025๋…„ 04์›” 24์ผ(๋ชฉ)์ด๋‹ค."},
  {"role": "user", "content": "์Šˆ๋ขฐ๋”ฉ๊ฑฐ ๋ฐฉ์ •์‹๊ณผ ์–‘์ž์—ญํ•™์˜ ๊ด€๊ณ„๋ฅผ ์ตœ๋Œ€ํ•œ ์ž์„ธํžˆ ์•Œ๋ ค์ค˜."},
]

inputs = tokenizer.apply_chat_template(chat, add_generation_prompt=True, return_dict=True, return_tensors="pt")
inputs = inputs.to(device="cuda")
output_ids = model.generate(**inputs,
                            max_length=1024,
                            stop_strings=["<|endofturn|>", "<|stop|>"],
                            repetition_penalty=1.2,
                            tokenizer=tokenizer)
print(tokenizer.batch_decode(output_ids))

Result

[
  '<|im_start|>tool_list\n<|im_end|>\n' \
  '<|im_start|>system\n- AI ์–ธ์–ด๋ชจ๋ธ์˜ ์ด๋ฆ„์€ "CLOVA X" ์ด๋ฉฐ ๋„ค์ด๋ฒ„์—์„œ ๋งŒ๋“ค์—ˆ๋‹ค.\n- ์˜ค๋Š˜์€ 2025๋…„ 04์›” 24์ผ(๋ชฉ)์ด๋‹ค.<|im_end|>\n' \
  '<|im_start|>user\n์Šˆ๋ขฐ๋”ฉ๊ฑฐ ๋ฐฉ์ •์‹๊ณผ ์–‘์ž์—ญํ•™์˜ ๊ด€๊ณ„๋ฅผ ์ตœ๋Œ€ํ•œ ์ž์„ธํžˆ ์•Œ๋ ค์ค˜.<|im_end|>\n' \
  '<|im_start|>assistant\n์–‘์ž์—ญํ•™์€ ์Šˆ๋ขฐ๋”ฉ๊ฑฐ ๋ฐฉ์ •์‹์„ ํ†ตํ•ด ๋ฌผ์งˆ๊ณผ ์—๋„ˆ์ง€, ๊ณต๊ฐ„ ๋“ฑ์˜ ํ˜„์ƒ์„ ์„ค๋ช…ํ•ฉ๋‹ˆ๋‹ค.\n\n**1. ์Šˆ๋ขฐ๋”ฉ๊ฑฐ ๋ฐฉ์ •์‹**\n\n์Šˆ๋ขฐ๋”ฉ๊ฑฐ๋Š” ํŒŒ๋™ํ•จ์ˆ˜๋ฅผ ์ด์šฉํ•˜์—ฌ ์ž…์ž์˜ ์œ„์น˜์™€ ์šด๋™๋Ÿ‰์„ ๊ณ„์‚ฐํ•  ์ˆ˜ ์žˆ๋‹ค๊ณ  ์ฃผ์žฅํ–ˆ์Šต๋‹ˆ๋‹ค. ์ด๋ฅผ ์œ„ํ•ด ๋‹ค์Œ๊ณผ ๊ฐ™์€ ์‹์œผ๋กœ ํ‘œํ˜„๋ฉ๋‹ˆ๋‹ค:\n\n$$\\frac{\\partial \\psi}{\\partial t} = iH \\nabla^2 \\psi + V(x)\\psi $$\n\n์—ฌ๊ธฐ์„œ $\\psi$๋Š” ํŒŒ๋™ํ•จ์ˆ˜์ด๊ณ  $i$๋Š” ํ—ˆ์ˆ˜ ๋‹จ์œ„์ž…๋‹ˆ๋‹ค. ์—ฌ๊ธฐ์„œ $t$๋Š” ์‹œ๊ฐ„, $x$๋Š” ๊ณต๊ฐ„ ์ขŒํ‘œ์ด๋ฉฐ, $H$๋Š” ํ•ด๋ฐ€ํ„ด ์ƒ์ˆ˜๋กœ ์‹œ์Šคํ…œ์˜ ์—๋„ˆ์ง€๋ฅผ ๋‚˜ํƒ€๋ƒ…๋‹ˆ๋‹ค. ๋˜ํ•œ $V(x)$๋Š” ์™ธ๋ถ€ ํž˜์ด๋‚˜ ์žฅ๋ฒฝ์— ์˜ํ•ด ์˜ํ–ฅ์„ ๋ฐ›๋Š” ๋ถ€๋ถ„์„ ๋‚˜ํƒ€๋‚ด๋Š” ํ•จ์ˆ˜๋กœ, ์ผ๋ฐ˜์ ์œผ๋กœ ์ „์œ„์žฅ์„ ์‚ฌ์šฉํ•ฉ๋‹ˆ๋‹ค.\n\n**2. ์–‘์ž์—ญํ•™๊ณผ ์Šˆ๋ขฐ๋”ฉ๊ฑฐ ๋ฐฉ์ •์‹์˜ ๊ด€๊ณ„**\n\n์–‘์ž์—ญํ•™์—์„œ๋Š” ์Šˆ๋ขฐ๋”ฉ๊ฑฐ ๋ฐฉ์ •์‹์ด ๋งค์šฐ ์ค‘์š”ํ•œ ์—ญํ• ์„ ํ•ฉ๋‹ˆ๋‹ค. ์ด๋Š” ๋ชจ๋“  ๋ฌผ๋ฆฌ์  ์‹œ์Šคํ…œ์ด ๋ถˆํ™•์ •์„ฑ ์›๋ฆฌ์— ๋”ฐ๋ผ ํ–‰๋™์„ ํ•˜๋ฉฐ, ์ด๋Ÿฌํ•œ ์‹œ์Šคํ…œ๋“ค์€ ํ™•๋ฅ ์ ์œผ๋กœ ์ƒํƒœ๋ฅผ ๊ฐ€์งˆ ์ˆ˜๋ฐ–์— ์—†๊ธฐ ๋•Œ๋ฌธ์ž…๋‹ˆ๋‹ค. ๋”ฐ๋ผ์„œ ์Šˆ๋ขฐ๋”ฉ๊ฑฐ ๋ฐฉ์ •์‹์€ ์–‘์ž์—ญํ•™์„ ์ˆ˜ํ•™์ ์œผ๋กœ ๋ชจ๋ธ๋งํ•˜๋Š” ํ•ต์‹ฌ์ ์ธ ๋„๊ตฌ ์ค‘ ํ•˜๋‚˜์ž…๋‹ˆ๋‹ค.\n\n์˜ˆ๋ฅผ ๋“ค์–ด, ์›์žํ•ต ๋‚ด์˜ ์ „์ž๋“ค์˜ ์ƒํƒœ๋Š” ์Šˆ๋ขฐ๋”ฉ๊ฑฐ ๋ฐฉ์ •์‹์— ์˜ํ•ด ๊ฒฐ์ •๋˜๋ฉฐ, ์ด๋Š” ๋ฌผ๋ฆฌํ•™์  ๋ฒ•์น™์„ ๋”ฐ๋ฅด๋Š” ๊ฒƒ์œผ๋กœ ๋ณด์ž…๋‹ˆ๋‹ค. ๋˜ํ•œ, ๊ด‘์ „ ํšจ๊ณผ์—์„œ๋„ ์Šˆ๋ขฐ๋”ฉ๊ฑฐ ๋ฐฉ์ •์‹์€ ๋น›์ด ๋ฌผ์งˆ ๋‚ด์—์„œ ์–ด๋–ป๊ฒŒ ํก์ˆ˜๋˜๊ณ  ๋ฐ˜์‚ฌ๋˜๋Š”์ง€๋ฅผ ์˜ˆ์ธกํ•˜๋Š”๋ฐ ์‚ฌ์šฉ๋ฉ๋‹ˆ๋‹ค.\n\n**3. ์‘์šฉ ๋ถ„์•ผ**\n\n์Šˆ๋ขฐ๋”ฉ๊ฑฐ ๋ฐฉ์ •์‹์€ ๋‹ค์–‘ํ•œ ๋ถ„์•ผ์—์„œ ํ™œ์šฉ๋˜๊ณ  ์žˆ์Šต๋‹ˆ๋‹ค. ์˜ˆ๋ฅผ ๋“ค๋ฉด, ๋ฐ˜๋„์ฒด ๊ธฐ์ˆ ์—์„œ์˜ ํŠธ๋žœ์ง€์Šคํ„ฐ ์„ค๊ณ„, ํ•ต๋ฌผ๋ฆฌํ•™์—์„œ์˜ ๋ฐฉ์‚ฌ์„ฑ ๋ถ•๊ดด ์—ฐ๊ตฌ ๋“ฑ์ด ์žˆ์œผ๋ฉฐ, ์ด๋Š” ๋ชจ๋‘ ์Šˆ๋ขฐ๋”ฉ๊ฑฐ ๋ฐฉ์ •์‹์„ ๊ธฐ๋ฐ˜์œผ๋กœ ํ•œ ์ด๋ก ์  ๊ธฐ๋ฐ˜ ์œ„์—์„œ ์ด๋ฃจ์–ด์ง‘๋‹ˆ๋‹ค.\n\n๋˜ํ•œ, ํ˜„๋Œ€ ๊ณผํ•™ ๊ธฐ์ˆ ์˜ ๋ฐœ์ „์—๋„ ํฐ ๊ธฐ์—ฌ๋ฅผ ํ•˜๊ณ  ์žˆ๋Š”๋ฐ, ํŠนํžˆ ์ธ๊ณต์ง€๋Šฅ(AI), ์ปดํ“จํ„ฐ ์‹œ๋ฎฌ๋ ˆ์ด์…˜ ๋“ฑ์—์„œ ๋ณต์žกํ•œ ๋ฌธ์ œ๋ฅผ ํ•ด๊ฒฐํ•˜๊ณ  ์ƒˆ๋กœ์šด ์ง€์‹์„ ์ฐฝ์ถœํ•˜๊ธฐ ์œ„ํ•œ ๊ธฐ์ดˆ๊ฐ€ ๋˜๊ณ  ์žˆ์Šต๋‹ˆ๋‹ค.\n\n๊ฒฐ๋ก ์ ์œผ๋กœ, ์Šˆ๋ขฐ๋”ฉ๊ฑฐ ๋ฐฉ์ •์‹์€ ์–‘์ž์—ญํ•™์˜ ๊ธฐ๋ณธ ๊ฐœ๋…๋“ค์„ ์ดํ•ดํ•˜๊ณ  ํ•ด์„ํ•˜๋ฉฐ, ๊ทธ ๊ฒฐ๊ณผ๋กœ์„œ ๋งŽ์€ ํ˜์‹ ์ ์ด๊ณ  ์‹ค์šฉ์ ์ธ ๊ธฐ์ˆ ์„ ๊ฐ€๋Šฅํ•˜๊ฒŒ ํ–ˆ์Šต๋‹ˆ๋‹ค. ์ด๋Š” ์–‘์ž์—ญํ•™์˜ ์ค‘์š”์„ฑ์„ ๋ณด์—ฌ์ฃผ๋Š” ๋Œ€ํ‘œ์ ์ธ ์˜ˆ์‹œ๋ผ๊ณ  ํ•  ์ˆ˜ ์žˆ์Šต๋‹ˆ๋‹ค.<|im_end|>' \
  '<|endofturn|>'
]
Downloads last month
9,355
Safetensors
Model size
566M params
Tensor type
BF16
ยท
Inference Providers NEW
This model isn't deployed by any Inference Provider. ๐Ÿ™‹ 10 Ask for provider support

Model tree for naver-hyperclovax/HyperCLOVAX-SEED-Text-Instruct-0.5B

Adapters
1 model
Finetunes
1 model
Quantizations
8 models

Spaces using naver-hyperclovax/HyperCLOVAX-SEED-Text-Instruct-0.5B 4

Collection including naver-hyperclovax/HyperCLOVAX-SEED-Text-Instruct-0.5B