Text Generation
Transformers
Safetensors
Korean
English
mixtral
Mixture of Experts
korean
bilingual
causal-lm
sft
instruction-tuned
chat
conversational
text-generation-inference
Instructions to use mkd-hossain/keural-sft3-50k with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- Transformers
How to use mkd-hossain/keural-sft3-50k with Transformers:
# Use a pipeline as a high-level helper from transformers import pipeline pipe = pipeline("text-generation", model="mkd-hossain/keural-sft3-50k") messages = [ {"role": "user", "content": "Who are you?"}, ] pipe(messages)# Load model directly from transformers import AutoTokenizer, AutoModelForCausalLM tokenizer = AutoTokenizer.from_pretrained("mkd-hossain/keural-sft3-50k") model = AutoModelForCausalLM.from_pretrained("mkd-hossain/keural-sft3-50k") messages = [ {"role": "user", "content": "Who are you?"}, ] inputs = tokenizer.apply_chat_template( messages, add_generation_prompt=True, tokenize=True, return_dict=True, return_tensors="pt", ).to(model.device) outputs = model.generate(**inputs, max_new_tokens=40) print(tokenizer.decode(outputs[0][inputs["input_ids"].shape[-1]:])) - Notebooks
- Google Colab
- Kaggle
- Local Apps
- vLLM
How to use mkd-hossain/keural-sft3-50k with vLLM:
Install from pip and serve model
# Install vLLM from pip: pip install vllm # Start the vLLM server: vllm serve "mkd-hossain/keural-sft3-50k" # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:8000/v1/chat/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "mkd-hossain/keural-sft3-50k", "messages": [ { "role": "user", "content": "What is the capital of France?" } ] }'Use Docker
docker model run hf.co/mkd-hossain/keural-sft3-50k
- SGLang
How to use mkd-hossain/keural-sft3-50k with SGLang:
Install from pip and serve model
# Install SGLang from pip: pip install sglang # Start the SGLang server: python3 -m sglang.launch_server \ --model-path "mkd-hossain/keural-sft3-50k" \ --host 0.0.0.0 \ --port 30000 # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:30000/v1/chat/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "mkd-hossain/keural-sft3-50k", "messages": [ { "role": "user", "content": "What is the capital of France?" } ] }'Use Docker images
docker run --gpus all \ --shm-size 32g \ -p 30000:30000 \ -v ~/.cache/huggingface:/root/.cache/huggingface \ --env "HF_TOKEN=<secret>" \ --ipc=host \ lmsysorg/sglang:latest \ python3 -m sglang.launch_server \ --model-path "mkd-hossain/keural-sft3-50k" \ --host 0.0.0.0 \ --port 30000 # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:30000/v1/chat/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "mkd-hossain/keural-sft3-50k", "messages": [ { "role": "user", "content": "What is the capital of France?" } ] }' - Docker Model Runner
How to use mkd-hossain/keural-sft3-50k with Docker Model Runner:
docker model run hf.co/mkd-hossain/keural-sft3-50k
Keural-SFT3-14.83B (SFT Epoch 3 โ 50,000 steps)
Keural is a bilingual KoreanโEnglish Mixture-of-Experts language model trained entirely from scratch โ no base model was used. This is an intermediate SFT epoch 3 checkpoint at step 50,000 out of 65,849 total steps (76.4% complete), trained on a 2.35M sample merged bilingual dataset.
Model Details
| Property | Value |
|---|---|
| Architecture | Mixtral-style MoE (8 experts, top-2 routing) |
| Parameters | 14.83B total / ~7.42B active per token |
| Layers | 24 |
| Hidden size | 4096 |
| Attention heads | 32 (GQA โ 8 KV heads) |
| Head dim | 128 |
| Expert intermediate size | 5,632 |
| Experts | 8 total, top-2 per token |
| Context length | 4,096 tokens |
| Vocabulary | 131,074 (131,072 SPM + `< |
| RoPE theta | 500,000 |
| Sliding window | 512 (alternating layers) |
| Norm | RMSNorm (eps=1e-5) |
| Activation | SiLU |
| Dtype | bfloat16 |
| Languages | Korean (primary), English |
Full Training Pipeline
| Stage | Steps | Tokens | Data | Hardware |
|---|---|---|---|---|
| Pretraining Stage 1 | 100,000 | ~50B | Korean + English web corpus | 2ร H200 SXM |
| Pretraining Stage 2 | 120,000 | ~13B | Korean + English web corpus (continued) | 2ร H200 SXM |
| SFT Epoch 1 | 18,000 | 710M | keural-SFT 1.14M ChatML samples | 2ร H200 SXM |
| DPO Round 1 | 6,927 | โ | 440K Korean preference pairs | 2ร H200 SXM |
| SFT Epoch 2 | 29,112 | 7.63B | keural-SFT 710K samples (2nd pass) | 2ร H200 SXM |
| SFT Epoch 3 (this checkpoint) | 50,000 / 65,849 | ~18B | 2.35M merged ChatML dataset | 2ร H200 SXM |
SFT Epoch 3 Training Details
| Hyperparameter | Value |
|---|---|
| Resumed from | checkpoint_29112 (SFT epoch 2 final) |
| Learning rate | 1e-5 โ 1e-6 cosine decay |
| Min learning rate | 1e-6 |
| Current LR at 50K | 2.19e-06 |
| Effective batch size | 64 (4 per GPU ร 8 grad accum ร 2 GPUs) |
| Max sequence length | 4,096 tokens |
| Weight decay | 0.05 |
| Gradient clipping | 1.0 |
| Optimizer | AdamW |
| Checkpoint step | 50,000 (76.4% of epoch) |
| Total epoch steps | 65,849 |
| Training loss at 50K | ~2.01 |
| Parallelism | FSDP FULL_SHARD (ZeRO-3 equivalent) |
| Precision | bfloat16 + gradient checkpointing |
| Hardware | 2ร NVIDIA H200 SXM (139 GiB each) |
SFT Epoch 3 Dataset (2,351,212 samples)
| Source | Samples | Language |
|---|---|---|
| OpenHermes-2.5 | 1,001,551 | English |
| SlimOrca | 517,982 | English |
| UltraChat | 193,212 | English |
| OpenOrca | 138,639 | English |
| AIHub multisession sci | 127,868 | Korean |
| AIHub daily conversation | 120,867 | Korean |
| AIHub multisession social | 85,346 | Korean |
| Alpaca | 46,303 | English |
| KoInstruct QA | 45,299 | Korean |
| KoInstruct base | 42,276 | Korean |
| KoAlpaca | 21,091 | Korean |
| AIHub expert QA | 10,778 | Korean |
| Total | 2,351,212 | Korean ~19% / English ~81% |
Chat Format (ChatML)
<|im_start|>system
You are a helpful bilingual Korean-English assistant. Always respond in the same language as the user.<|im_end|>
<|im_start|>user
์๋
ํ์ธ์! ํ์ด์ฌ ๋ฆฌ์คํธ ์ ๋ ฌ ๋ฐฉ๋ฒ์ ์๋ ค์ฃผ์ธ์.<|im_end|>
<|im_start|>assistant
How to Use
With vLLM (recommended)
python -m vllm.entrypoints.openai.api_server \
--model mkd-hossain/keural-sft3-50k \
--dtype auto \
--max-model-len 4096 \
--gpu-memory-utilization 0.7
from openai import OpenAI
client = OpenAI(base_url="http://localhost:8000/v1", api_key="none")
response = client.chat.completions.create(
model="mkd-hossain/keural-sft3-50k",
messages=[
{"role": "system", "content": "You are a helpful bilingual Korean-English assistant. Always respond in the same language as the user."},
{"role": "user", "content": "์ธ๊ณต์ง๋ฅ์ด๋ ๋ฌด์์ธ๊ฐ์?"},
],
max_tokens=512,
temperature=0.7,
)
print(response.choices[0].message.content)
With transformers
from transformers import AutoTokenizer, AutoModelForCausalLM
import torch
model_id = "mkd-hossain/keural-sft3-50k"
tokenizer = AutoTokenizer.from_pretrained(model_id)
model = AutoModelForCausalLM.from_pretrained(
model_id,
torch_dtype=torch.bfloat16,
device_map="auto",
)
messages = [
{"role": "system", "content": "You are a helpful bilingual Korean-English assistant."},
{"role": "user", "content": "ํ์ด์ฌ ๋ฆฌ์คํธ ์ ๋ ฌ ๋ฐฉ๋ฒ์ ์๋ ค์ฃผ์ธ์."},
]
text = tokenizer.apply_chat_template(messages, tokenize=False, add_generation_prompt=True)
inputs = tokenizer(text, return_tensors="pt").to(model.device)
with torch.no_grad():
output = model.generate(
**inputs,
max_new_tokens=512,
temperature=0.7,
top_p=0.9,
repetition_penalty=1.1,
do_sample=True,
eos_token_id=131073,
)
response = tokenizer.decode(output[0][inputs.input_ids.shape[1]:], skip_special_tokens=False)
response = response.split("<|im_end|>")[0].strip()
print(response)
Special Tokens
| Token | ID | Purpose |
|---|---|---|
| `< | im_start | >` |
| `< | im_end | >` |
<bos> |
1 | Beginning of sequence |
<eos> |
2 | End of sequence (not used for chat) |
<pad> |
0 | Padding |
Always set
eos_token_id=131073โ do not use ID 2.
Checkpoint Comparison
| Checkpoint | Stage | Steps | Progress |
|---|---|---|---|
| mkd-hossain/keural-pretrained | Pretraining | 120,000 | Base model |
| mkd-hossain/keural-sft-18k | SFT Epoch 1 | 18,000 | Initial instruction tuning |
| mkd-hossain/keural-dpo-final | DPO Round 1 | 6,927 | Alignment |
| mkd-hossain/keural-sft2 | SFT Epoch 2 | 29,112 | 2nd SFT pass |
| mkd-hossain/keural-sft3-40k | SFT Epoch 3 | 40,000 | 60.7% of epoch 3 |
| mkd-hossain/keural-sft3-50k | SFT Epoch 3 | 50,000 | 76.4% of epoch 3 |
Limitations
- Maximum context is 4,096 tokens.
- This is an intermediate checkpoint โ epoch 3 completes at step 65,849.
- Not safety-aligned โ do not deploy in production without additional safety fine-tuning.
- DPO round 2 planned (485,793 pairs) after SFT epoch 3 completes.
License
Apache 2.0
- Downloads last month
- -