Instructions to use mkd-hossain/keural-sft3-final with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- Transformers
How to use mkd-hossain/keural-sft3-final with Transformers:
# Use a pipeline as a high-level helper from transformers import pipeline pipe = pipeline("text-generation", model="mkd-hossain/keural-sft3-final")# Load model directly from transformers import AutoModel model = AutoModel.from_pretrained("mkd-hossain/keural-sft3-final", dtype="auto") - Notebooks
- Google Colab
- Kaggle
- Local Apps Settings
- vLLM
How to use mkd-hossain/keural-sft3-final with vLLM:
Install from pip and serve model
# Install vLLM from pip: pip install vllm # Start the vLLM server: vllm serve "mkd-hossain/keural-sft3-final" # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:8000/v1/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "mkd-hossain/keural-sft3-final", "prompt": "Once upon a time,", "max_tokens": 512, "temperature": 0.5 }'Use Docker
docker model run hf.co/mkd-hossain/keural-sft3-final
- SGLang
How to use mkd-hossain/keural-sft3-final with SGLang:
Install from pip and serve model
# Install SGLang from pip: pip install sglang # Start the SGLang server: python3 -m sglang.launch_server \ --model-path "mkd-hossain/keural-sft3-final" \ --host 0.0.0.0 \ --port 30000 # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:30000/v1/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "mkd-hossain/keural-sft3-final", "prompt": "Once upon a time,", "max_tokens": 512, "temperature": 0.5 }'Use Docker images
docker run --gpus all \ --shm-size 32g \ -p 30000:30000 \ -v ~/.cache/huggingface:/root/.cache/huggingface \ --env "HF_TOKEN=<secret>" \ --ipc=host \ lmsysorg/sglang:latest \ python3 -m sglang.launch_server \ --model-path "mkd-hossain/keural-sft3-final" \ --host 0.0.0.0 \ --port 30000 # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:30000/v1/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "mkd-hossain/keural-sft3-final", "prompt": "Once upon a time,", "max_tokens": 512, "temperature": 0.5 }' - Docker Model Runner
How to use mkd-hossain/keural-sft3-final with Docker Model Runner:
docker model run hf.co/mkd-hossain/keural-sft3-final
Keural-SFT3-Final โ 14.83B Bilingual MoE (SFT Epoch 3)
Keural is a 14.83B parameter Mixture-of-Experts language model trained from scratch for bilingual KoreanโEnglish instruction following.
This checkpoint is the result of SFT Epoch 3 (65,849 steps, 2.35M samples) and serves as the base model for DPO Round 2. The final preference-optimised model is mkd-hossain/keural-dpo2-final.
Architecture
| Property | Value |
|---|---|
| Architecture | KeuralMoECausalLM |
| Parameters | 14.83B total / ~7.42B active per token |
| Layers | 24 |
| Hidden size | 4,096 |
| Attention heads | 32 Q / 8 KV (GQA) |
| Head dimension | 128 |
| Experts | 8 total, top-2 per token |
| Expert intermediate size | 5,632 (SwiGLU) |
| Context length | 4,096 tokens |
| Vocabulary | 131,074 (131,072 SPM + `< |
| RoPE theta | 500,000 |
| Sliding window | 512 tokens (even layers only) |
| Normalization | RMSNorm (eps=1e-5) |
| Dtype | bfloat16 |
KeuralMoECausalLM is a custom architecture registered via trust_remote_code=True.
Special Tokens
| Token | ID | Purpose |
|---|---|---|
<|im_start|> |
131072 | Start of each conversation turn |
<|im_end|> |
131073 | End of turn โ use as eos_token_id |
<bos> |
1 | Beginning of sequence |
<eos> |
2 | Not used for chat |
<pad> |
0 | Padding |
Critical: Always set
eos_token_id=131073. Do not use ID 2 for chat generation.
Full Training Pipeline
| Stage | Steps | Tokens | Data | Hardware |
|---|---|---|---|---|
| Pretraining Stage 1 | 100,000 | ~50B | Korean + English web corpus | 2ร H200 SXM |
| Pretraining Stage 2 | 120,000 | ~19B | Korean + English web corpus | 2ร H200 SXM |
| SFT Epoch 1 | 18,000 | ~710M | 710K instruction samples (9 sources) | 2ร H200 SXM |
| DPO Round 1 | 6,927 | โ | 440K preference pairs (6 sources) | 2ร H200 SXM |
| SFT Epoch 2 | 29,112 | ~7.6B | 710K filtered samples | 2ร H200 SXM |
| SFT Epoch 3 | 65,849 | ~17.3B | 2.35M samples (12 sources) | 2ร H200 SXM |
SFT Epoch 3 Dataset (2,351,212 samples)
| Source | Samples | Language |
|---|---|---|
| OpenHermes-2.5 | 1,001,551 | English |
| SlimOrca | 517,982 | English |
| UltraChat | 193,212 | English |
| OpenOrca | 138,639 | English |
| AIHub multisession sci | 127,868 | Korean |
| AIHub daily conversation | 120,867 | Korean |
| AIHub multisession social | 85,346 | Korean |
| Alpaca | 46,303 | English |
| KoInstruct QA | 45,299 | Korean |
| KoInstruct base | 42,276 | Korean |
| KoAlpaca | 21,091 | Korean |
| AIHub expert QA | 10,778 | Korean |
| Total | 2,351,212 |
SFT Epoch 3 Hyperparameters
| Hyperparameter | Value |
|---|---|
| Learning rate | 2e-5 โ 2e-6 cosine decay |
| Warmup steps | 100 |
| Effective batch size | 64 (2 ร 16 accum ร 2 GPUs) |
| Max sequence length | 2,048 tokens |
| Total steps | 65,849 |
| Optimizer | AdamW (ฮฒ1=0.9, ฮฒ2=0.95, ฮต=1e-8) |
| Gradient clipping | 1.0 |
| Hardware | 2ร NVIDIA H200 SXM (143 GiB each) |
Chat Format (ChatML)
<|im_start|>system
You are a helpful, accurate, and safe bilingual Korean-English AI assistant.<|im_end|>
<|im_start|>user
Your question here<|im_end|>
<|im_start|>assistant
Usage
from transformers import AutoTokenizer, AutoModelForCausalLM
import torch
tokenizer = AutoTokenizer.from_pretrained(
"mkd-hossain/keural-sft3-final",
trust_remote_code=True,
)
model = AutoModelForCausalLM.from_pretrained(
"mkd-hossain/keural-sft3-final",
torch_dtype=torch.bfloat16,
device_map="auto",
trust_remote_code=True,
)
messages = [
{"role": "system", "content": "You are a helpful bilingual Korean-English AI assistant."},
{"role": "user", "content": "์๋
ํ์ธ์! ์ค๋ ๋ ์จ๊ฐ ์ด๋์?"},
]
text = tokenizer.apply_chat_template(messages, tokenize=False, add_generation_prompt=True)
inputs = tokenizer(text, return_tensors="pt").to(model.device)
outputs = model.generate(
**inputs,
max_new_tokens=256,
temperature=0.7,
top_p=0.9,
repetition_penalty=1.1,
eos_token_id=131073,
do_sample=True,
)
print(tokenizer.decode(outputs[0][inputs["input_ids"].shape[1]:], skip_special_tokens=True))
Next Model
DPO Round 2 has been completed on top of this checkpoint using 485,793 preference pairs (8 sources). The final preference-optimised model is available at: mkd-hossain/keural-dpo2-final
License
Training data includes datasets from AI Hub (Korean government open data platform) and publicly available English instruction datasets. All sources are Apache 2.0 or CC-BY compatible.