Instructions to use mkd-hossain/keural-dpo-final with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- Transformers
How to use mkd-hossain/keural-dpo-final with Transformers:
# Use a pipeline as a high-level helper from transformers import pipeline pipe = pipeline("text-generation", model="mkd-hossain/keural-dpo-final", trust_remote_code=True) messages = [ {"role": "user", "content": "Who are you?"}, ] pipe(messages)# Load model directly from transformers import AutoModelForCausalLM model = AutoModelForCausalLM.from_pretrained("mkd-hossain/keural-dpo-final", trust_remote_code=True, dtype="auto") - Notebooks
- Google Colab
- Kaggle
- Local Apps Settings
- vLLM
How to use mkd-hossain/keural-dpo-final with vLLM:
Install from pip and serve model
# Install vLLM from pip: pip install vllm # Start the vLLM server: vllm serve "mkd-hossain/keural-dpo-final" # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:8000/v1/chat/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "mkd-hossain/keural-dpo-final", "messages": [ { "role": "user", "content": "What is the capital of France?" } ] }'Use Docker
docker model run hf.co/mkd-hossain/keural-dpo-final
- SGLang
How to use mkd-hossain/keural-dpo-final with SGLang:
Install from pip and serve model
# Install SGLang from pip: pip install sglang # Start the SGLang server: python3 -m sglang.launch_server \ --model-path "mkd-hossain/keural-dpo-final" \ --host 0.0.0.0 \ --port 30000 # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:30000/v1/chat/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "mkd-hossain/keural-dpo-final", "messages": [ { "role": "user", "content": "What is the capital of France?" } ] }'Use Docker images
docker run --gpus all \ --shm-size 32g \ -p 30000:30000 \ -v ~/.cache/huggingface:/root/.cache/huggingface \ --env "HF_TOKEN=<secret>" \ --ipc=host \ lmsysorg/sglang:latest \ python3 -m sglang.launch_server \ --model-path "mkd-hossain/keural-dpo-final" \ --host 0.0.0.0 \ --port 30000 # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:30000/v1/chat/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "mkd-hossain/keural-dpo-final", "messages": [ { "role": "user", "content": "What is the capital of France?" } ] }' - Docker Model Runner
How to use mkd-hossain/keural-dpo-final with Docker Model Runner:
docker model run hf.co/mkd-hossain/keural-dpo-final
Keural-DPO-14.83B (Final โ 6927 steps, 1 full epoch)
Keural is a bilingual KoreanโEnglish Mixture-of-Experts language model trained entirely from scratch โ no base model was used. This is the final DPO (Direct Preference Optimization) checkpoint at step 6,927, completing 1 full epoch of preference alignment from the Keural SFT-18k base.
This is the most capable Keural checkpoint released to date. One full epoch of DPO alignment on 440K Korean+English preference pairs, producing consistently positive reward margins throughout training.
Model Details
| Property | Value |
|---|---|
| Architecture | Mixtral-style MoE (8 experts, top-2 routing) |
| Parameters | 14.83B total / ~7.42B active per token |
| Layers | 24 |
| Hidden size | 4096 |
| Attention heads | 32 (GQA โ 8 KV heads) |
| Head dim | 128 |
| Expert intermediate size | 5,632 |
| Experts | 8 total, top-2 per token |
| Context length | 4,096 tokens |
| Vocabulary | 131,074 (131,072 SPM + `< |
| RoPE theta | 500,000 |
| Sliding window | 512 (alternating every other layer) |
| Norm | RMSNorm (eps=1e-5) |
| Activation | SiLU |
| Dtype | bfloat16 |
| Languages | Korean (primary), English |
Full Training Pipeline
| Stage | Steps | Tokens | Data | Hardware |
|---|---|---|---|---|
| Pretraining Stage 1 | 100,000 | ~50B | Korean + English web corpus | 2ร H200 SXM |
| Pretraining Stage 2 | 120,000 | ~13B | Korean + English web corpus (continued) | 2ร H200 SXM |
| SFT | 18,000 | 710M | mkd-chanwoo/keural-SFT (1.14M ChatML samples) | 2ร H200 SXM |
| DPO (this checkpoint) | 6,927 (1 full epoch) | โ | keural-dpo-raw (440K preference pairs) | 2ร H200 SXM |
DPO Training Details
| Hyperparameter | Value |
|---|---|
| Algorithm | Direct Preference Optimization (DPO) |
| Learning rate | 2e-6 โ 2e-7 cosine decay |
| Min learning rate | 2e-7 |
| Warmup steps | 100 |
| Beta (KL penalty) | 0.1 |
| Batch size per GPU | 2 |
| Gradient accumulation | 16 steps |
| Effective batch size | 64 (2 ร 16 ร 2 GPUs) |
| Max sequence length | 1,024 tokens |
| Optimizer | AdamW (ฮฒ1=0.9, ฮฒ2=0.95, ฮต=1e-8) |
| Weight decay | 0.1 |
| Gradient clipping | 1.0 |
| Total steps | 6,927 (1 full epoch) |
| Dataset size | 440,627 preference pairs |
| Parallelism | FSDP FULL_SHARD (ZeRO-3 equivalent) |
| Precision | bfloat16 + gradient checkpointing |
| Hardware | 2ร NVIDIA H200 SXM (139 GiB each) |
| Speed | ~40 seconds/step |
| Final loss | ~0.6924 (stable) |
| Final margin | +0.0009โ0.0018 (consistently positive) |
| Final GradNorm | 0.20โ0.31 (clean) |
DPO Dataset Sources
| Source | Samples | Language |
|---|---|---|
| hh_rlhf | 159,777 | English |
| aihub_71760 | 116,320 | Korean |
| multifaceted_collection_dpo | 63,399 | English |
| ultrafeedback_binarized | 59,122 | English |
| aihub_71748 | 29,676 | Korean |
| orca_dpo_paris_ko | 12,714 | Korean |
| Total | 440,627 |
SFT Hyperparameters (base checkpoint)
| Hyperparameter | Value |
|---|---|
| Learning rate | 1e-5 โ 1e-6 cosine decay |
| Effective batch size | 64 (4 per GPU ร 8 grad accum ร 2 GPUs) |
| Max sequence length | 4,096 tokens |
| Weight decay | 0.05 |
| Steps | 18,000 |
| Dataset | mkd-chanwoo/keural-SFT (1.14M samples) |
Chat Format (ChatML)
This model uses ChatML format. Always include a system prompt for best results.
<|im_start|>system
You are a helpful bilingual Korean-English assistant. Always respond in the same language as the user.<|im_end|>
<|im_start|>user
์๋
ํ์ธ์! ์ค๋ ๋ ์จ๊ฐ ์ด๋์?<|im_end|>
<|im_start|>assistant
The model generates until it produces <|im_end|> (token ID 131073).
The chat template in
tokenizer_config.jsonautomatically injects a default system prompt if you don't provide one, so bilingual behavior works out of the box withapply_chat_template.
How to Use
With transformers
from transformers import AutoTokenizer, AutoModelForCausalLM
import torch
model_id = "mkd-hossain/keural-dpo-final"
tokenizer = AutoTokenizer.from_pretrained(model_id)
model = AutoModelForCausalLM.from_pretrained(
model_id,
torch_dtype=torch.bfloat16,
device_map="auto",
)
messages = [
{
"role": "system",
"content": (
"You are a helpful bilingual Korean-English assistant. "
"Always respond in the same language as the user's message."
)
},
{"role": "user", "content": "ํ์ด์ฌ์์ ๋ฆฌ์คํธ๋ฅผ ์ ๋ ฌํ๋ ๋ฐฉ๋ฒ์ ์๋ ค์ฃผ์ธ์."},
]
text = tokenizer.apply_chat_template(messages, tokenize=False, add_generation_prompt=True)
inputs = tokenizer(text, return_tensors="pt").to(model.device)
with torch.no_grad():
output = model.generate(
**inputs,
max_new_tokens=512,
temperature=0.7,
top_p=0.9,
top_k=50,
repetition_penalty=1.1,
no_repeat_ngram_size=8,
do_sample=True,
eos_token_id=131073,
)
response = tokenizer.decode(output[0][inputs.input_ids.shape[1]:], skip_special_tokens=False)
response = response.split("<|im_end|>")[0].strip()
print(response)
With vLLM (recommended for serving)
pip install vllm
python -m vllm.entrypoints.openai.api_server \
--model mkd-hossain/keural-dpo-final \
--tokenizer mkd-hossain/keural-dpo-final \
--dtype bfloat16 \
--max-model-len 4096 \
--tensor-parallel-size 1
Call the OpenAI-compatible endpoint:
from openai import OpenAI
client = OpenAI(base_url="http://localhost:8000/v1", api_key="none")
response = client.chat.completions.create(
model="mkd-hossain/keural-dpo-final",
messages=[
{"role": "system", "content": "You are a helpful bilingual assistant. Respond in the same language as the user."},
{"role": "user", "content": "What is the capital of South Korea?"},
],
max_tokens=512,
temperature=0.7,
)
print(response.choices[0].message.content)
Multi-GPU serving
python -m vllm.entrypoints.openai.api_server \
--model mkd-hossain/keural-dpo-final \
--dtype bfloat16 \
--max-model-len 4096 \
--tensor-parallel-size 2
Manual ChatML prompt
prompt = (
"<|im_start|>system\n"
"You are a helpful bilingual Korean-English assistant. "
"Always respond in the same language as the user.\n"
"<|im_end|>\n"
"<|im_start|>user\n"
"Tell me about Seoul.<|im_end|>\n"
"<|im_start|>assistant\n"
)
Special Tokens
| Token | ID | Purpose |
|---|---|---|
| `< | im_start | >` |
| `< | im_end | >` |
<bos> |
1 | Beginning of sequence |
<eos> |
2 | End of sequence (not used for chat) |
<pad> |
0 | Padding token |
Critical: Always set
eos_token_id=131073when generating. Do not useeos_token_id=2.
Recommended Generation Settings
# Conversational / creative
{
"max_new_tokens": 512,
"temperature": 0.7,
"top_p": 0.9,
"top_k": 50,
"repetition_penalty": 1.1,
"no_repeat_ngram_size": 8,
"do_sample": True,
"eos_token_id": 131073,
}
# Factual / deterministic
{
"max_new_tokens": 512,
"temperature": 0.1,
"repetition_penalty": 1.1,
"do_sample": False,
"eos_token_id": 131073,
}
Checkpoint Comparison
| Checkpoint | Stage | Steps | Notes |
|---|---|---|---|
| mkd-hossain/keural-pretrained | Pretraining | 120,000 | Raw base, no instruction tuning |
| mkd-hossain/keural-sft-18k | SFT | 18,000 | Instruction following, ChatML format |
| mkd-hossain/keural-dpo-3500 | DPO 50% | 3,500 | Early alignment |
| mkd-hossain/keural-dpo-5500 | DPO 79% | 5,500 | Late alignment |
| mkd-hossain/keural-dpo-final | DPO 100% | 6,927 | Full epoch โ best checkpoint |
Limitations
- Maximum context is 4,096 tokens.
- The pretraining corpus is Korean-dominant โ always include a system prompt for correct bilingual behavior.
- Not safety-aligned โ do not deploy in production without additional safety fine-tuning.
- This is an intermediate model in an ongoing training pipeline. Future releases will include SFT epoch 2 on filtered data and DPO round 2.
License
Apache 2.0
- Downloads last month
- 93