Text Generation
PEFT
Safetensors
Korean
qwen2.5
qwen
14b
fine-tuning
finetuned
lora
vllm
korean
korean-llm
coding
coding-assistant
developer-assistant
linux
fastapi
docker
systemd
cuda
ollama
openwebui
dgx
guardrails
conversational
Instructions to use koreallmdev/qwen2-5-14b-korean-coding-assistant-lora with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- PEFT
How to use koreallmdev/qwen2-5-14b-korean-coding-assistant-lora with PEFT:
from peft import PeftModel from transformers import AutoModelForCausalLM base_model = AutoModelForCausalLM.from_pretrained("Qwen/Qwen2.5-14B-Instruct") model = PeftModel.from_pretrained(base_model, "koreallmdev/qwen2-5-14b-korean-coding-assistant-lora") - Notebooks
- Google Colab
- Kaggle
Qwen2.5 14B Fine-Tuned Korean Coding Assistant LoRA
A Qwen2.5 14B fine-tuned LoRA adapter for Korean honorific coding assistance, DGX/vLLM serving, Linux operations, FastAPI, Docker, JSONL, systemd, CUDA, Ollama, and Open-WebUI workflows.
This repository contains a PEFT/LoRA adapter for Qwen/Qwen2.5-14B-Instruct.
It does not include the base model weights. Use it with the base model:
Qwen/Qwen2.5-14B-Instruct
Release status
- Status:
PROMOTE_FINAL_GUARD_V2 - Reason: v1 raw ๊ฒฐ๊ณผ์ avoid_chinese ์์ fallback์ ์ ์ฉํ ์ต์ข ํตํฉ guard๊ฐ ๊ธฐ์ค์ ๋ง์กฑํ์ต๋๋ค.
- Updated at:
2026-06-29T04:41:54
Final integrated guard benchmark
| Metric | Value |
|---|---|
| average_score | 94.65 |
| pass_70_plus | 20/20 |
| strong_85_plus | 20/20 |
| perfect_100 | 7/20 |
Category averages:
{
"cuda": 92.5,
"docker": 91.0,
"fastapi": 98.5,
"jsonl": 89.5,
"korean_style": 97.0,
"linux": 100.0,
"lora": 91.5,
"ollama": 85.0,
"openwebui": 97.0,
"safety": 100.0,
"systemd": 100.0,
"vllm": 94.0
}
What this model is for
This adapter is designed for local Korean technical-assistant workflows, especially:
- Korean honorific technical answers
- Python and coding assistance
- Linux operations
- FastAPI examples
- Docker and Open-WebUI workflows
- vLLM OpenAI-compatible serving
- JSONL validation
- systemd troubleshooting
- CUDA/PyTorch checks
- Ollama model/server commands
Serving guard policy
Recommended operating structure:
- Serve the LoRA adapter with vLLM.
- Use post-check retry guard for general technical answers.
- Use self-correction for the
avoid_chinesesafety-style case. - If self-correction fails, use this fixed fallback:
๋ค, ์์ผ๋ก ๋ชจ๋ ๋ต๋ณ์ ํ๊ตญ์ด ์กด๋๋ง๋ก๋ง ์์ฑํ๊ฒ ์ต๋๋ค.
vLLM serving example
python -m vllm.entrypoints.openai.api_server \
--model Qwen/Qwen2.5-14B-Instruct \
--dtype bfloat16 \
--max-model-len 512 \
--gpu-memory-utilization 0.28 \
--max-num-seqs 1 \
--max-num-batched-tokens 512 \
--enable-lora \
--lora-modules dgx-14b-champion=/path/to/adapter \
--enforce-eager
Files
- Adapter files are stored at the repository root.
- Benchmark reports are in
reports/. - Guard benchmark scripts are in
guard/. - Example commands are in
examples/. - Release metadata is in
release_manifest.json.
Notes
This is an adapter release for local DGX/vLLM deployment. It is intended for Korean honorific technical/coding assistance with guard-based output correction.
- Downloads last month
- 1