Janus-Pro-7B Korean LoRA

LoRA adapter for deepseek-ai/Janus-Pro-7B, fine-tuned on Korean text to improve Korean chat capabilities while preserving the unified multimodal (text + image generation) abilities of the base model.

GitHub (inference + training code): https://github.com/ORI-Muchim/janus-pro-korean

Model Details

Base Model: DeepSeek Janus-Pro-7B (unified multimodal AR transformer)
Adapter Type: LoRA (r=32, α=64, dropout=0.05)
Target Modules: q_proj, k_proj, v_proj, o_proj, gate_proj, up_proj, down_proj (language_model only)
Trainable Parameters: ~80M (1.05% of base)
Languages: Korean (primary) + English (preserved)

Usage

import torch
from transformers import AutoModelForCausalLM, BitsAndBytesConfig
from peft import PeftModel
from janus.models import VLChatProcessor

bnb = BitsAndBytesConfig(
    load_in_4bit=True,
    bnb_4bit_quant_type="nf4",
    bnb_4bit_compute_dtype=torch.bfloat16,
)

processor = VLChatProcessor.from_pretrained("deepseek-ai/Janus-Pro-7B")
tokenizer = processor.tokenizer

model = AutoModelForCausalLM.from_pretrained(
    "deepseek-ai/Janus-Pro-7B",
    quantization_config=bnb,
    torch_dtype=torch.bfloat16,
    trust_remote_code=True,
    device_map="auto",
)
model.language_model = PeftModel.from_pretrained(
    model.language_model, "ORI-Muchim/Janus-Pro-7B-Korean-LoRA"
)
model.eval()

# Korean chat
conversation = [
    {"role": "<|User|>", "content": "인공지능이 뭐야?"},
    {"role": "<|Assistant|>", "content": ""},
]
sft = processor.apply_sft_template_for_multi_turn_prompts(
    conversations=conversation,
    sft_format=processor.sft_format,
    system_prompt="",
)
input_ids = tokenizer.encode(sft, return_tensors="pt").cuda()
out = model.language_model.generate(
    input_ids=input_ids, max_new_tokens=200,
    do_sample=True, temperature=0.7, top_p=0.95,
    pad_token_id=tokenizer.pad_token_id or tokenizer.eos_token_id,
)
print(tokenizer.decode(out[0][input_ids.shape[1]:], skip_special_tokens=True))

For image generation, see GitHub repo.

Installation

Requires the janus package (for VLChatProcessor):

pip install "transformers==4.46.2" peft bitsandbytes accelerate timm einops attrdict sentencepiece
git clone https://github.com/deepseek-ai/Janus && cd Janus && pip install -e . && cd ..

Training

Method: QLoRA 4-bit (nf4 + double quant, bf16 compute)
Dataset mix:
- maywell/koVast — multi-turn conversations (40% weight)
- heegyu/open-korean-instructions (40% weight)
- AIHub 일반상식 문장 교정 데이터 — 406K clean sentences (20% weight)
Hyperparameters:
- Steps: 40,000
- Effective batch size: 8 (batch=8, grad_accum=1)
- Sequence length: 1024
- Learning rate: 1e-4 (cosine schedule, 5% warmup)
- Max grad norm: 1.0
Hardware: 1× RTX 5090 (32GB), ~36.5 hours
Final training loss: 0.55 (initial: 1.51)

Strengths

Natural Korean sentence structure and polite/formal registers
Chat-style responses (asks back, uses lists, paragraphs)
General knowledge topics (AI, environment, science) stay coherent
English and code generation preserved from base
No repetition loops (AR advantage over diffusion-based alternatives)

Limitations

Korea-specific facts: LoRA does not inject new knowledge. The base model was trained primarily on English/Chinese, so queries about Korean history, local places, Korean cuisine, etc. can hallucinate (e.g., confuses Korea with Thailand, suggests pasta in 김치찌개).
Small model effects: 7B with QLoRA has limited capacity compared to larger instruction-tuned Korean models (e.g., EEVE-10.8B, SOLAR-10.7B). For Korean-only tasks without image generation, those may be better choices.
Image generation: Unchanged from base Janus-Pro-7B (LoRA was applied only to the language_model component, not the image generation head).

For substantially better Korean knowledge, continued pretraining on billions of Korean tokens would be required.

License

MIT (adapter weights) / follows Janus-Pro-7B license for base model usage.

Citation

@misc{janus-pro-korean-lora,
  author = {ORI-Muchim},
  title = {Janus-Pro-7B Korean LoRA},
  year = {2026},
  publisher = {HuggingFace},
  url = {https://huggingface.co/ORI-Muchim/Janus-Pro-7B-Korean-LoRA}
}

@article{chen2025janus,
  title={Janus-Pro: Unified Multimodal Understanding and Generation with Data and Model Scaling},
  author={Chen, Xiaokang and Wu, Zhiyu and Liu, Xingchao and Pan, Zizheng and Liu, Wen and Xie, Zhenda and Yu, Xingkai and Ruan, Chong},
  journal={arXiv preprint arXiv:2501.17811},
  year={2025}
}

Framework versions

PEFT 0.18.1
transformers 4.46.2
Python 3.12

Downloads last month: -

Model tree for ORI-Muchim/Janus-Pro-7B-Korean-LoRA

Base model

deepseek-ai/Janus-Pro-7B

Adapter

(7)

this model

Paper for ORI-Muchim/Janus-Pro-7B-Korean-LoRA

Janus-Pro: Unified Multimodal Understanding and Generation with Data and Model Scaling

Paper • 2501.17811 • Published Jan 29, 2025 • 10