Janus-Pro-7B Korean LoRA

LoRA adapter for deepseek-ai/Janus-Pro-7B, fine-tuned on Korean text to improve Korean chat capabilities while preserving the unified multimodal (text + image generation) abilities of the base model.

GitHub (inference + training code): https://github.com/ORI-Muchim/janus-pro-korean

Model Details

  • Base Model: DeepSeek Janus-Pro-7B (unified multimodal AR transformer)
  • Adapter Type: LoRA (r=32, α=64, dropout=0.05)
  • Target Modules: q_proj, k_proj, v_proj, o_proj, gate_proj, up_proj, down_proj (language_model only)
  • Trainable Parameters: ~80M (1.05% of base)
  • Languages: Korean (primary) + English (preserved)

Usage

import torch
from transformers import AutoModelForCausalLM, BitsAndBytesConfig
from peft import PeftModel
from janus.models import VLChatProcessor

bnb = BitsAndBytesConfig(
    load_in_4bit=True,
    bnb_4bit_quant_type="nf4",
    bnb_4bit_compute_dtype=torch.bfloat16,
)

processor = VLChatProcessor.from_pretrained("deepseek-ai/Janus-Pro-7B")
tokenizer = processor.tokenizer

model = AutoModelForCausalLM.from_pretrained(
    "deepseek-ai/Janus-Pro-7B",
    quantization_config=bnb,
    torch_dtype=torch.bfloat16,
    trust_remote_code=True,
    device_map="auto",
)
model.language_model = PeftModel.from_pretrained(
    model.language_model, "ORI-Muchim/Janus-Pro-7B-Korean-LoRA"
)
model.eval()

# Korean chat
conversation = [
    {"role": "<|User|>", "content": "인공지능이 뭐야?"},
    {"role": "<|Assistant|>", "content": ""},
]
sft = processor.apply_sft_template_for_multi_turn_prompts(
    conversations=conversation,
    sft_format=processor.sft_format,
    system_prompt="",
)
input_ids = tokenizer.encode(sft, return_tensors="pt").cuda()
out = model.language_model.generate(
    input_ids=input_ids, max_new_tokens=200,
    do_sample=True, temperature=0.7, top_p=0.95,
    pad_token_id=tokenizer.pad_token_id or tokenizer.eos_token_id,
)
print(tokenizer.decode(out[0][input_ids.shape[1]:], skip_special_tokens=True))

For image generation, see GitHub repo.

Installation

Requires the janus package (for VLChatProcessor):

pip install "transformers==4.46.2" peft bitsandbytes accelerate timm einops attrdict sentencepiece
git clone https://github.com/deepseek-ai/Janus && cd Janus && pip install -e . && cd ..

Training

  • Method: QLoRA 4-bit (nf4 + double quant, bf16 compute)
  • Dataset mix:
  • Hyperparameters:
    • Steps: 40,000
    • Effective batch size: 8 (batch=8, grad_accum=1)
    • Sequence length: 1024
    • Learning rate: 1e-4 (cosine schedule, 5% warmup)
    • Max grad norm: 1.0
  • Hardware: 1× RTX 5090 (32GB), ~36.5 hours
  • Final training loss: 0.55 (initial: 1.51)

Strengths

  • Natural Korean sentence structure and polite/formal registers
  • Chat-style responses (asks back, uses lists, paragraphs)
  • General knowledge topics (AI, environment, science) stay coherent
  • English and code generation preserved from base
  • No repetition loops (AR advantage over diffusion-based alternatives)

Limitations

  • Korea-specific facts: LoRA does not inject new knowledge. The base model was trained primarily on English/Chinese, so queries about Korean history, local places, Korean cuisine, etc. can hallucinate (e.g., confuses Korea with Thailand, suggests pasta in 김치찌개).
  • Small model effects: 7B with QLoRA has limited capacity compared to larger instruction-tuned Korean models (e.g., EEVE-10.8B, SOLAR-10.7B). For Korean-only tasks without image generation, those may be better choices.
  • Image generation: Unchanged from base Janus-Pro-7B (LoRA was applied only to the language_model component, not the image generation head).

For substantially better Korean knowledge, continued pretraining on billions of Korean tokens would be required.

License

MIT (adapter weights) / follows Janus-Pro-7B license for base model usage.

Citation

@misc{janus-pro-korean-lora,
  author = {ORI-Muchim},
  title = {Janus-Pro-7B Korean LoRA},
  year = {2026},
  publisher = {HuggingFace},
  url = {https://huggingface.co/ORI-Muchim/Janus-Pro-7B-Korean-LoRA}
}

@article{chen2025janus,
  title={Janus-Pro: Unified Multimodal Understanding and Generation with Data and Model Scaling},
  author={Chen, Xiaokang and Wu, Zhiyu and Liu, Xingchao and Pan, Zizheng and Liu, Wen and Xie, Zhenda and Yu, Xingkai and Ruan, Chong},
  journal={arXiv preprint arXiv:2501.17811},
  year={2025}
}

Framework versions

  • PEFT 0.18.1
  • transformers 4.46.2
  • Python 3.12
Downloads last month
-
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for ORI-Muchim/Janus-Pro-7B-Korean-LoRA

Adapter
(7)
this model

Paper for ORI-Muchim/Janus-Pro-7B-Korean-LoRA