Janus-Pro: Unified Multimodal Understanding and Generation with Data and Model Scaling
Paper • 2501.17811 • Published • 10
How to use ORI-Muchim/Janus-Pro-7B-Korean-LoRA with PEFT:
Base model is not found.
LoRA adapter for deepseek-ai/Janus-Pro-7B, fine-tuned on Korean text to improve Korean chat capabilities while preserving the unified multimodal (text + image generation) abilities of the base model.
GitHub (inference + training code): https://github.com/ORI-Muchim/janus-pro-korean
q_proj, k_proj, v_proj, o_proj, gate_proj, up_proj, down_proj (language_model only)import torch
from transformers import AutoModelForCausalLM, BitsAndBytesConfig
from peft import PeftModel
from janus.models import VLChatProcessor
bnb = BitsAndBytesConfig(
load_in_4bit=True,
bnb_4bit_quant_type="nf4",
bnb_4bit_compute_dtype=torch.bfloat16,
)
processor = VLChatProcessor.from_pretrained("deepseek-ai/Janus-Pro-7B")
tokenizer = processor.tokenizer
model = AutoModelForCausalLM.from_pretrained(
"deepseek-ai/Janus-Pro-7B",
quantization_config=bnb,
torch_dtype=torch.bfloat16,
trust_remote_code=True,
device_map="auto",
)
model.language_model = PeftModel.from_pretrained(
model.language_model, "ORI-Muchim/Janus-Pro-7B-Korean-LoRA"
)
model.eval()
# Korean chat
conversation = [
{"role": "<|User|>", "content": "인공지능이 뭐야?"},
{"role": "<|Assistant|>", "content": ""},
]
sft = processor.apply_sft_template_for_multi_turn_prompts(
conversations=conversation,
sft_format=processor.sft_format,
system_prompt="",
)
input_ids = tokenizer.encode(sft, return_tensors="pt").cuda()
out = model.language_model.generate(
input_ids=input_ids, max_new_tokens=200,
do_sample=True, temperature=0.7, top_p=0.95,
pad_token_id=tokenizer.pad_token_id or tokenizer.eos_token_id,
)
print(tokenizer.decode(out[0][input_ids.shape[1]:], skip_special_tokens=True))
For image generation, see GitHub repo.
Requires the janus package (for VLChatProcessor):
pip install "transformers==4.46.2" peft bitsandbytes accelerate timm einops attrdict sentencepiece
git clone https://github.com/deepseek-ai/Janus && cd Janus && pip install -e . && cd ..
For substantially better Korean knowledge, continued pretraining on billions of Korean tokens would be required.
MIT (adapter weights) / follows Janus-Pro-7B license for base model usage.
@misc{janus-pro-korean-lora,
author = {ORI-Muchim},
title = {Janus-Pro-7B Korean LoRA},
year = {2026},
publisher = {HuggingFace},
url = {https://huggingface.co/ORI-Muchim/Janus-Pro-7B-Korean-LoRA}
}
@article{chen2025janus,
title={Janus-Pro: Unified Multimodal Understanding and Generation with Data and Model Scaling},
author={Chen, Xiaokang and Wu, Zhiyu and Liu, Xingchao and Pan, Zizheng and Liu, Wen and Xie, Zhenda and Yu, Xingkai and Ruan, Chong},
journal={arXiv preprint arXiv:2501.17811},
year={2025}
}
Base model
deepseek-ai/Janus-Pro-7B