Multi-domain LoRA — Text-to-Image
Fine-tuned từ runwayml/stable-diffusion-v1-5 bằng LoRA rank 32.
Domains được train
- ✅ Realistic (COCO, Flickr30K)
- ✅ Anime / Illustration (Pokemon captions)
- ✅ Art styles (ArtBench)
- ✅ Portrait / Human faces
- ✅ Vietnamese culture
Training config
{
"model_id": "runwayml/stable-diffusion-v1-5",
"output_dir": "/kaggle/working/lora_output",
"hf_repo_id": "huydev0000/text_to_image_finetune",
"lora_rank": 32,
"lora_alpha": 64,
"lora_dropout": 0.05,
"target_modules": [
"to_k",
"to_q",
"to_v",
"to_out.0",
"ff.net.0.proj",
"ff.net.2"
],
"resolution": 512,
"train_batch_size": 4,
"gradient_accum": 2,
"learning_rate": 0.0002,
"max_train_steps": 4000,
"save_steps": 2000,
"lr_scheduler": "cosine",
"warmup_steps": 500,
"mixed_precision": "fp16",
"seed": 42,
"snr_gamma": 5.0,
"cfg_drop_prob": 0.1,
"resume_from": ""
}
Usage
from diffusers import StableDiffusionPipeline
from peft import PeftModel
import torch
pipe = StableDiffusionPipeline.from_pretrained(
"runwayml/stable-diffusion-v1-5", torch_dtype=torch.float16
)
pipe.unet = PeftModel.from_pretrained(pipe.unet, "your-username/your-lora")
pipe.to("cuda")
image = pipe("your prompt here").images[0]
Model tree for huydev0000/text_to_image_finetune
Base model
runwayml/stable-diffusion-v1-5