metadata
license: other
datasets:
- Mitsua/vroid-image-dataset-lite
pipeline_tag: text-to-image
Model Card for VRoid Diffusion
This is a latent text-to-image diffusion model to demonstrate how U-Net training affects the generated images.
- Text Encoder is from OpenCLIP ViT-H/14, MIT License, Training Data : LAION-2B
- VAE is from Mitsua Diffusion One, Mitsua Open RAIL-M License, Training Data: Public Domain/CC0 + Licensed
- U-Net is trained from scratch using full version of VRoid Image Dataset Lite with some modifications.
- VRoid is a trademark or registered trademark of Pixiv inc. in Japan and other regions.
Model Details
vroid_diffusion_test.safetensors
- base variant.
vroid_diffusion_test_invert_red_blue.safetensors
red
andblue
in the caption is swapped.pink
andskyblue
in the caption is swapped.
vroid_diffusion_test_monochrome.safetensors
- all training images are converted to grayscale.
Model Variant
- VRoid Diffusion Unconditional
- This is unconditional image generator without CLIP.
Model Description
- Developed by: Abstract Engine.
- License: Mitsua Open RAIL-M License.
Uses
Direct Use
Text-to-Image generation for research and educational purposes.
Out-of-Scope Use
Any deployed use case of the model.
Training Details
- Trained resolution : 256x256
- Batch Size : 48
- Steps : 45k
- LR : 1e-5 with warmup 1000 steps
Training Data
We use full version of VRoid Image Dataset Lite with some modifications.