Model Card for VRoid Diffusion Unconditional
This is a latent unconditional diffusion model to demonstrate how U-Net training affects the generated images.
- Pretrained Text Encoder (OpenCLIP) is removed, but an empty text encoder is included for compatibility with
StableDiffusionPipeline
. - VAE is from Mitsua Diffusion One, Mitsua Open RAIL-M License, Training Data: Public Domain/CC0 + Licensed
- U-Net is trained from scratch using full version of VRoid Image Dataset Lite with some modifications.
- The architecture of the U-Net model was modified to conform to unconditional image generation. Cross-attention blocks are replaced by self-attention blocks.
- VRoid is a trademark or registered trademark of Pixiv inc. in Japan and other regions.
Model variant
- VRoid Diffusion
- This is conditional text-to-image generator using OpenCLIP.
Note
- This model works only on diffusers
StableDiffusionPipeline
. This model will not work on A1111 WebUI.
from diffusers import StableDiffusionPipeline
pipeline = StableDiffusionPipeline.from_pretrained("Mitsua/vroid-diffusion-test-unconditional")
Model Description
- Developed by: Abstract Engine.
- License: Mitsua Open RAIL-M License.
Uses
Direct Use
Image generation for research and educational purposes.
Out-of-Scope Use
Any deployed use case of the model.
Training Details
- Trained resolution : 256x256
- Batch Size : 48
- Steps : 45k
- LR : 1e-5 with warmup 1000 steps
Training Data
We use full version of VRoid Image Dataset Lite with some modifications.
- Downloads last month
- 3
This model does not have enough activity to be deployed to Inference API (serverless) yet. Increase its social
visibility and check back later, or deploy to Inference Endpoints (dedicated)
instead.