Edit model card

Model Card for VRoid Diffusion Unconditional

This is a latent unconditional diffusion model to demonstrate how U-Net training affects the generated images.

  • Pretrained Text Encoder (OpenCLIP) is removed, but an empty text encoder is included for compatibility with StableDiffusionPipeline.
  • VAE is from Mitsua Diffusion One, Mitsua Open RAIL-M License, Training Data: Public Domain/CC0 + Licensed
  • U-Net is trained from scratch using full version of VRoid Image Dataset Lite with some modifications.
    • The architecture of the U-Net model was modified to conform to unconditional image generation. Cross-attention blocks are replaced by self-attention blocks.
  • VRoid is a trademark or registered trademark of Pixiv inc. in Japan and other regions.

Model variant

  • VRoid Diffusion
    • This is conditional text-to-image generator using OpenCLIP.

Note

  • This model works only on diffusers StableDiffusionPipeline. This model will not work on A1111 WebUI.
from diffusers import StableDiffusionPipeline
pipeline = StableDiffusionPipeline.from_pretrained("Mitsua/vroid-diffusion-test-unconditional")

Model Description

  • Developed by: Abstract Engine.
  • License: Mitsua Open RAIL-M License.

Uses

Direct Use

Image generation for research and educational purposes.

Out-of-Scope Use

Any deployed use case of the model.

Training Details

  • Trained resolution : 256x256
  • Batch Size : 48
  • Steps : 45k
  • LR : 1e-5 with warmup 1000 steps

Training Data

We use full version of VRoid Image Dataset Lite with some modifications.

Downloads last month
0

Dataset used to train Mitsua/vroid-diffusion-test-unconditional