Mitsua's picture
Update README.md
13dc11c
metadata
license: other
datasets:
  - Mitsua/vroid-image-dataset-lite
pipeline_tag: text-to-image

Model Card for VRoid Diffusion

This is a latent text-to-image diffusion model to demonstrate how U-Net training affects the generated images.

  • Text Encoder is from OpenCLIP ViT-H/14, MIT License, Training Data : LAION-2B
  • VAE is from Mitsua Diffusion One, Mitsua Open RAIL-M License, Training Data: Public Domain/CC0 + Licensed
  • U-Net is trained from scratch using full version of VRoid Image Dataset Lite with some modifications.
  • VRoid is a trademark or registered trademark of Pixiv inc. in Japan and other regions.

Model Details

  • vroid_diffusion_test.safetensors
    • base variant.
  • vroid_diffusion_test_invert_red_blue.safetensors
    • red and blue in the caption is swapped.
    • pink and skyblue in the caption is swapped.
  • vroid_diffusion_test_monochrome.safetensors
    • all training images are converted to grayscale.

Model Variant

Model Description

  • Developed by: Abstract Engine.
  • License: Mitsua Open RAIL-M License.

Uses

Direct Use

Text-to-Image generation for research and educational purposes.

Out-of-Scope Use

Any deployed use case of the model.

Training Details

  • Trained resolution : 256x256
  • Batch Size : 48
  • Steps : 45k
  • LR : 1e-5 with warmup 1000 steps

Training Data

We use full version of VRoid Image Dataset Lite with some modifications.