|
--- |
|
license: other |
|
datasets: |
|
- Mitsua/vroid-image-dataset-lite |
|
library_name: diffusers |
|
pipeline_tag: text-to-image |
|
--- |
|
|
|
# Model Card for VRoid Diffusion Unconditional |
|
|
|
<!-- Provide a quick summary of what the model is/does. --> |
|
|
|
This is a latent unconditional diffusion model to demonstrate how U-Net training affects the generated images. |
|
|
|
- Pretrained Text Encoder (OpenCLIP) is removed, but an empty text encoder is included for compatibility with `StableDiffusionPipeline`. |
|
- VAE is from [Mitsua Diffusion One](https://huggingface.co/Mitsua/mitsua-diffusion-one), Mitsua Open RAIL-M License, Training Data: Public Domain/CC0 + Licensed |
|
- U-Net is trained from scratch using full version of [VRoid Image Dataset Lite](https://huggingface.co/datasets/Mitsua/vroid-image-dataset-lite) with some modifications. |
|
- The architecture of the U-Net model was modified to conform to unconditional image generation. Cross-attention blocks are replaced by self-attention blocks. |
|
- VRoid is a trademark or registered trademark of Pixiv inc. in Japan and other regions. |
|
|
|
## Model variant |
|
- [VRoid Diffusion](https://huggingface.co/Mitsua/vroid-diffusion-test) |
|
- This is conditional text-to-image generator using OpenCLIP. |
|
|
|
## Note |
|
- This model works only on diffusers `StableDiffusionPipeline`. This model will not work on A1111 WebUI. |
|
|
|
``` |
|
from diffusers import StableDiffusionPipeline |
|
pipeline = StableDiffusionPipeline.from_pretrained("Mitsua/vroid-diffusion-test-unconditional") |
|
``` |
|
### Model Description |
|
|
|
- **Developed by:** Abstract Engine. |
|
- **License:** Mitsua Open RAIL-M License. |
|
|
|
## Uses |
|
|
|
### Direct Use |
|
|
|
Image generation for research and educational purposes. |
|
|
|
### Out-of-Scope Use |
|
|
|
Any deployed use case of the model. |
|
|
|
## Training Details |
|
|
|
- Trained resolution : 256x256 |
|
- Batch Size : 48 |
|
- Steps : 45k |
|
- LR : 1e-5 with warmup 1000 steps |
|
|
|
### Training Data |
|
|
|
We use full version of [VRoid Image Dataset Lite](https://huggingface.co/datasets/Mitsua/vroid-image-dataset-lite) with some modifications. |