--- license: other datasets: - Mitsua/vroid-image-dataset-lite library_name: diffusers pipeline_tag: text-to-image --- # Model Card for VRoid Diffusion Unconditional This is a latent unconditional diffusion model to demonstrate how U-Net training affects the generated images. - Pretrained Text Encoder (OpenCLIP) is removed, but an empty text encoder is included for compatibility with `StableDiffusionPipeline`. - VAE is from [Mitsua Diffusion One](https://huggingface.co/Mitsua/mitsua-diffusion-one), Mitsua Open RAIL-M License, Training Data: Public Domain/CC0 + Licensed - U-Net is trained from scratch using full version of [VRoid Image Dataset Lite](https://huggingface.co/datasets/Mitsua/vroid-image-dataset-lite) with some modifications. - The architecture of the U-Net model was modified to conform to unconditional image generation. Cross-attention blocks are replaced by self-attention blocks. - VRoid is a trademark or registered trademark of Pixiv inc. in Japan and other regions. ## Model variant - [VRoid Diffusion](https://huggingface.co/Mitsua/vroid-diffusion-test) - This is conditional text-to-image generator using OpenCLIP. ## Note - This model works only on diffusers `StableDiffusionPipeline`. This model will not work on A1111 WebUI. ``` from diffusers import StableDiffusionPipeline pipeline = StableDiffusionPipeline.from_pretrained("Mitsua/vroid-diffusion-test-unconditional") ``` ### Model Description - **Developed by:** Abstract Engine. - **License:** Mitsua Open RAIL-M License. ## Uses ### Direct Use Image generation for research and educational purposes. ### Out-of-Scope Use Any deployed use case of the model. ## Training Details - Trained resolution : 256x256 - Batch Size : 48 - Steps : 45k - LR : 1e-5 with warmup 1000 steps ### Training Data We use full version of [VRoid Image Dataset Lite](https://huggingface.co/datasets/Mitsua/vroid-image-dataset-lite) with some modifications.