File size: 1,612 Bytes
f524d08 4c0fb85 c31664d 4c0fb85 5d3ec3a 4c0fb85 f5fa415 4c0fb85 |
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 |
---
license: other
datasets:
- Mitsua/vroid-image-dataset-lite
pipeline_tag: text-to-image
---
# Model Card for VRoid Diffusion
<!-- Provide a quick summary of what the model is/does. -->
This is a latent text-to-image diffusion model to demonstrate how U-Net training affects the generated images.
- Text Encoder is from [OpenCLIP ViT-H/14](https://github.com/mlfoundations/open_clip), MIT License, Training Data : LAION-2B
- VAE is from [Mitsua Diffusion One](https://huggingface.co/Mitsua/mitsua-diffusion-one), Mitsua Open RAIL-M License, Training Data: Public Domain/CC0 + Licensed
- U-Net is trained from scratch using full version of [VRoid Image Dataset Lite](https://huggingface.co/datasets/Mitsua/vroid-image-dataset-lite) with some modifications.
- VRoid is a trademark or registered trademark of Pixiv inc. in Japan and other regions.
## Model Details
- `vroid_diffusion_test.safetensors`
- base variant.
- `vroid_diffusion_test_invert_red_blue.safetensors`
- `red` and `blue` in the caption is swapped.
- `pink` and `skyblue` in the caption is swapped.
- `vroid_diffusion_test_monochrome.safetensors`
- all training images are converted to grayscale.
### Model Description
- **Developed by:** Abstract Engine.
- **License:** Mitsua Open RAIL-M License.
## Uses
### Direct Use
Text-to-Image generation for research and educational purposes.
### Out-of-Scope Use
Any deployed use case of the model.
## Training Details
### Training Data
We use full version of [VRoid Image Dataset Lite](https://huggingface.co/datasets/Mitsua/vroid-image-dataset-lite) with some modifications.
|