File size: 2,004 Bytes
a26a4ad
 
6f93f07
 
 
 
a26a4ad
6f93f07
 
 
 
 
 
 
e4fed10
6f93f07
 
e4fed10
6f93f07
 
 
 
 
71abe9a
 
 
 
 
 
 
 
6f93f07
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
---
license: other
datasets:
- Mitsua/vroid-image-dataset-lite
library_name: diffusers
pipeline_tag: text-to-image
---

# Model Card for VRoid Diffusion Unconditional

<!-- Provide a quick summary of what the model is/does. -->

This is a latent unconditional diffusion model to demonstrate how U-Net training affects the generated images.

- Pretrained Text Encoder (OpenCLIP) is removed, but an empty text encoder is included for compatibility with `StableDiffusionPipeline`.
- VAE is from [Mitsua Diffusion One](https://huggingface.co/Mitsua/mitsua-diffusion-one), Mitsua Open RAIL-M License, Training Data: Public Domain/CC0 + Licensed
- U-Net is trained from scratch using full version of [VRoid Image Dataset Lite](https://huggingface.co/datasets/Mitsua/vroid-image-dataset-lite) with some modifications.
  - The architecture of the U-Net model was modified to conform to unconditional image generation. Cross-attention blocks are replaced by self-attention blocks.
- VRoid is a trademark or registered trademark of Pixiv inc. in Japan and other regions.

## Model variant
- [VRoid Diffusion](https://huggingface.co/Mitsua/vroid-diffusion-test)
  - This is conditional text-to-image generator using OpenCLIP.
 
## Note
- This model works only on diffusers `StableDiffusionPipeline`. This model will not work on A1111 WebUI.

```
from diffusers import StableDiffusionPipeline
pipeline = StableDiffusionPipeline.from_pretrained("Mitsua/vroid-diffusion-test-unconditional")
```
### Model Description

- **Developed by:** Abstract Engine.
- **License:** Mitsua Open RAIL-M License.

## Uses

### Direct Use

Image generation for research and educational purposes.

### Out-of-Scope Use

Any deployed use case of the model.

## Training Details

- Trained resolution : 256x256
- Batch Size : 48
- Steps : 45k
- LR : 1e-5 with warmup 1000 steps

### Training Data

We use full version of [VRoid Image Dataset Lite](https://huggingface.co/datasets/Mitsua/vroid-image-dataset-lite) with some modifications.