File size: 1,612 Bytes
f524d08
4c0fb85
c31664d
 
 
4c0fb85
 
 
 
 
 
 
5d3ec3a
 
 
4c0fb85
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
f5fa415
4c0fb85
 
 
 
 
 
 
 
 
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
---
license: other
datasets:
- Mitsua/vroid-image-dataset-lite
pipeline_tag: text-to-image
---
# Model Card for VRoid Diffusion

<!-- Provide a quick summary of what the model is/does. -->

This is a latent text-to-image diffusion model to demonstrate how U-Net training affects the generated images.

- Text Encoder is from [OpenCLIP ViT-H/14](https://github.com/mlfoundations/open_clip), MIT License, Training Data : LAION-2B
- VAE is from [Mitsua Diffusion One](https://huggingface.co/Mitsua/mitsua-diffusion-one), Mitsua Open RAIL-M License, Training Data: Public Domain/CC0 + Licensed
- U-Net is trained from scratch using full version of [VRoid Image Dataset Lite](https://huggingface.co/datasets/Mitsua/vroid-image-dataset-lite) with some modifications.
- VRoid is a trademark or registered trademark of Pixiv inc. in Japan and other regions.

## Model Details

- `vroid_diffusion_test.safetensors`
  - base variant.
- `vroid_diffusion_test_invert_red_blue.safetensors`
  - `red` and `blue` in the caption is swapped.
  - `pink` and `skyblue` in the caption is swapped.
- `vroid_diffusion_test_monochrome.safetensors`
  - all training images are converted to grayscale.


### Model Description

- **Developed by:** Abstract Engine.
- **License:** Mitsua Open RAIL-M License.

## Uses

### Direct Use

Text-to-Image generation for research and educational purposes.

### Out-of-Scope Use

Any deployed use case of the model.

## Training Details

### Training Data

We use full version of [VRoid Image Dataset Lite](https://huggingface.co/datasets/Mitsua/vroid-image-dataset-lite) with some modifications.