ddfsd
Browse files- posts/mask.md +35 -0
posts/mask.md
ADDED
@@ -0,0 +1,35 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
1 |
+
(It would be greatly appreciated if someone can point to me a clean source of Tokyo 7th Sisters assets. I don't really want to scrape Twitter or reverse the game API.)
|
2 |
+
|
3 |
+
# Mask, Don't Negative Prompt: Dealing with undesirable parts of training images
|
4 |
+
|
5 |
+
## Introduction
|
6 |
+
|
7 |
+
Training images aren't always clean. Sometimes, when training for a given target, unrelated parts in images such as text, frames, or watermarks will also be learned by the model.
|
8 |
+
There are several strategies that can be applied to this problem, each with shortcomings:
|
9 |
+
|
10 |
+
1. **Cropping**: Leave out undesired parts. Modifies source composition, not applicable in some cases.
|
11 |
+
2. **Inpainting**: Preprocess the data and replace undesirable parts with generated pixels. Requires a good inpainting prompt / model.
|
12 |
+
3. **Negative Prompts**: Train as is and add negative prompts when generating new images. Requires the model to know how the undesirable parts map to the prompt.
|
13 |
+
|
14 |
+
Another simple strategy is effective:
|
15 |
+
|
16 |
+
4. **Masking**: Multiply the loss with a predefinfed mask.
|
17 |
+
|
18 |
+
This method [is](https://github.com/AUTOMATIC1111/stable-diffusion-webui/pull/6700) [not](https://github.com/csyxwei/ELITE) new, but the most popular LoRA training script has yet to have built-in support for it.
|
19 |
+
|
20 |
+
## Experiment
|
21 |
+
|
22 |
+
[60 images](https://huggingface.co/datasets/gustproof/sd-data/blob/main/serizawa-momoka.zip) with card text and decorations of [Serizwa Momoka from Tokyo 7th Sisters](https://t7s.game-info.wiki/d/%b6%dc%c2%f4%a5%e2%a5%e2%a5%ab) were used.
|
23 |
+
|
24 |
+
[A masked LoRA](https://huggingface.co/gustproof/sd-models/blob/main/serizawa-momoka/checkpoints/srzwmmk-masked-v1.0-000050.safetensors) and [an plain unmasked LoRA](https://huggingface.co/gustproof/sd-models/blob/main/serizawa-momoka/checkpoints/srzwmmk-v1.0-000050.safetensors) were trained.
|
25 |
+
|
26 |
+
For the masked version, [a mask](https://huggingface.co/gustproof/sd-models/resolve/main/posts/images/mask-original.webp) was drawn using image editing software over source images. Note that since the VAE has a 8x scaling factor, what seen by the model is the [8x8 pixelated version](https://huggingface.co/gustproof/sd-models/resolve/main/posts/images/mask.webp). Tags that do not describe the parts masked away were removed.
|
27 |
+
|
28 |
+
|
29 |
+
## Results
|
30 |
+
![xy compare](https://huggingface.co/gustproof/sd-models/resolve/main/posts/images/srmm.png)
|
31 |
+
|
32 |
+
Masked version works 100% unlike negative prompts.
|
33 |
+
|
34 |
+
## Future work
|
35 |
+
* Auto generation of masks with segmantation models
|