(It would be greatly appreciated if someone can point to me a clean source of Tokyo 7th Sisters assets. I don't really want to scrape Twitter or reverse the game API.)

# Mask, Don't Negative Prompt: Dealing with undesirable parts of training images

## Introduction

Training images aren't always clean. Sometimes, when training for a given target, unrelated parts in images such as text, frames, or watermarks will also be learned by the model. 
There are several strategies that can be applied to this problem, each with shortcomings:

1. **Cropping**: Leave out undesired parts. Modifies source composition, not applicable in some cases.
2. **Inpainting**: Preprocess the data and replace undesirable parts with generated pixels. Requires a good inpainting prompt / model.
3. **Negative Prompts**: Train as is and add negative prompts when generating new images. Requires the model to know how the undesirable parts map to the prompt.

Another simple strategy is effective:

4. **Masking**: Multiply the loss with a predefinfed mask.

This method [is](https://github.com/AUTOMATIC1111/stable-diffusion-webui/pull/6700) [not](https://github.com/csyxwei/ELITE) new, but the most popular LoRA training script has yet to have built-in support for it.

## Experiment

[60 images](https://huggingface.co/datasets/gustproof/sd-data/blob/main/serizawa-momoka.zip) with card text and decorations of [Serizwa Momoka from Tokyo 7th Sisters](https://t7s.game-info.wiki/d/%b6%dc%c2%f4%a5%e2%a5%e2%a5%ab) were used.

[A masked LoRA](https://huggingface.co/gustproof/sd-models/blob/main/serizawa-momoka/checkpoints/srzwmmk-masked-v1.0-000050.safetensors) and [an plain unmasked LoRA](https://huggingface.co/gustproof/sd-models/blob/main/serizawa-momoka/checkpoints/srzwmmk-v1.0-000050.safetensors) were trained.

For the masked version, [a mask](https://huggingface.co/gustproof/sd-models/resolve/main/posts/images/mask-original.webp) was drawn using image editing software over source images. Note that since the VAE has a 8x scaling factor, what seen by the model is the [8x8 pixelated version](https://huggingface.co/gustproof/sd-models/resolve/main/posts/images/mask.webp). Tags that do not describe the parts masked away were removed.


## Results
![xy compare](https://huggingface.co/gustproof/sd-models/resolve/main/posts/images/srmm.png)

Masked version works 100% unlike negative prompts.

## Future work
* Auto generation of masks with segmantation models