README.md · CyberHarem/README at 89d505eb486e008ce26bb642b5214fb81da768f1

metadata

title: README
emoji: 📉
colorFrom: gray
colorTo: gray
sdk: static
pinned: false
license: mit

Update 2023.9.12

If you want to make a LoRA request, see this article.

CyberHarem is a non-profit technical team that works purely out of interest, so we do not charge any fees in any form. However, our computing resources and team members' working time are limited, so we cannot guarantee the delivery time of models in principle. We will do our best to complete them as soon as possible under the circumstances, and we hope for your understanding in this regard.

Update 2023.9.2

Two recent developments:

The automated training process for v1.4 has been deployed, and the model's quality has improved significantly compared to before (for more technical details, see: https://civitai.com/articles/2064/2023-8-31-release-of-v14-training-automation-process). We are now in the process of thoroughly cleaning the dataset and retraining the model.
We now support LoRA training for characters in anime videos, and the entire process is highly automated.

What is this?

As you can see, this place is called CyberHarem, a centralized repository for anime waifu images dataset and LoRA models.

It's an interesting experiment where all the datasets, models, model previews, and models published to civitai are fully auto-generated without any human intervention. For this purpose, we've done a lot of tech and data preparation, which you can find in our Organization - DeepGHS and the code on Github - DeepGHS.

Currently, we have collected databases of several popular mobile games' characters (see Supported Games of GChar Library) and crawled datasets of female characters from these games for training. In the future, we may include more characters, not just limited to mobile games, but also from anime series. You can find your waifu with CyberHarem/find_my_waifu.

Where does the dataset come from? What's the format?

The dataset is automatically crawled from various major image websites like ZeroChan, Anime-Pictures, Danbooru, Rule34, etc. (see Supported Sites of GChar Library)
In each dataset repository, there are both original data packs and images resized and aligned to a uniform size, along with image tags generated using the SmilingWolf/wd-v1-4-convnextv2-tagger-v2 model.

How are the models trained? What's the format?

LoRA models are trained in batch with corresponding datasets. We use 7eu7d7's HCP-Diffusion training framework for the process.

How to use a1111's WebUI to generate images of anime waifus?

Go to the model repository.
Check the Model Card and choose a step that looks good visually.
Click on the right side's Download to download the model package. The package contains two files: a .pt file and a .safetensors format LoRA file.
You need to use both of these models simultaneously. Put the pt file in the embedding path and use the safetensors file as LoRA mount.
Use the trigger words (provided in the Model Card) and prompt text to generate images.

Why do some preview images not look very much like the original characters?

The prompt texts used in the preview images are automatically generated using clustering algorithms based on the feature information extracted from the training dataset. The seed for generating images is also randomly generated, and the images are not selected or modified in any way, so there is a probability of such issues.

In reality, according to our internal tests, most models that have this issue perform better in actual use than what you see in the preview images. The only thing you might need to do is fine-tune the tags you use a bit.