alea31415
/

bocchi-the-rock-character

Model card Files Files and versions Community

bocchi-the-rock-character / README.md

alea31415's picture

Update README.md

1bbde23 over 1 year ago

|

No virus

2.8 kB

	---
	license: creativeml-openrail-m
	---

	---
	license: creativeml-openrail-m
	---

	This is a low-quality bocchi-the-rock (ぼっち・ざ・ろっく!) character model.
	Similar to my [yama-no-susume model](https://huggingface.co/alea31415/yama-no-susume), this model is capable of generating multi-character scenes beyond images of a single character.
	Of course, the result is still hit-or-miss, but I think the success rate of getting the entire Kessoku Band right in one shot is already quite high,
	and otherwise, you can always rely on inpainting.
	Here are two examples:

	With inpainting
	Coming soon

	Without inpainting
	Coming soon


	### Characters

	The model knows 12 characters from bocchi the rock.
	The ressemblance with a character can be improved by a better description of their appearance.

	Coming soon

	### Dataset description

	The dataset contains around 27K images with the following composition
	- 7024 anime screenshots
	- 1630 fan arts
	- 18519 customized regularization images

	The model is trained with a specific weighting scheme to balance between different concepts.
	For example, the above three categories have weights respectively 0.3, 0.25, and 0.45.
	Each category is itself split into many sub-categories in a hierarchical way.
	For more details on the data preparation process please refer to https://github.com/cyber-meow/anime_screenshot_pipeline


	### Training Details

	#### Trainer
	The model is trained using [EveryDream1](https://github.com/victorchall/EveryDream-trainer) as
	EveryDream seems to be the only trainer out there that supports sample weighting (through the use of `multiply.txt`).
	Note that for future training it makes sense to migrate to [EveryDream2](https://github.com/victorchall/EveryDream2trainer).

	#### Hardware and cost
	The model is trained on runpod using 3090 and cost me around 15 dollors.

	#### Hyperparameter specification

	- The model is trained for 48000 steps, at batch size 4, lr 1e-6, resolution 512, and conditional dropping rate of 10%.

	Note that as a consequence of the weighting scheme which translates into a number of different multiply for each image,
	the count of repeat and epoch has a quite different meaning here.
	For example, depending on the weighting, I have around 300K images (some images are used multiple times) in an epoch,
	and therefore I did not even finish an entire epoch with the 48000 steps at batch size 4.

	### Failures

	- For the first 24000 steps I use the trigger words `Bfan1` and `Bfan2` for the two fans of Bocchi.
	However, these two words are too similar and the model fails to different characters for these. Therefore I changed Bfan2 to Bofa2 at step 24000.


	### More Example Generations

	With inpainting
	Coming soon

	Without inpainting
	Coming soon

	Some failure cases
	Coming soon