Maykeye
/

MambaFaceKISS-hf

Model card Files Files and versions Community

MambaFaceKISS-hf / README.md

Maykeye

Initial commit

1a030c8 4 months ago

|

No virus

2.47 kB

	---
	license: apache-2.0
	datasets:
	- huggan/anime-faces
	---

	# Mamba face kiss

	## KISS

	This repo contains two Keep It Simple Stupid anime face generators that generates 64x64 faces from 8x8 provided images.

	Basic idea was to take 64x64 anime faces dataset(https://huggingface.co/datasets/huggan/anime-faces), resize it to 8x8, then teach the model to restore original images, intuition is that after that if new unseen images are provided, it will make some face.

	![Validation](./valid.png)

	Mamba is being fed a sequence `[A][A]...[A][SEP][B][B][B]...[B]` where there are 64 `[A]` that came from the 8x8 draft. there are 64x64 `[B]`s that are initially are upscaled draft(nearest neighbor) with addition of PAE. Model run several layers of mamba, and spits last 64x64 into RGB image. (`[SEP]` is not used for anything significant other than BERT has it to separate sentences, so I used it too as placeholder for command "Upscale from here")

	Two models are used.

	### RNN goess brr (one way)

	One(`imgen3test.ipynb` and `imgen3.py`) always feeds images from top-left pixel to bottom-right pixel row by row

	![Non-flip image](./krita-nonflip.png)


	### "Bi-directional"

	Another take(`imgen3test_flip.ipynb` and `imgen3_flip.py`) feed from top-left pixel to bottom-right pixel in every even layer and every odd layer sees upscaled images in reverse order

	![Flip image](./krita-flip.png)

	This flip version also uses way more parameters and different dtype. I didn't notice that much difference.


	#### Command line tool

	Simple script can be used to call the model on a single image

	```console
	$ cli_imgen3_flip ./krita/face1.png face1.out.png

	python cli_imgen3_flip.py ./krita/face1.png /tmp/face1.png
	Weight path is data/image-flip-weights-1024x4-torch.bfloat16.bin
	Loading the model
	Loading 8x8 input image from ./krita/face1.png
	Writing 64x64 image to /tmp/face1.png
	```

	It's not really good way to use, comparing to calling through jupyter it though: mamba2 is implemented using triton and it takes around 30 seconds to initialize the model each time (on Raider GE76).


	## Recreating

	Training is done in `imgen3(_flip)?.py`. Testing is in notebook. `Image_utils` should provide path to anime faces dataset.

	## Naming and configuring

	Name imgen3 comes from "image generation 3".
	Two other attemts are not that interesting to even backup them.

	I'm too lazy to pass configuration around so parameters are hardcoded in the beginning of the file.