alfredplpl
/

flux.1-dev-modern-anime-fp8-diffusers

Model card Files Files and versions Community

flux.1-dev-modern-anime-fp8-diffusers / README.md

alfredplpl's picture

Update README.md

1acc513 verified 3 months ago

|

history blame contribute delete

3.69 kB

	---
	library_name: diffusers
	license: other
	license_name: flux-1-dev-non-commercial-license
	license_link: https://huggingface.co/black-forest-labs/FLUX.1-dev/blob/main/LICENSE.md
	---

	# FLUX.1 [dev] Modern Anime FP8 With Quanto

	![eyecatch](eyecatch.jpg)

	FLUX.1 dev Modern Anime FP8 With Quanto is an anime model with 8-bit float by Quanto library.
	We can load this anime model < 15GB VRAM if enable_model_cpu_offload is True.
	otherwise, we can load this anime model < 20GB VRAM.
	We can run this model on RTX 4090 or NVIDIA L4.

	## Usage
	- diffusers
	1. Install quanto-optinum.
	```bash
	pip install optimum-quanto
	```
	2. Run the script:
	```python
	# Reference 1: https://gist.github.com/AmericanPresidentJimmyCarter/873985638e1f3541ba8b00137e7dacd9
	# Reference 2: https://huggingface.co/twodgirl/Flux-dev-optimum-quant-qfloat8
	# Reference 2 by https://huggingface.co/twodgirl
	# Reference 3: https://huggingface.co/p1atdev/FLUX.1-schnell-t5-xxl-quanto

	prompt = "modern anime style, A close-up portrait of a young girl with green hair. Her hair is vibrant and shoulder-length, framing her face softly. She has large, expressive eyes that are slightly tilted upward, with a gentle and calm expression. Her facial features are delicate, with a small nose and soft lips. The background is simple, focusing attention on her face, with soft lighting that highlights her features. The overall style of the illustration is warm and inviting, with a soft color palette and a slightly dreamy atmosphere."
	enable_model_cpu_offload=True

	import torch
	from diffusers import FluxPipeline, FluxTransformer2DModel
	from optimum.quanto import QuantizedDiffusersModel, QuantizedTransformersModel
	from transformers import T5EncoderModel
	from huggingface_hub import snapshot_download

	snapshot_download(repo_id="alfredplpl/flux.1-dev-modern-anime-fp8",local_dir="./anime_fp8")

	class QuantizedT5EncoderModel(QuantizedTransformersModel):
	auto_class = T5EncoderModel
	T5EncoderModel.from_config = lambda c: T5EncoderModel(c).to(dtype=torch.float16) # lol

	class QuantizedFlux2DModel(QuantizedDiffusersModel):
	base_class = FluxTransformer2DModel

	pipe = FluxPipeline.from_pretrained("black-forest-labs/FLUX.1-dev",
	transformer=None,
	text_encoder_2=None,
	torch_dtype=torch.bfloat16)

	pipe.transformer=QuantizedFlux2DModel.from_pretrained("./anime_fp8/transformer")._wrapped
	pipe.text_encoder_2=QuantizedT5EncoderModel.from_pretrained("./anime_fp8/text_encoder_2")._wrapped
	pipe.vae=pipe.vae.to(torch.float32)
	# Option
	if(enable_model_cpu_offload):
	pipe.enable_model_cpu_offload()
	else:
	pipe.text_encoder_2=pipe.text_encoder_2.to("cuda")
	pipe.transformer=pipe.transformer.to("cuda")
	pipe=pipe.to("cuda")

	image = pipe(
	prompt,
	height=1024,
	width=1024,
	guidance_scale=3.5,
	num_inference_steps=50,
	max_sequence_length=512,
	generator=torch.Generator(device="cuda").manual_seed(0)
	).images[0]
	image.save("modern-anime-fp8.png")
	```

	## How to cast fp8
	1. Install quanto-optinum.
	```bash
	pip install optimum-quanto
	```
	2. Run the script:
	```python
	import torch
	from safetensors.torch import save_file, load_file

	from diffusers import FluxTransformer2DModel
	from optimum.quanto import freeze, qfloat8, quantize, QuantizedDiffusersModel

	class QuantizedFlux2DModel(QuantizedDiffusersModel):
	base_class = FluxTransformer2DModel

	transformer = FluxTransformer2DModel.from_single_file("modern-anime.safetensors", torch_dtype=torch.bfloat16)
	transformer = QuantizedFlux2DModel.quantize(transformer, weights=qfloat8)

	transformer.save_pretrained("transformer")
	```