File size: 3,689 Bytes
f089b9c
 
 
 
 
 
 
af409e1
f59fd54
 
 
2529111
63052cc
 
 
2529111
1acc513
f59fd54
b1ca4f9
 
 
 
 
f59fd54
 
 
 
 
 
447b65f
 
 
f59fd54
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
63052cc
 
f59fd54
63052cc
 
 
 
2aea301
 
 
f59fd54
 
 
 
 
 
 
 
 
 
 
 
 
1acc513
f089b9c
 
 
 
 
 
 
749f233
f089b9c
749f233
 
 
 
 
f089b9c
 
749f233
f089b9c
 
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
---
library_name: diffusers
license: other
license_name: flux-1-dev-non-commercial-license
license_link: https://huggingface.co/black-forest-labs/FLUX.1-dev/blob/main/LICENSE.md
---

# FLUX.1 [dev] Modern Anime FP8 With Quanto

![eyecatch](eyecatch.jpg)

FLUX.1 dev Modern Anime FP8 With Quanto is an anime model with 8-bit float by Quanto library. 
We can load this anime model < 15GB VRAM if enable_model_cpu_offload is True. 
otherwise, we can load this anime model < 20GB VRAM.
We can run this model on RTX 4090 or NVIDIA L4.

## Usage
- diffusers
1. Install quanto-optinum.
```bash
pip install optimum-quanto
```
2. Run the script:
```python
# Reference 1: https://gist.github.com/AmericanPresidentJimmyCarter/873985638e1f3541ba8b00137e7dacd9
# Reference 2: https://huggingface.co/twodgirl/Flux-dev-optimum-quant-qfloat8
# Reference 2 by https://huggingface.co/twodgirl
# Reference 3: https://huggingface.co/p1atdev/FLUX.1-schnell-t5-xxl-quanto

prompt = "modern anime style, A close-up portrait of a young girl with green hair. Her hair is vibrant and shoulder-length, framing her face softly. She has large, expressive eyes that are slightly tilted upward, with a gentle and calm expression. Her facial features are delicate, with a small nose and soft lips. The background is simple, focusing attention on her face, with soft lighting that highlights her features. The overall style of the illustration is warm and inviting, with a soft color palette and a slightly dreamy atmosphere."
enable_model_cpu_offload=True

import torch
from diffusers import FluxPipeline, FluxTransformer2DModel
from optimum.quanto import QuantizedDiffusersModel, QuantizedTransformersModel
from transformers import T5EncoderModel
from huggingface_hub import snapshot_download

snapshot_download(repo_id="alfredplpl/flux.1-dev-modern-anime-fp8",local_dir="./anime_fp8")

class QuantizedT5EncoderModel(QuantizedTransformersModel):
    auto_class = T5EncoderModel
T5EncoderModel.from_config = lambda c: T5EncoderModel(c).to(dtype=torch.float16) # lol

class QuantizedFlux2DModel(QuantizedDiffusersModel):
    base_class = FluxTransformer2DModel

pipe = FluxPipeline.from_pretrained("black-forest-labs/FLUX.1-dev",
                                    transformer=None,
                                    text_encoder_2=None,
                                    torch_dtype=torch.bfloat16)

pipe.transformer=QuantizedFlux2DModel.from_pretrained("./anime_fp8/transformer")._wrapped
pipe.text_encoder_2=QuantizedT5EncoderModel.from_pretrained("./anime_fp8/text_encoder_2")._wrapped
pipe.vae=pipe.vae.to(torch.float32)
# Option
if(enable_model_cpu_offload):
    pipe.enable_model_cpu_offload()
else:
    pipe.text_encoder_2=pipe.text_encoder_2.to("cuda")
    pipe.transformer=pipe.transformer.to("cuda")
    pipe=pipe.to("cuda")

image = pipe(
    prompt,
    height=1024,
    width=1024,
    guidance_scale=3.5,
    num_inference_steps=50,
    max_sequence_length=512,
    generator=torch.Generator(device="cuda").manual_seed(0)
).images[0]
image.save("modern-anime-fp8.png")
```

## How to cast fp8
1. Install quanto-optinum.
```bash
pip install optimum-quanto
```
2. Run the script:
```python
import torch
from safetensors.torch import save_file, load_file

from diffusers import FluxTransformer2DModel
from optimum.quanto import freeze, qfloat8, quantize, QuantizedDiffusersModel

class QuantizedFlux2DModel(QuantizedDiffusersModel):
    base_class = FluxTransformer2DModel

transformer = FluxTransformer2DModel.from_single_file("modern-anime.safetensors", torch_dtype=torch.bfloat16)
transformer = QuantizedFlux2DModel.quantize(transformer, weights=qfloat8)

transformer.save_pretrained("transformer")
```