--- license: apache-2.0 --- Quantized T5-XXL of FLUX.1[schnell] using HuggingFace [optimum-quanto](https://github.com/huggingface/optimum-quanto). ### Quantize ```py import torch from transformers import T5EncoderModel from optimum.quanto import ( QuantizedTransformersModel, qfloat8_e4m3fn, qfloat8_e5m2, qint8, qint4, ) REPO_NAME = "black-forest-labs/FLUX.1-schnell" TEXT_ENCODER = "text_encoder_2" model = T5EncoderModel.from_pretrained( REPO_NAME, subfolder=TEXT_ENCODER, torch_dtype=torch.bfloat16 ) qmodel = QuantizedTransformersModel.quantize( model, weights=qfloat8_e4m3fn, ) qmodel.save_pretrained("./t5_xxl/qfloat8_e4m3fn") ``` ### Load Currently `QuantizedTransformersModel` [does not support](https://github.com/huggingface/optimum-quanto/blob/601dc193ce0ed381c479fde54a81ba546bdf64d1/optimum/quanto/models/transformers_models.py#L151) load a quantized model from huggingface hub. ```py from transformers import T5EncoderModel, AutoModelForTextEncoding from optimum.quanto import QuantizedTransformersModel MODEL_PATH = "./t5_xxl/qfloat8_e4m3fn" class QuantizedModelForTextEncoding(QuantizedTransformersModel): auto_class = AutoModelForTextEncoding qmodel = QuantizedModelForTextEncoding.from_pretrained( "./t5_xxl/qint8", ) ```