Support for variant fp16?
Hey guys is there a VAE for this model that can support the variant="fp16" option when loading? I think it could potentially reduce load and inference time by a fair amount.
Hey guys is there a VAE for this model that can support the variant="fp16" option when loading? I think it could potentially reduce load and inference time by a fair amount.
for VAE already baked in model...
and VAE Animagine XL 3.1 use already fp16
we use https://huggingface.co/madebyollin/sdxl-vae-fp16-fix
for VAE already baked in model...
and VAE Animagine XL 3.1 use already fp16
we use https://huggingface.co/madebyollin/sdxl-vae-fp16-fix
I see, is it possible to add the ".fp16" into the file names? Currently it isn't possible to load from_pretrained()
with the variant="fp16"
in diffusers, as the file names don't exist.
variant (`str`, *optional*):
Load weights from a specified variant filename such as `"fp16"` or `"ema"`.
So trying to load it with variant="fp16"
will throw the error You are trying to load the model files of the
variant=fp16, but no such modeling files are available. The default model files: ... will be loaded instead
We're running a remote server that fetches the model from this huggingface repo. I did try changing the names to add a "fp16" and uploading everything to my own repo. While I could load the models with this method, I always end up with the error
encoded_inputs["attention_mask"] = encoded_inputs["attention_mask"] + [0] * difference
OverflowError: cannot fit 'int' into an index-sized integer
I guess I am looking for advice on how to load the Animagine XL 3.1 on DiffusionPipeline.from_pretrained()
with the variant="fp16"
enabled. Because by default it will load as float32 and then convert to float16. Loading with variant="fp16"
will load as float16 and save some time.
Much appreciated!
for VAE already baked in model...
and VAE Animagine XL 3.1 use already fp16
we use https://huggingface.co/madebyollin/sdxl-vae-fp16-fixI see, is it possible to add the ".fp16" into the file names? Currently it isn't possible to load
from_pretrained()
with thevariant="fp16"
in diffusers, as the file names don't exist.variant (`str`, *optional*): Load weights from a specified variant filename such as `"fp16"` or `"ema"`.
So trying to load it with
variant="fp16"
will throw the errorYou are trying to load the model files of the
variant=fp16, but no such modeling files are available. The default model files: ... will be loaded instead
We're running a remote server that fetches the model from this huggingface repo. I did try changing the names to add a "fp16" and uploading everything to my own repo. While I could load the models with this method, I always end up with the error
encoded_inputs["attention_mask"] = encoded_inputs["attention_mask"] + [0] * difference OverflowError: cannot fit 'int' into an index-sized integer
I guess I am looking for advice on how to load the Animagine XL 3.1 on
DiffusionPipeline.from_pretrained()
with thevariant="fp16"
enabled. Because by default it will load as float32 and then convert to float16. Loading withvariant="fp16"
will load as float16 and save some time.
Model AnimagineXL 3.1 is already fp16
you dont need to use variant="fp16" (because use variant will read extension of filename like sdxl_vae.fp16.safetensor
)
if you still want to make sure try torch_dtype = torch.float16 (you can check implementation at https://huggingface.co/cagliostrolab/animagine-xl-3.1#%F0%9F%A7%A8-diffusers-installation)
I did try torch_dtype = torch.float16
. I believe the model loads it as float32 and then converts to float16 afterwards for inference. This is slower than loading directly as float16.
There is a noticeable faster model load speed when variant="fp16"
is used (like in the sdxl base example https://huggingface.co/stabilityai/stable-diffusion-xl-base-1.0#%F0%9F%A7%A8-diffusers), compared to not having it on.
I did try
torch_dtype = torch.float16
. I believe the model loads it as float32 and then converts to float16 afterwards for inference. This is slower than loading directly as float16.There is a noticeable faster model load speed when
variant="fp16"
is used (like in the sdxl base example https://huggingface.co/stabilityai/stable-diffusion-xl-base-1.0#%F0%9F%A7%A8-diffusers), compared to not having it on.
why you still believe model AnimagineXL load as float32 ?
you can see at my image, does diffuser load model float32 from nothing ?
at sdxl base have 2 variant you can see at precision every model belowmodel.fp16.safetensors
model.safetensors
maybe if in SDXL base you need to add variant="fp16"
because there are 2 variants fp32 and fp16
but in this AnimagineXL 3.1 model it only has 1 variant, that is fp16, and the file name is model.safetensors
if you equate it with model.safetensors
in sdxl base, the difference in precision is clear
that's why you believe that AnimagineXL 3.1 is loaded fp32 first but in fact animagine model only has 1 precision which is fp16 in this repo
Yea I agree with you that AnimagineXL 3.1 is in fp16. If it was fp32 then it would be over 10GB.
My problem is the diffusers library is dumb as hell and can't load variant="fp16" without the "fp16" being in the name. And if I don't specify variant="fp16" then diffusers will still try to load it as fp32 even though it is fp16 (no error, just inefficient and slow). I'll try cloning this model repo, renaming everything to "fp16", and try loading in with the parameter variant="fp16".
It works, I'm seeing a reduction of model load time of 50% on my setup :)