Transformers
English
ctranslate2
int8
float16
code
Inference Endpoints

Some other quantizations

#1
by localAGI - opened

Hey, any chance you add a fp16 variant of the model?

Does it make any difference in executing?

I am running on GPU. Afaik fp16 model would be around 28G, so should do nicely with 80-90% offloading to a 24GVram card.

Might be able to do it.

Just not sure, if a partial offloading is supported with Ctranslate2, and I am also not sure for which reason you would want to load in fp16. fp16 would be 32GB also

Sign up or log in to comment