Is there a Quantized version(s)?

#5
by mrmikelevy - opened

I was hoping there would be a quantized version. I know it would be less accurate, but the performance might make up for it. Doing zero-shot-classification.

Hi, nice idea, do you want this for CPU workloads ? The model already fits small GPU

Yes. I've been using a Tesla T4 GPU, but the model is so small that it seems like moving it to CPU might be worth it. Unquantized, it runs about 5 times faster on GPU. I think if it was quantized, it would be about the same speed on CPU.

Sign up or log in to comment