Zero-Shot Classification
Transformers
PyTorch
Safetensors
English
deberta-v2
text-classification
deberta-v3-base
deberta-v3
deberta
nli
natural-language-inference
multitask
multi-task
pipeline
extreme-multi-task
extreme-mtl
tasksource
zero-shot
rlhf
Eval Results
Inference Endpoints

Is there a Quantized version(s)?

#5
by mrmikelevy - opened

I was hoping there would be a quantized version. I know it would be less accurate, but the performance might make up for it. Doing zero-shot-classification.

Hi, nice idea, do you want this for CPU workloads ? The model already fits small GPU

Yes. I've been using a Tesla T4 GPU, but the model is so small that it seems like moving it to CPU might be worth it. Unquantized, it runs about 5 times faster on GPU. I think if it was quantized, it would be about the same speed on CPU.

Sign up or log in to comment