Why is the inference api unavailable?

#8
by amgadhasan - opened

The inference api for this model is unavailable. It would be nice if we could try the model out on the fly.

Thanks for the model btw

It's unavailable because this is a quantised GPTQ model and Hugging Face doesn't support live inference on those.

But I have an unquantised version uploaded as well, and inference is enabled on this: https://huggingface.co/TheBloke/wizard-vicuna-13B-HF

Personally I've never found that live inference demo to be any good though, because it only returns about 10 tokens which isn't nearly enough to test anything useful. But it's available at the above link if you want it.

Sign up or log in to comment