God bless you!

#1
by h2m - opened

God bless you!

If you can, use the 3.0bpw TinyLLaMA 32K exl2 quant for your speculative decoding draft model and get insane inference speed:
https://huggingface.co/models?sort=trending&search=LoneStriker+tinyllama+32k

Ooba doesn't support speculative deconding unfortunately, but exui and TabbyAPI do.

Sign up or log in to comment