God bless you!

by h2m - opened Jan 30

h2m

Jan 30

God bless you!

Owner Jan 30

If you can, use the 3.0bpw TinyLLaMA 32K exl2 quant for your speculative decoding draft model and get insane inference speed:
https://huggingface.co/models?sort=trending&search=LoneStriker+tinyllama+32k

Ooba doesn't support speculative deconding unfortunately, but exui and TabbyAPI do.

Upload images, audio, and videos by dragging in the text input, pasting, or clicking here.

Tap or paste here to upload images

· Sign up or log in to comment