Any chance/timeline for a q8 version?

by skyrien - opened

This model is awesome, even the q5 version! Though with a RTX 4090, I can't quite run the fp16 version properly, and offloading any layers at all seems to break it entirely.

Any chance for a q8 version?

Be about 8 hours.

PsiPi changed discussion status to closed

This model is awesome, even the q5 version! Though with a RTX 4090, I can't quite run the fp16 version properly, and offloading any layers at all seems to break it entirely.

Any chance for a q8 version?

What sort of tokens/second do you get on the rtx 4090? how large are the images you're passing in?

Sign up or log in to comment