So far the best model I've tested. Just a little slow

#15
by victorx98 - opened

on my 3090 machine
can't wait to see the result once the community start fine tune and optimize the speed

I would also like to try this model. Could you give me an overview of how you conducted the test?

I would also like to try this model. Could you give me an overview of how you conducted the test?

Use llama.cpp: https://github.com/ggerganov/llama.cpp

Any quantized versions of this? Need smaller than 24GB.

Any quantized versions of this? Need smaller than 24GB.

The one that fits on 24GB is q2_0. It's already pretty extreme quantization with significant quality degradation, just so it can (barely) chug along on a top of the line GPU.

how come it gives error, saying the loaded weights have different shape than the model ...

Your need to confirm your account before you can post a new comment.

Sign up or log in to comment