So far the best model I've tested. Just a little slow

#15
by victorx98 - opened

on my 3090 machine
can't wait to see the result once the community start fine tune and optimize the speed

I would also like to try this model. Could you give me an overview of how you conducted the test?

I would also like to try this model. Could you give me an overview of how you conducted the test?

Use llama.cpp: https://github.com/ggerganov/llama.cpp

Any quantized versions of this? Need smaller than 24GB.

Any quantized versions of this? Need smaller than 24GB.

The one that fits on 24GB is q2_0. It's already pretty extreme quantization with significant quality degradation, just so it can (barely) chug along on a top of the line GPU.

how come it gives error, saying the loaded weights have different shape than the model ...

Sign up or log in to comment