Compatibility with llama-cpp-python and Ollama

#124
by liashchynskyi - opened

Hi there!

I've tried some quantized versions of this model and ran into an issue. I use llama-cpp-python for model inference. When I provide a question, I get infinite random characters as the result (see screenshot). But when I create a local model from the same quantized gguf by using Modelfile for Ollama inference, then everything works fine. So the issue is that Ollama works, and llama-cpp-python provides random output. The same behavior was noticed with a couple other models, like defog/llama-3-sqlcoder-8b.

Is anyone here experiencing same issues?

image.png

Sign up or log in to comment