Ollama and llama.cpp

#13
by KeilahElla - opened

First of all, congrats on this incredible good model. I'm running this on the CPU of my laptop and can get captions at a rate of about 2-3 images per minute. The caption quality is comparable to Llava 1.6 running at 4-bit quantization with Ollama, maybe moondream hallucinates a little less than llava.

Would you be interested in sharing this model in the Ollama library? Ollama (and it's backend llama.cpp) now support a Vulkan backend, which means I will be able to run this on my laptops iGPU. With Llava 1.6, the speedup is more than x2.

@vikhyatk , I see now that moondream is now on ollama library: https://www.ollama.com/library/moondream

Do you know which version is this? I prefer to use it through ollama because it's much faster than transformers. But I also want to use the latest version and do not want to fall behind.

Sign up or log in to comment