vikhyatk/moondream2 · Ollama and llama.cpp

Apr 28, 2024

First of all, congrats on this incredible good model. I'm running this on the CPU of my laptop and can get captions at a rate of about 2-3 images per minute. The caption quality is comparable to Llava 1.6 running at 4-bit quantization with Ollama, maybe moondream hallucinates a little less than llava.

Would you be interested in sharing this model in the Ollama library? Ollama (and it's backend llama.cpp) now support a Vulkan backend, which means I will be able to run this on my laptops iGPU. With Llava 1.6, the speedup is more than x2.

KeilahElla

May 19, 2024

@vikhyatk , I see now that moondream is now on ollama library: https://www.ollama.com/library/moondream

Do you know which version is this? I prefer to use it through ollama because it's much faster than transformers. But I also want to use the latest version and do not want to fall behind.

Rendomman067

May 24, 2024

This comment has been hidden

Rendomman067

May 24, 2024

This comment has been hidden

vikhyatk

Owner May 25, 2024

@vikhyatk , I see now that moondream is now on ollama library: https://www.ollama.com/library/moondream

Do you know which version is this? I prefer to use it through ollama because it's much faster than transformers. But I also want to use the latest version and do not want to fall behind.

It may be an older version actually, I’m not sure how it gets updated. Will try to find out.

KeilahElla

May 25

It looks like llama.cpp now supports moondream2: https://github.com/ggml-org/llama.cpp/blob/master/docs/multimodal.md

Just tried it out on my laptop. With vulkan backend it is approximately x4 faster than the pytorch inference on the cpu.

KeilahElla changed discussion status to closed May 25