How to run the inference using llama.cpp

#1
by TouqeerAhmad - opened

Hello,

In the unquantized version of the GME (https://huggingface.co/Alibaba-NLP/gme-Qwen2-VL-2B-Instruct); we can see how to extract the text, image or fused embeddings from the model. Can you please specify how such an embedding can be extracted from the provided GGUF models e.g., I was trying something like this: ./build/bin/llama-cli -m ./models/gme-Qwen2-VL-2B-Instruct.i1-Q4_K_M.gguf -i ../frames_Episode1/video_0100/frame_1.png -o output_embeddings.json; but it does not look like I can provide the input path. I am a newbie to GGUF, so apologies.

Thanks
-Touqeer

You need a) the vision part (mmproj file, in the static quant repo) and b) a front-end that supports that (e..g llama-server)

Can you please share a sample instruction?

Hmm, looking at llama-saerver, I think all vision code has been removed. You could try llama-llava-cli, or simply some other frontend to llama.cpp, such as koboldcpp, which defnitely supports mmproj files.

I used the vision encoder available in the other directory of models that you shared: https://huggingface.co/mradermacher/gme-Qwen2-VL-2B-Instruct-GGUF/tree/main
I am trying to get the embedding by specifying both GME quantized model and the vision encoder; even to get the text embedding, it says --mmproj is an invalid argument:
'./build/bin/llama-cli -m ./models/gme-Qwen2-VL-2B-Instruct.Q4_K_M.gguf --mmproj ./models/gme-Qwen2-VL-2B-Instruct.mmproj-fp16.gguf --embedding -p "Quantum computing"'. Getting a complete example would be great if you can please provide that. thanks!

Interesting this worked for me fine and provided me a good description of the image: './build/bin/llama-qwen2vl-cli -m ./models/gme-Qwen2-VL-2B-Instruct.Q4_K_M.gguf --mmproj ./models/gme-Qwen2-VL-2B-Instruct.mmproj-fp16.gguf --image ../frames/video_0000/frame_1.jpg -p "describe the image in details"'

Sign up or log in to comment