MMProj?

by nigeln - opened Apr 24, 2024

Apr 24, 2024

Could you upload the mmproj file as well?
Also were you able to successfully run this model on llama.cpp? I've seen others have issues.

djward888

Owner Apr 25, 2024

I'm sorry, I don't know what an mmproj file is. Would you mind explaining?
Yes, it ran great! I haven't had any issues.

mbak

Apr 26, 2024

@djward888 Thanks for uploading this! I think people keep asking for the mmproj but then not explaining because the mmproj file isn't described anywhere.

I think its a CLIP model that's needed for multimodal inference with llama.cpp

The --mmproj command line argument is needed for both llava-cli and older llama.cpp before it was removed and probably other things built on top of llama.cpp.

Here's another comment abou it.

FYI: to utilize multimodality you have to specify a compatible model (in this case llava 7b) and its belonging mmproj model. The mmproj has to be in f-16

How are you able to run this without an mmproj file?

I tried the file from here with your GGUF and am only getting non-sensical results. Repetition and hallucinations, with almost nothing about the actual image.

djward888

Owner Apr 26, 2024

You're welcome!
Oh I see, that finally makes sense. Thanks so much for explaining! I haven't actually tried it with images, but I've been running it for normal coding in Jan AI. It works quite well for text generation, I just have no need for multimodality.
How did I run it? I just imported the GGUF into Jan. I'm really not an expert on why it does or doesn't work though.

mbak

Apr 26, 2024

Ah, because this is a llava model, I assumed you were also using it for multimodal inference.

I noticed this model is using token

128001 '<|end_of_text|>

for its EOS token instead of

128009 '<|eot_id|>'

This is a common problem for a lot of llama 3 models because it was in the initial llama 3 release. For anyone else seeing this, it can be fixed with a script included in llama.cpp by running

./gguf-py/scripts/gguf-set-metadata.py llava-llama-3-8b-v1_1.Q8_0.gguf tokenizer.ggml.eos_token_id 128009

Now the repetition problem went away and I'm getting slightly better answers (although I don't see how EOS could have affected that) but still lots of hallucination. Maybe multimodel just isn't quite there yet or I need images at a specific resolution or something.

Upload images, audio, and videos by dragging in the text input, pasting, or clicking here.

Tap or paste here to upload images

· Sign up or log in to comment