Is vision/multimodal input supported in this GGUF build?

#8
by grubwithu - opened

I can run text inference successfully, but the image upload button in llama-server's web UI appears disabled. Does the PR #24523 implementation include vision encoder support, or only text for now?

No vision projector, look here:

https://huggingface.co/unsloth/MiniMax-M3-GGUF/tree/main

There is no mmproj file in there.

If you try to create a vision projector file with PR #24523 you will get this:

  • INFO:hf-to-gguf:Loading model: MiniMax-M3-uncensored-heretic-aggressive
  • INFO:hf-to-gguf:Model architecture: MiniMaxM3SparseForConditionalGeneration
  • ERROR:hf-to-gguf:Model MiniMaxM3SparseForConditionalGeneration is not supported

No vision projector, look here:

https://huggingface.co/unsloth/MiniMax-M3-GGUF/tree/main

There is no mmproj file in there.

If you try to create a vision projector file with PR #24523 you will get this:

  • INFO:hf-to-gguf:Loading model: MiniMax-M3-uncensored-heretic-aggressive
  • INFO:hf-to-gguf:Model architecture: MiniMaxM3SparseForConditionalGeneration
  • ERROR:hf-to-gguf:Model MiniMaxM3SparseForConditionalGeneration is not supported

Thanks.

No vision projector, look here:

https://huggingface.co/unsloth/MiniMax-M3-GGUF/tree/main

There is no mmproj file in there.

If you try to create a vision projector file with PR #24523 you will get this:

  • INFO:hf-to-gguf:Loading model: MiniMax-M3-uncensored-heretic-aggressive
  • INFO:hf-to-gguf:Model architecture: MiniMaxM3SparseForConditionalGeneration
  • ERROR:hf-to-gguf:Model MiniMaxM3SparseForConditionalGeneration is not supported

Thanks.

You're welcome.

Sign up or log in to comment