thanks a lot
I was searching for llava-1.6 to fit into my 3090 :)
No worries, please note that they are very early and performance should improve when image splitting is taken into account
I have no issues with performance, it works very well on my 3090, however I have other question.
How do you use it?
I was only able to run it with llava-cli by passing prompt.
I was not able to use any kind of chat (tried "-i").
And it doesn't work in server (says it has no multi-modal support).
It may be because I was using Windows binary (but official one..).
I can compile and I can run on Linux, but that won't solve issue how to chat in llava-cli mode - is it possible?
I use it with ./server
primarily. This makes a webserver at localhost:8080
The command is ./server -m <path_to_model> --mmproj <path_to_mmproj-model-f16.gguf>
For running on the terminal use it as such:
./llava-cli -m <path_to_model> --mmproj <path_to_mmproj-model-f16.gguf> --image path/to/an/image.jpg
Unfortunately interactive mode for llava-cli is not possible yet. I miss this functionality as well.
There is issue in llama.cpp project regarding this problem: fail to run llava in interactive mode #4393 ( https://github.com/ggerganov/llama.cpp/issues/4393 )
According to llama.cpp contributor in thread for this issue (update from 10/Dec/2023):
llava-cli does not have instruct mode built in!
It's a one time prompt
one time prompt works for me
the problem is if I want to ask about multiple things I need to run everything again
and if I want to use different image I also need to load model again
I was able to have interactive chat with running server, but I needed to compile it (not use the one from the binary release for Windows).