Add llama.cpp support

#19
by KeilahElla - opened

Huggingface transformers is too slow for inference. Please add llama.cpp support.

On Apple Silicon M2 Max 32GB under torch I can run most of the multimodals (slowly) but not - so far - direct audio output (but it does save an output.wav successfully). Requires limiting the parameters and not trying to do more than one mode at once.
Seems to be all about VRAM-size, which is no surprise.
Text response is good-to-high quality (content, not speed).
Image interpretation is good; sometimes excellent.
Audio files, if short, are read and summarised well.
Video, once the frame-size is docked, produces reasonable analysis and description.
Audio output creates an intelligible output.wav file but so far has not worked in direct mode.

Your need to confirm your account before you can post a new comment.

Sign up or log in to comment