Add llama.cpp support
#19
by
KeilahElla
- opened
Huggingface transformers is too slow for inference. Please add llama.cpp support.
On Apple Silicon M2 Max 32GB under torch
I can run most of the multimodals (slowly) but not - so far - direct audio output (but it does save an output.wav
successfully). Requires limiting the parameters and not trying to do more than one mode at once.
Seems to be all about VRAM-size, which is no surprise.
Text response is good-to-high quality (content, not speed).
Image interpretation is good; sometimes excellent.
Audio files, if short, are read and summarised well.
Video, once the frame-size is docked, produces reasonable analysis and description.
Audio output creates an intelligible output.wav
file but so far has not worked in direct mode.