Qwen/Qwen2.5-Omni-7B · Add llama.cpp support

On Apple Silicon M2 Max 32GB under torch I can run most of the multimodals (slowly) but not - so far - direct audio output (but it does save an output.wav successfully). Requires limiting the parameters and not trying to do more than one mode at once.
Seems to be all about VRAM-size, which is no surprise.
Text response is good-to-high quality (content, not speed).
Image interpretation is good; sometimes excellent.
Audio files, if short, are read and summarised well.
Video, once the frame-size is docked, produces reasonable analysis and description.
Audio output creates an intelligible output.wav file but so far has not worked in direct mode.