Can you make ARM optimized quants too?

by mitsu89 - opened 5 days ago

5 days ago

•

Like Q4_0_4_4 GGUF. 7b models can run ok in even a good mid range phones under koboldai termux, but the normal quants are slow in arm.

Nurb4000

5 days ago

I suspect he means to use TPU which is embedded in most current ARM SoC. Hardly anyone supports that, yet.

That is something id love to see too. Messed with it some, limited success, but when it worked it was a great thing.

mitsu89

4 days ago

•

edited 4 days ago

You mean Google tensor? Not really. I just want something like this what running fast enough in my poco X6 pro under termux koboldai

https://huggingface.co/SicariusSicariiStuff/EVA-UNIT-01_EVA-Qwen2.5-7B-v0.0_ARM

I don't know the technical details.

Nurb4000

4 days ago

You mean Google tensor? Not really. I just want something like this what running fast enough in my poco X6 pro under termux koboldai

https://huggingface.co/SicariusSicariiStuff/EVA-UNIT-01_EVA-Qwen2.5-7B-v0.0_ARM

I don't know the technical details.

No i do not mean google. Different companies do call them different things ( love standards ) but modern ARM SoCs have on board processing for AI. Some call them TPU, some call them NPU, and a few other things. Using that is the only way to get decent speed out of an arm machine ( unless you go apple with a GPU ). Unfortunately that part of the industry is still in dis-array so its not 'plug and play' yet, like NVIDIA would be on x86. Its doable, but its not easy. With luck that stabilizes and something like lllama.cpp can include that in the driver set.

( and actually IoT level TPU From google isn't going to work either. It has its place in the AI world, but LLM is not it. )

RichardErkhov

Owner 4 days ago

Right now I joined the forces with team mradermacher. So all gguf are going to be computed with them on their account. All the other quants, like BNB or AWQ, are going for my account. So they might have this model in ARM variant. Maybe I will continue doing gguf for my account later, but definitely not right now

Nurb4000

4 days ago

Right now I joined the forces with team mradermacher. So all gguf are going to be computed with them on their account. All the other quants, like BNB or AWQ, are going for my account. So they might have this model in ARM variant. Maybe I will continue doing gguf for my account later, but definitely not right now

i assume -> https://huggingface.co/mradermacher

RichardErkhov

Owner 4 days ago

you assume correct

Upload images, audio, and videos by dragging in the text input, pasting, or clicking here.

Tap or paste here to upload images

· Sign up or log in to comment