Run on LLamaCPP

#1
by erlonted - opened

How can i run this model in LLamaCPP , i am using the last version from github but it isnt working

Works for me™
Alternatively you can try to build this branch: https://github.com/fairydreaming/llama.cpp/tree/dsv4

What error are you getting?

It also works for me

What error are you getting?

It also works for me

Thank you! It works for me in latest version of llama.cpp. But unfourtunatly it is not usable for agentic tasks, because the context window size very huge.

For example, for model Qwen3.5 397b Q6 (or Nex N2 Pro) I can put to my local machine context window = 262 000 with -ub 14000 -b 14000 that gives me pp=400 t/s, tg=11.5 t/s
For this implementation in llama.cpp I can only put the ctx=92000 with -ub 128 -b 512 that gives me only pp=23 t/s, tg=14 t/s.

It crashed on Mac Studio metal. Got a kernel panic

Sign up or log in to comment