Getting MTP working?

#7
by Ewere - opened

Has anyone been able to get MTP working with llama.cpp?

yes. --spec-type draft-mtp --spec-draft-n-max 2
Performance however i cannot judge as with CPU offload it does not get me anywhere better than regular pass.
tested with iq3 quant

MTP works but on apple silicon, the performance deteriorates. Can't speak for other hardware.

for me, on a workstation with 512gb ram and 40gb vram , it is slower with mpt compared to without mpt.

Sign up or log in to comment