How to use an assistent model for MTP in llama.cpp?

#29
by Regrin - opened

How to use an assistent model for MTP in llama.cpp?

My example is from 31b but should be same for every model in gemma 4 family

      --model gemma-4-31B-it-qat-UD-Q4_K_XL.gguf
      --model-draft gemma-4-31b-it-qat-q4_0-assistant.gguf
      --spec-type draft-mtp
      --spec-draft-n-max 4

Tune --spec-draft-* stuff to your liking.

ggufs should be available around huggingface, people already converted them.

Thanks!

Sign up or log in to comment