How to use an assistent model for MTP in llama.cpp?

#29

by Regrin - opened 9 days ago

Discussion

Regrin

9 days ago

How to use an assistent model for MTP in llama.cpp?

lexrivera

8 days ago

•

edited 8 days ago

My example is from 31b but should be same for every model in gemma 4 family

      --model gemma-4-31B-it-qat-UD-Q4_K_XL.gguf
      --model-draft gemma-4-31b-it-qat-q4_0-assistant.gguf
      --spec-type draft-mtp
      --spec-draft-n-max 4

Tune --spec-draft-* stuff to your liking.

ggufs should be available around huggingface, people already converted them.

Regrin

8 days ago

Thanks!

Upload images, audio, and videos by dragging in the text input, pasting, or clicking here.

Tap or paste here to upload images

· Sign up or log in to comment