Special tokens don't match between draft model and target model
Hi, thanks for making this model quant! I'd love to use it and compare it to the non-fine-tuned Qwen2.5-Coder-0.5B, but unfortunately llama.cpp won't accept it as a matching draft model:
common_speculative_are_compatible: draft model special tokens must match target model to use speculation
common_speculative_are_compatible: tgt: bos = 151643 (0), eos = 151645 (0)
common_speculative_are_compatible: dft: bos = 11 (0), eos = 151645 (0)
srv load_model: the draft model './models/Qwen2.5-Coder-0.5B-QwQ-draft.Q8_0.gguf' is not compatible with the target model './models/QwQ-32B-Preview-Q4_K_M.gguf'
I'm using QwQ quantized to Q4_K_M from this HF repo: bartowski/QwQ-32B-Preview-GGUF
And I'm using your Q8_0 quant.
I have asked the original model creator about it and this is their reponse:
Qwen models don't have any
bos_token
, probably quantatizer/GGUF converter is setting something tobos_token
.
In the above, 151643 should be<|endoftext|>
and 11 is,
. Probably, you should regenerate again GGUF and set<|endoftext|>
as bos token.
Is this something you can change in your quant? Do you also see this issue on your environment? I'm mostly running llama.cpp that has been compiled on the same day, so I should be up to date.
Hi @Nindaleth
Thanks for testing this model. I've never used llama.cpp in a way you are using, so I can't speak for that. But this is how the bos token look like in the quant:
I don't see any bos token being added to the quant. It uses the model's config so if the bos is null, it should leave it null. That said, can you use --bos-token-id
to set the correct ids to match the target model?
That parameter isn't available in llama.cpp AFAIK, but another helpful one is, this one did help:
--override-kv tokenizer.ggml.bos_token_id=int:151643
Previously this KV was present on the target model only which caused the loading issues, with this switch I can now load the draft model as well, cool!