Safetensors
English
qwen3_5
Eval Results

No MTP layers?

#3
by soyalemujica - opened

I'm trying to quantize this into a GGUF model, but I think it does not have MTP layers? I'm using the normal convert_to_gguf.py file from llama.cpp, no speciail arguments, just normal stuff to generate the GGUF model in F16 and later quantizing to Q6K and Q5KM, however, trying to pass MTP does not work

I'm trying to quantize this into a GGUF model, but I think it does not have MTP layers? I'm using the normal convert_to_gguf.py file from llama.cpp, no speciail arguments, just normal stuff to generate the GGUF model in F16 and later quantizing to Q6K and Q5KM, however, trying to pass MTP does not work

https://huggingface.co/pearsonkyle/tmax-27b-imatrix-MTP-GGUF

I'm trying to quantize this into a GGUF model, but I think it does not have MTP layers? I'm using the normal convert_to_gguf.py file from llama.cpp, no speciail arguments, just normal stuff to generate the GGUF model in F16 and later quantizing to Q6K and Q5KM, however, trying to pass MTP does not work

https://huggingface.co/pearsonkyle/tmax-27b-imatrix-MTP-GGUF

Too low of a quantification for my needs, Q5KM or Q6K is needed, Q4KM always gets stuck in thinking loops

I'm trying to quantize this into a GGUF model, but I think it does not have MTP layers? I'm using the normal convert_to_gguf.py file from llama.cpp, no speciail arguments, just normal stuff to generate the GGUF model in F16 and later quantizing to Q6K and Q5KM, however, trying to pass MTP does not work

https://huggingface.co/pearsonkyle/tmax-27b-imatrix-MTP-GGUF

Too low of a quantification for my needs, Q5KM or Q6K is needed, Q4KM always gets stuck in thinking loops

I can make you one with the MTP and upload after it's done, in ~hour. Also, I think you'd be surprised by the imatrix quants (probably the iq4xs) in my repo. They are specifically calibrated on usage logs for claude code, open code and qwen code with a setting enabled to help parse special tokens such as the ones at the beginning of chat templates for tool calls. Each quant was used to solve 10 different repo issues in a manner like SWEbench but using nebius/rebench dataset. None appear to get stuck in loops or reach the max number of turns. However, sampling with a repetition penalty > 1 can sometimes help these qwen models from looping.

Sign up or log in to comment