Request for additional REAP/Quant

#1
by TheOlRazzleDazzle - opened

Hi @pipenetwork , thanks for the rapid creation of GLM-5.2 variations!

I am targeting a hardware limit of 256 GB memory on MLX, which the REAP50 variant achieves but the perplexity divergence is quite discouraging. I haven't run the exact numbers but I suspect there would be sufficient headroom for the model + 256k+ KV cache, you created a REAP40 variant (provided REAP37 doesn't already fit) with an MXFP4 quantisation.

Let me know if this would be feasible / if you have other ideas on optimisation the model size vs cohesiveness :)

Sign up or log in to comment