QAT as first-class citizen

#10
by ioannisnousias - opened

This is a great work and congrats on your first release!

Have you considered making quantization a first class citizen, with quantization aware training part of the GRPO loop, going all the way to ternary? IMHO, this is where it's at for local AI.

A 122B QAT ternary MoE will likely outperform a 35B Q8 MoE, while occupying less RAM.

Sign up or log in to comment