This is a very cool idea!

#3
by ddh0 - opened

Finetuning only the output tensor is very clever!! I wonder how this isn't already a commonly used technique, it makes so much sense. We don't need to change what the model thinks, only how it translates its thoughts to speech. Cheap, effective, and much less room for error.

I haven't even tried this model yet (quantizing now), I'm just excited nerding out about this. Do you know if there is any prior research / literature about this technique?

Thanks :)

I didn't specifically look for any papers, though one can argue I referenced my 2023 self from around the time where I built MythoMax!

Radermacher quants were made available a while ago, I noticed. Just be careful with the IQ versions, Gemma 4 hates those for some reason.

Sign up or log in to comment