10b size

#1
by WesPro - opened

I think your model "grew" because it was made back when the mergekit still had a bug that caused Gemma2-9b merges to end up bloated without any mistake on the user side. If you're interested in debloating or running the merges .yml again but this time after deleting your old mergekit and building a new and updated one from scratch to make sure it's a working, eliminating the possibility that some old mergekit file is still causing problems. You could also try to follow the approach grimjim shared and
described here: https://huggingface.co/posts/grimjim/968917199366229 without having to do the whole merge from the beginning.

I think your model "grew" because it was made back when the mergekit still had a bug that caused Gemma2-9b merges to end up bloated without any mistake on the user side. If you're interested in debloating or running the merges .yml again but this time after deleting your old mergekit and building a new and updated one from scratch to make sure it's a working, eliminating the possibility that some old mergekit file is still causing problems. You could also try to follow the approach grimjim shared and
described here: https://huggingface.co/posts/grimjim/968917199366229 without having to do the whole merge from the beginning.

Yes, I've realized that the old version of transformers and mergekit would add lm_head weights to models even if they originally didn't have an lm_head (like Gemma 2 and Llama 3.2 3B/1B). I actually already have a non-bloated version of the model - I had kept it private before. It's the same model, but I removed the lm_head after loading it and then reuploaded/pushed it to the Hub: https://huggingface.co/Hastagaras/sunmoy-no-lmhead-test

Sign up or log in to comment