model has duplicated tensor layers
#1
by
piotr25691
- opened
Gemma 2 ordinarily is supposed to have 2.67B parameters, however due to a defect in mergekit, some tensor layers have been duplicated creating a 3.2B parameter model instead.
This is weird.
Thanks for pointing this out, seems to be some bug in the way mergekit handles the gemma architecture, seen this on a few gemma based merges. Manually removed the extra parameters and updated config files to reflect, now down to the expected size and runs normally!