Can't merge this model with other 7B's

#2
by son-of-man - opened

I think this model shows a lot of potential for merging but I can't seem to get it to work.
I can run the model on koboldcpp without problems, but when I try to merge it with other 7B models through mergekit, it gives me the following RuntimeError: Tensor lm_head.weight required but not present in model Severian/Mistral-v0.2-Nexus-Internal-Knowledge-Map-7B
Could this be a fixable issue in the configuration, or would it require a full retraining for it to be compatible with other models?

Owner

Great catch! I never noticed an issue, but then again I haven't fully ventured into the realm of merging yet. After inspection at the blocks and layers, it looks like it is missing a head. Strangely enough still works haha

I originally trained this on Unsloth and used a Laser-QLoRa approach so the Lm_head must have gotten dropped during the fuse or something. So strange that it seemed to function for the most part. I'll need to do a re-training to fix the issue but it definitely warrants it so that the model can be merged properly. Would love to see what results come from it!

I'll get started on the retraining this weekend and have a new, more robust version available. I'll ping you once it's up! Thanks for reaching out

Wow that really is strange indeed, I look forward to trying out the full version hahah

son-of-man changed discussion status to closed

Sign up or log in to comment