小an you increase LLAMA3 8b simply by duplicating some layers?

#2
by Regrin - opened

Tell me, can you increase LLAMA3 8b simply by duplicating some layers?
Will this be of any use? I would like there to be a model, say, at 13b, so that, on the one hand, it would be easy to train, and on the other, it would be quite smart. I hope that such a transformation can preserve the performance of the model on the one hand, and increase the learning prospects on the other.

Hey, unfortunately these self-merging technique performs poorly with small models like 8B. This has proven successful with continuous pre-training, as in SOLAR. You'd probably need to retrain the model if you want to get to this size without a massive loss of performance

And if you do this, will the model lose performance at all? If not, that's great! Then it will be possible to train on the generated GPT4 datasets, the result will be much better than with the 8b model.

Am I right that the 8b models have reached the limit of their capabilities?

I don't think they did. There's still a lot of performance you can squeeze out of 8B models.

Sign up or log in to comment