8b version?

#2
by ElvisM - opened

Looks like an interesting model, but I'd really like an 8b version of it. 70b is too big for me.

Hey ElvisM, I hear you on that! A lot of folks including myself were wondering if a 8b version was in the works. The original model from tdrussel had this to say about it on the card:

Why no 8B?
I tried multiple times to train this on Llama 3 8B Instruct, using a variety of hyperparameters. It never worked well. The model took a huge hit to intelligence every time, to the point of being unusable. 70B fared much better. I don't know why, maybe 8B is just too small for this type of technique, and loses too much of the instruction-tuned smarts.

Doesn't sound like the fine-tuning was as effective on the smaller size. It might be possible with a different dataset or the process itself would need to be updated somehow for 8b. Having used the 70b I could see a lot of value in having a smaller version of it.

Sign up or log in to comment