Great! Others with 3B/7B/1.5B planned?
Thank you, a draft collection is great! i have missed exactly that so far, especially for these large models.
In my first test it doesn't give a speed advantage (it always depends a bit on the system) so it would be interesting to have other sizes like a 7B and 3B?
In my observation with other models (72B, 32B) the best draft model sizes were the ones that had a combination of good prediction (not too small like 0.5) and yet small enough not to steal too much resources (like >=14B). Mostly these were 3B and 7B, 1.5B was still partly ok.
Yeah, I'm just trying fine-tuning the 0.5B
first and will then test larger sizes. The 0.5B
started off with only around 33% top-1 accuracy, so will be interesting to see what it ends up as after fine-tuning on around 500M-1B tokens to compare with.