7B or 8B?

#24
by amgadhasan - opened

Is this actually a 7B model? It's size indicates 8B+ parameters.

Google org

Hi! For clarity, many of those are embedding parameters, which we often do not count in the total parameter count for papers and releases. With respect to the emerging 7B class of open models, we've targeted the same use cases as other models in the 7B class from a hardware and software compatibility standpoint -- so it should be strictly transferable for many, if not all, 7B-class use cases.

trisfromgoogle changed discussion status to closed

@trisfromgoogle So like i said in my comment on the other forum post "I think they are trying to compete with mistral-7b so they are faking the name to seem like its smaller than it actually is. Because the more popular model is a 7b parameter size. If google has a better explanation they can pitch in here."

This all but confirms my suspicions. Can you clarify how many embedding parameters are in the model that take up so much space? Id like to know the "true" size of the model, or more realistically how big is the cut down version without the extra parameters when compared to other models similar to it? This will give us a better idea of the true requirements of the model considering its awkward state in size.

@trisfromgoogle Excluding the embedding parameters (which still take part in computation since there is something called weight tying), the number of parameters of the blocks is still 7,751,248,896 which is nearly 8 billion according to gemma's technical paper and such this model cannot be called as a 7B model.

Damn calling them out

Sign up or log in to comment