Question about their name.. why it is 2b???

#36
by sh0416 - opened

I count the number of parameter and it is 3_030_460_416, which is 3.03 billion in my knowledge.

Does gemma-2b mean that gemma architecture with 2 billion parameters? or is there any other meanings in their name?

from transformers import AutoModelForCausalLM

sum(x.size for _, x in AutoModelForCausalLM.from_pretrained("google/gemma-2b").state_dict().items())
# 3030460416

Did you exclude the vocabulary weight and lm head weight?
If it is, it makes sense as the two weights have 1b parameters.
There are some fairness issues in counting the number of parameters as the gemma has 5 times larger vocabulary size than other models.

Sign up or log in to comment