Question about their name.. why it is 2b???
#36
by
sh0416
- opened
I count the number of parameter and it is 3_030_460_416, which is 3.03 billion in my knowledge.
Does gemma-2b mean that gemma architecture with 2 billion parameters? or is there any other meanings in their name?
from transformers import AutoModelForCausalLM
sum(x.size for _, x in AutoModelForCausalLM.from_pretrained("google/gemma-2b").state_dict().items())
# 3030460416
Did you exclude the vocabulary weight and lm head weight?
If it is, it makes sense as the two weights have 1b parameters.
There are some fairness issues in counting the number of parameters as the gemma has 5 times larger vocabulary size than other models.