Number of parameters in the model

#16
by bivouac0 - opened

Is the "-33M" in the model supposed to mean 33 million parameters?
Using the standard method of...
trainable_params = sum(p.numel() for p in model.parameters() if p.requires_grad)
I get 68M parameters.
What am I missing?

See comment about this in the paper. When testing the model we didn't use all tokens in the dictionary, but so that we don't complicate the code, we didn't implement this in the HF version. The parameters indicated in the paper correspond to vocab_size=8,000

Anyway, even though I reduce the vocab size to 8000, the model still has more than 36M parameters. Can you please provide us the model config?

Anyway, even though I reduce the vocab size to 8000, the model still has more than 36M parameters. Can you please provide us the model config?

Could u please provide your config? Thank you so much!

Sign up or log in to comment