Vocab_size of the model configuration is incorrect

#18
by robkirk - opened

In the model configuration for this (and other opt models) the vocab_size is 50272, but the tokenizer has vocab size 50265, which matches the original vocabulary here. and the one on huggingface here. Could this be updated somehow (although I realise that could mess with checkpoints etc.)?

There's this issue on the transformers github referencing the samething.

Hey @robkirk ,

Good question! I think you can find the answer here: https://github.com/huggingface/transformers/issues/17431#issuecomment-1224231170 (it was on another GitHub issue)

Sign up or log in to comment