Why 72B model has different vocab size comparing with other models?

by Mikasaka - opened Feb 8

Feb 8

I found this 72B has vocab size of 152064, while other 7b 4b models etc have vocab size of 151936. Why it is designed in such way?

hafezmg48

Apr 22

I also have a similar problem. For Qwen 1.8B they mentioned that the vocab size is 151851, and the tokenizer also has the same 151851 vocabs, but in the model weights, the vocab_size is 151936. Can someone explain why it is that way? Thanks.

JustinLin610

Qwen org Apr 24

The vocabularies are the same actually. The reason why we have different sizes of vocab is our distributed training. For larger models trained across devices, we need padding for the vocab.

jklj077 changed discussion status to closed Apr 26

cduk

12 days ago

•

edited 12 days ago

The problem is that vLLM checks for vocab size and if it doesn't match, the speculative decoding is not enabled. If you pad, then maybe pad all models to the same vocab size.

Upload images, audio, and videos by dragging in the text input, pasting, or clicking here.

Tap or paste here to upload images

· Sign up or log in to comment