Issue with Vocabulary Size Mismatch in all LLaMmlein_1B_chat_ peft adapters

#1
by DavidGF - opened

Dear LSX-UniWue Team,

First of all, thank you for providing the LLaMmlein models and their adapters!
While trying to integrate the LLaMmlein_1B_chat_x adapter with the base model LLaMmlein_1b, I encountered a size mismatch error related to the embedding layer and output head.

Specifically, the error occurs due to a difference in vocabulary size:

The base model's vocabulary size is 32,000,
While the adapter expects a vocabulary size of 32,064.
Here’s the relevant error snippet:

RuntimeError: Error(s) in loading state_dict for PeftModelForCausalLM:
  size mismatch for base_model.model.model.embed_tokens.weight: copying a param with shape torch.Size([32064, 2048]) from checkpoint, the shape in current model is torch.Size([32000, 2048]).
  size mismatch for base_model.model.lm_head.weight: copying a param with shape torch.Size([32064, 2048]) from checkpoint, the shape in current model is torch.Size([32000, 2048]).

I suspect this might be due to a discrepancy in the tokenizers or vocabulary configurations used when training the adapter. Could you clarify if there’s a specific tokenizer or additional steps required to align the base model with the adapter?
Alternatively, is it intentional that the adapter has a larger vocabulary size, and if so, what would be the recommended method to handle this mismatch?

Thank you for your time and support. I look forward to your guidance on resolving this issue.

Best regards,
David

Data Science@CAIDAS Uni Würzburg org

Hey David,

you are absolutely right, in the heat of the release we forgot to add an inference script. I just added them to the readmes of the instruct adapters!

Best,
jan

Sign up or log in to comment