Can't add special tokens to a model quantized with bitsnbytes

#50
by mattma1970 - opened

I'm trying to instruction finetune phi 1.5 on the OpenOrca/SlimOrca dataset. I'm using a prompt formatting that uses special tokens ['[INST]','[/INST]','<>','','']. To get it to run locally on my rtx4090 I need to quantize the model and am using bnb 4bit. When I add the special tokens and adjust the embedding size as follows:

special_tokens_dict = {'additional_special_tokens': ['[INST]','[/INST]']}
num_added_toks = tokenizer.add_special_tokens(special_tokens_dict)
model.resize_token_embeddings(len(tokenizer))

I get an error thrown by the resize_token_embeddings function saying that 'to only accepts complex or floats' . The part of that function that is problematic is

new_embeddings.to(
embedding_layer.weight.device,
dtype=embedding_layer.weight.dtype,
)

because the embedding_layer.weight.dtype, which is got from the last linear layer of the model, is torch.uint8. This must be because the model has been quantized. If I remove the quantization, the problem goes away but the model VRAM consumption then blows out.

Has anyone encountered this before and found a work around?

Hello @mattma1970 !

Do you need to resize the token's embeddings? This model has been trained with a larger vocabulary size to allow for further addition of new tokens.

Its vocabulary size is 51200 whereas the tokenizer last token is indexed at 50294. Even though these extra 906 tokens account in the final number of model's parameters, they were not embedded during the training and can be used to index extra tokens and fine-tune them.

gugarosa changed discussion status to closed

Sign up or log in to comment