hivemind/gpt-j-6B-8bit · Adding Special Tokens for Training

Oct 18, 2022

Is there a good way to add special tokens when training? I get an error when I use the resize_token_embedding() function because this model uses FrozenBNBEmbeddings. I already have tried recreating the function with the frozen embeddings in mind but I'm not sure that can work with this kind of model.

justheuristic

hivemind org Oct 18, 2022

Please note: this code has been deprecated for 4 months now. See README for the updated version.

you can look inside the existing tokenizer - there are already some ~unused tokens that can be reused as special tokens
you can resize embeddings in the original model, then run the quantization notebook to get its 8-bit version (see README) . This will take some effort

However, the newly added tokens will not be trained because their embeddings are frozen. You can also implement a custom forward code where the new tokens will be stored in a separate torch.nn.Embedding layer which is not quantized, and hence, fully trainable. In that case, you will need to slightly modify the existing forward pass code - which is something that you will need to figure out by yourself (see note above).

realtimeriddle

Oct 18, 2022

Yes, resizing the original model and then running the quantization notebook does seem to work.
Also, the new token are not being trained. Thank you for clarifying.

vinnitu

May 12, 2023

can we use <|extratoken_1|>, <|extratoken_2|>, etc.?

realtimeriddle

May 23, 2023

Maybe? I tried that back in October and I forget how it went. In any case, justheuristic was correct, the model was depreciated at the time and I think the never version of the transformers library has a better solution for training 8-bit models now anyways.

realtimeriddle changed discussion status to closed May 23, 2023