Is the tokenizer missing settings?

#4
by cmcmaster - opened

Having trouble finetuning the Galactica models. In particular, the tokenizer seems to be missing things like a defined padding token "[PAD]". See: https://github.com/paperswithcode/galai/blob/f056e1ad791f994428ca81e25683ed9656b6958f/galai/model.py#L85

Here is a great article by Patrick von Platen (Huggingface) which does an excellent job explaining the details for another LLM (Bloom):
https://huggingface.co/blog/how-to-generate

Sign up or log in to comment