Update tokenizer_config.json

by saattrupdan - opened Nov 15, 2024

base: refs/heads/main

←

from: refs/pr/2

Discussion Files changed

-46

saattrupdan

Nov 15, 2024

This removes the extra tokens whose indices exceed the vocab size. Only the pad token is actually used, so we do the standard thing of using the EOS token as the PAD token.

Further, we set the model_max_length, which was previously not set, as well as changing the padding_side to 'left' instead of 'right', as the model is auto-regressive.

Update tokenizer_config.json262a6bd2

saattrupdan changed pull request status to closed Nov 15, 2024

Upload images, audio, and videos by dragging in the text input, pasting, or clicking here.

Tap or paste here to upload images

· Sign up or log in to comment