Can this be trained?

by umarbutler - opened Sep 18, 2024

Sep 18, 2024

Just like any other Hugging Face tokenizer, it does seem like this tokenizer can be trained, however, I just wanted to check whether there were any caveats to this implementation that mean that if one were to try to train this tokenizer, say, on the exact same dataset used to create the gpt4o tokenizer, you'd still end up with a different tokenizer.

Upload images, audio, and videos by dragging in the text input, pasting, or clicking here.

Tap or paste here to upload images

Your need to confirm your account before you can post a new comment.

· Sign up or log in to comment