Use V1 tokenizer instead

#10
by Rocketknight1 HF staff - opened
No description provided.
Rocketknight1 changed pull request title from Upload tokenizer to Use V1 tokenizer instead

There was an issue with the last PR - we used the V3 tokenizer, but this base model actually uses the V1 tokenizer. This should fix the issue!

@Rocketknight1 does it affect the vocab size? Model and tokenizer sizes are not matching. So model is failing to load.

@lbathen can you give me some code to reproduce that issue? From here it looks like the tokenizer and the model both have a vocab size of 32000

@Rocketknight1 I confirmed that both show same vocab of 32K now. I had pulled the wrong revision :)

Ready to merge
This branch is ready to get merged automatically.

Sign up or log in to comment