This is a very small uncased tokenizer for the non-ascii version of TinyStories, based on the original TinyStories dataset. I use a WordPiece tokenizer with a vocabulary of 2048.
The tokenizer is strictly fitted to the mentioned dataset and probably won't work well in any context outside of children's stories.
Unable to determine this model's library. Check the
docs
.