tdooms
/

TinyStories-2048

Model card Files Files and versions Community

TinyStories-2048 / README.md

tdooms's picture

Upload tokenizer

fbec681 verified 4 months ago

|

history blame contribute delete

No virus

435 Bytes

	---
	{}
	---
	This is a very small uncased tokenizer for the [non-ascii version of TinyStories](https://huggingface.co/datasets/tdooms/TinyStories), based on the [original TinyStories dataset](https://huggingface.co/datasets/roneneldan/TinyStories). I use a WordPiece tokenizer with a vocabulary of 2048.

	The tokenizer is strictly fitted to the mentioned dataset and probably won't work well in any context outside of children's stories.