Transformers
English
tokenizers
tokenization

pika

๐ŸŽ‰ You are looking at pika 4, which is using FinePhrase and a more efficient vocabulary size!

pika is a simple and public domain-like tokenizer.

Special Tokens

  • End-of-Sequence token: <|endoftext|> (ID 0)

Training

pika was trained on a portion of format/explanation-1b-hq from FinePhrase.

Limitations

Extra special tokens aren't present, you'll have to add them manually if needed.

Downloads last month

-

Downloads are not tracked for this model. How to track
Inference Providers NEW
This model isn't deployed by any Inference Provider. ๐Ÿ™‹ Ask for provider support

Dataset used to train qikp/pika-4

Space using qikp/pika-4 1