Conversion to tiktoken
#4
by
koyfman
- opened
Not a general way that I know of, but the original gpt4 tokenizer is already in tiktoken format, which you can use. Do you have a specific tokenizer in mind?
the original gpt4 tokenizer is already in tiktoken format,
Right, I was more thinking about training a HF tokenizer from scratch and creating a tiktoken model from that. Thanks π
I'd love to see that, the performance gains for semantic chunking with semchunk
would be great since tiktoken
is much faster than transformers
when it comes to tokenization.