Create the tokenizer.json properly (with TemplateProcessing included).

#75
by Narsil HF staff - opened

Turns out, transformers will override the TemplateProcessing meaning in python land the bug is not seen.
https://github.com/huggingface/transformers/blame/main/src/transformers/models/llama/tokenization_llama_fast.py#L203-L205

However in Rust land, the definition is taken exactly from the tokenizer.json file, therefore it's missing the TemplateProcessor.

Adding it would fix the issue in rust land without affecting python land (since the override is still there)

Cannot merge
This branch has merge conflicts in the following files:
  • tokenizer.json

Sign up or log in to comment