Missing tokenizer.model
#4
by
Jeronymous
- opened
Hello,
Congrats and thanks for sharing CroissantLLM :)
We noticed that "tokenizer.model" file is missing, which can cause issues in some workflows.
See for instance https://github.com/huggingface/transformers/issues/29137
Hello !
I think we really only ever kept the fast version of the tokenizer (use_fast = True) and never had to rely on the original sentencepiece tokenizer.model standard...
This is similar as what is done in https://huggingface.co/meta-llama/Meta-Llama-3-8B/.
I don't have any more files than you sadly...
https://github.com/huggingface/transformers/issues/21289
Did you manage to solve this on your end ?
manu
changed discussion status to
closed