Text Generation
Transformers
PyTorch
French
English
llama
legal
code
text-generation-inference
art
Inference Endpoints

Missing tokenizer.model

#4
by Jeronymous - opened

Hello,
Congrats and thanks for sharing CroissantLLM :)

We noticed that "tokenizer.model" file is missing, which can cause issues in some workflows.
See for instance https://github.com/huggingface/transformers/issues/29137

CroissantLLM org

Hello !
I think we really only ever kept the fast version of the tokenizer (use_fast = True) and never had to rely on the original sentencepiece tokenizer.model standard...

This is similar as what is done in https://huggingface.co/meta-llama/Meta-Llama-3-8B/.

I don't have any more files than you sadly...

https://github.com/huggingface/transformers/issues/21289

Did you manage to solve this on your end ?

manu changed discussion status to closed

Sign up or log in to comment