llama.cpp convert problem report(about `tokenizer.json`)
I attempted to convert this model to gguf using the convert_hf_to_gguf.py
script from llama.cpp, but encountered an error:
[
FileNotFoundError: File not found: F:\OpensourceAI-models\SuperNova-Medius\tokenizer.model
Exception: data did not match any variant of untagged enum ModelWrapper at line 757443 column 3
]
After downloading tokenizer.json
from qwen2.5-14B, replacing the file with the same name in this model's directory with it, I was able to successfully convert the model to gguf.
I made a rough comparison of the two "tokenizer.json" files and found that they are mostly similar except for some formatting differences. This model's tokenizer.json
has an additional line "ignore_merges": false
, while other parts seem unchanged.
I am unsure of the reason behind this issue, nor do I know if others might encounter a similar problem. Therefore, I report it here for reference.
I appreciate the report. I’ll loop in @bartowski - as he did our GGUF conversions.
@Crystalcareai i did chat with fp16 gguf but its not doing very well pretty slow tbh
AWQ with dataset calibration?
If you update transformers
and tokenizers
this error should go away.
I actually did have a problem with the tokenizer but i think because my docker image had a more updated version than my main OS i got past it for the conversion, so yeah tokenizers and/or transformers definitely needs an update
Thanks for the suggestions. Then, I will close this topic later.😊