Unusual tokenizer.json file size
#8
by
AuriAetherwiing
- opened
tokenizer.json
in this repo seems to be larger than one from original Qwen2.5-14B-Instruct (11.4mb vs 7.03mb). From personal experience it seems to be a transformers bug, this happened to my models after using Axolotl, LLaMA-Factory, Mergekit and Distillkit, with several different model architectures too. It sometimes seems to have adverse effect on model's coherency, but strangely only sometimes. Thought you'd want to look into this.
For what it's worth, it doesn't seem to affect this particular model, or models finetuned on top of it in any obvious capacity, but still.