https://huggingface.co/Kooten/Mistral-Nemo-Instruct-2407-norefuse-OAS
Looking to get GGUFs of this model. The newest version of llama.cpp has the fixes needed.
Thank you.
queued, let's see what happens :)
unfortunately, the pretokenizer is not supported by llama.cpp (aa78fe8b04bc622b077520b1fb3d3a5c6f7a53dd375e2361e62599be3cf58de1)
According to https://github.com/ggerganov/llama.cpp/pull/8604 aa78fe8b04bc622b077520b1fb3d3a5c6f7a53dd375e2361e62599be3cf58de1
is the old tokenizer. The latest tokenizer 63b97e4253352e6f357cc59ea5b583e3a680eaeaf2632188c2b952de2588485e
should be supported by llama.cpp. The reason the old tokenized doesn't work is because only the hash of the new one is hardcoded at https://github.com/ggerganov/llama.cpp/blob/de280085e7917dbb7f5753de5842ff4455f82a81/convert_hf_to_gguf.py#L600C23-L600C87. The reason convert_hf_to_gguf_update.py
doesn't automaticaly download the latest tokenizer is because https://huggingface.co/mistralai/Mistral-Nemo-Base-2407 is gated and so it will fail unless a huggingface token with access to this model is specified.
I don't think the pretokenizer is hardcoded for the usual meaning of that word. in any case, different hashes mean different pretokenizers, so if llama.cpp has no support for the old one, it can't be done (properly), unless it's known why they are different. in any case, the model would likely need to be updated.