mradermacher/model_requests · https://huggingface.co/Kooten/Mistral-Nemo-Instruct-2407-norefuse-OAS

StatusQuo209

Jul 23

•

edited Jul 23

Looking to get GGUFs of this model. The newest version of llama.cpp has the fixes needed.

Thank you.

mradermacher

Owner Jul 23

queued, let's see what happens :)

mradermacher changed discussion status to closed Jul 23

mradermacher

Owner Jul 23

unfortunately, the pretokenizer is not supported by llama.cpp (aa78fe8b04bc622b077520b1fb3d3a5c6f7a53dd375e2361e62599be3cf58de1)

nicoboss

Jul 24

•

edited Jul 24

According to https://github.com/ggerganov/llama.cpp/pull/8604 aa78fe8b04bc622b077520b1fb3d3a5c6f7a53dd375e2361e62599be3cf58de1 is the old tokenizer. The latest tokenizer 63b97e4253352e6f357cc59ea5b583e3a680eaeaf2632188c2b952de2588485e should be supported by llama.cpp. The reason the old tokenized doesn't work is because only the hash of the new one is hardcoded at https://github.com/ggerganov/llama.cpp/blob/de280085e7917dbb7f5753de5842ff4455f82a81/convert_hf_to_gguf.py#L600C23-L600C87. The reason convert_hf_to_gguf_update.py doesn't automaticaly download the latest tokenizer is because https://huggingface.co/mistralai/Mistral-Nemo-Base-2407 is gated and so it will fail unless a huggingface token with access to this model is specified.

mradermacher

Owner Jul 24

•

edited Jul 24

I don't think the pretokenizer is hardcoded for the usual meaning of that word. in any case, different hashes mean different pretokenizers, so if llama.cpp has no support for the old one, it can't be done (properly), unless it's known why they are different. in any case, the model would likely need to be updated.