there is no tokenizer.model file

#35

by zhaowei0315 - opened 18 days ago

Discussion

zhaowei0315

18 days ago

•

edited 18 days ago

Can you please provide this file?
an error happened when I try to convert a finetuned model to gguf.

Traceback (most recent call last):
File "/home/xxx/jupyter/Ollama/llama.cpp/convert-hf-to-gguf.py", line 1278, in set_vocab
self. _set_vocab_sentencepiece()
File "/home/xxx/jupyter/Ollama/llama.cpp/convert-hf-to-gguf.py", line 567, in _set_vocab_sentencepiece
raise FileNotFoundError(f"File not found: {tokenizer_path}")
FileNotFoundError: File not found: model/tokenizer.model

shenzhi-wang

Owner 18 days ago

We convert the model to GGUF using the following commands, which might not raise such errors:
https://github.com/ggerganov/llama.cpp/discussions/2948#discussion-5580716
https://github.com/ggerganov/llama.cpp/discussions/2948#discussioncomment-6889679

We also have provided GGUF models as follows:
https://huggingface.co/shenzhi-wang/Llama3-8B-Chinese-Chat-GGUF-8bit
https://huggingface.co/shenzhi-wang/Llama3-8B-Chinese-Chat-GGUF-4bit
https://huggingface.co/shenzhi-wang/Llama3-8B-Chinese-Chat-GGUF-f16

zhaowei0315

17 days ago

•

edited 17 days ago

@shenzhi-wang many thanks for your quick response, but I didn't find any commands from your links.
I mean I finetuned your model by Unsloth with my own dataset. but Unsloth goes error when converts finetuned model to GGUF.
Did you encounter such errors before?

Unsloth logs:
==((====))== Unsloth: Conversion from QLoRA to GGUF information
\ /| [0] Installing llama.cpp will take 3 minutes.
O^O/ _/ \ [1] Converting HF to GUUF 16bits will take 3 minutes.
\ / [2] Converting GGUF 16bits to q4_k_m will take 20 minutes.
"-____-" In total, you will have to wait around 26 minutes.

Unsloth: [0] Installing llama.cpp. This will take 3 minutes...
Unsloth: We must use f16 for non Llama and Mistral models.
Unsloth: [1] Converting model at model into f16 GGUF format.
The output location will be ./model-unsloth.F16.gguf
This will take 3 minutes...
INFO:hf-to-gguf:Loading model: model
INFO:gguf.gguf_writer:gguf: This GGUF file is for Little Endian only
INFO:hf-to-gguf:Set model parameters
INFO:hf-to-gguf:gguf: context length = 8192
INFO:hf-to-gguf:gguf: embedding length = 4096
INFO:hf-to-gguf:gguf: feed forward length = 14336
INFO:hf-to-gguf:gguf: head count = 32
INFO:hf-to-gguf:gguf: key-value head count = 8
INFO:hf-to-gguf:gguf: rope theta = 500000.0
INFO:hf-to-gguf:gguf: rms norm epsilon = 1e-05
INFO:hf-to-gguf:gguf: file type = 1
INFO:hf-to-gguf:Set model tokenizer
Special tokens have been added in the vocabulary, make sure the associated word embeddings are fine-tuned or trained.
WARNING:hf-to-gguf:

WARNING:hf-to-gguf:**************************************************************************************
WARNING:hf-to-gguf:** WARNING: The BPE pre-tokenizer was not recognized!
WARNING:hf-to-gguf:** There are 2 possible reasons for this:
WARNING:hf-to-gguf:** - the model has not been added to convert-hf-to-gguf-update.py yet
WARNING:hf-to-gguf:** - the pre-tokenization config has changed upstream
WARNING:hf-to-gguf:** Check your model files and convert-hf-to-gguf-update.py and update them accordingly.
WARNING:hf-to-gguf:** ref: https://github.com/ggerganov/llama.cpp/pull/6920
WARNING:hf-to-gguf:**
WARNING:hf-to-gguf:** chkhsh: c136ed14d01c2745d4f60a9596ae66800e2b61fa45643e72436041855ad4089d
WARNING:hf-to-gguf:**************************************************************************************
WARNING:hf-to-gguf:

Traceback (most recent call last):
File "/home/xxx/jupyter/Ollama/llama.cpp/convert-hf-to-gguf.py", line 1278, in set_vocab
self. _set_vocab_sentencepiece()
File "/home/xxx/jupyter/Ollama/llama.cpp/convert-hf-to-gguf.py", line 567, in _set_vocab_sentencepiece
raise FileNotFoundError(f"File not found: {tokenizer_path}")
FileNotFoundError: File not found: model/tokenizer.model

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
File "/home/xxx/jupyter/Ollama/llama.cpp/convert-hf-to-gguf.py", line 1281, in set_vocab
self._set_vocab_llama_hf()
File "/home/xxx/jupyter/Ollama/llama.cpp/convert-hf-to-gguf.py", line 625, in _set_vocab_llama_hf
vocab = LlamaHfVocab(self.dir_model)
File "/home/xxx/jupyter/Ollama/llama.cpp/convert.py", line 577, in init
raise TypeError('Llama 3 must be converted with BpeVocab')
TypeError: Llama 3 must be converted with BpeVocab

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
File "/home/xxx/jupyter/Ollama/llama.cpp/convert-hf-to-gguf.py", line 2546, in
main()
File "/home/xxx/jupyter/Ollama/llama.cpp/convert-hf-to-gguf.py", line 2531, in main
model_instance.set_vocab()
File "/home/xxx/jupyter/Ollama/llama.cpp/convert-hf-to-gguf.py", line 1284, in set_vocab
self._set_vocab_gpt2()
File "/home/xxx/jupyter/Ollama/llama.cpp/convert-hf-to-gguf.py", line 494, in _set_vocab_gpt2
tokens, toktypes, tokpre = self.get_vocab_base()
File "/home/xxx/jupyter/Ollama/llama.cpp/convert-hf-to-gguf.py", line 381, in get_vocab_base
tokpre = self.get_vocab_base_pre(tokenizer)
File "/home/xxx/jupyter/Ollama/llama.cpp/convert-hf-to-gguf.py", line 486, in get_vocab_base_pre
raise NotImplementedError("BPE pre-tokenizer was not recognized - update get_vocab_base_pre()")
NotImplementedError: BPE pre-tokenizer was not recognized - update get_vocab_base_pre()
Unsloth: Conversion completed! Output location: ./model-unsloth.F16.gguf
Unsloth: [2] Converting GGUF 16bit into q4_k_m. This will take 20 minutes...
main: build = 2887 (583fd6b0)
main: built with gcc (GCC) 10.2.0 for x86_64-redhat-linux
main: quantizing './model-unsloth.F16.gguf' to './model-unsloth.Q4_K_M.gguf' as Q4_K_M using 192 threads
gguf_init_from_file: invalid magic characters '
'
llama_model_quantize: failed to quantize: llama_model_loader: failed to load model from ./model-unsloth.F16.gguf

main: failed to quantize model from './model-unsloth.F16.gguf'
Traceback (most recent call last):
File "/home/xxx/jupyter/Ollama/finetune.py", line 179, in
model.save_pretrained_gguf("model", tokenizer, quantization_method = "q4_k_m")
File "/home/xxx/.conda/envs/unsloth/lib/python3.10/site-packages/unsloth/save.py", line 1381, in unsloth_save_pretrained_gguf
file_location = save_to_gguf(model_type, is_sentencepiece_model,
File "/home/xxx/.conda/envs/unsloth/lib/python3.10/site-packages/unsloth/save.py", line 1045, in save_to_gguf
raise RuntimeError(
RuntimeError: Unsloth: Quantization failed! You might have to compile llama.cpp yourself, then run this again.
You do not need to close this Python program. Run the following commands in a new terminal:
You must run this in the same folder as you're saving your model.
git clone --recursive https://github.com/ggerganov/llama.cpp
cd llama.cpp && make clean && make all -j
Once that's done, redo the quantization.

shenzhi-wang

Owner 17 days ago

•

edited 17 days ago

My links provided to you indeed include the instructions on how to convert models to GGUF. Please check them again carefully.

As for your error messages, I suggest that you'd better convert the models to GGUF with BpeVocab.

shenzhi-wang

Owner 17 days ago

And this might be helpful to you:
https://github.com/ggerganov/llama.cpp/issues/3256#issuecomment-1726639646

We have not changed the tokenizer, so you can use the tokenizer file of the original llama3-8b-instruct.

Frozenbananana

6 days ago

hi
May be you should use the conver.py like this:
python llm/llama.cpp/convert.py --outtype f16 --outfile .gguf --vocab-type bpe

Upload images, audio, and videos by dragging in the text input, pasting, or clicking here.

Tap or paste here to upload images

· Sign up or log in to comment