tokenizer.model

#1
by AtenziaDptoIA - opened

Good evening,

Great news the release of llama-3-sqlcoder-8b!

When attempting to generate the .gguf with FP32 from this new model, I've noticed the absence of the tokenizer.model file. It would be great to be able to use the model in .gguf format without quantization. The results of sqlcoder-7b-2 demonstrated very good performance in medium-sized schemas!!!

Thanks and regards

I converted it to gguf format from this repo omeryentur/llama-3-sqlcoder-8b-GGUF. you can use you can do it yourself. I leave the colab notebook here. https://colab.research.google.com/drive/1P646NEg33BZy4BfLDNpTz0V0lwIU3CHu

@omeryenturfor thanks for the answer!!

Im using the link --> https://colab.research.google.com/drive/1P646NEg33BZy4BfLDNpTz0V0lwIU3CHu , but im still having the tokenizer issue:
Traceback (most recent call last):
File "/content/llama.cpp/convert.py", line 1714, in
main()
File "/content/llama.cpp/convert.py", line 1671, in main
vocab, special_vocab = vocab_factory.load_vocab(vocab_types, model_parent_path)
File "/content/llama.cpp/convert.py", line 1522, in load_vocab
vocab = self._create_vocab_by_path(vocab_types)
File "/content/llama.cpp/convert.py", line 1512, in _create_vocab_by_path
raise FileNotFoundError(f"Could not find a tokenizer matching any of {vocab_types}")
FileNotFoundError: Could not find a tokenizer matching any of ['spm', 'hfft']

tokenizer.jpg

May i be doing something wrong?

Thanks so much and regards

change this part of code !python llama.cpp/convert.py {MODEL_NAME} --outtype f16 --outfile {fp16} -> !python llama.cpp/convert.py {MODEL_NAME} --outtype f16 --outfile {fp16} --vocab-type hfft --pad-vocab

@omeryentur thanks so much for the answer.

I have done that but now im getting the error:
Traceback (most recent call last):
File "/content/llama.cpp/convert.py", line 1714, in
main()
File "/content/llama.cpp/convert.py", line 1671, in main
vocab, special_vocab = vocab_factory.load_vocab(vocab_types, model_parent_path)
File "/content/llama.cpp/convert.py", line 1522, in load_vocab
vocab = self._create_vocab_by_path(vocab_types)
File "/content/llama.cpp/convert.py", line 1512, in _create_vocab_by_path
raise FileNotFoundError(f"Could not find a tokenizer matching any of {vocab_types}")
FileNotFoundError: Could not find a tokenizer matching any of ['hfft']

!python llama.cpp/convert.py {MODEL_NAME} --outtype f16 --outfile {fp16} --vocab-type bpe This should work

It is still executing, but it seems that its working.

Thanks so much for the answers @omeryentur .

Regards

AtenziaDptoIA changed discussion status to closed

Sign up or log in to comment