tokenizer.model

by AtenziaDptoIA - opened May 12, 2024

May 12, 2024

Good evening,

Great news the release of llama-3-sqlcoder-8b!

When attempting to generate the .gguf with FP32 from this new model, I've noticed the absence of the tokenizer.model file. It would be great to be able to use the model in .gguf format without quantization. The results of sqlcoder-7b-2 demonstrated very good performance in medium-sized schemas!!!

Thanks and regards

omeryentur

May 13, 2024

•

edited May 13, 2024

I converted it to gguf format from this repo omeryentur/llama-3-sqlcoder-8b-GGUF. you can use you can do it yourself. I leave the colab notebook here. https://colab.research.google.com/drive/1P646NEg33BZy4BfLDNpTz0V0lwIU3CHu

AtenziaDptoIA

May 13, 2024

@omeryenturfor thanks for the answer!!

Im using the link --> https://colab.research.google.com/drive/1P646NEg33BZy4BfLDNpTz0V0lwIU3CHu , but im still having the tokenizer issue:
Traceback (most recent call last):
File "/content/llama.cpp/convert.py", line 1714, in
main()
File "/content/llama.cpp/convert.py", line 1671, in main
vocab, special_vocab = vocab_factory.load_vocab(vocab_types, model_parent_path)
File "/content/llama.cpp/convert.py", line 1522, in load_vocab
vocab = self._create_vocab_by_path(vocab_types)
File "/content/llama.cpp/convert.py", line 1512, in _create_vocab_by_path
raise FileNotFoundError(f"Could not find a tokenizer matching any of {vocab_types}")
FileNotFoundError: Could not find a tokenizer matching any of ['spm', 'hfft']

May i be doing something wrong?

Thanks so much and regards

omeryentur

May 13, 2024

change this part of code !python llama.cpp/convert.py {MODEL_NAME} --outtype f16 --outfile {fp16} -> !python llama.cpp/convert.py {MODEL_NAME} --outtype f16 --outfile {fp16} --vocab-type hfft --pad-vocab

AtenziaDptoIA

May 13, 2024

@omeryentur thanks so much for the answer.

I have done that but now im getting the error:
Traceback (most recent call last):
File "/content/llama.cpp/convert.py", line 1714, in
main()
File "/content/llama.cpp/convert.py", line 1671, in main
vocab, special_vocab = vocab_factory.load_vocab(vocab_types, model_parent_path)
File "/content/llama.cpp/convert.py", line 1522, in load_vocab
vocab = self._create_vocab_by_path(vocab_types)
File "/content/llama.cpp/convert.py", line 1512, in _create_vocab_by_path
raise FileNotFoundError(f"Could not find a tokenizer matching any of {vocab_types}")
FileNotFoundError: Could not find a tokenizer matching any of ['hfft']

omeryentur

May 13, 2024

!python llama.cpp/convert.py {MODEL_NAME} --outtype f16 --outfile {fp16} --vocab-type bpe This should work

AtenziaDptoIA

May 13, 2024

It is still executing, but it seems that its working.

Thanks so much for the answers @omeryentur .

Regards

AtenziaDptoIA changed discussion status to closed May 13, 2024

Upload images, audio, and videos by dragging in the text input, pasting, or clicking here.

Tap or paste here to upload images

· Sign up or log in to comment