Something isn't working for me.

#1
by Lewdiculous - opened

Not sure if the same issues as with Aurora, but I'm getting this when converting, both to FP16 and BF16:

INFO:hf-to-gguf:Loading model: Puppy_Purpose_0.69
INFO:gguf.gguf_writer:gguf: This GGUF file is for Little Endian only
INFO:hf-to-gguf:Set model parameters
INFO:hf-to-gguf:gguf: context length = 8192
INFO:hf-to-gguf:gguf: embedding length = 4096
INFO:hf-to-gguf:gguf: feed forward length = 14336
INFO:hf-to-gguf:gguf: head count = 32
INFO:hf-to-gguf:gguf: key-value head count = 8
INFO:hf-to-gguf:gguf: rope theta = 500000.0
INFO:hf-to-gguf:gguf: rms norm epsilon = 1e-05
INFO:hf-to-gguf:gguf: file type = 32
INFO:hf-to-gguf:Set model tokenizer
Traceback (most recent call last):
  File "D:\conda\llama.cpp\convert-hf-to-gguf.py", line 2562, in <module>
    main()
  File "D:\conda\llama.cpp\convert-hf-to-gguf.py", line 2547, in main
    model_instance.set_vocab()
  File "D:\conda\llama.cpp\convert-hf-to-gguf.py", line 1288, in set_vocab
    self. _set_vocab_sentencepiece()
  File "D:\conda\llama.cpp\convert-hf-to-gguf.py", line 583, in _set_vocab_sentencepiece
    tokenizer.LoadFromFile(str(tokenizer_path))
  File "C:\Users\User\scoop\apps\miniconda3\current\envs\conver\Lib\site-packages\sentencepiece\__init__.py", line 310, in LoadFromFile
    return _sentencepiece.SentencePieceProcessor_LoadFromFile(self, arg)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
RuntimeError: Internal: C:\b\abs_f7cttiucvr\croot\sentencepiece_1684525347071\work\src\sentencepiece_processor.cc(1102) [model_proto->ParseFromArray(serialized.data(), serialized. Size())]

---

INFO:hf-to-gguf:Loading model: Puppy_Purpose_0.69
INFO:gguf.gguf_writer:gguf: This GGUF file is for Little Endian only
INFO:hf-to-gguf:Set model parameters
INFO:hf-to-gguf:gguf: context length = 8192
INFO:hf-to-gguf:gguf: embedding length = 4096
INFO:hf-to-gguf:gguf: feed forward length = 14336
INFO:hf-to-gguf:gguf: head count = 32
INFO:hf-to-gguf:gguf: key-value head count = 8
INFO:hf-to-gguf:gguf: rope theta = 500000.0
INFO:hf-to-gguf:gguf: rms norm epsilon = 1e-05
INFO:hf-to-gguf:gguf: file type = 1
INFO:hf-to-gguf:Set model tokenizer
Traceback (most recent call last):
  File "D:\conda\llama.cpp\convert-hf-to-gguf.py", line 2562, in <module>
    main()
  File "D:\conda\llama.cpp\convert-hf-to-gguf.py", line 2547, in main
    model_instance.set_vocab()
  File "D:\conda\llama.cpp\convert-hf-to-gguf.py", line 1288, in set_vocab
    self. _set_vocab_sentencepiece()
  File "D:\conda\llama.cpp\convert-hf-to-gguf.py", line 583, in _set_vocab_sentencepiece
    tokenizer.LoadFromFile(str(tokenizer_path))
  File "C:\Users\User\scoop\apps\miniconda3\current\envs\conver\Lib\site-packages\sentencepiece\__init__.py", line 310, in LoadFromFile
    return _sentencepiece.SentencePieceProcessor_LoadFromFile(self, arg)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
RuntimeError: Internal: C:\b\abs_f7cttiucvr\croot\sentencepiece_1684525347071\work\src\sentencepiece_processor.cc(1102) [model_proto->ParseFromArray(serialized.data(), serialized. Size())]
The Chaotic Neutrals org
β€’
edited about 1 month ago

This is the exact error i get on EVERY model, I have no clue what causes it. @Lewdiculous

The Chaotic Neutrals org

This model never saw local hardware, it was completely created in mergekit gui on HF.

I just did cgato/L3-TheSpice-8b-v0.8.3 and that one doesn't have this issue, ugh.

The Chaotic Neutrals org

Are some of the configs modified compared to what its expecting, looks like it's not sure which tokenizer to select.

I am using the llama-bpe files fetched from convert-hf-to-gguf-update.py.
https://files.catbox.moe/45uddo.zip

I remember also having issues with ResplendentAI/Aurora_l3_8B, not sure if exactly these but...

@SolidSnacke have you run into this with some models?

The Chaotic Neutrals org
β€’
edited about 1 month ago

@Lewdiculous

Left is from the files you sent, right is hf tokenizer_config on model. Official instruct hf repo also updated to one one right.
image.png

Official meta instruct repo:
image.png

Surprised that fetching the files now still gave the result in the left. Might be from base model. Didn't have to check so far tbh.

Honestly, I tried with both the llama-bpe config files and the original repo files and the result was the same.

The Chaotic Neutrals org

Yea I've had that issue before as well when doing merges.

The Chaotic Neutrals org

Wait so what did I do wrong? All of the constituent models had been quanted before...

The Chaotic Neutrals org

Ill have to tamper around with this tomorrow when i have time after work.

The Chaotic Neutrals org
β€’
edited about 1 month ago

Wait so what did I do wrong? All of the constituent models had been quanted before...

nothing that i can tell.

Working so far for me:
cgato/L3-TheSpice-8b-v0.8.3
NeverSleep/Llama-3-Lumimaid-8B-v0.1-OAS

As examples, not sure if it applies at this stage.

The Chaotic Neutrals org

Working so far for me:
cgato/L3-TheSpice-8b-v0.8.3
NeverSleep/Llama-3-Lumimaid-8B-v0.1-OAS

As examples, not sure if it applies at this stage.

Both appear to be using the outdated version like on the left example you sent above.

Well then. We did try with the new and previous version.

Maybe we wait for clarifications on this then:
https://github.com/ggerganov/llama.cpp/issues/7129

Converting to GGUF worked for me after deleting tokenizer.model. I can upload some quants if you want.

Eeh. IT ACTUALLY WORKS! Huge, nbeer! Do you want to upload Puppy_Purpose_0.69 then? :D - But I don't mind doing it too.

No go for it! Glad I could help :)

Turns out we're good to go. Hurray!

llm_load_print_meta: model size       = 14.96 GiB (16.00 BPW)
llm_load_print_meta: general.name     = Puppy_Purpose_0.69
llm_load_print_meta: BOS token        = 128000 '<|begin_of_text|>'
llm_load_print_meta: EOS token        = 128009 '<|eot_id|>'
llm_load_print_meta: LF token         = 128 'β”œΓ€'
llm_load_print_meta: EOT token        = 128009 '<|eot_id|>'

I see you've been cooking many experiments, anything hot so far in your opinion?


Update:

I might fall asleep before uploading, if that happens it will come in the morning.

Update:

It's all uploaded.

The Chaotic Neutrals org

Thank you so much @nbeerbower I've been trying to keep all model files together since sometimes NOT having them causes errors, never thought including an official Meta Llama 3 file would break anything.

@Lewdiculous Not really, I just shit out a bunch of models on intuition and let others test them 😝

tbf, RP isn't my main goal and I don't want to use llama in production for licensing reasons... but I've been focused on improving chatML support and leaderboard performance (llama3 seems to really like learning languages)

@jeiku no prob m8, I spent hours this morning trying to get llama3 quants working and the tokenizer is definitely a huge pain in the ass lol

Sign up or log in to comment