Lewdiculous/Puppy_Purpose_0.69-GGUF-IQ-Imatrix

Lewdiculous

Owner May 13

•

edited May 13

This is considered a highly experimental model.

Feedback for authors is welcome if you can test.

jeiku

May 13

@Lewdiculous seeing an error with mmproj. Pretty much every model I try from you, including Puppy, will output the full 500 tokens of which it turns to junk after 150-200 tokens. The only model I'm not seeing this issue with is Poppy Porpoise 0.72. I'm wondering if when you do the quants now, are you still changing the config files? I suspect there is some error manifesting in multimodal due to a bad stopping string/EOT token.

Lewdiculous

Owner May 13

I do change the configs with the llama-bpe configs, I'll try to use your original configs instead.

jeiku

May 13

I do change the configs with the llama-bpe configs, I'll try to use your original configs instead.

Oh man don't worry about it I was just checking to see if there was any difference between poppy and puppy quants. I already tried a straight quant last night and it was borked, but I waited to see yours before I drew the conclusion. I'll talk to Nitral when he gets off work to see if there's anything I've missed.

Ardvark123

May 13

That image.....now I want to try this one too.....

Lewdiculous

Owner May 13

@Nitral-AI All good, just for reference, in Poppy Porpoise 0.72 I also used the llama-bpe configs.

Lewdiculous

Owner May 13

The model name and the image are so good though. I hope Puppy-chan makes a comeback.

jeiku

May 14

@Lewdiculous found the issue but no word on the cause:

Somehow my EOS token is wrong, I'll take a look at the files to see if I can fix easily.

jeiku

May 14

I've checked the configs and they are identical. Downloading repo now to see if manually changing tokenizer_config.json will fix it.

Lewdiculous

Owner May 14

Alright. If that does it you can catbox me the configs for the next run to be proper.

jeiku

May 14

Alright. If that does it you can catbox me the configs for the next run to be proper.

@Lewdiculous Do you remember how we had to delete tokenizer.model? Did you also have to add the new Llama 3 config files to get conversion working? I did, and I've pretty much narrowed it down to that being the point of issue. Still downloading last file because I missed it in the pull, but will know in a few minutes.

jeiku

May 14

I resolved the mismatch of the EOS token, but it did not solve the issue.

Lewdiculous

Owner May 14

•

edited May 14

Did you also have to add the new Llama 3 config files to get conversion working?

Not necessarily, I think it was possible without replacing them, basically using your files as they are. The issue was that leftover file.

But this makes sense to me:

jeiku

May 14

Did you also have to add the new Llama 3 config files to get conversion working?

Not necessarily, I think it was possible without replacing them, basically using your files as they are. The issue was that leftover file.

But this makes sense to me:

I ended up getting a new 4K_M with the corrected EOS but it didn't seem to fix the issue. I would ask you to gen one (because I feel like I'm doing it wrong,) but I'm not sure it would fix anything. Nitral is testing with some other mmproj now and it's having the same issue. I don't really understand why this is happening as I made no significant changes to anything.

Nitral-AI

May 14

@Nitral-AI All good, just for reference, in Poppy Porpoise 0.72 I also used the llama-bpe configs.

Hopefully the model is stopping for gguf user's then :kek:

jeiku

May 14

@Lewdiculous maybe you could link me to a doc or something that explains the proper way to quant. I followed a llamacpp issue but it was just convert and quant and as I understand it there should be another step now?

Lewdiculous

Owner May 14

•

edited May 14

@jeiku

To be fair all they gave us was the PR, but things are explained at least.

There's no need to set vacab type manually anymore.

I added some guidance to the script page, see the Llama-3 warning.

You should be good just following this.

During the model download portion, if you want to replace configs, you can do so that at time in the models/{model} folder and the rest of the process should continue. Use the lossless version.

https://huggingface.co/FantasiaFoundry/GGUF-Quantization-Script

Are you performing all steps manually?

Honestly if the configs are already good just Download the model, convert it to BF16-GGUF using the hf-gguf script, then convert that to the quants you want.

There's the hf-gguf update script to download the configs but you're saying these are wrong?

Lewdiculous

Owner May 14

@Nitral-AI - The thing is, they changed the config in the Instruct model but the llama "documentation" - PR - says to use the llama-bpe configs fetched by the ...update.py script.

I'd figure if things changed the script would reflect the new configs if they are necessary for proper functionality.

Nitral-AI

May 14

This comment has been hidden

Lewdiculous

Owner May 14

•

edited May 14

I asked about that but it's fine if y'all prefer the included/upstream configs, doesn't matter for the process, just something I need to be aware of and I'll do it accordingly. I don't expect either way to cause issues.

But I also don't want to run broken quants anyway so I'm in to get things right. – Tho so far there haven't been tokenizer/context formatting issues reported.

jeiku

May 14

I asked about that but it's fine if y'all prefer the included/upstream configs, doesn't matter for the process, just something I need to be aware of and I'll do it accordingly. I don't expect either way to cause issues.

But I also don't want to run broken quants anyway so I'm in to get things right. – Tho so far there haven't been tokenizer/context formatting issues reported.

I found that it doesn't make a difference whether I run the quant you did or the quant I fixed EOS on Puppy, it still handles text fine either way. The only issue is when running it with the mmproj. So go ahead and leave it up, I'm running your imat quant for two days now no issue with the text output.

Nitral-AI

May 15

I asked about that but it's fine if y'all prefer the included/upstream configs, doesn't matter for the process, just something I need to be aware of and I'll do it accordingly. I don't expect either way to cause issues.

But I also don't want to run broken quants anyway so I'm in to get things right. – Tho so far there haven't been tokenizer/context formatting issues reported.

Hid my previous comment, seemed a bit snappy which was not my intention. Tested both config setups with exl2 and seeing very equalized performance stop wise, given the tokens act in a similar manner. Maybe it does not matter, but i have not gone into any kind of extensive long context beyond just checking for stopping behavior.

Lewdiculous
/

Puppy_Purpose_0.69-GGUF-IQ-Imatrix

General discussion.