GGUF (Q4_K_M only) outputs gibberish

by sergkisel3v - opened Jun 18

Jun 18

•

GGUF outputs random characters in koboldcpp 1.67 or latest oobabooga.
Testing split Tesla P40 + Ram with q4_k_m
Enabling flash attention, disabling mmq doesn't help.

probably qwen2 support problem in general?

ChuckMcSneed

Jun 18

koboldcpp 1.72

Are you from the future? The latest version is 1.67... I converted first to bf16 and then to Q6_K and it works in kobold 1.65.

sergkisel3v

Jun 18

ah yeah, 1.67 - typo.

I checked with last pure llama.cpp and it doesn't work too.

sergkisel3v

Jun 18

•

edited Jun 18

The latest version is 1.67... I converted first to bf16 and then to Q6_K and it works in kobold 1.65.

Are you splitting between GPU and CPU? Because without splitting it appears to work. Also maybe it works only on rtx cards.

ChuckMcSneed

Jun 18

0 layers on gpu, cublas enabled.

sergkisel3v

Jun 18

Probably it's the problem when splitting/gpu only inference.

Also I downloaded gguf from alpindale repo

skatardude10

Jun 18

This comment has been hidden

sergkisel3v

Jun 18

This comment has been hidden

I tried both with enabled/disabled flash attention

sergkisel3v

Jun 20

Seems like yet another person has this problem here https://github.com/LostRuins/koboldcpp/issues/909

He said that model output gibberish on q4km with or without offloading.

Maybe problem in q4km quants. I can't test Q6_K

sergkisel3v

Jun 20

Q4_K_M on different PC - ram only.

Doesn't work too.

gghfez

Jun 20

IIRC, that ^ in textegen is a sampling setting issue. I remember getting it last year.

FWIW - the exl2 quants work.. the <3.0 quants output random Chinese text sometimes. I made myself an exl2 5BPW and it's great. Feels like a dumber version of claude3-opus.

sergkisel3v

Jun 20

•

edited Jun 20

IIRC, that ^ in textegen is a sampling setting issue. I remember getting it last year.

I tried with different samplers settings on different backends. It doesn't work no matter what.

Btw, i switched to IQ4_XS and it works great. The model is really good too, better then llama finetunes i think.

gghfez

Jun 20

Cool, hopefully that helps others using GGUF ^

Yeah it's a great model. I don't like any of the llama3 models so far.

sergkisel3v changed discussion title from GGUF outputs gibberish to GGUF (Q4_K_M only) outputs gibberish Jun 20

Adzeiros

Jun 21

I get similar issues with EXL2 4.5BPW... Not complete gibberish, but will often switch up PoV. Example will sometimes speak in the first person I look at you and sometimes speak in 3rd person She speaks to him, and even sometimes just doesn't speak correctly adjusts hands. wrinkles skirt. ... Then randomly will spew math problems in the response. Ex. I look at you and smile 3+2=5 squares finished "Hi there"

Some weird stuff going on haha

gghfez

Jun 21

Oh I haven't had that issue. I assume you're using ChatML like the model card says?

skatardude10

Jun 22

I get similar issues with EXL2 4.5BPW... Not complete gibberish, but will often switch up PoV. Example will sometimes speak in the first person I look at you and sometimes speak in 3rd person She speaks to him, and even sometimes just doesn't speak correctly adjusts hands. wrinkles skirt. ... Then randomly will spew math problems in the response. Ex. I look at you and smile 3+2=5 squares finished "Hi there"

Some weird stuff going on haha

I was having the same exact issues. I read elsewhere about changing up sampler settings, and similar to what @alpindale posted in separate thread is exactly what fixed it for me.

He recommends using min P ~0.06 and temperature 1.1 ish only.

What I did was neutralize all samplers, and what I have set is Min-P to 0.1, Dynamic temperature at 0.8-1.6 / 1.45 exponent, smoothing factor 0.25 / smoothing curve 1.85, and temperature last.

The model immediately became unbelievably good, where as before with my normal Llama 3 sampler settings it just... felt very off, despite still seeing how smart it could be.

Upload images, audio, and videos by dragging in the text input, pasting, or clicking here.

Tap or paste here to upload images

· Sign up or log in to comment