q3_k_xs broken?

by FlareRebellion - opened

I gave these a quick test and at first impression q3_k_xs seems entirely incoherent outputting only stuff like:

"" sleepannWhyaked habitsovery Queifr Edd警 drew已 tries系 dit analysisrieve+ Ul теverter eat digit progressive lowest Int slight Sem attorneyked switchingesoCTION "

q3_k_s on the other hand seems okay-ish.

(as an aside, I also see the q5_k_s occasionally messing up proper names. It seems to like to say 'Dawn' instead of 'Daphne'. Maybe it's an effect of the importance matrix training on wiki.train.raw? https://huggingface.co/datasets/wikitext/viewer/wikitext-2-raw-v1/train?q=daphne https://huggingface.co/datasets/wikitext/viewer/wikitext-2-raw-v1/train?q=dawn)

Thank you. I can reproduce the issue myself, I've reported it upstream and deleted the Q3_K_XS quants for now.

(as an aside, I also see the q5_k_s occasionally messing up proper names. It seems to like to say 'Dawn' instead of 'Daphne'. Maybe it's an effect of the importance matrix training on wiki.train.raw? https://huggingface.co/datasets/wikitext/viewer/wikitext-2-raw-v1/train?q=daphne https://huggingface.co/datasets/wikitext/viewer/wikitext-2-raw-v1/train?q=dawn)

That could be possible. We're still figuring it out, see https://github.com/ggerganov/llama.cpp/discussions/5006#discussioncomment-8166807

Yeah that's a good discussion, thanks for the link.

I was trying to be sneaky and reproduce the issue with the iq2_xs in koboldcpp but I couldn't. Not sure why those Dawn <=> Daphne issues happened. But I'm really impressed with the quality of the 2bit quant:

Narrate these events in vivid detail: At dawn Hermione Granger tries to steal the dark magic tome from Daphne, a Slytherin student. She catches Hermione, is slightly amused that Hermione didn't just ask to borrow it, and comes up with a creative punishment for penance.
In the dim light of dawn, Hermione Granger crept silently through the corridors of Hogwarts Castle, her heart pounding in her chest as she approached the room belonging to Daphne Greengrass, a Slytherin student who was rumored to possess an ancient and powerful Dark Magic Tome. The book held secrets that could aid Hermione greatly in her studies at Hogwarts, but obtaining it would require subterfuge. Quietly, she tried the door handle, finding it unlocked. With practiced ease, she slipped inside, her eyes scanning the shelves until they landed on the coveted tome. It sat there innocently amongst other books, but its cover bore intricate runes etched into its leather-bound surface, hinting at the dangerous knowledge contained within.

Just then, Daphne entered the room, catching Hermione mid-snatch attempt. A sly smile curled onto Daphne's lips as she realized what had transpired. "Well, well, if it isn't dear Miss Granger," Daphne taunted softly, clearly entertained by the situation. Instead of anger or outrage, she seemed only mildly amused that Hermione hadn't simply asked to borrow it. 

Daphne approached Hermione slowly, leaving no option for escape. Her slender fingers wrapped around the precious tome, pulling it out of Hermione's grasp effortlessly. "You know," Daphne began thoughtfully, eyeing Hermione intently, "I believe I have a better plan than reporting you."

The next day, Hermione found herself standing before the entirety of their potions class, forced to wear a garish bumblebee costume while reciting the correct incantations for each potion brewed under Professor Snape's watchful gaze. She blushed furiously every time someone snickered at her expense, but kept her composure - all thanks to her unwavering determination and her desire not to let Daphne get the satisfaction of humiliating her completely.

As days turned into weeks, this punishment became Hermione's daily reality, and yet she persevered. Though embarrassed and frustrated, she never faltered in her dedication towards learning despite the added challenge. After all, sometimes the most significant lessons are learned outside the pages of textbooks.

Gave me a good chuckle. but even though I can offload much more layers to GPU with the iq2 it's still just as slow, so not really worth it i think. But getting all this out of so few bits is real neat.

Artefact2 changed discussion status to closed

Sign up or log in to comment