A few broken ones...

#485
by DavidAU - opened

I'll redo the first undi model and see how it goes, then see at the others. Even the original model page remarks that the k quants seem to be broken, so maybe this is an issue with that model. Either way, it will have killed it...

The L3-Snowstorm IQ1_S is pure poetry. Should be used to write dialogue for the stanley parable:

Hi, I'm 33-year-old and I don't have a lot of friends in my circle. I am a lot of people's friend's in the past. In addition, I was a lot of times my favorite friend in the past and I was always a favorite of the past in the past in the past in the past

Q: What is my name? A: 13 letters in length Q: What is the first letter of the alphabet? A: Each of the above is a way to be able to get a good answer, the 13th, first and last letters of the alphabet are each different ways to be able to get a good answer to each of the above are different ways to be...

Mixtral-8x7B-MoE-RP-Story seems fixed, I'll redo the other one. Will also tell undi :)

Excellent.
I only tried one of the "snow storm" models - there are several, all quanted about the same time frame / age.

Thank you again ;
This really helps.

Here is some info from the lab:

1B models -> Lowest so far: IQ2_S
Llama-3.2-1B-Instruct-NEO-SI-FI-IQ2_S
Operating at 280 T/S via LLAMA-SERVER.EXE

Funny thing is same for 8B L3/L3.1 (non moe) -> Needs at least IQ2s or higher.
Same for Mistral Nemos 12Bs.
More testing here still required.

For other archs - run at IQ1_S ; but need IQ1_M UNLESS you modify/simplify instructions.
The critical break point seems to be at IQ2_XXS or IQ2_XS - performance jump is large here.
(relative to IQ1s, and you move into almost normal operations).

MOES track very differently (all sizes: 4X7, 8X7) ... (default, 2 experts).
IQ1_S - most work, especially newer quants ... and can even be used at this level.
IQ1_M - medium jump in power...

However, what is interesting is T/S changes.
You only drop about 1 T/S each time you move up a quant.
IE: IQ1_S you might get 46 T/S for a 4X7 (8X7 , just a 2-3 t/s slower), then 45 for IQ1_M, 44 for IQ2_XXS ...

Next up: Speculative decoding ; that should break some stuff for sure!

F!uck me ... OMFG - THIS is spec decoding , at 30 t/s (TWO 8B models loaded):

(L3-8B model - Dark Planet 8B IQ4XS + Dark Planet 8B Neo Imatrix IQ2S)

I stood frozen at my desk, staring into the abyss of the office park outside our high-rise building. My fingers were still poised above the keyboard, the cursor blinking mockingly at me like some malignant eye. The air was thick with tension, the kind that precedes an announcement no one wants to hear. But this was different. This was personal.

As I gazed out over the sea of cubicles and fluorescent lights, something began to rumble beneath my feet, like thunder growing nearer. A low, ominous hum filled my ears, making my teeth ache. Suddenly, the sky scraper swayed, as if buffeted by a storm force wind. I grabbed onto the edge of my desk for support, my heart racing with alarm.

My coworkers gasped and screamed as we all stumbled about, our chairs scraping against tile floors. Glass shattered somewhere nearby, and a panicked shout rose up from below. It felt like a seismic event, a catastrophic shift in the earth's crust that had nothing to do with geology.

Then came the blast.

Sign up or log in to comment