Very sensitve to any repetition penalty!

by jukofyork - opened

If anybody tries to say the quants are broken - tell them to try a much lower (or no) repetition penalty...

Even with 1.05 is does all sorts of strange stuff like stopping mid-distance, etc and with a default of 1.1 or 1.2 it's hilariously lazy and bad! (see my post at the bottom of the PR).

@jukofyork Yes, the model is quite sensitive. I added the command I used for testing which appears to produce reliable results except for IQ1_S. I think it may be worth to generate the imatrix on the FP16 weights and for longer than 200 chunks.

@jukofyork Yes, the model is quite sensitive. I added the command I used for testing which appears to produce reliable results except for IQ1_S. I think it may be worth to generate the imatrix on the FP16 weights and for longer than 200 chunks.

I'm just downloading your IQ4_XS to try vs the Q4_0 I got from phymbert's repo.

It will likely be next week before I get chance to download and recreate the imatrix and try Q4_K_M and/or Q5_K_M(which should just fit in 96GB VRAM).

Sign up or log in to comment