Very sensitve to any repetition penalty!
If anybody tries to say the quants are broken - tell them to try a much lower (or no) repetition penalty...
Even with 1.05 is does all sorts of strange stuff like stopping mid-distance, etc and with a default of 1.1 or 1.2 it's hilariously lazy and bad! (see my post at the bottom of the PR).
@jukofyork Yes, the model is quite sensitive. I added the command I used for testing which appears to produce reliable results except for IQ1_S. I think it may be worth to generate the imatrix on the FP16 weights and for longer than 200 chunks.
@jukofyork Yes, the model is quite sensitive. I added the command I used for testing which appears to produce reliable results except for IQ1_S. I think it may be worth to generate the imatrix on the FP16 weights and for longer than 200 chunks.
I'm just downloading your IQ4_XS
to try vs the Q4_0
I got from phymbert's repo.
It will likely be next week before I get chance to download and recreate the imatrix and try Q4_K_M
and/or Q5_K_M
(which should just fit in 96GB VRAM).