Very sensitve to any repetition penalty!

by jukofyork - opened Apr 13, 2024

Apr 13, 2024

If anybody tries to say the quants are broken - tell them to try a much lower (or no) repetition penalty...

Even with 1.05 is does all sorts of strange stuff like stopping mid-distance, etc and with a default of 1.1 or 1.2 it's hilariously lazy and bad! (see my post at the bottom of the PR).

dranger003

Owner Apr 13, 2024

•

edited Apr 13, 2024

@jukofyork Yes, the model is quite sensitive. I added the command I used for testing which appears to produce reliable results except for IQ1_S. I think it may be worth to generate the imatrix on the FP16 weights and for longer than 200 chunks.

jukofyork

Apr 13, 2024

@jukofyork Yes, the model is quite sensitive. I added the command I used for testing which appears to produce reliable results except for IQ1_S. I think it may be worth to generate the imatrix on the FP16 weights and for longer than 200 chunks.

I'm just downloading your IQ4_XS to try vs the Q4_0 I got from phymbert's repo.

It will likely be next week before I get chance to download and recreate the imatrix and try Q4_K_M and/or Q5_K_M(which should just fit in 96GB VRAM).

Upload images, audio, and videos by dragging in the text input, pasting, or clicking here.

Tap or paste here to upload images

· Sign up or log in to comment