Anyone else encountering bad quantized(?) performance with Llama3-70B?

#37
by philjd - opened

I've been trying to use Llama3 70B using int8 and NF4 quantization on a single A100, but outputs seem to be quite broken.
Is anybody else encountering similar issues?

Example breakages include double comma, dates inserted in random places (even when e.g. asking for a Poem), or repeated words.

I've found a few other threads which seem to suggest the Llama3 models might be very susceptible to quantization.
Unfortunately I don't have a machine that can run the bfloat16 version.

Same issue encountered when doing int-8 quantization.

One solution is to enable group-wise quantization with group-size = 128/64.

Sign up or log in to comment