did you have to do anything special to quantize this one?

#1
by gghfez - opened

Hey mate, I was trying to quantize this one yesterday with the convert.py script in exllamav2, but it always failed with this error:

 -- Resuming job
 !! Note: Overriding options with settings from existing job
 -- Input: llama-3-70B-Instruct-abliterated/
 -- Output: llama-3-70B-Instruct-abliterated-wip
 -- Using default calibration dataset
 -- Target bits per weight: 8.0 (decoder), 6 (head)
 -- Max shard size: 8192 MB
 -- Full model will be compiled to: exl2/llama-3-70B-Instruct-abliterated-exl2-8BPW
 -- Quantizing...
 -- Layer: model.layers.0 (Attention)
 -- Linear: model.layers.0.self_attn.q_proj -> 1:6b_32g s4, 6.13 bpw
 -- Linear: model.layers.0.self_attn.k_proj -> 1:6b_32g s4, 6.16 bpw
 !! Warning, difference of (0.015625, 0.015625) between unpacked and dequantized matrices
 -- Linear: model.layers.0.self_attn.v_proj -> 1:8b_32g s4, 8.16 bpw
 ## Quantization error (2)

Never had an issue with this before. Did you have to do anything special for this model to make it work?

Nothing special. Just make sure you have the latest exllamav2 version. It's also possible your fp16 download of the original model is corrupted. So you may need to download again.

Sign up or log in to comment