GPTQ Query

#3
by dsv147 - opened

Great model, thank you for it, if at all possible, quantize in GPTQ 4-bit. GPTQ is more resource efficient as opposed to gguf. I can twice as much context to borrow if the GPTQ model.
Thank you for your work

I don't need it anymore, I'm using it DarkForest-20B-v2.0-bpw4.0-h6-exl2 exllama2 18944 context size on 3090
19:37:17-538056 INFO Loading "TeeZee_DarkForest-20B-v2.0-bpw4.0-h6-exl2"
19:37:25-196493 INFO LOADER: "ExLlamav2_HF"
19:37:25-198493 INFO TRUNCATION LENGTH: 18944
19:37:25-198493 INFO INSTRUCTION TEMPLATE: "Alpaca"
19:37:25-199492 INFO Loaded the model in 7.66 seconds.
Output generated in 3.74 seconds (11.23 tokens/s, 42 tokens, context 1152, seed 193801079)

dsv147 changed discussion status to closed

Sign up or log in to comment