GPTQ Query

by dsv147 - opened Feb 26, 2024

Discussion

dsv147

Feb 26, 2024

Great model, thank you for it, if at all possible, quantize in GPTQ 4-bit. GPTQ is more resource efficient as opposed to gguf. I can twice as much context to borrow if the GPTQ model.
Thank you for your work

dsv147

Feb 27, 2024

I don't need it anymore, I'm using it DarkForest-20B-v2.0-bpw4.0-h6-exl2 exllama2 18944 context size on 3090
19:37:17-538056 INFO Loading "TeeZee_DarkForest-20B-v2.0-bpw4.0-h6-exl2"
19:37:25-196493 INFO LOADER: "ExLlamav2_HF"
19:37:25-198493 INFO TRUNCATION LENGTH: 18944
19:37:25-198493 INFO INSTRUCTION TEMPLATE: "Alpaca"
19:37:25-199492 INFO Loaded the model in 7.66 seconds.
Output generated in 3.74 seconds (11.23 tokens/s, 42 tokens, context 1152, seed 193801079)

dsv147 changed discussion status to closed Feb 27, 2024

Upload images, audio, and videos by dragging in the text input, pasting, or clicking here.

Tap or paste here to upload images

· Sign up or log in to comment