Any chance of a 3.5bpw?
#1
by
smcleod
- opened
Howdy Mike,
Thanks for quantising these models, it's really appreciated!
I was just wondering if there's any chance you'd be able to do a 3.5bpw?
I have 1x 3090 and 2x a4000 which gets me a total of 56GB, I figure 3.5bpw would be about as high as I could go with 16-32K~ (4bit) context
Starting now. You want h6 or h8?
Legend! I think h6 would be fine. Thanks 🙏
Glad you said that, because h6 is what I decided to upload. Should be uploaded in ~2 hours. Should be accessible here once the upload is complete: https://huggingface.co/MikeRoz/c4ai-command-r-plus-08-2024-3.5bpw-h6-exl2
Thanks, I really appreciate that.
It's up.
MikeRoz
changed discussion status to
closed
Fits like a grove with 32K context!
It's not the fastest, but it works pretty well:
INFO: Metrics (ID: e00542d82c61496093eea52f00e3b6c0): 603 tokens generated
in 70.06 seconds (Queue: 0.0 s, Process: 0 cached tokens and 1814 new tokens at
299.96 T/s, Generate: 9.42 T/s, Context: 1814 tokens)
Thanks again!