Text Generation
Transformers
Safetensors
dbrx
custom_code
text-generation-inference

Quantized GGUF or exl2?

#1
by MLDataScientist - opened

Hi @Undi95 ,

Thank you for uploading this. Do you know any quantized versions of dbrx instruct? I has been two days since the release but there are still no quantized GGUF or exl2 versions for this model. I have 36GB VRAM and 96GB RAM. It would be interesting to run the model locally (e.g. Q3_K_M or exl2 with 3.0 bpw ).

Thanks!

Owner

https://github.com/ggerganov/llama.cpp/issues/6344

Llama.cpp don't support it yet, so no GGUF.
I don't think exllama support it either, sadly.

that is very unfortunate. Looking forward to 4-bit quantization to run it locally.

Turboderp recently added support for DBRX! Exllamav2 should now work with DBRX:
https://github.com/turboderp/exllamav2/issues/388#issuecomment-2027971687

Turbo also uploaded a bunch of quants for both base and instruct models:
https://huggingface.co/turboderp/dbrx-base-exl2
https://huggingface.co/turboderp/dbrx-instruct-exl2

Owner

Turboderp recently added support for DBRX! Exllamav2 should now work with DBRX:
https://github.com/turboderp/exllamav2/issues/388#issuecomment-2027971687

Turbo also uploaded a bunch of quants for both base and instruct models:
https://huggingface.co/turboderp/dbrx-base-exl2
https://huggingface.co/turboderp/dbrx-instruct-exl2

Thanks for the head up

Sign up or log in to comment