Quantized GGUF or exl2?

by MLDataScientist - opened Mar 28

Mar 28

Thank you for uploading this. Do you know any quantized versions of dbrx instruct? I has been two days since the release but there are still no quantized GGUF or exl2 versions for this model. I have 36GB VRAM and 96GB RAM. It would be interesting to run the model locally (e.g. Q3_K_M or exl2 with 3.0 bpw ).

Thanks!

Undi95

Owner Mar 28

https://github.com/ggerganov/llama.cpp/issues/6344

Llama.cpp don't support it yet, so no GGUF.
I don't think exllama support it either, sadly.

MLDataScientist

Mar 28

that is very unfortunate. Looking forward to 4-bit quantization to run it locally.

JoeySalmons

Mar 30

Turboderp recently added support for DBRX! Exllamav2 should now work with DBRX:
https://github.com/turboderp/exllamav2/issues/388#issuecomment-2027971687

Turbo also uploaded a bunch of quants for both base and instruct models:
https://huggingface.co/turboderp/dbrx-base-exl2
https://huggingface.co/turboderp/dbrx-instruct-exl2

Undi95

Owner Mar 30

Turboderp recently added support for DBRX! Exllamav2 should now work with DBRX:
https://github.com/turboderp/exllamav2/issues/388#issuecomment-2027971687

Turbo also uploaded a bunch of quants for both base and instruct models:
https://huggingface.co/turboderp/dbrx-base-exl2
https://huggingface.co/turboderp/dbrx-instruct-exl2

Thanks for the head up

Upload images, audio, and videos by dragging in the text input, pasting, or clicking here.

Tap or paste here to upload images

· Sign up or log in to comment