databricks/dbrx-instruct · Can you provide a quantitative solution? For example, it can be used quantitatively through llama.cpp.

Mar 29

Can you provide a quantitative solution? For example, it can be used quantitatively through llama.cpp.

srowen

Databricks org Mar 29

This comment has been hidden

Databricks org Mar 29

Hi @edisonzf2020 , thanks for your question!

We are working with the community an enabling more quantized versions of models. A few examples to follow:

MLX community has a 4-bit quantized version (https://huggingface.co/mlx-community/dbrx-instruct-4bit) that can run on a beefy Apple M2 chip
We are working with the llama.cpp folks to enable DBRX, you can follow the work here: https://github.com/ggerganov/llama.cpp/issues/6344

I'll close this comment for now, but please re-open if the above approaches we are pursuing doesn't answer your question.

hanlintang changed discussion status to closed Mar 29