Can you provide a quantitative solution? For example, it can be used quantitatively through llama.cpp.
#22
by
edisonzf2020
- opened
Can you provide a quantitative solution? For example, it can be used quantitatively through llama.cpp.
This comment has been hidden
Hi @edisonzf2020 , thanks for your question!
We are working with the community an enabling more quantized versions of models. A few examples to follow:
- MLX community has a 4-bit quantized version (https://huggingface.co/mlx-community/dbrx-instruct-4bit) that can run on a beefy Apple M2 chip
- We are working with the llama.cpp folks to enable DBRX, you can follow the work here: https://github.com/ggerganov/llama.cpp/issues/6344
I'll close this comment for now, but please re-open if the above approaches we are pursuing doesn't answer your question.
hanlintang
changed discussion status to
closed