GPTQ version?

#4
by mer0mingian - opened

Hi jphme,
thanks for providing the model! Would it be possible to provide a gptq-quantized version? E.g. in collaboration with TheBloke?
For people that want to run the model in a cost-saving way and do not have their own hardware to host it, this would remove many barriers...
Cheers

We're planning on doing a GGUF conversion, if that helps? Should probably be ready tomorrow.

Thanks. Haven't worked with that, but happy to try. :)

@mer0mingian here we go :) https://huggingface.co/morgendigital/Llama-2-13b-chat-german-GGUF/tree/main

Either inference it with llama.cpp directly, or use one of the popular tools like text-generation-webui, koboldcpp, etc...

Sign up or log in to comment