Text Generation
Transformers
Safetensors
English
llama
causal-lm
text-generation-inference
4-bit precision

ask

#12
by ReD2401 - opened

Hello TheBloke, I hope you are well.

I thank you for all your effort O)/

I'm happy to take a look. There are some complications with GPT-J models. llama.cpp can't load them, and the latest and best 4bit quantisation code for CPU doesn't work with them. It would be possible to use them in GPT4ALL-Chat though, or there's a CLI version that also supports GPT-J for CPU.

GPTQ 4bit for GPU should be possible, but not as well supported.

I will give it a go and let you know!

thank you so much i really appreciate it

Why did you edit the model out of your comment? Do you not want me to look at it any more? Or did someone else do it?

Sign up or log in to comment