Text Generation
Transformers
PyTorch
Safetensors
English
llama
Eval Results
text-generation-inference
Inference Endpoints

Released a 4bit GPTQ to make it easier for folks to try it out!

#3
by flashvenom - opened

nah .... ggml ( llama.cpp ) version is far more better as can use much better precision like 2.3 4,5,6, or 8 bit and still works on any PC even without GPU than that ancient 4 bit gpto.

flashvenom changed discussion status to closed

Thanks @flashvenom that was fast. Somehow the url gives 404 error.

I deleted mine since @TheBloke released models as well. No point having two copies to confuse folks

Sign up or log in to comment