Text Generation
Transformers
PyTorch
Safetensors
llama
Inference Endpoints
text-generation-inference

Necessary hardware for Operating the 34B Model

#40
by blurjp - opened

I currently use a 4090, but the inference process is extremely slow. Is it impractical to expect this model to run efficiently on just a single 4090?

Did you solve that? I have a same problem.

You can use these 2 bit versions made with quip#. Inference is slower than usual but it should work on a single 4090.

https://huggingface.co/KnutJaegersberg/Tess-M-34B-2bit
https://huggingface.co/KnutJaegersberg/orca-mini-70b-2bit

Sign up or log in to comment