Text Generation
Transformers
Safetensors
English
llama
causal-lm
text-generation-inference
4-bit precision

Model's VRAM Utilisation

#10
by thefaheem - opened

I Just Wants to know how much vram is required to run this model?

Can Anyone Tell...

9GB to 10gb VRAM for a 13b 4bit model

Can You Help Me With Running This Model Locally using Transformers or Any Other Ways So That I Can Run Inference Locally

Look up AutoGPTQ. It is still in development and not yet fully stable but does work.

I will add instructions to the readme today or tomorrow so check back then if you can't figure it out yourself.

Sign up or log in to comment