Specs for inference

#5
by mzhadigerov - opened

What is the size of VRAM required to run it for inference?

Roughly 14gb vram

Use quantized GGUF/GGML/AWQ models if you want to run on machines with lower computational resources.

Yeah then it will be roughly 6gb vram.

Can you suggest the smallest SageMaker instance I can use to deploy? For some reason loading the model via sample notebook given fails on the ml.g5.12xlarge instance even though the VRAM should be enough based on your suggestion?

@smrazaabbas you have to use the quantized version with 4 bit. It should work then

Sign up or log in to comment