CPU or GPU Inference

#54
by eggie5 - opened

Is this doing CPU inference?

This comment has been hidden
Hugging Chat org

The model is not running in the Space itself, the Space is just a webapp that proxies calls to the HF inference API

Hugging Chat org

It's running in GPUs

coyotte508 changed discussion status to closed

any details on the inference setup?

eggie5 changed discussion status to open
Hugging Chat org

g5 instances from aws currently

@julien-c cool glad to you hear you don't need A100s to get speed like that. Using the base model https://huggingface.co/OpenAssistant/oasst-sft-6-llama-30b-xor and not special quantization/distillation on top??

g5 instances from aws currently

Hello @julien-c ,

could you please kindly share the parameters used for starting the server. I also happened to use g5 instance for inference, speed good, but not as good as this demo.

yes, not docker but cli

Sign up or log in to comment