Inference API

@julien-c - on a completely different note, I saw you mentioning that you are running huggingface chat (https://huggingface.co/spaces/huggingchat/chat-ui) backend server models in AWS g5 instances. Would it be okay for you to share the server launch configuration parameters? I'm also using g5 instances with quantization, but generation speed is not as good as huggingface chat.

Huggingface chat is lightning fast.

Upload images, audio, and videos by dragging in the text input, pasting, or clicking here.

Tap or paste here to upload images

· Sign up or log in to comment