GPU for inference

by vt404v2 - opened

Chat with h2oai/h2ogpt-gm-oasst1-en-2048-falcon-7b-v2 here looks very fast. Can you please tell me what GPU you are using for inference? I get about 6.5 tokens/s with 500 tokens prompt and 32 new tokens on A100 80Gb. org

We are hosting the model on a A100 80GB using the awesome inference repository from Hugging Face
Actually, the GPU is even shared with the other 7B model.

Thanks, it works for me

vt404v2 changed discussion status to closed

Sign up or log in to comment