GPU for inference

#3
by vt404v2 - opened

Chat with h2oai/h2ogpt-gm-oasst1-en-2048-falcon-7b-v2 here https://gpt-gm.h2o.ai/ looks very fast. Can you please tell me what GPU you are using for inference? I get about 6.5 tokens/s with 500 tokens prompt and 32 new tokens on A100 80Gb.

H2O.ai org

We are hosting the model on a A100 80GB using the awesome inference repository from Hugging Face https://github.com/huggingface/text-generation-inference.
Actually, the GPU is even shared with the other 7B model.

Thanks, it works for me

vt404v2 changed discussion status to closed

Sign up or log in to comment