Spaces:

uwnlp
/

guanaco-playground-tgi

Running

App Files Files Community

Type of hardware for inference

by jilijeanlouis - opened May 25, 2023

Discussion

jilijeanlouis

May 25, 2023

Hi, and congrats for your work,
What type of hardware are you running the inference on?

nudelbrot

May 26, 2023

•

edited May 26, 2023

~~I'm running OOM on a 24GB VRAM card, even with 4bits, using the 7b model~~

I'm able to load the 7b model in 4 bit (will require a recent card IMO) on an Nvidia A10G utilizing 6.9 GB VRAM (out of 24)

keelezibel

May 27, 2023

There is a difference in the token generation rate for this space compared to the sample ipynb. What is the difference? There is some optimization done on the inference endpoint?

olivierdehaene

May 31, 2023

We are using https://github.com/huggingface/text-generation-inference to power the inference backend for this space.
The model is currently sharded using tensor parallelism on 4xA10s.

Upload images, audio, and videos by dragging in the text input, pasting, or clicking here.

Tap or paste here to upload images

· Sign up or log in to comment