Inference hardware?

by urimerhav - opened Dec 12, 2022

Dec 12, 2022

I was wondering what's the minimal GPU to even support inference? And for prompting with let's say 1000 tokens, how long roughly does it take to generate 1000 tokens with a V100, A100, or any other benchmark datapoint?

Muennighoff

BigScience Workshop org Dec 12, 2022

The 3B model is 6GB (in fp16), so you will need 6GB of GPU memory + some extra for running the tokens through it if you do it in fp16. So I think a V100 with 32GB should work. Don't have an exact datapoint, but I think 1000 tokens on a V100 w/ 32GB may take around 1 - 3 minutes.

Upload images, audio, and videos by dragging in the text input, pasting, or clicking here.

Tap or paste here to upload images

Your need to confirm your account before you can post a new comment.

· Sign up or log in to comment