Inference hardware?

#5
by urimerhav - opened

I was wondering what's the minimal GPU to even support inference? And for prompting with let's say 1000 tokens, how long roughly does it take to generate 1000 tokens with a V100, A100, or any other benchmark datapoint?

BigScience Workshop org

The 3B model is 6GB (in fp16), so you will need 6GB of GPU memory + some extra for running the tokens through it if you do it in fp16. So I think a V100 with 32GB should work. Don't have an exact datapoint, but I think 1000 tokens on a V100 w/ 32GB may take around 1 - 3 minutes.

Sign up or log in to comment