How many tokens per second when using Deepseek-V2(236B) as inference model in 8*A100

#7
by harvin-cn - opened

I want to deploy in 8*a100,But speed needs considerations,I can't get the inference speed of Deepseek-V2(236B) from README,thanks!

I tried, about 30 tokens/s @harvin-cn

Sign up or log in to comment