GPU memory usage/requirement?

#2
by Bilibili - opened

Thanks for this work!

Since the original StarCoder requires 60+ GB GPU RAM for inference, I wonder what about the c2fast version, and could the model run inference on V100-32G?

It requires around 16.5gb vram with the int8 setting.

michaelfeil changed discussion status to closed
michaelfeil changed discussion status to open

Thanks! Have you tested the performance loss of int8_float16 quantization?Like on HumanEval Python?

michaelfeil changed discussion status to closed

I tried with 8, 16, 20 but until 24G VRAM, it worked.

I would assume you need around 17GB of VRAM (GPU) or 17GB RAM. Addionally depending on ctx length and batch size further memory usage is expected.

Sign up or log in to comment