GPU memory usage/requirement?
#2
by
Bilibili
- opened
Thanks for this work!
Since the original StarCoder requires 60+ GB GPU RAM for inference, I wonder what about the c2fast version, and could the model run inference on V100-32G?
It requires around 16.5gb vram with the int8 setting.
michaelfeil
changed discussion status to
closed
michaelfeil
changed discussion status to
open
Thanks! Have you tested the performance loss of int8_float16 quantization?Like on HumanEval Python?
No.
michaelfeil
changed discussion status to
closed
I tried with 8, 16, 20 but until 24G VRAM, it worked.
I would assume you need around 17GB of VRAM (GPU) or 17GB RAM. Addionally depending on ctx length and batch size further memory usage is expected.