Model inference speed ....

#2
by halsayed - opened

Hi Core42 team,

Thanks for creating this opensource arabic model. I tested the model based on the provided example on the model card and I got very slow performance. I'm using A100 80GB, so I would expect a much better performance than the results shown on the image below. is this correct?

jais_performance.png

Inception org

@halsayed Thanks for using Jais. You may get better inference speed using 2 x A100 80GB GPUs as the model size is ~(30x4)GB and all layers of the model could fit on 2 GPUs.

@samta-kamboj thanks, increasing GPU solved the problem. Was there any attempt to quantize the model and reduce the vram footprint?

Sign up or log in to comment