Inference: CUDA out of memory error

#145
by Tecena - opened

Hi,
The mistral-7B-v0.1 model is finetuned and stored in s3, now I want to run an inference, when I load the model using inference script, the model is loaded, but facing CUDA out of memory issue.
I have increased the instance type to ml.g5.8xlarge, but still facing same issue.
The batch size is 1.
Still facing the same issue.

Please help how to solve this issue and load the model successfully.
Thank you

Sign up or log in to comment