Inference: CUDA out of memory error
#145
by
Tecena
- opened
Hi,
The mistral-7B-v0.1 model is finetuned and stored in s3, now I want to run an inference, when I load the model using inference script, the model is loaded, but facing CUDA out of memory issue.
I have increased the instance type to ml.g5.8xlarge, but still facing same issue.
The batch size is 1.
Still facing the same issue.
Please help how to solve this issue and load the model successfully.
Thank you