Inference: CUDA out of memory error

#145

by Tecena - opened Apr 22, 2024

Discussion

Tecena

Apr 22, 2024

•

edited Apr 22, 2024

Hi,
The mistral-7B-v0.1 model is finetuned and stored in s3, now I want to run an inference, when I load the model using inference script, the model is loaded, but facing CUDA out of memory issue.
I have increased the instance type to ml.g5.8xlarge, but still facing same issue.
The batch size is 1.
Still facing the same issue.

Please help how to solve this issue and load the model successfully.
Thank you

Upload images, audio, and videos by dragging in the text input, pasting, or clicking here.

Tap or paste here to upload images

· Sign up or log in to comment