🚩 Report

#173
by SwatiM - opened

I am unable to use to the endpoint , getting this error

W-9000-model-stdout com.amazonaws.ml.mms.wlm.WorkerLifeCycle - torch.cuda.OutOfMemoryError: CUDA out of memory. Tried to allocate 112.00 MiB (GPU 0; 22.20 GiB total capacity; 1.88 GiB already allocated; 115.12 MiB free; 1.88 GiB reserved in total by PyTorch) If reserved memory is >> allocated memory try setting max_split_size_mb to avoid fragmentation. See documentation for Memory Management and PYTORCH_CUDA_ALLOC_CONF

Below are some options I tried :

setting PYTORCH_CUDA_ALLOC_CONF using max_split_size_mb
e.g. os.environ["PYTORCH_CUDA_ALLOC_CONF"] = "max_split_size_mb:24"
tried max_split_size_mb = 64,128,512,1024
setting PYTORCH_CUDA_ALLOC_CONF as all type of memory management technique e.g. "heuristic"
Using torch.cuda.empty_cache() in the inference script

Any kind of help or references are really appreciated. Looking forward to it. Thanks

Sign up or log in to comment