Text Generation
Transformers
Safetensors
llama
text-generation-inference

2x GPU but only one is being used

#22
by ecaglar - opened

I am trying to run a LLM but even i choose 2x H100 only one is utilized and then getting below error. Any idea?

image.png
torch.cuda.OutOfMemoryError: CUDA out of memory. Tried to allocate 384.00 MiB. GPU 0 has a total capacty of 79.11 GiB of which 168.50 MiB is free. Process 3311833 has 78.93 GiB memory in use. Of the allocated memory 78.31 GiB is allocated by PyTorch, and 189.60 MiB is reserved by PyTorch but unallocated. If reserved but unallocated memory is large try setting max_split_size_mb to avoid fragmentation. See documentation for Memory Management and PYTORCH_CUDA_ALLOC_CONF

Can you share your docker run command? Seems I got hang when using 4xA100 80GB

Your need to confirm your account before you can post a new comment.

Sign up or log in to comment