Getting an issue with Cuda

#10
by LLMHackathonNYC - opened

Hey,

I've deployed an instance of meditron-70b on 2xA100 and when testing the endpoint, I keep getting the following CUDA error. Any workarounds / solutions?

Request failed during generation: Server error: Unexpected <class 'RuntimeError'>: captures_underway == 0 INTERNAL ASSERT FAILED at "/opt/conda/conda-bld/pytorch_1699449201336/work/c10/cuda/CUDACachingAllocator.cpp":2939, please report a bug to PyTorch.
Screenshot 2024-04-09 at 17.13.57.png

I got the same error when trying to access the dedicated endpoint with API key request: 'Request failed during generation: Server error: Unexpected <class 'RuntimeError'>: captures_underway == 0 INTERNAL ASSERT FAILED at "/opt/conda/conda-bld/pytorch_1699449201336/work/c10/cuda/CUDACachingAllocator.cpp":2939, please report a bug to PyTorch. '.

Hi @LLMHackathonNYC @marichka-dobko , Thanks for reporting. We've taken a look and recommend selecting quantization: EETQ (in place of Bitsandbytes) to help resolve the error reported. Please let us know how it goes. Thanks again!

When we run it with EETQ instead of Bitsandbytes it fails to start. Could you share the specific config that makes it work for you?
Screenshot 2024-04-11 at 15.56.54.png

Sign up or log in to comment