multi GPU inferencing
#18
by
cjj2003
- opened
Is it possible to do inferencing on a multi gpu set up? I have been unsuccessful with just using the demonstration code, with this error:
RuntimeError: CUDA error: device-side assert triggered
CUDA kernel errors might be asynchronously reported at some other API call, so the stacktrace below might be incorrect.
For debugging consider passing CUDA_LAUNCH_BLOCKING=1
Compile with TORCH_USE_CUDA_DSA to enable device-side assertions.
../aten/src/ATen/native/cuda/TensorCompare.cu:110: assertasync_cuda_kernel: block: [0,0,0], thread: [0,0,0] Assertion probability tensor contains either inf, nan or element < 0 failed.
try tabby api with this quant :)
Yes, you can use the vLLM framework for multi-GPU inference.