multi GPU inferencing

#18
by cjj2003 - opened

Is it possible to do inferencing on a multi gpu set up? I have been unsuccessful with just using the demonstration code, with this error:

RuntimeError: CUDA error: device-side assert triggered
CUDA kernel errors might be asynchronously reported at some other API call, so the stacktrace below might be incorrect.
For debugging consider passing CUDA_LAUNCH_BLOCKING=1
Compile with TORCH_USE_CUDA_DSA to enable device-side assertions.
../aten/src/ATen/native/cuda/TensorCompare.cu:110: assertasync_cuda_kernel: block: [0,0,0], thread: [0,0,0] Assertion probability tensor contains either inf, nan or element < 0 failed.

try tabby api with this quant :)

Yes, you can use the vLLM framework for multi-GPU inference.

Sign up or log in to comment