CUDA Assertion Errors with Concurrent Requests on XTTS-v2
#107
by
Chan-Y
- opened
Environment
- Model: coqui/XTTS-v2
- GPU: NVIDIA RTX 3080 Laptop (16GB VRAM)
- Operating System: Windows
- PyTorch: 2.6.0+cu124
- Torchaudio: 2.6.0+cu124
- Transformers: 4.46.1
- CUDA: 12.4
Issue Description
When processing multiple concurrent requests in quick succession, the model consistently encounters CUDA errors that can only be resolved by restarting the program. The error appears to be related to indexing operations and CUBLAS execution.
Error Message
device-side assert triggered
CUDA kernel errors might be asynchronously reported at some other API call
CUBLAS_STATUS_EXECUTION_FAILED when calling `cublasSgemm( handle, opa, opb, m, n, k, &alpha, a, lda, b, ldb, &beta, c, ldc)`
Assertion errors in Indexing.cu:1369:
block: [0,0,0], thread: [92-95,0,0] Assertion `srcIndex < srcSelectDimSize` failed.
Reproduction Steps
- Run the XTTS-v2 model on local machine
- Send multiple inference requests within a short time window
- Error occurs consistently under load
- Only solution is to restart the program
Expected Behavior
- Model should handle concurrent requests without CUDA assertion errors
- Graceful handling of multiple requests without requiring program restart
Questions
- Are there known issues with concurrent processing using PyTorch 2.6.0?
- Are there recommended batch size or request rate limitations for this model?
- Is there a way to implement request queuing or rate limiting to prevent these errors?
- Could this be related to CUDA 12.4 compatibility issues?
Additional Context
The error seems to indicate an issue with array indexing operations during concurrent processing. The assertion srcIndex < srcSelectDimSize
suggests possible race conditions or memory access violations during parallel execution.
The error occurs with the latest stable versions of PyTorch and torchaudio (2.6.0+cu124), which should theoretically be compatible with this workload. Any guidance on handling concurrent requests or implementing proper request management would be greatly appreciated.