coqui/XTTS-v2 · CUDA Assertion Errors with Concurrent Requests on XTTS-v2

Environment

Model: coqui/XTTS-v2
GPU: NVIDIA RTX 3080 Laptop (16GB VRAM)
Operating System: Windows
PyTorch: 2.6.0+cu124
Torchaudio: 2.6.0+cu124
Transformers: 4.46.1
CUDA: 12.4

Issue Description

When processing multiple concurrent requests in quick succession, the model consistently encounters CUDA errors that can only be resolved by restarting the program. The error appears to be related to indexing operations and CUBLAS execution.

Error Message

device-side assert triggered
CUDA kernel errors might be asynchronously reported at some other API call
CUBLAS_STATUS_EXECUTION_FAILED when calling `cublasSgemm( handle, opa, opb, m, n, k, &alpha, a, lda, b, ldb, &beta, c, ldc)`

Assertion errors in Indexing.cu:1369:
block: [0,0,0], thread: [92-95,0,0] Assertion `srcIndex < srcSelectDimSize` failed.

Reproduction Steps

Run the XTTS-v2 model on local machine
Send multiple inference requests within a short time window
Error occurs consistently under load
Only solution is to restart the program

Expected Behavior

Model should handle concurrent requests without CUDA assertion errors
Graceful handling of multiple requests without requiring program restart

Questions

Are there known issues with concurrent processing using PyTorch 2.6.0?
Are there recommended batch size or request rate limitations for this model?
Is there a way to implement request queuing or rate limiting to prevent these errors?
Could this be related to CUDA 12.4 compatibility issues?

Additional Context

The error seems to indicate an issue with array indexing operations during concurrent processing. The assertion srcIndex < srcSelectDimSize suggests possible race conditions or memory access violations during parallel execution.

The error occurs with the latest stable versions of PyTorch and torchaudio (2.6.0+cu124), which should theoretically be compatible with this workload. Any guidance on handling concurrent requests or implementing proper request management would be greatly appreciated.