CUDA Memory error when using sentence transformers using Tesla V100-PCIE-32GB
Hello, I'm facing an cuda memory error while trying to embed documents (less than 4096 tokens).
I'm using sentence transformers to load the model, I'm using a Tesla V100-PCIE-32GB GPU.
Here is the error :
torch.cuda.OutOfMemoryError: CUDA out of memory. Tried to allocate 64.00 MiB (GPU 0; 31.74 GiB total capacity; 23.36 GiB already allocated; 11.06 MiB free; 23.37 GiB reserved in total by PyTorch) If reserved memory is >> allocated memory try setting max_split_size_mb to avoid fragmentation. See documentation for Memory Management and PYTORCH_CUDA_ALLOC_CONF
any idea how to solve this ? or my GPU doesn't have enough memory ?
Hi, I encountered similar issues when quantizing this embedder to 4-bit and embedding a series of text with various lengths. I only have a 16GB V100 GPU to use, and the longest text in the series I have is less than 1k words. I noticed that the GPU memory wasn't released even with model outputs being moved to cpu and deleted plus running gc.collect()
and torch.cuda.empty_cache()
after. I appreciate for any idea on the solution or suggestion.
@cc-wei
looks like you are doing batch processing.
I faced the same issue, no matter even if the batch size is as small as 4, the issue pertains
Please embed each chunk at a time.
def embed_texts(batch_texts, batch_number):
embeddings = []
with torch.no_grad():
batch_time = time.time()
for text in batch_texts: ###################### processing each chunk individually rather in batches
embedding = model.encode([text], prompt_name="web_search_query", convert_to_tensor=True, device=device)
embeddings.append(embedding.cpu())
elapsed_time = time.time() - batch_time
print(f"Time taken for embedding batch {batch_number}: {elapsed_time:.2f} seconds")
return torch.cat(embeddings)