CUDA out of memory during model fine-tuning

#224
by AMCalejandro - opened

Hi,

I keep hitting this 'CUDA out of memory' issue, coming from a matrix multiplication.
I am allocating 100GB of memory to my GPU running a job on SLURM but I do not seem to be able to overcome this issue.

I am trying to fine-tune tune the model to distinguish between dosage senitive vs dosage insensitive genes.

The output
of the run

 Traceback (most recent call last)
Cell In[16], line 3
      1 # cross-validate gene classifier
      2 all_roc_auc, roc_auc, roc_auc_sd, mean_fpr, mean_tpr, confusion, label_dicts \
----> 3     = cross_validate(subsampled_train_dataset, targets, labels, nsplits, subsample_size, training_args, freeze_layers, training_output_dir, 1)

...


File /gpfs/gsfs12/users/martinezcarraa2/conda/envs/hackgneformer/lib/python3.8/site-packages/transformers/models/bert/modeling_bert.py:325, in BertSelfAttention.forward(self, hidden_states, attention_mask, head_mask, encoder_hidden_states, encoder_attention_mask, past_key_value, output_attentions)
    322     past_key_value = (key_layer, value_layer)
    324 # Take the dot product between "query" and "key" to get the raw attention scores.
--> 325 attention_scores = torch.matmul(query_layer, key_layer.transpose(-1, -2))
    327 if self.position_embedding_type == "relative_key" or self.position_embedding_type == "relative_key_query":
    328     query_length, key_length = query_layer.shape[2], key_layer.shape[2]

OutOfMemoryError: CUDA out of memory. Tried to allocate 12.50 GiB (GPU 0; 11.92 GiB total capacity; 2.47 GiB already allocated; 9.05 GiB free; 2.56 GiB reserved in total by PyTorch) If reserved memory is >> allocated memory try setting max_split_size_mb to avoid fragmentation.  See documentation for Memory Management and PYTORCH_CUDA_ALLOC_CONF

Useful info regarding CUDA memory
'|===========================================================================|\n| PyTorch CUDA memory summary, device ID 0 |\n|---------------------------------------------------------------------------|\n| CUDA OOMs: 1 | cudaMalloc retries: 1 |\n|===========================================================================|\n| Metric | Cur Usage | Peak Usage | Tot Alloc | Tot Freed |\n|---------------------------------------------------------------------------|\n| Allocated memory | 1728 MiB | 2528 MiB | 41874 GiB | 41872 GiB |\n| from large pool | 1708 MiB | 2508 MiB | 41578 GiB | 41576 GiB |\n| from small pool | 20 MiB | 35 MiB | 295 GiB | 295 GiB |\n|---------------------------------------------------------------------------|\n| Active memory | 1728 MiB | 2528 MiB | 41874 GiB | 41872 GiB |\n| from large pool | 1708 MiB | 2508 MiB | 41578 GiB | 41576 GiB |\n| from small pool | 20 MiB | 35 MiB | 295 GiB | 295 GiB |\n|---------------------------------------------------------------------------|\n| Requested memory | 1727 MiB | 2527 MiB | 41860 GiB | 41858 GiB |\n| from large pool | 1707 MiB | 2507 MiB | 41564 GiB | 41563 GiB |\n| from small pool | 20 MiB | 35 MiB | 295 GiB | 295 GiB |\n|---------------------------------------------------------------------------|\n| GPU reserved memory | 2618 MiB | 4734 MiB | 4734 MiB | 2116 MiB |\n| from large pool | 2594 MiB | 4694 MiB | 4694 MiB | 2100 MiB |\n| from small pool | 24 MiB | 40 MiB | 40 MiB | 16 MiB |\n|---------------------------------------------------------------------------|\n| Non-releasable memory | 91432 KiB | 843 MiB | 12055 GiB | 12055 GiB |\n| from large pool | 87498 KiB | 838 MiB | 11748 GiB | 11748 GiB |\n| from small pool | 3934 KiB | 12 MiB | 307 GiB | 307 GiB |\n|---------------------------------------------------------------------------|\n| Allocations | 193 | 301 | 2642 K | 2641 K |\n| from large pool | 16 | 72 | 1554 K | 1554 K |\n| from small pool | 177 | 247 | 1087 K | 1087 K |\n|---------------------------------------------------------------------------|\n| Active allocs | 193 | 301 | 2642 K | 2641 K |\n| from large pool | 16 | 72 | 1554 K | 1554 K |\n| from small pool | 177 | 247 | 1087 K | 1087 K |\n|---------------------------------------------------------------------------|\n| GPU reserved segments | 22 | 48 | 48 | 26 |\n| from large pool | 10 | 28 | 28 | 18 |\n| from small pool | 12 | 20 | 20 | 8 |\n|---------------------------------------------------------------------------|\n| Non-releasable allocs | 18 | 48 | 1512 K | 1512 K |\n| from large pool | 6 | 27 | 879 K | 879 K |\n| from small pool | 12 | 25 | 633 K | 633 K |\n|---------------------------------------------------------------------------|\n| Oversize allocations | 0 | 0 | 0 | 0 |\n|---------------------------------------------------------------------------|\n| Oversize GPU segments | 0 | 0 | 0 | 0 |\n|===========================================================================|\n'

Thank you for your interest in Geneformer! The memory error you are getting is from the GPU memory, not the CPU memory. You can reduce the batch size as needed to fit your hardware. Please note that changing the batch size may affect the training efficacy. You can optimize the other hyperparameters given your batch size requirement.

ctheodoris changed discussion status to closed

Sign up or log in to comment