huggingface/tokenizers: The current process just got forked, after parallelism has already been used. Disabling parallelism to avoid deadlocks...
To disable this warning, you can either:
	- Avoid using `tokenizers` before the fork if possible
	- Explicitly set the environment variable TOKENIZERS_PARALLELISM=(true | false)
huggingface/tokenizers: The current process just got forked, after parallelism has already been used. Disabling parallelism to avoid deadlocks...
To disable this warning, you can either:
	- Avoid using `tokenizers` before the fork if possible
	- Explicitly set the environment variable TOKENIZERS_PARALLELISM=(true | false)
huggingface/tokenizers: The current process just got forked, after parallelism has already been used. Disabling parallelism to avoid deadlocks...
To disable this warning, you can either:
	- Avoid using `tokenizers` before the fork if possible
	- Explicitly set the environment variable TOKENIZERS_PARALLELISM=(true | false)
[34m[1mwandb[39m[22m: [33mWARNING[39m Serializing object of type dict that is 589920 bytes
[34m[1mwandb[39m[22m: [33mWARNING[39m Serializing object of type dict that is 589920 bytes
  0%|                                                                                                                                         | 0/70340 [00:00<?, ?it/s]
huggingface/tokenizers: The current process just got forked, after parallelism has already been used. Disabling parallelism to avoid deadlocks...
To disable this warning, you can either:
	- Avoid using `tokenizers` before the fork if possible
	- Explicitly set the environment variable TOKENIZERS_PARALLELISM=(true | false)
huggingface/tokenizers: The current process just got forked, after parallelism has already been used. Disabling parallelism to avoid deadlocks...
To disable this warning, you can either:
	- Avoid using `tokenizers` before the fork if possible
	- Explicitly set the environment variable TOKENIZERS_PARALLELISM=(true | false)
huggingface/tokenizers: The current process just got forked, after parallelism has already been used. Disabling parallelism to avoid deadlocks...
To disable this warning, you can either:
	- Avoid using `tokenizers` before the fork if possible
	- Explicitly set the environment variable TOKENIZERS_PARALLELISM=(true | false)
huggingface/tokenizers: The current process just got forked, after parallelism has already been used. Disabling parallelism to avoid deadlocks...
To disable this warning, you can either:
	- Avoid using `tokenizers` before the fork if possible
	- Explicitly set the environment variable TOKENIZERS_PARALLELISM=(true | false)
huggingface/tokenizers: The current process just got forked, after parallelism has already been used. Disabling parallelism to avoid deadlocks...
To disable this warning, you can either:
	- Avoid using `tokenizers` before the fork if possible
	- Explicitly set the environment variable TOKENIZERS_PARALLELISM=(true | false)
huggingface/tokenizers: The current process just got forked, after parallelism has already been used. Disabling parallelism to avoid deadlocks...
To disable this warning, you can either:
	- Avoid using `tokenizers` before the fork if possible
	- Explicitly set the environment variable TOKENIZERS_PARALLELISM=(true | false)
huggingface/tokenizers: The current process just got forked, after parallelism has already been used. Disabling parallelism to avoid deadlocks...
To disable this warning, you can either:
	- Avoid using `tokenizers` before the fork if possible
	- Explicitly set the environment variable TOKENIZERS_PARALLELISM=(true | false)
huggingface/tokenizers: The current process just got forked, after parallelism has already been used. Disabling parallelism to avoid deadlocks...
To disable this warning, you can either:
	- Avoid using `tokenizers` before the fork if possible
	- Explicitly set the environment variable TOKENIZERS_PARALLELISM=(true | false)
huggingface/tokenizers: The current process just got forked, after parallelism has already been used. Disabling parallelism to avoid deadlocks...
To disable this warning, you can either:
	- Avoid using `tokenizers` before the fork if possible
	- Explicitly set the environment variable TOKENIZERS_PARALLELISM=(true | false)
huggingface/tokenizers: The current process just got forked, after parallelism has already been used. Disabling parallelism to avoid deadlocks...
To disable this warning, you can either:
	- Avoid using `tokenizers` before the fork if possible
	- Explicitly set the environment variable TOKENIZERS_PARALLELISM=(true | false)
huggingface/tokenizers: The current process just got forked, after parallelism has already been used. Disabling parallelism to avoid deadlocks...
To disable this warning, you can either:
	- Avoid using `tokenizers` before the fork if possible
	- Explicitly set the environment variable TOKENIZERS_PARALLELISM=(true | false)
huggingface/tokenizers: The current process just got forked, after parallelism has already been used. Disabling parallelism to avoid deadlocks...
To disable this warning, you can either:
	- Avoid using `tokenizers` before the fork if possible
	- Explicitly set the environment variable TOKENIZERS_PARALLELISM=(true | false)
huggingface/tokenizers: The current process just got forked, after parallelism has already been used. Disabling parallelism to avoid deadlocks...
To disable this warning, you can either:
	- Avoid using `tokenizers` before the fork if possible
  0%|                                                                                                                                         | 0/70340 [00:00<?, ?it/s]/n/fs/nlp-pranjal/miniconda3/lib/python3.9/site-packages/transformers/data/data_collator.py:132: UserWarning: Creating a tensor from a list of numpy.ndarrays is extremely slow. Please consider converting the list to a single numpy.ndarray with numpy.array() before converting to a tensor. (Triggered internally at  ../torch/csrc/utils/tensor_new.cpp:210.)
  batch[k] = torch.tensor([f[k] for f in features])
/n/fs/nlp-pranjal/miniconda3/lib/python3.9/site-packages/transformers/data/data_collator.py:132: UserWarning: Creating a tensor from a list of numpy.ndarrays is extremely slow. Please consider converting the list to a single numpy.ndarray with numpy.array() before converting to a tensor. (Triggered internally at  ../torch/csrc/utils/tensor_new.cpp:210.)
  batch[k] = torch.tensor([f[k] for f in features])
/n/fs/nlp-pranjal/miniconda3/lib/python3.9/site-packages/transformers/data/data_collator.py:132: UserWarning: Creating a tensor from a list of numpy.ndarrays is extremely slow. Please consider converting the list to a single numpy.ndarray with numpy.array() before converting to a tensor. (Triggered internally at  ../torch/csrc/utils/tensor_new.cpp:210.)
  batch[k] = torch.tensor([f[k] for f in features])
/n/fs/nlp-pranjal/miniconda3/lib/python3.9/site-packages/transformers/data/data_collator.py:132: UserWarning: Creating a tensor from a list of numpy.ndarrays is extremely slow. Please consider converting the list to a single numpy.ndarray with numpy.array() before converting to a tensor. (Triggered internally at  ../torch/csrc/utils/tensor_new.cpp:210.)
  batch[k] = torch.tensor([f[k] for f in features])
/n/fs/nlp-pranjal/miniconda3/lib/python3.9/site-packages/transformers/data/data_collator.py:132: UserWarning: Creating a tensor from a list of numpy.ndarrays is extremely slow. Please consider converting the list to a single numpy.ndarray with numpy.array() before converting to a tensor. (Triggered internally at  ../torch/csrc/utils/tensor_new.cpp:210.)
  batch[k] = torch.tensor([f[k] for f in features])
/n/fs/nlp-pranjal/miniconda3/lib/python3.9/site-packages/transformers/data/data_collator.py:132: UserWarning: Creating a tensor from a list of numpy.ndarrays is extremely slow. Please consider converting the list to a single numpy.ndarray with numpy.array() before converting to a tensor. (Triggered internally at  ../torch/csrc/utils/tensor_new.cpp:210.)
  batch[k] = torch.tensor([f[k] for f in features])
/n/fs/nlp-pranjal/miniconda3/lib/python3.9/site-packages/transformers/data/data_collator.py:132: UserWarning: Creating a tensor from a list of numpy.ndarrays is extremely slow. Please consider converting the list to a single numpy.ndarray with numpy.array() before converting to a tensor. (Triggered internally at  ../torch/csrc/utils/tensor_new.cpp:210.)
  batch[k] = torch.tensor([f[k] for f in features])
/n/fs/nlp-pranjal/miniconda3/lib/python3.9/site-packages/transformers/data/data_collator.py:132: UserWarning: Creating a tensor from a list of numpy.ndarrays is extremely slow. Please consider converting the list to a single numpy.ndarray with numpy.array() before converting to a tensor. (Triggered internally at  ../torch/csrc/utils/tensor_new.cpp:210.)
  batch[k] = torch.tensor([f[k] for f in features])
/n/fs/nlp-pranjal/miniconda3/lib/python3.9/site-packages/transformers/data/data_collator.py:132: UserWarning: Creating a tensor from a list of numpy.ndarrays is extremely slow. Please consider converting the list to a single numpy.ndarray with numpy.array() before converting to a tensor. (Triggered internally at  ../torch/csrc/utils/tensor_new.cpp:210.)
  batch[k] = torch.tensor([f[k] for f in features])
/n/fs/nlp-pranjal/miniconda3/lib/python3.9/site-packages/transformers/data/data_collator.py:132: UserWarning: Creating a tensor from a list of numpy.ndarrays is extremely slow. Please consider converting the list to a single numpy.ndarray with numpy.array() before converting to a tensor. (Triggered internally at  ../torch/csrc/utils/tensor_new.cpp:210.)
  batch[k] = torch.tensor([f[k] for f in features])
/n/fs/nlp-pranjal/miniconda3/lib/python3.9/site-packages/transformers/data/data_collator.py:132: UserWarning: Creating a tensor from a list of numpy.ndarrays is extremely slow. Please consider converting the list to a single numpy.ndarray with numpy.array() before converting to a tensor. (Triggered internally at  ../torch/csrc/utils/tensor_new.cpp:210.)
  batch[k] = torch.tensor([f[k] for f in features])
/n/fs/nlp-pranjal/miniconda3/lib/python3.9/site-packages/transformers/data/data_collator.py:132: UserWarning: Creating a tensor from a list of numpy.ndarrays is extremely slow. Please consider converting the list to a single numpy.ndarray with numpy.array() before converting to a tensor. (Triggered internally at  ../torch/csrc/utils/tensor_new.cpp:210.)
  batch[k] = torch.tensor([f[k] for f in features])
/n/fs/nlp-pranjal/miniconda3/lib/python3.9/site-packages/transformers/data/data_collator.py:132: UserWarning: Creating a tensor from a list of numpy.ndarrays is extremely slow. Please consider converting the list to a single numpy.ndarray with numpy.array() before converting to a tensor. (Triggered internally at  ../torch/csrc/utils/tensor_new.cpp:210.)
  batch[k] = torch.tensor([f[k] for f in features])
/n/fs/nlp-pranjal/miniconda3/lib/python3.9/site-packages/transformers/data/data_collator.py:132: UserWarning: Creating a tensor from a list of numpy.ndarrays is extremely slow. Please consider converting the list to a single numpy.ndarray with numpy.array() before converting to a tensor. (Triggered internally at  ../torch/csrc/utils/tensor_new.cpp:210.)
  batch[k] = torch.tensor([f[k] for f in features])
/n/fs/nlp-pranjal/miniconda3/lib/python3.9/site-packages/transformers/data/data_collator.py:132: UserWarning: Creating a tensor from a list of numpy.ndarrays is extremely slow. Please consider converting the list to a single numpy.ndarray with numpy.array() before converting to a tensor. (Triggered internally at  ../torch/csrc/utils/tensor_new.cpp:210.)
  batch[k] = torch.tensor([f[k] for f in features])
/n/fs/nlp-pranjal/miniconda3/lib/python3.9/site-packages/transformers/data/data_collator.py:132: UserWarning: Creating a tensor from a list of numpy.ndarrays is extremely slow. Please consider converting the list to a single numpy.ndarray with numpy.array() before converting to a tensor. (Triggered internally at  ../torch/csrc/utils/tensor_new.cpp:210.)
  batch[k] = torch.tensor([f[k] for f in features])
Traceback (most recent call last):
  File "/n/fs/nlp-pranjal/SemSup-LMLC/cleaned_code/main.py", line 765, in <module>
  File "/n/fs/nlp-pranjal/SemSup-LMLC/cleaned_code/main.py", line 678, in main
  File "/n/fs/nlp-pranjal/miniconda3/lib/python3.9/site-packages/transformers/trainer.py", line 1409, in train
    return inner_training_loop(
  File "/n/fs/nlp-pranjal/miniconda3/lib/python3.9/site-packages/transformers/trainer.py", line 1651, in _inner_training_loop
    tr_loss_step = self.training_step(model, inputs)
  File "/n/fs/nlp-pranjal/miniconda3/lib/python3.9/site-packages/transformers/trainer.py", line 2349, in training_step
    loss = self.compute_loss(model, inputs)
  File "/n/fs/nlp-pranjal/miniconda3/lib/python3.9/site-packages/transformers/trainer.py", line 2381, in compute_loss
    outputs = model(**inputs)
  File "/n/fs/nlp-pranjal/miniconda3/lib/python3.9/site-packages/torch/nn/modules/module.py", line 1110, in _call_impl
    return forward_call(*input, **kwargs)
  File "/n/fs/nlp-pranjal/miniconda3/lib/python3.9/site-packages/torch/nn/parallel/data_parallel.py", line 168, in forward
    outputs = self.parallel_apply(replicas, inputs, kwargs)
  File "/n/fs/nlp-pranjal/miniconda3/lib/python3.9/site-packages/torch/nn/parallel/data_parallel.py", line 178, in parallel_apply
    return parallel_apply(replicas, inputs, kwargs, self.device_ids[:len(replicas)])
  File "/n/fs/nlp-pranjal/miniconda3/lib/python3.9/site-packages/torch/nn/parallel/parallel_apply.py", line 86, in parallel_apply
    output.reraise()
  File "/n/fs/nlp-pranjal/miniconda3/lib/python3.9/site-packages/torch/_utils.py", line 457, in reraise
    raise exception
RuntimeError: Caught RuntimeError in replica 0 on device 0.
Original Traceback (most recent call last):
  File "/n/fs/nlp-pranjal/miniconda3/lib/python3.9/site-packages/torch/nn/parallel/parallel_apply.py", line 61, in _worker
    output = module(*input, **kwargs)
  File "/n/fs/nlp-pranjal/miniconda3/lib/python3.9/site-packages/torch/nn/modules/module.py", line 1110, in _call_impl
    return forward_call(*input, **kwargs)
  File "/n/fs/nlp-pranjal/SemSup-LMLC/cleaned_code/src/models.py", line 468, in forward
    return self.coil_forward(
  File "/n/fs/nlp-pranjal/SemSup-LMLC/cleaned_code/src/models.py", line 256, in coil_forward
    outputs_lab, label_embeddings, _, _ = self.forward_label_embeddings(None, None, desc_input_ids = desc_input_ids, desc_attention_mask = desc_attention_mask, return_hidden_states = True, desc_inputs_embeds = desc_inputs_embeds)
  File "/n/fs/nlp-pranjal/SemSup-LMLC/cleaned_code/src/models.py", line 408, in forward_label_embeddings
    outputs = self.label_model(
  File "/n/fs/nlp-pranjal/miniconda3/lib/python3.9/site-packages/torch/nn/modules/module.py", line 1110, in _call_impl
    return forward_call(*input, **kwargs)
  File "/n/fs/nlp-pranjal/miniconda3/lib/python3.9/site-packages/transformers/models/bert/modeling_bert.py", line 1018, in forward
    encoder_outputs = self.encoder(
  File "/n/fs/nlp-pranjal/miniconda3/lib/python3.9/site-packages/torch/nn/modules/module.py", line 1110, in _call_impl
    return forward_call(*input, **kwargs)
  File "/n/fs/nlp-pranjal/miniconda3/lib/python3.9/site-packages/transformers/models/bert/modeling_bert.py", line 607, in forward
    layer_outputs = layer_module(
  File "/n/fs/nlp-pranjal/miniconda3/lib/python3.9/site-packages/torch/nn/modules/module.py", line 1110, in _call_impl
    return forward_call(*input, **kwargs)
  File "/n/fs/nlp-pranjal/miniconda3/lib/python3.9/site-packages/transformers/models/bert/modeling_bert.py", line 493, in forward
    self_attention_outputs = self.attention(
  File "/n/fs/nlp-pranjal/miniconda3/lib/python3.9/site-packages/torch/nn/modules/module.py", line 1110, in _call_impl
    return forward_call(*input, **kwargs)
  File "/n/fs/nlp-pranjal/miniconda3/lib/python3.9/site-packages/transformers/models/bert/modeling_bert.py", line 423, in forward
    self_outputs = self.self(
  File "/n/fs/nlp-pranjal/miniconda3/lib/python3.9/site-packages/torch/nn/modules/module.py", line 1110, in _call_impl
    return forward_call(*input, **kwargs)
  File "/n/fs/nlp-pranjal/miniconda3/lib/python3.9/site-packages/transformers/models/bert/modeling_bert.py", line 355, in forward
    attention_probs = self.dropout(attention_probs)
  File "/n/fs/nlp-pranjal/miniconda3/lib/python3.9/site-packages/torch/nn/modules/module.py", line 1110, in _call_impl
    return forward_call(*input, **kwargs)
  File "/n/fs/nlp-pranjal/miniconda3/lib/python3.9/site-packages/torch/nn/modules/dropout.py", line 58, in forward
    return F.dropout(input, self.p, self.training, self.inplace)
  File "/n/fs/nlp-pranjal/miniconda3/lib/python3.9/site-packages/torch/nn/functional.py", line 1279, in dropout
    return _VF.dropout_(input, p, training) if inplace else _VF.dropout(input, p, training)
RuntimeError: CUDA out of memory. Tried to allocate 782.00 MiB (GPU 0; 10.76 GiB total capacity; 3.28 GiB already allocated; 61.69 MiB free; 3.65 GiB reserved in total by PyTorch) If reserved memory is >> allocated memory try setting max_split_size_mb to avoid fragmentation.  See documentation for Memory Management and PYTORCH_CUDA_ALLOC_CONF
Error in sys.excepthook:
Traceback (most recent call last):
  File "/n/fs/nlp-pranjal/miniconda3/lib/python3.9/site-packages/rich/syntax.py", line 496, in tokens_to_spans
    _token_type, token = next(tokens)
StopIteration
The above exception was the direct cause of the following exception:
Traceback (most recent call last):
  File "/n/fs/nlp-pranjal/miniconda3/lib/python3.9/site-packages/rich/console.py", line 1673, in print
    extend(render(renderable, render_options))
  File "/n/fs/nlp-pranjal/miniconda3/lib/python3.9/site-packages/rich/console.py", line 1309, in render
    yield from self.render(render_output, _options)
  File "/n/fs/nlp-pranjal/miniconda3/lib/python3.9/site-packages/rich/console.py", line 1305, in render
    for render_output in iter_render:
  File "/n/fs/nlp-pranjal/miniconda3/lib/python3.9/site-packages/rich/constrain.py", line 29, in __rich_console__
    yield from console.render(self.renderable, child_options)
  File "/n/fs/nlp-pranjal/miniconda3/lib/python3.9/site-packages/rich/console.py", line 1305, in render
    for render_output in iter_render:
  File "/n/fs/nlp-pranjal/miniconda3/lib/python3.9/site-packages/rich/panel.py", line 175, in __rich_console__
    lines = console.render_lines(renderable, child_options, style=style)
  File "/n/fs/nlp-pranjal/miniconda3/lib/python3.9/site-packages/rich/console.py", line 1345, in render_lines
    lines = list(
  File "/n/fs/nlp-pranjal/miniconda3/lib/python3.9/site-packages/rich/segment.py", line 292, in split_and_crop_lines
    for segment in segments:
  File "/n/fs/nlp-pranjal/miniconda3/lib/python3.9/site-packages/rich/console.py", line 1305, in render
    for render_output in iter_render:
  File "/n/fs/nlp-pranjal/miniconda3/lib/python3.9/site-packages/rich/padding.py", line 97, in __rich_console__
    lines = console.render_lines(
  File "/n/fs/nlp-pranjal/miniconda3/lib/python3.9/site-packages/rich/console.py", line 1345, in render_lines
    lines = list(
  File "/n/fs/nlp-pranjal/miniconda3/lib/python3.9/site-packages/rich/segment.py", line 292, in split_and_crop_lines
    for segment in segments:
  File "/n/fs/nlp-pranjal/miniconda3/lib/python3.9/site-packages/rich/console.py", line 1309, in render
    yield from self.render(render_output, _options)
  File "/n/fs/nlp-pranjal/miniconda3/lib/python3.9/site-packages/rich/console.py", line 1305, in render
    for render_output in iter_render:
  File "/n/fs/nlp-pranjal/miniconda3/lib/python3.9/site-packages/rich/syntax.py", line 598, in __rich_console__
    segments = Segments(self._get_syntax(console, options))
  File "/n/fs/nlp-pranjal/miniconda3/lib/python3.9/site-packages/rich/segment.py", line 668, in __init__
    self.segments = list(segments)
  File "/n/fs/nlp-pranjal/miniconda3/lib/python3.9/site-packages/rich/syntax.py", line 626, in _get_syntax
    text = self.highlight(processed_code, self.line_range)
  File "/n/fs/nlp-pranjal/miniconda3/lib/python3.9/site-packages/rich/syntax.py", line 508, in highlight
    text.append_tokens(tokens_to_spans())
  File "/n/fs/nlp-pranjal/miniconda3/lib/python3.9/site-packages/rich/text.py", line 970, in append_tokens
    for content, style in tokens:
RuntimeError: generator raised StopIteration
Original exception was:
Traceback (most recent call last):
  File "/n/fs/nlp-pranjal/SemSup-LMLC/cleaned_code/main.py", line 765, in <module>
  File "/n/fs/nlp-pranjal/SemSup-LMLC/cleaned_code/main.py", line 678, in main
  File "/n/fs/nlp-pranjal/miniconda3/lib/python3.9/site-packages/transformers/trainer.py", line 1409, in train
    return inner_training_loop(
  File "/n/fs/nlp-pranjal/miniconda3/lib/python3.9/site-packages/transformers/trainer.py", line 1651, in _inner_training_loop
    tr_loss_step = self.training_step(model, inputs)
  File "/n/fs/nlp-pranjal/miniconda3/lib/python3.9/site-packages/transformers/trainer.py", line 2349, in training_step
    loss = self.compute_loss(model, inputs)
  File "/n/fs/nlp-pranjal/miniconda3/lib/python3.9/site-packages/transformers/trainer.py", line 2381, in compute_loss
    outputs = model(**inputs)
  File "/n/fs/nlp-pranjal/miniconda3/lib/python3.9/site-packages/torch/nn/modules/module.py", line 1110, in _call_impl
    return forward_call(*input, **kwargs)
  File "/n/fs/nlp-pranjal/miniconda3/lib/python3.9/site-packages/torch/nn/parallel/data_parallel.py", line 168, in forward
    outputs = self.parallel_apply(replicas, inputs, kwargs)
  File "/n/fs/nlp-pranjal/miniconda3/lib/python3.9/site-packages/torch/nn/parallel/data_parallel.py", line 178, in parallel_apply
    return parallel_apply(replicas, inputs, kwargs, self.device_ids[:len(replicas)])
  File "/n/fs/nlp-pranjal/miniconda3/lib/python3.9/site-packages/torch/nn/parallel/parallel_apply.py", line 86, in parallel_apply
    output.reraise()
  File "/n/fs/nlp-pranjal/miniconda3/lib/python3.9/site-packages/torch/_utils.py", line 457, in reraise
    raise exception
RuntimeError: Caught RuntimeError in replica 0 on device 0.
Original Traceback (most recent call last):
  File "/n/fs/nlp-pranjal/miniconda3/lib/python3.9/site-packages/torch/nn/parallel/parallel_apply.py", line 61, in _worker
    output = module(*input, **kwargs)
  File "/n/fs/nlp-pranjal/miniconda3/lib/python3.9/site-packages/torch/nn/modules/module.py", line 1110, in _call_impl
    return forward_call(*input, **kwargs)
  File "/n/fs/nlp-pranjal/SemSup-LMLC/cleaned_code/src/models.py", line 468, in forward
    return self.coil_forward(
  File "/n/fs/nlp-pranjal/SemSup-LMLC/cleaned_code/src/models.py", line 256, in coil_forward
    outputs_lab, label_embeddings, _, _ = self.forward_label_embeddings(None, None, desc_input_ids = desc_input_ids, desc_attention_mask = desc_attention_mask, return_hidden_states = True, desc_inputs_embeds = desc_inputs_embeds)
  File "/n/fs/nlp-pranjal/SemSup-LMLC/cleaned_code/src/models.py", line 408, in forward_label_embeddings
    outputs = self.label_model(
  File "/n/fs/nlp-pranjal/miniconda3/lib/python3.9/site-packages/torch/nn/modules/module.py", line 1110, in _call_impl
    return forward_call(*input, **kwargs)
  File "/n/fs/nlp-pranjal/miniconda3/lib/python3.9/site-packages/transformers/models/bert/modeling_bert.py", line 1018, in forward
    encoder_outputs = self.encoder(
  File "/n/fs/nlp-pranjal/miniconda3/lib/python3.9/site-packages/torch/nn/modules/module.py", line 1110, in _call_impl
    return forward_call(*input, **kwargs)
  File "/n/fs/nlp-pranjal/miniconda3/lib/python3.9/site-packages/transformers/models/bert/modeling_bert.py", line 607, in forward
    layer_outputs = layer_module(
  File "/n/fs/nlp-pranjal/miniconda3/lib/python3.9/site-packages/torch/nn/modules/module.py", line 1110, in _call_impl
    return forward_call(*input, **kwargs)
  File "/n/fs/nlp-pranjal/miniconda3/lib/python3.9/site-packages/transformers/models/bert/modeling_bert.py", line 493, in forward
    self_attention_outputs = self.attention(
  File "/n/fs/nlp-pranjal/miniconda3/lib/python3.9/site-packages/torch/nn/modules/module.py", line 1110, in _call_impl
    return forward_call(*input, **kwargs)
  File "/n/fs/nlp-pranjal/miniconda3/lib/python3.9/site-packages/transformers/models/bert/modeling_bert.py", line 423, in forward
    self_outputs = self.self(
  File "/n/fs/nlp-pranjal/miniconda3/lib/python3.9/site-packages/torch/nn/modules/module.py", line 1110, in _call_impl
    return forward_call(*input, **kwargs)
  File "/n/fs/nlp-pranjal/miniconda3/lib/python3.9/site-packages/transformers/models/bert/modeling_bert.py", line 355, in forward
    attention_probs = self.dropout(attention_probs)
  File "/n/fs/nlp-pranjal/miniconda3/lib/python3.9/site-packages/torch/nn/modules/module.py", line 1110, in _call_impl
    return forward_call(*input, **kwargs)
  File "/n/fs/nlp-pranjal/miniconda3/lib/python3.9/site-packages/torch/nn/modules/dropout.py", line 58, in forward
    return F.dropout(input, self.p, self.training, self.inplace)
  File "/n/fs/nlp-pranjal/miniconda3/lib/python3.9/site-packages/torch/nn/functional.py", line 1279, in dropout
    return _VF.dropout_(input, p, training) if inplace else _VF.dropout(input, p, training)
RuntimeError: CUDA out of memory. Tried to allocate 782.00 MiB (GPU 0; 10.76 GiB total capacity; 3.28 GiB already allocated; 61.69 MiB free; 3.65 GiB reserved in total by PyTorch) If reserved memory is >> allocated memory try setting max_split_size_mb to avoid fragmentation.  See documentation for Memory Management and PYTORCH_CUDA_ALLOC_CONF