ERROR occurred during knowledge distillation for T5 in Question Answering

#1
by haurajahra - opened

Hello everyone,

Has anyone ever conducted knowledge distillation in Question Answering using the T5 model? I am currently researching knowledge distillation in Question Answering and have encountered an error like this:

RuntimeError Traceback (most recent call last)
in <cell line: 1>()
----> 1 trainer.train()

18 frames
/usr/local/lib/python3.10/dist-packages/transformers/models/t5/modeling_t5.py in forward(self, hidden_states, mask, key_value_states, position_bias, past_key_value, layer_head_mask, query_length, use_cache, output_attentions)
529
530 # compute scores
--> 531 scores = torch.matmul(
532 query_states, key_states.transpose(3, 2)
533 ) # equivalent of torch.einsum("bnqd,bnkd->bnqk", query_states, key_states), compatible with onnx op>9

RuntimeError: CUDA error: device-side assert triggered
CUDA kernel errors might be asynchronously reported at some other API call, so the stacktrace below might be incorrect.
For debugging, consider passing CUDA_LAUNCH_BLOCKING=1.
Compile with TORCH_USE_CUDA_DSA to enable device-side assertions.

The teacher model: idT5-base (Indonesian T5), and the student model is T5 small.

Do you have any ideas on what I should do?

Sign up or log in to comment