Loss without grad_fn when using transformers Trainer suite

#10
by syboomsysy - opened

I tried to finetune the model using alpaca dataset. However, when I launched the training procedure, I found that the model yielded loss directly (does the model have loss function embedded inside?), and the loss had no grad_func attached on it so
the error occurred during backward phase. Could anyone tell me the reason of the problem?

Here is my package setting:
torch 2.0.0+cu117
transformers 4.35.2
peft 0.6.2

and here is the snapshot for my code:

image.png

and finally, the error log details:
image.png

Oh, it seems that the gradient checkpointing should be the crux, the procedure runs well if I set it False.

Sign up or log in to comment