use_cache=False changes behavior

#14
by edmond - opened

Hello, generating sequences token by token changes the result (it finally gives the right result, the argmax of each new token). use_cache=False also generates the proper result.
I know it was already mentioned here https://github.com/huggingface/transformers/issues/31425,
but I was wondering if anyone already knew a quickfix here, I really need to know how to generate proper sequences without sacrifiying inference speed.

edmond changed discussion status to closed

Sign up or log in to comment