Is the CausalML model from HuggingFace truly causal?
#46
by
Cyrile
- opened
Hello, I have a technical question. It's about training for text generation in a causal manner. I noticed that the training objective is cross-entropy based on a simple shift of the input_ids. However, the attention mechanism is causal thanks to the mask, but the feed-forward part is non-causal, am I correct? Therefore, isn't the way the model is trained in the HuggingFace library incorrect? Shouldn't we apply cross-entropy only on the prediction of the last token or also put a causal-mask on the MLP part?
Cyrile
changed discussion status to
closed
Excuse me, I made a mistake and this is wrong.