"It is strongly recommended to train Gemma2 models with the `eager` attention implementation "

#10
by JaronTHU - opened

Why is eager attention implementation preferred?

Reference: https://github.com/huggingface/transformers/blob/v4.42.0/src/transformers/models/gemma2/modeling_gemma2.py#L1046

"It is strongly recommended to train Gemma2 models with the eager attention implementation "
f"instead of {self.config._attn_implementation}. Use eager with AutoModelForCausalLM.from_pretrained('<path-to-checkpoint>', attn_implementation='eager')."

I get it, thank you~

JaronTHU changed discussion status to closed

Sign up or log in to comment