The base model doesn't generate coherently

by migtissera - opened Jul 1, 2024

Discussion

migtissera

Jul 1, 2024

I'm having major issues with fine-tuning this model. Is the base model bricked?

lysandre

Google org Jul 1, 2024

Hey @migtissera , could you try with the latest transformers release (v4.42.3) and let us know if it fixes your problem? We have validated the model seems to fine-tune correctly in this version.

lysandre

Google org Jul 1, 2024

We also recommend using attn_implementation='eager' in the configuration to use the eager attention instead of Flash Attention to improve the results.

ivnle

Aug 11, 2024

•

edited Aug 11, 2024

I've noticed the same issue with transformers 4.44.0. Generating using vLLM 0.5.4 for google/gemma-2-9b is fine, but google/gemma-2-27b generates text similar in quality to openai-community/gpt2-medium.

tanliboy

Aug 11, 2024

The vLLM didn't support global attention on odd layers. Although a fix has been implemented in the main branch, it hasn't yet been merged into the v0.5.4 release.
If you're using transformers, you need to set attn_implementation='eager' with the released versions. Otherwise, if you want to use flash_attention_2, you'll need to install the main branch to get the fix.

migtissera changed discussion status to closed Aug 11, 2024

Upload images, audio, and videos by dragging in the text input, pasting, or clicking here.

Tap or paste here to upload images

· Sign up or log in to comment