The base model doesn't generate coherently

#9
by migtissera - opened

I'm having major issues with fine-tuning this model. Is the base model bricked?

Google org

Hey @migtissera , could you try with the latest transformers release (v4.42.3) and let us know if it fixes your problem? We have validated the model seems to fine-tune correctly in this version.

Google org

We also recommend using attn_implementation='eager' in the configuration to use the eager attention instead of Flash Attention to improve the results.

Sign up or log in to comment