The base model doesn't generate coherently
#9
by
migtissera
- opened
I'm having major issues with fine-tuning this model. Is the base model bricked?
Hey
@migtissera
, could you try with the latest transformers
release (v4.42.3) and let us know if it fixes your problem? We have validated the model seems to fine-tune correctly in this version.
We also recommend using attn_implementation='eager'
in the configuration to use the eager attention instead of Flash Attention to improve the results.