Fix weights by putting the right value in `lm_head.weight`

#3
by sgugger - opened

There was probably a bug in the initial conversion script that created those models, as the weights they have have a
different value for lm_head.weight and model.decoder.embed_tokens.weight. Those models are tied though.

This was not a problem until now as the model was tied after the load and the (wrong) value of lm_head.weight was
replaced by the value of model.decoder.embed_tokens.weight. This does not work any more if we tie the weights before
the load however, as the value picked might be the one from lm_head.weight depending on how the models are tied.
As far as I can see, the model stop generating properly on Transformers main.

This should fix the bug without any side effect.

Language Technology Research Group at the University of Helsinki org

Thanks I can't merge but this fixes the issues for these models

Thank you very much. I'm a bit confused though.
I want to convert a Marian MT model (from Tatoeba-Challenge) to PyTorch, so as to use it with HF locally.
In order to apply this fix, should I make changes to the MarianMTModel or in the conversion script as well?

Language Technology Research Group at the University of Helsinki org

If you use the latest release of transformers, the conversion should work out of the box! Does it not?

tiedeman changed pull request status to merged

Sign up or log in to comment